来源 | 搜索技术
责编 | 小白
Google和百度都支持输入提示功能,辅助你快速准确的输入想要的内容。
如下:输入“五一”,会提示“五一劳动节”等。
那如何实现谷歌这样的输入提示功能呢?
分析下输入提示的功能需求
当输入前面的词A,希望提示出前缀为A的所有高相关性的词。这个特性属于前缀匹配,trie树被称为前缀树,是一种搜索排序树,很适合用作输入提示的实践。
下面以python3为例,使用Trie树,构建输入提示服务。
# Python3 program to demonstrate auto-complete # feature using Trie data structure. # Note: This is a basic implementation of Trie # and not the most optimized one. class TrieNode(): def __init__(self):# Initialising one node for trie self.children = {} self.last = False
class Trie(): def __init__(self):# Initialising the trie structure. self.root = TrieNode() self.word_list = []def formTrie(self, keys):# Forms a trie structure with the given set of strings # if it does not exists already else it merges the key # into it by extending the structure as required for key in keys: self.insert(key) # inserting one key to the trie.def insert(self, key):# Inserts a key into trie if it does not exist already. # And if the key is a prefix of the trie node, just # marks it as leaf node. node = self.rootfor a in list(key): if not node.children.get(a): node.children[a] = TrieNode()node = node.children[a]node.last = Truedef search(self, key):# Searches the given key in trie for a full match # and returns True on success else returns False. node = self.root found = Truefor a in list(key): if not node.children.get(a): found = False breaknode = node.children[a]return node and node.last and founddef suggestionsRec(self, node, word):# Method to recursively traverse the trie # and return a whole word. if node.last: self.word_list.append(word)for a,n in node.children.items(): self.suggestionsRec(n, word + a)def printAutoSuggestions(self, key):# Returns all the words in the trie whose common # prefix is the given key thus listing out all # the suggestions for autocomplete. node = self.root not_found = False temp_word = ''for a in list(key): if not node.children.get(a): not_found = True breaktemp_word += a node = node.children[a]if not_found: return 0 elif node.last and not node.children: return -1self.suggestionsRec(node, temp_word)for s in self.word_list: print(s) return 1
# Driver Codekeys = ["五一", "五一劳动节", "五一放假安排", "五一劳动节图片", "五一劳动节图片 2020", "五一劳动节快乐", "五一晚会", "五一假期", "五一快乐","五一节快乐", "五花肉", "五行", "五行相生"] # keys to form the trie structure.key = "五一" # key for autocomplete suggestions.status = ["Not found", "Found"]
# creating trie objectt = Trie()
# creating the trie structure with the# given set of strings.t.formTrie(keys)
# autocompleting the given key using# our trie structure.comp = t.printAutoSuggestions(key)
if comp == -1: print("No other strings found with this prefix\n")elif comp == 0: print("No string found with this prefix\n")
# This code is contributed by amurdia
输入:五一,输入提示结果如下:
结果都实现了,但我们实现后的输入提示顺序跟Google有点不一样,那怎么办呢?
一般构建输入提示的数据源都是用户输入的query词的日志数据,并且会统计每个输入词的次数,以便按照输入词的热度给用户提示。
现在我们把日志词库加上次数,来模拟Google的输入效果。
日志库的查询词及个数示例如下:
五一劳动节 10五一劳动节图片 9五一假期 8五一劳动节快乐 7五一放假安排 6五一晚会 5五一 4五一快乐 3五一劳动节图片2020 2五一快乐 1
把输入提示的代码调整下,支持查询词次数的支持:
# Python3 program to demonstrate auto-complete # feature using Trie data structure. # Note: This is a basic implementation of Trie # and not the most optimized one. import operatorclass TrieNode(): def __init__(self): # Initialising one node for trie self.children = {} self.last = False class Trie(): def __init__(self): # Initialising the trie structure. self.root = TrieNode() #self.word_list = [] self.word_list = {} def formTrie(self, keys): # Forms a trie structure with the given set of strings # if it does not exists already else it merges the key # into it by extending the structure as required for key in keys: self.insert(key) # inserting one key to the trie. def insert(self, key): # Inserts a key into trie if it does not exist already. # And if the key is a prefix of the trie node, just # marks it as leaf node. node = self.root for a in list(key): if not node.children.get(a): node.children[a] = TrieNode() node = node.children[a] node.last = True def search(self, key): # Searches the given key in trie for a full match # and returns True on success else returns False. node = self.root found = True for a in list(key): if not node.children.get(a): found = False break node = node.children[a] return node and node.last and found def suggestionsRec(self, node, word): # Method to recursively traverse the trie # and return a whole word. if node.last: #self.word_list.append(word) ll = word.split(',') if(len(ll) >= 2): self.word_list[ll[0]] = int(ll[1]) else: self.word_list[ll[0]] = 0 for a,n in node.children.items(): self.suggestionsRec(n, word + a) def printAutoSuggestions(self, key): # Returns all the words in the trie whose common # prefix is the given key thus listing out all # the suggestions for autocomplete. node = self.root not_found = False temp_word = '' for a in list(key): if not node.children.get(a): not_found = True break temp_word += a node = node.children[a] if not_found: return 0 elif node.last and not node.children: return -1 self.suggestionsRec(node, temp_word) #sort sorted_d = dict(sorted(self.word_list.items(), key=operator.itemgetter(1),reverse=True)) for s in sorted_d.keys(): print(s) return 1
# Driver Codekeys = ["五一,4", "五一劳动节,10", "五一放假安排,6", "五一劳动节图片,9", "五一劳动节图片 2020,2", "五一劳动节快乐,7", "五一晚会,5", "五一假期,8", "五一快乐,3","五一节快乐,1", "五花肉,0", "五行,0", "五行相生,0"] # keys to form the trie structure.key = "五一" # key for autocomplete suggestions.status = ["Not found", "Found"]
# creating trie objectt = Trie()
# creating the trie structure with the# given set of strings.t.formTrie(keys)
# autocompleting the given key using# our trie structure.comp = t.printAutoSuggestions(key)
if comp == -1: print("No other strings found with this prefix\n")elif comp == 0: print("No string found with this prefix\n")
# This code is contributed by amurdia
输出结果跟Google一模一样:
总结:
以上是使用Trie树,实践Google输入提示的功能。除了Trie树实践,我们还有其他办法么,搜索中有没有其他的索引能很好实现输入提示的功能呢?
更多阅读推荐
云原生体系下的技海浮沉与理论探索
如何通过 Serverless 轻松识别验证码?
5G与金融行业融合应用的场景探索
打破“打工人”魔咒,RPA 来狙击!
使用 SQL 语句实现一个年会抽奖程序