引言

前缀树——trie /ˈtraɪ//树，也叫作“单词查找树”、“字典树”。

它属于多叉树结构，典型应用场景是统计、保存大量的字符串，经常被搜索引擎系统用于文本词频统计。它的优点是利用字符串的公共前缀来减少查找时间，最大限度的减少无谓字符串的比较和存储空间。

trie 来自于 retrieval 的中间部分。在wiki百科中，有关于 trie 一词的由来：

Tries were first described by René de la Briandais in 1959.The term trie was coined two years later by Edward Fredkin, who pronounces it /ˈtriː/(as "tree"), after the middle syllable of retrieval.However, other authors pronounce it/ˈtraɪ/(as "try"), in an attempt to distinguish it verbally from "tree".

一、前缀树的逻辑结构

前缀树是一个由“路径”和“节点”组成多叉树结构。由根节点出发，按照存储字符串的每个字符，创建对应字符路径。

由“路径”记载字符串中的字符，由节点记载经过的字符数以及结尾字符结尾数，例如一个简单的记录了"abc"、"abd"、"bcf"、"abcd" 这四个字符串的前缀树如下图所示：

二、前缀树的添加过程

经典的前缀树都是以“路径”（以下简称路）记录字符的，由节点记录统计信息 pass 代表经过的字符个数，end代表有多少个字符串以这条路径结尾。

2.1 逻辑过程

前缀树添加字符串的过程，以 "abc"、"bcd"、"abcd"为例：

1、首先会有一个初始化的根节点 root :

2、添加 'a' 字符，根节点 pass++ , 由于字符串还没有结束，后面还有 'b'、'c'，end 不变，并在路的另一端创建一个节点（因为两个节点才能形成一条路），并将新节点的 pass++，end 同样不变：

3、添加 'b' 字符，同样创建一个新的节点，来表示 b 的路，并在新节点上 pass++，end不变：

4、添加'c' 字符，创建一个新的节点，表示 c 的路，在新节点上 pass++，此时字符串已经结束，end++：

到此为止，就完成了'abc'字符串的添加，以同样的方法添加 'bcd'、'abcd'，注意，每次添加都要从 root 开始，root 作为前缀树的第一个节点，其 pass 可以表示树中一共存储了多少字符串。前缀树的最大特点就是复用字符，如果从 root 没有可复用的前缀，那么就需要创建新的路径，如果有就需要复用已有路径，并标记经过的字符个数：

2.2 代码实现

前缀树的路是一种抽象结构，无法用具体的代码直接描述，在代码中，基于最简单的小写英文的前缀树，通常就是以一个数组表示 26 个字母的通道，以每个通道是否存在 Node 节点来表示到达这个Node节点的路是否存在：

private static class Node {public int pass;public int end;public Node[] nexts;public Node() {pass = 0;end = 0;// 26种可能nexts = new Node[26];}
}

那么添加的过程就是：

public class Code01_TrieTree {private Node root;public Code01_TrieTree() {root = new Node();}private static class Node {// ...}public void insert(String word) {if (word == null)return;char[] chars = word.toCharArray();Node node = root;// 以 root 出发node.pass++;// 初始化路径int path = 0;// 遍历字符数组for (int i = 0; i < chars.length; i++) {// 26个槽位与26个字母对应，'a'->0，字符相减就可以获得路径的偏移量path = chars[i] - 'a';// 计算出偏移量后判断是否存在节点，如果是 null 表示不存在这条路径if (node.nexts[path] == null)node.nexts[path] = new Node();// 指针移动到路径的尾节点node = node.nexts[path];// 路径的尾节点pass++node.pass++;}// 遍历完成后，node一定就是最后一个路径的尾节点，此时记录结尾数量node.end++;}
}

三、前缀树的查找

前缀树有两种常用查找，第一种是最普通的字符串出现的次数，第二种是某个字符串前缀出现的次数。

3.1 指定字符串出现次数

前缀树的查找思路是，先获取到 root 引用，由此出发。

遍历整个字符串，如果在遍历途中发现某个路径不存在（即路径的尾节点==null），则表示前缀树从未存储过该字符串。

经过遍历后，节点指针一定会来到最后一个字符路径的尾节点，这个节点的 end 记录了总共有多少个字符串以这个字符路径结尾，所以直接返回 end 即可，代码如下：

public int search(String word) {if (word == null)return 0;char[] chars = word.toCharArray();// 由 root出发的节点指针Node node = root;// 初始化路径int path = 0;// 遍历字符串for (int i = 0; i < chars.length; i++) {// 计算路径偏移path = chars[i] - 'a';// 路径不存在，表示该字符串不存在if (node.nexts[path] == null)return 0;// 移动节点指针node = node.nexts[path];}// 遍历结束后，节点指针来到字符串的尾节点，直接返回end统计值return node.end;
}

3.2 指定前缀出现次数

前缀查找统计的逻辑和字符串查找的逻辑几乎完全一样，唯一不同的是，在最后返回时，返回的是字符串尾节点的 pass 值，它代表有多少个字符串经过了这个节点。

public int searchPre(String pre) {if (pre == null) return 0;Node node = root;char[] chars = pre.toCharArray();int path = 0;for (int i = 0; i < chars.length; i++) {path = chars[i] - 'a';if (node.nexts[path] == null) {return 0;}node = node.nexts[path];}return node.pass;
}

四、前缀树的删除

前缀树的删除逻辑主体结构和插入、查找相同，都是遍历参数字符串，在途径的节点上分别 count down pass 和end 两个变量。

需要注意的两个点：

1、在开始执行真正的删除逻辑之前，一定要先调用 search 方法判断是否存在该字符串。

2、如果 node 的pass 属性-1 后是0，那么需要将节点引用置为 null，以便回收内存，同时也契合 insert、search等逻辑中判断路径是否存在的方式。

public void delete(String word) {if (search(word) == 0)return;char[] chars = word.toCharArray();Node node = root;// 根节点 -1node.pass--;int path = 0;// 遍历字符串for (int i = 0; i < chars.length; i++) {// 计算路径偏移path = chars[i] - 'a';// 如果路径的尾节点 pass-1 后为 0，将这个尾节点置为 null，直接返回if (--node.nexts[path].pass == 0) {node.nexts[path] = null;return;}// 移动节点指针node = node.nexts[path];}// 遍历后沿途的 pass 都已经 -1，最后 end - 1node.end--;
}

五、以HashMap描述路径的前缀树

数组的适用场景较为局限，由于数组本身扩容不便，在实际的前缀树实现上，往往可以用HashMap作为路径的替代，这种方式还可以实现中文字符的存储。

public class Code02_TrieTree {private Node root;public Code02_TrieTree() {this.root = new Node();}private static class Node {public int pass;public int end;public HashMap<Integer, Node> nexts;public Node() {this.pass = 0;this.end = 0;this.nexts = new HashMap<>();}}public void insert(String word) {if (word == null || "".equals(word)) return;char[] chars = word.toCharArray();Node node = root;node.pass++;Integer path = 0;for (int i = 0; i < chars.length; i++) {// 以字符的ASCII码作为路径path = (int) chars[i];if (!node.nexts.containsKey(path))node.nexts.put(path, new Node());node = node.nexts.get(path);node.pass++;}node.end++;}public int search(String word) {if (word == null || "".equals(word)) return 0;char[] chars = word.toCharArray();Node node = root;Integer path = 0;for (int i = 0; i < chars.length; i++) {// 计算路径path = (int) chars[i];if (node.nexts.get(path) == null)return 0;node = node.nexts.get(path);}return node.end;}public int searchPre(String pre) {if (pre == null || "".equals(pre)) return 0;char[] chars = pre.toCharArray();Node node = root;Integer path = 0;for (int i = 0; i < chars.length; i++) {path = (int) chars[i];if (node.nexts.get(path) == null)return 0;node = node.nexts.get(path);}return node.pass;}public void delete(String word) {if (search(word) == 0) return;char[] chars = word.toCharArray();Node node = root;node.pass--;Integer path = 0;for (int i = 0; i < chars.length; i++) {path = (int) chars[i];if (--node.nexts.get(path).pass == 0) {node.nexts.remove(path);return;}node = node.nexts.get(path);}node.end--;}public int size() {return root.pass;}
}