文章目录
- 1. 题目
- 2. 思路及代码实现详解(Python)
- 2.1 滑动窗口
1. 题目
给定一个字符串 s s s 和一个字符串数组 w o r d s words words。 w o r d s words words 中所有字符串 长度相同。
s s s 中的 串联子串 是指一个包含 w o r d s words words 中所有字符串以任意顺序排列连接起来的子串。
例如,如果 w o r d s = [ " a b " , " c d " , " e f " ] words = ["ab","cd","ef"] words=["ab","cd","ef"], 那么 " a b c d e f " , " a b e f c d " , " c d a b e f " , " c d e f a b " , " e f a b c d " , " e f c d a b " "abcdef", "abefcd","cdabef", "cdefab","efabcd", "efcdab" "abcdef","abefcd","cdabef","cdefab","efabcd","efcdab" 都是串联子串。 " a c d b e f " "acdbef" "acdbef" 不是串联子串,因为他不是任何 w o r d s words words 排列的连接。
返回所有串联子串在 s s s 中的开始索引。你可以以 任意顺序 返回答案。
示例 1:
输入: s = " b a r f o o t h e f o o b a r m a n " , w o r d s = [ " f o o " , " b a r " ] s = "barfoothefoobarman", words = ["foo","bar"] s="barfoothefoobarman",words=["foo","bar"]
输出: [ 0 , 9 ] [0,9] [0,9]
解释:因为 w o r d s . l e n g t h = = 2 words.length == 2 words.length==2 同时 w o r d s [ i ] . l e n g t h = = 3 words[i].length == 3 words[i].length==3,连接的子字符串的长度必须为 6 6 6。
子串 " b a r f o o " "barfoo" "barfoo" 开始位置是 0 0 0。它是 w o r d s words words 中以 [ " b a r " , " f o o " ] ["bar","foo"] ["bar","foo"] 顺序排列的连接。
子串 " f o o b a r " "foobar" "foobar" 开始位置是 9 9 9。它是 w o r d s words words 中以 [ " f o o " , " b a r " ] ["foo","bar"] ["foo","bar"] 顺序排列的连接。
输出顺序无关紧要。返回 [ 9 , 0 ] [9,0] [9,0] 也是可以的。
示例 2:
输入: s = " w o r d g o o d g o o d g o o d b e s t w o r d " , w o r d s = [ " w o r d " , " g o o d " , " b e s t " , " w o r d " ] s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"] s="wordgoodgoodgoodbestword",words=["word","good","best","word"]
输出: [ ] [] []
解释:因为 w o r d s . l e n g t h = = 4 words.length == 4 words.length==4 并且 w o r d s [ i ] . l e n g t h = = 4 words[i].length == 4 words[i].length==4,所以串联子串的长度必须为 16 16 16。
s s s 中没有子串长度为 16 16 16 并且等于 w o r d s words words 的任何顺序排列的连接。
所以我们返回一个空数组。
示例 3:
输入: s = " b a r f o o f o o b a r t h e f o o b a r m a n " , w o r d s = [ " b a r " , " f o o " , " t h e " ] s = "barfoofoobarthefoobarman", words = ["bar","foo","the"] s="barfoofoobarthefoobarman",words=["bar","foo","the"]
输出: [ 6 , 9 , 12 ] [6,9,12] [6,9,12]
解释:因为 w o r d s . l e n g t h = = 3 words.length == 3 words.length==3 并且 w o r d s [ i ] . l e n g t h = = 3 words[i].length == 3 words[i].length==3,所以串联子串的长度必须为 9 9 9。
子串 " f o o b a r t h e " "foobarthe" "foobarthe" 开始位置是 6 6 6。它是 w o r d s words words 中以 [ " f o o " , " b a r " , " t h e " ] ["foo","bar","the"] ["foo","bar","the"] 顺序排列的连接。
子串 " b a r t h e f o o " "barthefoo" "barthefoo" 开始位置是 9 9 9。它是 w o r d s words words 中以 [ " b a r " , " t h e " , " f o o " ] ["bar","the","foo"] ["bar","the","foo"] 顺序排列的连接。
子串 " t h e f o o b a r " "thefoobar" "thefoobar" 开始位置是 12 12 12。它是 w o r d s words words 中以 [ " t h e " , " f o o " , " b a r " ] ["the","foo","bar"] ["the","foo","bar"] 顺序排列的连接。
提示:
- 1 < = s . l e n g t h < = 1 0 4 1 <= s.length <= 10^4 1<=s.length<=104
- 1 < = w o r d s . l e n g t h < = 5000 1 <= words.length <= 5000 1<=words.length<=5000
- 1 < = w o r d s [ i ] . l e n g t h < = 30 1 <= words[i].length <= 30 1<=words[i].length<=30
- w o r d s [ i ] words[i] words[i] 和 s s s 由小写英文字母组成
2. 思路及代码实现详解(Python)
2.1 滑动窗口
记 words \textit{words} words 的长度为 m m m, words \textit{words} words 中每个单词的长度为 n n n, s s s 的长度为 ls \textit{ls} ls。由于已知单词的长度和数量,因此可以直到串联子串的长度,并通过建立一个固定长度的窗口,并在不断滑动窗口的过程中,判断窗口中的固定长度字符串的出现次数是否与 w o r d s words words 中的一致,如果一致,则获取窗口的开始索引位置。
具体做法是,首先需要将 s s s 划分为单词组,每个单词的大小均为 n n n (首尾除外)。这样的划分方法有 n n n 种( l e n ( s ) / n len(s)/n len(s)/n 的商的情况),即先删去前 i i i ( i = 0 ∼ n − 1 i = 0 \sim n-1 i=0∼n−1)个字母后,将剩下的字母进行划分,如果末尾有不到 n n n 个字母也删去。对这 n n n 种划分得到的单词数组分别使用滑动窗口对 words \textit{words} words 进行搜寻,每次向右滑动 n n n 个距离,直到 s s s 右端剩余字符数不足 n n n。
划分成单词组后,一个窗口包含 s s s 中从 i i i 起始的前 m m m 个单词,用一个哈希表 differ \textit{differ} differ 表示窗口中单词频次和 words \textit{words} words 中单词频次之差,当差值为 0 0 0 时认为窗口中单词数量与 w o r d s words words 当中单词数量一致。初始化 differ \textit{differ} differ 时,出现在窗口中的单词,每出现一次,相应的值增加 1 1 1,出现在 words \textit{words} words 中的单词,每出现一次,相应的值减少 1 1 1。然后将窗口右移,右侧会加入一个单词,左侧会移出一个单词,并对 differ \textit{differ} differ 做相应的更新。窗口移动时,若出现 differ \textit{differ} differ 中值不为 0 0 0 的键的数量为 0 0 0,则表示这个窗口中的单词频次和 words \textit{words} words 中单词频次相同,窗口的左端点是一个待求的起始位置。划分的方法有 n n n 种,做 n n n 次滑动窗口后,即可找到所有可能的起始位置。
class Solution:def findSubstring(self, s: str, words: List[str]) -> List[int]:res = []m, n, ls = len(words), len(words[0]), len(s)for i in range(n):if i + m * n > ls:breakdiffer = Counter()for j in range(m):word = s[i + j * n: i + (j + 1) * n]differ[word] += 1for word in words:differ[word] -= 1if differ[word] == 0:del differ[word]for start in range(i, ls - m * n + 1, n):if start != i:word = s[start + (m - 1) * n: start + m * n]differ[word] += 1if differ[word] == 0:del differ[word]word = s[start - n: start]differ[word] -= 1if differ[word] == 0:del differ[word]if len(differ) == 0:res.append(start)return res
该算法遍历了 n n n 种窗口的左端起始位置,每次窗口滑动的时间复杂度为 O ( l s ) O(ls) O(ls),总的渐进复杂度为 O ( l s × n ) O(ls\times n) O(ls×n),空间复杂度为 O ( m × n ) O(m\times n) O(m×n),主要是保存单词出现频次的哈希表开销。
执行用时:97 ms
消耗内存:11.21 MB
题解来源:力扣官方题解