Massive Algorithms: LeetCode 30 - Substring with Concatenation of All Words

Substring with Concatenation of All Words | N00tc0d3r
You are given a string, S, and a list of words, L, that are all of the same length. Find all starting indices of substring(s) in S that is a concatenation of each word in L exactly once and without any intervening characters.

For example, given:
S: "barfoothefoobarman"
L: ["foo", "bar"]

You should return the indices: [0,9]. (order does not matter).

X. Brute Force
http://www.cnblogs.com/grandyang/p/4521224.html
这道题让我们求串联所有单词的子串，就是说给定一个长字符串，再给定几个长度相同的单词，让我们找出串联给定所有单词的子串的起始位置，还是蛮有难度的一道题。这道题我们需要用到两个哈希表，第一个哈希表先把所有的单词存进去，然后从开头开始一个个遍历，停止条件为当剩余字符个数小于单词集里所有字符的长度。这时候我们需要定义第二个哈希表，然后每次找出给定单词长度的子串，看其是否在第一个哈希表里，如果没有，则break，如果有，则加入第二个哈希表，但相同的词只能出现一次，如果多了，也break。如果正好匹配完给定单词集里所有的单词，则把i存入结果中
https://www.hrwhisper.me/leetcode-substring-with-concatenation-of-all-words/

首先用对words中每个单词进行计数count_word(hash表)，

枚举起始位置和终点位置，每次用cur_cnt计数，若单词存在且cur_cnt[word] <= count_word[word]，总计数total_in+1, 如果rotal_in等于words的长度，放入答案ans

复杂度O(len(s) * len(words) * t)， t为计算每个单词hash值的时间。

 vector<int> findSubstring(string s, vector<string>& words) {

  vector<int> ans;

  if (s.empty() || words.empty()) return ans;

  int word_len = words[0].size();

  int n = words.size();

  unordered_map<string, int> count_word;

  for (const string &word : words)

   ++count_word[word];

  for (int i = 0; i + n * word_len <= s.size(); ++i) {

   int total_in = 0;

   unordered_map<string, int> cur_cnt;

   for (int j = i; j < i + n * word_len; j += word_len) {

    string word = s.substr(j, word_len);

    if (++cur_cnt[word] > count_word[word]) break;

    ++total_in;

}

   if (total_in == n)

    ans.push_back(i);

}

  return ans;

}

我们可以用滑动窗口思想优化上面的过程。

设word_len为单词的长度，枚举可能的起点为[0, word_len]，

然后枚举s中的单词，令start = i表示当前窗口的起点，令j = i表示当前枚举的字符串s的起点，每次枚举结束j+= word_len

在枚举s中单词的过程中保持一个合法的窗口使得cur_cnt[word] <= count_word[word]，若窗口不合法，则说明之前已经有单词存在，不断的令start对应的单词-=1，start+=word_len即可。

复杂度O(word_len * len(s) / word_len * 2)= O(len(s) * 2)，乘上2是因为一次遍历中，每个单词最多两次。（PS：这个复杂度没有计算哈希的代价，假设为O(1）)

X. Slide Window
这种方法不再是一个字符一个字符的遍历，而是一个词一个词的遍历，比如根据题目中的例子，字符串s的长度n为18，words数组中有两个单词(cnt=2)，每个单词的长度len均为3，那么遍历的顺序为0，3，6，8，12，15，然后偏移一个字符1，4，7，9，13，16，然后再偏移一个字符2，5，8，10，14，17，这样就可以把所有情况都遍历到，我们还是先用一个哈希表m1来记录words里的所有词，然后我们从0开始遍历，用left来记录左边界的位置，count表示当前已经匹配的单词的个数。然后我们一个单词一个单词的遍历，如果当前遍历的到的单词t在m1中存在，那么我们将其加入另一个哈希表m2中，如果在m2中个数小于等于m1中的个数，那么我们count自增1，如果大于了，那么需要做一些处理，比如下面这种情况, s = barfoofoo, words = {bar, foo, abc}, 我们给words中新加了一个abc，目的是为了遍历到barfoo不会停止，那么当遍历到第二foo的时候, m2[foo]=2, 而此时m1[foo]=1，这是后已经不连续了，所以我们要移动左边界left的位置，我们先把第一个词t1=bar取出来，然后将m2[t1]自减1，如果此时m2[t1]<m1[t1]了，说明一个匹配没了，那么对应的count也要自减1，然后左边界加上个len，这样就可以了。如果某个时刻count和cnt相等了，说明我们成功匹配了一个位置，那么将当前左边界left存入结果res中，此时去掉最左边的一个词，同时count自减1，左边界右移len，继续匹配。如果我们匹配到一个不在m1中的词，那么说明跟前面已经断开了，我们重置m2，count为0，左边界left移到j+len
http://bangbingsyb.blogspot.com/2014/11/leetcode-substring-with-concatenation.html
和strStr那题的双指针解法类似。关键在于如何判断以任意i起始的S的substring是否整个L的concatenation。这里显然要用到hash table。由于L中可能存在重复的word，所以hash table的key = word，val = count of the word。

在建立好L的hash table后，对每个S[i]进行检查。这里的一个技巧建立一个新的hash table记录已经找到的word。因为L的hash table需要反复利用，不能被修改

    vector<int> findSubstring(string S, vector<string> &L) {
        vector<int> allPos;
        if(L.empty()) return allPos;
        int totalWords = L.size();
        int wordSize = L[0].size();
        int totalLen = wordSize * totalWords;
        if(S.size()<totalLen) return allPos;
        
        unordered_map<string,int> wordCount;
        for(int i=0; i<totalWords; i++)
            wordCount[L[i]]++;
        
        for(int i=0; i<=S.size()-totalLen; i++) {
            if(checkSubstring(S, i, wordCount, wordSize, totalWords))
                allPos.push_back(i);
        }
        return allPos;
    }
    
    bool checkSubstring(string S, int start, unordered_map<string,int> &wordCount, int wordSize, int totalWords) {
        if(S.size()-start+1 < wordSize*totalWords) return false;
        unordered_map<string,int> wordFound;
        
        for(int i=0; i<totalWords; i++) {
            string curWord = S.substr(start+i*wordSize,wordSize);
            if(!wordCount.count(curWord)) return false;
            wordFound[curWord]++;
            if(wordFound[curWord]>wordCount[curWord]) return false;
        }
        return true;
    }

https://yuanhsh.iteye.com/blog/2187543

首先是要明确用滑块的概念来解决，始终保持L集合中的字符串在滑块中都只出现了一次，当然设置一个总计数count，当cout等于L集合长度时，即使找了一段符合要求的字符串。

需要用到的内存空间：

两张哈希表，一张保存L集合中的单词，一张用来保存当前滑块中的单词，key为单词，value为出现次数
cout计数，保存当前滑块中的单词总数
left标记，记录滑块左起点

实现的步骤：

遍历一遍单词数组L集合，构造总单词表
以单词长度为步长，遍历目标字符串，如果当前单词在总单词表内，则进入步骤3；反之，则清空当前滑块单词表，将cout置零，将left移动到下一位置
当前滑块档次表中的相应单词计数加1，检查该单词的计数是否小于等于总单词表中该单词的总数，如果是，则将count计数加1，进入步骤5；反之，进入步骤4
根据左起点left收缩滑块，直到收缩到与当前单词相同的字符串片段，将其剔除之后，滑块的收缩工作完成
如果当前count计数等于单词集合长度，记录下left左起点的位置后，将left右移，当前滑块中相应单词计数减1，总计数减1，继续循环

这里解释下步骤4中的收缩滑块，这是因为当前滑块中有单词的出现次数超过了额定的出现次数，那么就是需要收缩滑块来剔除这个单词，相当于是从滑块的左起点开始寻找该单词，找到之后，将该单词的右端点作为滑块新的左起点，这样就保证了滑块中所有单词都是小于等于额定出现次数，这样也保证了count计数的有效性。

遇到总单词表中不存在的单词的情况，在步骤2中已经说明，清空当前数据之后继续循环，也就是保证了滑块中是不会出现不存在单词表中的单词的。

最后，考虑最外圈循环，如果是从0开始作为滑块的初始起点，那么其实并没有遍历字符串中的所有可能子串，因为步长是单词长度，所以移动滑块的时候会跨过很多可能子串，所以要在外圈再加一层循环，这个循环的作用就是移动滑块的初始起点，所以循环次数就是单词的长度

https://discuss.leetcode.com/topic/68976/sliding-window-algorithm-template-to-solve-all-the-leetcode-substring-search-problem

    public List<Integer> findSubstring(String S, String[] L) {
        List<Integer> res = new LinkedList<>();
        if (L.length == 0 || S.length() < L.length * L[0].length())   return res;
        int N = S.length();
        int M = L.length; // *** length
        int wl = L[0].length();
        Map<String, Integer> map = new HashMap<>(), curMap = new HashMap<>();
        for (String s : L) {
            if (map.containsKey(s))   map.put(s, map.get(s) + 1);
            else                      map.put(s, 1);
        }
        String str = null, tmp = null;
        for (int i = 0; i < wl; i++) {
            int count = 0;  // remark: reset count 
            int start = i;
            for (int r = i; r + wl <= N; r += wl) {
                str = S.substring(r, r + wl);
                if (map.containsKey(str)) {
                    if (curMap.containsKey(str))   curMap.put(str, curMap.get(str) + 1);
                    else                           curMap.put(str, 1);
                    
                    if (curMap.get(str) <= map.get(str))    count++;
                    while (curMap.get(str) > map.get(str)) {
                        tmp = S.substring(start, start + wl);
                        curMap.put(tmp, curMap.get(tmp) - 1);
                        start += wl;
                        
                        //the same as https://leetcode.com/problems/longest-substring-without-repeating-characters/
                        if (curMap.get(tmp) < map.get(tmp)) count--;
                        
                    }
                    if (count == M) {
                        res.add(start);
                        tmp = S.substring(start, start + wl);
                        curMap.put(tmp, curMap.get(tmp) - 1);
                        start += wl;
                        count--;
                    }
                }else {
                    curMap.clear();
                    count = 0;
                    start = r + wl;//not contain, so move the start
                }
            }
            curMap.clear();
        }
        return res;
    }

https://discuss.leetcode.com/topic/6617/an-o-n-solution-with-detailed-explanation/10

    // travel all the words combinations to maintain a window
    // there are wl(word len) times travel
    // each time, n/wl words, mostly 2 times travel for each word
    // one left side of the window, the other right side of the window
    // so, time complexity O(wl * 2 * N/wl) = O(2N)

I also agree with complexity of O(wordLenSlen) as it is O(Slen/wordLen) for the loop, then O(wordLen) for substr, then also O(wordLen) for the first loop, in total O(wordLenSlen)

I think the complexity is O(KN), K = L[0].length(), N = S.length().

    // ask interviewer if words is empty, should I return empty list
    public List<Integer> findSubstring(String S, String[] L) {
        List<Integer> res = new LinkedList<>();
        if (L.length == 0 || S.length() < L.length * L[0].length())   return res;
        int N = S.length(), M = L.length, K = L[0].length();
        Map<String, Integer> map = new HashMap<>(), curMap = new HashMap<>();
        for (String s : L) {
            if (map.containsKey(s))   map.put(s, map.get(s) + 1);
            else                      map.put(s, 1);
        }
        String str = null, tmp = null;
        for (int i = 0; i < K; i++) {
            int count = 0;  // remark: reset count 
            for (int l = i, r = i; r + K <= N; r += K) {
                str = S.substring(r, r + K);
                if (map.containsKey(str)) {
                    if (curMap.containsKey(str))   curMap.put(str, curMap.get(str) + 1);
                    else                           curMap.put(str, 1);
                    
                    if (curMap.get(str) <= map.get(str))    count++;
                    while (curMap.get(str) > map.get(str)) {
                        tmp = S.substring(l, l + K);
                        curMap.put(tmp, curMap.get(tmp) - 1);
                        l += K;
                        if (curMap.get(tmp) < map.get(tmp)) count--;
                    }
                    if (count == M) {
                        res.add(l);
                        tmp = S.substring(l, l + K);
                        curMap.put(tmp, curMap.get(tmp) - 1);
                        l += K;
                        count--;
                    }
                }else {
                    curMap.clear();
                    count = 0;
                    l = r + K;
                }
            }
            curMap.clear();
        }
        return res;
    }

https://discuss.leetcode.com/topic/32038/java-12ms-beats-100/3

The two pointers method with hashmap is known by many other solutions. The idea is to slide the scan window as far as possible, and keep throwing the impossible cases based on the length test.

In short, we got the source histogram from the dictionary L and build the new histogram for each possible window comparing it with the help of Java's equals method to the source one. Additionally, for the sake of tiny optimization, we check the starting word for being in the dictionary.

https://discuss.leetcode.com/topic/6939/accepted-short-java-solution/12

  public List<Integer> findSubstring(String s, String[] words) {
 List<Integer> ans = new ArrayList<>();
 if (s == null || words.length == 0) return ans;
 int n = words.length, wordLen = words[0].length();
    Map<String, Integer> hist = new HashMap<>();
    for (String w : words) {
     hist.put(w, hist.getOrDefault(w, 0)+1);
    }
 Map<String, Integer> curHist = new HashMap<>();
    for (int i = 0; i <= s.length() - n*wordLen; i++) {
     if (hist.containsKey(s.substring(i, i+wordLen))) {
      curHist.clear();
      for (int j = 0; j < n; j++) {
       String word = s.substring(i + j*wordLen, i+(j+1)*wordLen);
       if (hist.containsKey(word)) {
        curHist.put(word, curHist.getOrDefault(word, 0) + 1);
        if (curHist.get(word) > hist.get(word))
         break;
       } else 
        break;
      }
      if (hist.equals(curHist)) ans.add(i);
     }
    }
    return ans;
}

https://leetcode.com/problems/substring-with-concatenation-of-all-words/discuss/13664/Simple-Java-Solution-with-Two-Pointers-and-Map

My idea is pretty simple. Just build a map for the words and their relative count in L. Then we traverse through S to check whether there is a match.

public static List<Integer> findSubstring(String S, String[] L) {
    List<Integer> res = new ArrayList<Integer>();
    if (S == null || L == null || L.length == 0) return res;
    int len = L[0].length(); // length of each word
    
    Map<String, Integer> map = new HashMap<String, Integer>(); // map for L
    for (String w : L) map.put(w, map.containsKey(w) ? map.get(w) + 1 : 1);
    
    for (int i = 0; i <= S.length() - len * L.length; i++) {
        Map<String, Integer> copy = new HashMap<String, Integer>(map);
        for (int j = 0; j < L.length; j++) { // checkc if match
            String str = S.substring(i + j*len, i + j*len + len); // next word
            if (copy.containsKey(str)) { // is in remaining words
                int count = copy.get(str);
                if (count == 1) copy.remove(str);
                else copy.put(str, count - 1);
                if (copy.isEmpty()) { // matches
                    res.add(i);
                    break;
                }
            } else break; // not in L
        }
    }
    return res;
}

At first I was gonna to use a set for words. Owing to the fact that duplicate is allowed in L, we need to use map instead.

    public List<Integer> findSubstring(String s, String[] words) {
     int n = words.length, m = words[0].length();
     List<Integer> res = new ArrayList();
        /*Store string array with hashtable.*/
        HashMap<String, Integer> map = new HashMap();
        for (String str : words) {
         if (map.containsKey(str)) map.put(str, map.get(str)+1);
         else map.put(str, 1);
        }
        /*m is the length of each word in array words, each time get a substring of length m to check if it exits in words*/
        for (int i = 0; i <= s.length()-n*m; i++) {
            HashMap<String, Integer> copy = new HashMap(map);
         /*if it exits, use another hashset to avoid duplicate and count the number to reach n, the number of words in array words*/
         int k = n, j = i;
         while (k > 0) {
          String str = s.substring(j, j+m);
          if (!copy.containsKey(str) || copy.get(str) < 1) break;
          copy.put(str, copy.get(str)-1);
          k--; j+=m;
         }
         if (k == 0) res.add(i);
        }
        return res;
    }

这道题看似比较复杂，其实思路和Longest Substring Without Repeating Characters差不多。因为那些单词是定长的，所以本质上和单一个字符一样。和Longest Substring Without Repeating Characters的区别只在于我们需要维护一个字典，然后保证目前的串包含字典里面的单词有且仅有一次。思路仍然是维护一个窗口，如果当前单词在字典中，则继续移动窗口右端，否则窗口左端可以跳到字符串下一个单词了。假设源字符串的长度为n，字典中单词的长度为l。因为不是一个字符，所以我们需要对源字符串所有长度为l的子串进行判断。做法是i从0到l-1个字符开始，得到开始index分别为i, i+l, i+2*l, ...的长度为l的单词。这样就可以保证判断到所有的满足条件的串。因为每次扫描的时间复杂度是O(2*n/l)(每个单词不会被访问多于两次，一次是窗口右端，一次是窗口左端)，总共扫描l次（i=0, ..., l-1)，所以总复杂度是O(2*n/l*l)=O(n)，是一个线性算法。空间复杂度是字典的大小，即O(m*l)，其中m是字典的单词数量
http://segmentfault.com/a/1190000002625580

首先是要明确用滑块的概念来解决，始终保持L集合中的字符串在滑块中都只出现了一次，当然设置一个总计数count，当cout等于L集合长度时，即使找了一段符合要求的字符串。

需要用到的内存空间：

两张哈希表，一张保存L集合中的单词，一张用来保存当前滑块中的单词，key为单词，value为出现次数
cout计数，保存当前滑块中的单词总数
left标记，记录滑块左起点

实现的步骤：

遍历一遍单词数组L集合，构造总单词表
以单词长度为步长，遍历目标字符串，如果当前单词在总单词表内，则进入步骤3；反之，则清空当前滑块单词表，将cout置零，将left移动到下一位置
当前滑块档次表中的相应单词计数加1，检查该单词的计数是否小于等于总单词表中该单词的总数，如果是，则将count计数加1，进入步骤5；反之，进入步骤4
根据左起点left收缩滑块，直到收缩到与当前单词相同的字符串片段，将其剔除之后，滑块的收缩工作完成
如果当前count计数等于单词集合长度，记录下left左起点的位置后，将left右移，当前滑块中相应单词计数减1，总计数减1，继续循环

这里解释下步骤4中的收缩滑块，这是因为当前滑块中有单词的出现次数超过了额定的出现次数，那么就是需要收缩滑块来剔除这个单词，相当于是从滑块的左起点开始寻找该单词，找到之后，将该单词的右端点作为滑块新的左起点，这样就保证了滑块中所有单词都是小于等于额定出现次数，这样也保证了count计数的有效性。

遇到总单词表中不存在的单词的情况，在步骤2中已经说明，清空当前数据之后继续循环，也就是保证了滑块中是不会出现不存在单词表中的单词的。

最后，考虑最外圈循环，如果是从0开始作为滑块的初始起点，那么其实并没有遍历字符串中的所有可能子串，因为步长是单词长度，所以移动滑块的时候会跨过很多可能子串，所以要在外圈再加一层循环，这个循环的作用就是移动滑块的初始起点，所以循环次数就是单词的长度。

    public List<Integer> findSubstring(String S, String[] L) {
        ArrayList<Integer> result = new ArrayList<Integer>();
        if (S == null || S.length() == 0 || L == null || L.length == 0)
            return result;
        int strLen = S.length();
        int wordLen = L[0].length();
        HashMap<String, Integer> map = new HashMap<String, Integer>();
        for (int i = 0; i < L.length; i++) {
            if (map.containsKey(L[i])) {
                map.put(L[i], map.get(L[i]) + 1);
            } else {
                map.put(L[i], 1);
            }
        }
        for (int i = 0; i < wordLen; i++) {
            HashMap<String, Integer> curMap = new HashMap<String, Integer>();
            int count = 0, left = i;
            for (int j = i; j <= strLen - wordLen; j += wordLen) {
                String curStr = S.substring(j, j + wordLen);
                if (map.containsKey(curStr)) {
                    if (curMap.containsKey(curStr)) {
                        curMap.put(curStr, curMap.get(curStr) + 1);
                    } else {
                        curMap.put(curStr, 1);
                    }
                    if (curMap.get(curStr) <= map.get(curStr)) {
                        count++;
                    } else {
                        while (true) {
                            String tmp = S.substring(left, left + wordLen);
                            curMap.put(tmp, curMap.get(tmp) - 1);
                            left += wordLen;
                            if (curStr.equals(tmp)) {
                                break;
                            } else {
                                count--;
                            }
                        }
                    }
                    if (count == L.length) {
                        result.add(left);
                        String tmp = S.substring(left, left + wordLen);
                        curMap.put(tmp, curMap.get(tmp) - 1);
                        left += wordLen;
                        count--;
                    }
                } else {
                    curMap.clear();
                    count = 0;
                    left = j + wordLen;
                }
            }
        }
        return result;
    }

X. Using Trie

https://discuss.leetcode.com/topic/6593/accepted-recursive-solution-using-trie-tree

The idea is quite simple. Just use a trie tree to accelerate testing whether a substring is valid. The value of each TrieNode is used to deal with duplication and to mark whether the word is used before.

      static class TrieNode {
           int value = 0;
           Map<Character, TrieNode> children = new HashMap<Character, TrieNode>();
       }

       TrieNode trie;

    // build a trie tree
    public List<Integer> findSubstring(String S, String[] L) {
        trie = buildTrie(L);
        int length = getTotalLength(L);
        List<Integer> result = new LinkedList<Integer>();
        for (int i = 0; i < S.length() - length + 1; i++) {
            if (isSubString(S, i, i + length))
                result.add(i);
        }
        return result;
    }
    
    private int getTotalLength(String[] L) {
        int sum = 0;
        for (String l : L)
            sum += l.length();
        return sum;
    }
    
    private TrieNode buildTrie(String[] L) {
        TrieNode root = new TrieNode();
        for (String l : L)
            addWord(root, l);
        return root;
    }
    
    private void addWord(TrieNode root, String s) {
        TrieNode node = root;
        for (int i = 0; i < s.length(); i++) {
            char c = s.charAt(i);
            TrieNode next = node.children.get(c);
            if (next == null) {
                next = new TrieNode();
                node.children.put(c, next);
            }
            node = next;
        }
        node.value++;
    }
    
    private boolean isSubString(String S, int start, int end) {
     if (start == end)
      return true;
        // search in the trie tree
        TrieNode node = trie;
        for (int i = start; i < end; i++) {
            char c = S.charAt(i);
            if (node.children.get(c) == null)
                return false;
            node = node.children.get(c);
            if (node.value > 0) {  // leaf & can be used
                node.value--; // mark as used
                if (isSubString(S, i + 1, end)) {
                    node.value++; // mark as unused
                    return true;
                }
                node.value++; // mark as unused
            }
        }
        return false;
    }

X. KMP
http://n00tc0d3r.blogspot.com/2013/06/substring-with-concatenation-of-all.html
Recall that we are looking for continuous concatenations of words in list L and all words in L are of the same length. To cover all possibilities, we can start from a letter of S[0..k-1] and search for all potential concatenation from that position. That is, we check concatenation at 0, k, 2k, ..., n-1; then 1, k+1, 2k+1, ..., n-1; then 2, k+2, 2k+2, ..., n-1, etc.; until k-1, 2k-1, 3k-1, ..., n-1.
Start from the first character of S (begin = 0), cut a substring of length k,

If it is not a valid word in list L, restart from next substring (i.e. begin += k);
If it is a valid word but has seen before (i.e. a duplicate), forward begin pointer until previous one is removed from the current window;
Otherwise, add the substring to the current collection. If we get all words (i.e. hit a concatenation), forward begin pointer by one and check the next substring.

What is the complexity now?
We hit each substring at most twice, one to add into the collection and one to remove from the collection. There are O(n) substrings in total, where n is the total number of characters in string S, and for each substring it takes O(k) = O(1) to check whether it is a valid word or not.
So, in total, this algorithm runs in time O(n*k) = O(n).

public ArrayList<Integer> findSubstring(String S, String[] L) {
ArrayList<Integer> indices = new ArrayList<Integer>();
if (L.length == 0) return indices;

int total = L.length, wordLen = L[0].length();

// store the words and frequencies in a hash table
HashMap<String, Integer> expectWords = new HashMap<String, Integer>();
for (String w : L) {
addWord(w, expectWords);
}

// find concatenations
for (int i=0; i < wordLen; ++i) {
// check if there are any concatenations
int count = 0;
HashMap<String, Integer> collectWords = new HashMap<String, Integer>(expectWords);
for (int j = i, begin = i; j <= S.length() - (total-count)*wordLen && begin <= S.length() - total*wordLen;) {
String sub = S.substring(j, j+wordLen);
if (!expectWords.containsKey(sub)) { // if not an expect word, reset
begin = j + wordLen;
j = begin;
count = 0;
collectWords.putAll(expectWords);
} else if (!collectWords.containsKey(sub)) { // if duplicate, forward begin by 1
begin = slideWindow(S, begin, wordLen, collectWords);
} else {
removeWord(sub, collectWords);
j += wordLen;
++count;
if (collectWords.isEmpty()) {
indices.add(begin);
begin = slideWindow(S, begin, wordLen, collectWords);
--count;
}
}
}
}

return indices;
}
http://www.jiuzhang.com/solutions/substring-with-concatenation-of-all-words/

    public ArrayList<Integer> findSubstring(String S, String[] L) {
        ArrayList<Integer> result = new ArrayList<Integer>();
        HashMap<String, Integer> toFind = new HashMap<String, Integer>();
        HashMap<String, Integer> found = new HashMap<String, Integer>();
        int m = L.length, n = L[0].length();
        for (int i = 0; i < m; i ++){
            if (!toFind.containsKey(L[i])){
                toFind.put(L[i], 1);
            }
            else{
                toFind.put(L[i], toFind.get(L[i]) + 1);
            }
        }
        for (int i = 0; i <= S.length() - n * m; i ++){
            found.clear();
            int j;
            for (j = 0; j < m; j ++){
                int k = i + j * n;
                String stub = S.substring(k, k + n);
                if (!toFind.containsKey(stub)) break;
                if(!found.containsKey(stub)){
                    found.put(stub, 1);
                }
                else{
                    found.put(stub, found.get(stub) + 1);
                }
                if (found.get(stub) > toFind.get(stub)) break;
            }
            if (j == m) result.add(i);
        }
        return result;
    }

X.
https://leetcode.com/problems/substring-with-concatenation-of-all-words/discuss/13658/Easy-Two-Map-Solution-(C%2B%2BJava)

    public List<Integer> findSubstring(String s, String[] words) {
        final Map<String, Integer> counts = new HashMap<>();
        for (final String word : words) {
            counts.put(word, counts.getOrDefault(word, 0) + 1);
        }
        final List<Integer> indexes = new ArrayList<>();
        final int n = s.length(), num = words.length, len = words[0].length();
        for (int i = 0; i < n - num * len + 1; i++) {
            final Map<String, Integer> seen = new HashMap<>();
            int j = 0;
            while (j < num) {
                final String word = s.substring(i + j * len, i + (j + 1) * len);
                if (counts.containsKey(word)) {
                    seen.put(word, seen.getOrDefault(word, 0) + 1);
                    if (seen.get(word) > counts.getOrDefault(word, 0)) {
                        break;
                    }
                } else {
                    break;
                }
                j++;
            }
            if (j == num) {
                indexes.add(i);
            }
        }
        return indexes;
    }

X. using recursion
https://discuss.leetcode.com/topic/12787/my-java-solution-with-recursion

    public List<Integer> findSubstring(String s, String[] words) {
        List<Integer> result = new ArrayList<Integer>();
        if (s==null || words.length == 0){
            return result;
        }
       
        HashMap<String, Integer> allWords = new HashMap<String, Integer>();
        for (int i=0;i<words.length;i++){
         String w = words[i];
         if (!allWords.containsKey(w)){
          allWords.put(w, 1);
         }
         else{
          Integer count = allWords.get(w);
          count++;
          allWords.put(w, count);
         }
        }
        int wordLength = words[0].length();
        int length = words.length*wordLength;
        for (int i=0;i<=s.length()-length;i++){
         boolean b = findWords(s, i, (Map)allWords.clone(), wordLength);
         if (b){
          result.add(i);
         }
        }
        return result;
 }
 
 private boolean findWords(String s, int startIndex, Map<String, Integer> allWords, int wordLength){
  if (allWords.isEmpty()){
   return true;
  }
  String s1 = s.substring(startIndex, startIndex+wordLength);
  if (allWords.containsKey(s1)){
   int result = allWords.get(s1);
   result--;
   if (result == 0){
    allWords.remove(s1);
   }
   else{
    allWords.put(s1, result);
   }
   return findWords(s, startIndex+wordLength, allWords, wordLength);
  }
  return false;
 }

http://www.programcreek.com/2014/06/leetcode-substring-with-concatenation-of-all-words-java/

public List<Integer> findSubstring(String s, String[] words) {
    ArrayList<Integer> result = new ArrayList<Integer>();
    if(s==null||s.length()==0||words==null||words.length==0){
        return result;
    } 
 
    //frequency of words
    HashMap<String, Integer> map = new HashMap<String, Integer>();
    for(String w: words){
        if(map.containsKey(w)){
            map.put(w, map.get(w)+1);
        }else{
            map.put(w, 1);
        }
    }
 
    int len = words[0].length();
 
    for(int j=0; j<len; j++){
        HashMap<String, Integer> currentMap = new HashMap<String, Integer>();
        int start = j;//start index of start
        int count = 0;//count totoal qualified words so far
 
        for(int i=j; i<=s.length()-len; i=i+len){
            String sub = s.substring(i, i+len);
            if(map.containsKey(sub)){
                //set frequency in current map
                if(currentMap.containsKey(sub)){
                    currentMap.put(sub, currentMap.get(sub)+1);
                }else{
                    currentMap.put(sub, 1);
                }
 
                count++;
 
                while(currentMap.get(sub)>map.get(sub)){
                    String left = s.substring(start, start+len);
                    currentMap.put(left, currentMap.get(left)-1);
 
                    count--;
                    start = start + len;
                }
 
 
                if(count==words.length){
                    result.add(start); //add to result
 
                    //shift right and reset currentMap, count & start point         
                    String left = s.substring(start, start+len);
                    currentMap.put(left, currentMap.get(left)-1);
                    count--;
                    start = start + len;
                }
            }else{
                currentMap.clear();
                start = i+len;
                count = 0;
            }
        }
    }
 
    return result;
}

we check each substrings multiple times.

public ArrayList<Integer> findSubstring(String S, String[] L) {

ArrayList<Integer> indices = new ArrayList<Integer>();

if (L.length == 0) return indices;

int total = L.length, wordLen = L[0].length();

// store the words and frequencies in a hash table

HashMap<String, Integer> words = new HashMap<String, Integer>();

for (String s : L) {

if (words.containsKey(s)) {

words.put(s, words.get(s)+1);

} else {

words.put(s, 1);

}

// find concatenations

for (int i=0; i <= S.length() - total*wordLen; ++i) {

// check if it is a concatenation

HashMap<String, Integer> target = new HashMap<String, Integer>(words);

for (int j = i; j <= S.length() - wordLen && !target.isEmpty(); j+=wordLen) {

String sub = S.substring(j, j+wordLen);

if (!target.containsKey(sub)) break;

if (target.get(sub) > 1) { // reduce the frequency

target.put(sub, target.get(sub)-1);

} else { // remove the word if only one left

target.remove(sub);

}

if (target.isEmpty()) {

indices.add(i);

}

return indices;

}

http://blog.csdn.net/linhuanmars/article/details/20342851
这道题看似比较复杂，其实思路和Longest Substring Without Repeating Characters差不多。因为那些单词是定长的，所以本质上和单一个字符一样。和Longest Substring Without Repeating Characters的区别只在于我们需要维护一个字典，然后保证目前的串包含字典里面的单词有且仅有一次。思路仍然是维护一个窗口，如果当前单词在字典中，则继续移动窗口右端，否则窗口左端可以跳到字符串下一个单词了。假设源字符串的长度为n，字典中单词的长度为l。因为不是一个字符，所以我们需要对源字符串所有长度为l的子串进行判断。做法是i从0到l-1个字符开始，得到开始index分别为i, i+l, i+2*l, ...的长度为l的单词。这样就可以保证判断到所有的满足条件的串。因为每次扫描的时间复杂度是O(2*n/l)(每个单词不会被访问多于两次，一次是窗口右端，一次是窗口左端)，总共扫描l次（i=0, ..., l-1)，所以总复杂度是O(2*n/l*l)=O(n)，是一个线性算法。空间复杂度是字典的大小，即O(m*l)，其中m是字典的单词数量。
这种移动窗口的方法在字符串处理的问题中非常常见，是一种可以把时间复杂度降低到线性的有效算法，大家可以熟悉一下。还有非常类似的题目Minimum Window Substring，思路完全一样，只是移动窗口的规则稍微不同而已。
https://github.com/rffffffff007/leetcode/blob/master/Substring%20with%20Concatenation%20of%20All%20Words.java
public ArrayList<Integer> findSubstring(String S, String[] L) {
int lens = S.length();
int lenl = L.length;
ArrayList<Integer> subPos = new ArrayList<Integer>();
if (lenl == 0)
return subPos;
int wordLen = L[0].length();
int totalLen = wordLen * lenl;
Map<String, Integer> lMap = new HashMap<String, Integer>();
Map<String, Integer> sMap = new HashMap<String, Integer>();
for (String l : L)
addMap(lMap, l); // extract as a method
for (int i = 0; i <= lens - totalLen; i++) {
sMap.clear();
int j = i;
for (; j < i + totalLen; j += wordLen) {
String sub = S.substring(j, j + wordLen);
if (lMap.containsKey(sub)) {
addMap(sMap, sub);
if (sMap.get(sub) > lMap.get(sub))
break;
} else
break;
}
if (j == i + totalLen)
subPos.add(i);
}
return subPos;
}

private void addMap(Map<String, Integer> map, String key) {
if (map.containsKey(key))
map.put(key, map.get(key) + 1);
else
map.put(key, 1);
}

因为所有的pattern长度都一样，所以这道题的匹配部分其实也可以用Rabin Karp的多模式匹配来做。
X. Brute Force
http://blog.csdn.net/fightforyourdream/article/details/14165397

// 用暴力法压线通过，未来应该用类似KMP的方法降低复杂度！
public static ArrayList<Integer> findSubstring(String S, String[] L) {
ArrayList<Integer> ret = new ArrayList<Integer>();
Hashtable<String, Integer> ht = new Hashtable<String, Integer>();
// 把L中的string全部添加入hashtable
for (String str : L) {
if(ht.containsKey(str)){
ht.put(str, ht.get(str)+1);
}else{
ht.put(str, 1);
}
}
int wordLen = L[0].length();
int numberOfWords = L.length;
// Brute force法比较
for(int i=0; i<=S.length()-numberOfWords*wordLen; i++){
// 每次要基于原hashtable新建一个hashtable
Hashtable<String, Integer> table = new Hashtable<String, Integer>(ht);
int cnt = 0;
// j每次都重复地做相同的工作
for(int j=i; j<=i+numberOfWords*wordLen-wordLen; j+=wordLen){
String substr = S.substring(j, j+wordLen);
if(table.containsKey(substr)){
// 如果只用这一句会TLE
// table.put(substr, table.get(substr)-1);
// 改成这样就能压线AC
int times = table.get(substr);
if(times == 1) table.remove(substr);
else table.put(substr, times - 1);
cnt++;
}else{
break;
}
}
if(cnt == numberOfWords){
ret.add(i);
}
}
return ret;
}

http://n00tc0d3r.blogspot.com/2013/06/substring-with-concatenation-of-all.html

 public ArrayList<Integer> findSubstring(String S, String[] L) {  
   ArrayList<Integer> indices = new ArrayList<Integer>();  
   if (L.length == 0) return indices;  
   
   int total = L.length, wordLen = L[0].length();  
   
   // store the words and frequencies in a hash table  
   HashMap<String, Integer> words = new HashMap<String, Integer>();  
   for (String s : L) {  
     if (words.containsKey(s)) {  
       words.put(s, words.get(s)+1);  
     } else {  
       words.put(s, 1);  
     }  
   }  
   
   // find concatenations  
   for (int i=0; i <= S.length() - total*wordLen; ++i) {  
     // check if it is a concatenation   
     HashMap<String, Integer> target = new HashMap<String, Integer>(words);  
     for (int j = i; j <= S.length() - wordLen && !target.isEmpty(); j+=wordLen) {  
       String sub = S.substring(j, j+wordLen);  
       if (!target.containsKey(sub)) break;  
       if (target.get(sub) > 1) {  // reduce the frequency
         target.put(sub, target.get(sub)-1);  
       } else {  // remove the word if only one left
         target.remove(sub);  
       }
     }  
     if (target.isEmpty()) {  
       indices.add(i);  
     }  
   }  
   
   return indices;  
 }

http://dp2.me/blog/?p=478
Substring with Concatenation of All Words, Leetcode 解题笔记
Read full article from Substring with Concatenation of All Words | N00tc0d3r

LeetCode 30 - Substring with Concatenation of All Words

Labels

Popular Posts