Massive Algorithms: LeetCode 472

LeetCode 472 - Concatenated Words

http://bookshadow.com/weblog/2016/12/18/leetcode-concatenated-words/

Given a list of words, please write a program that returns all concatenated words in the given list of words.

A concatenated word is defined as a string that is comprised entirely of at least two shorter words in the given array.

Example:

Input: ["cat","cats","catsdogcats","dog","dogcatsdog","hippopotamuses","rat","ratcatdogcat"]

Output: ["catsdogcats","dogcatsdog","ratcatdogcat"]

Explanation: "catsdogcats" can be concatenated by "cats", "dog" and "cats"; 
 "dogcatsdog" can be concatenated by "dog", "cats" and "dog"; 
"ratcatdogcat" can be concatenated by "rat", "cat", "dog" and "cat".

Note:

The number of elements of the given array will not exceed 10,000
The length sum of elements in the given array will not exceed 600,000.
The returned elements order does not matter.

X. DP
http://www.cnblogs.com/EdwardLiu/p/6213426.html

We iterate through each word and see if it can be formed by using other words. The subproblem is Word Break I.

But it is a little different from Word Break I, because our result only want words that are concantenated by 2 or more other words in the given array.

How to tell if a word is only by itself in the given array, or it can be concatenated by other words in the given array?

The trick is: it is obvious that a word can only be formed by words shorter than it. So we can first sort the input by length of each word, and only try to form one word by using words in front of it. We also do not add the current word to dictionary when determine if it can be concantenated.

小心一个test case:

Input:[""]

Output:[""]

Expected:[]

如果不加21行那句，就会直接return dp[0], true了

https://leetcode.com/problems/concatenated-words/discuss/95652/Java-DP-Solution

If you do know one optimized solution for above question is using DP, this problem is just one more step further. We iterate through each word and see if it can be formed by using other words.

Of course it is also obvious that a word can only be formed by words shorter than it. So we can first sort the input by length of each word, and only try to form one word by using words in front of it.

    public static List<String> findAllConcatenatedWordsInADict(String[] words) {
        List<String> result = new ArrayList<>();
        Set<String> preWords = new HashSet<>();
        Arrays.sort(words, new Comparator<String>() {
            public int compare (String s1, String s2) {
                return s1.length() - s2.length();
            }
        });
        
        for (int i = 0; i < words.length; i++) {
            if (canForm(words[i], preWords)) {
                result.add(words[i]);
            }
            preWords.add(words[i]);
        }
        
        return result;
    }
 
    private static boolean canForm(String word, Set<String> dict) {
        if (dict.isEmpty()) return false;
 boolean[] dp = new boolean[word.length() + 1];
 dp[0] = true;
 for (int i = 1; i <= word.length(); i++) {
     for (int j = 0; j < i; j++) {
  if (!dp[j]) continue;
  if (dict.contains(word.substring(j, i))) {
      dp[i] = true;
      break;
  }
     }
 }
 return dp[word.length()];
    }

https://discuss.leetcode.com/topic/72113/java-dp-solution

If you do know one optimized solution for above question is using DP, this problem is just one more step further. We iterate through each word and see if it can be formed by using other words.

Time complexity

Suppose the word has a average length of k, total n words, we still have a TC : n^2* k^2.

I didn't think about this carefully before. If we consider TC of word.substring(j, i) as k, then overall runtime complexity is n*k^3. Anyway it's not n^2*k^2.
http://blog.csdn.net/u014688145/article/details/70702445
一个道理，输入中混杂了字典和匹配单词，所以直接从输入中筛选即可，筛选规则就是word break中的方法，如果能够匹配，就加入到list中
筛选过程中，做了些许优化，很容易理解。单词按长度排序，长度小的一定不能由长度大的单词组成，自己不能组成自己。所以，我们才可以用一个单循环完成这些判别操作。
public List<String> findAllConcatenatedWordsInADict(String[] words) { List<String> ans = new ArrayList<>(); Set<String> set = new HashSet<>(); Arrays.sort(words, new Comparator<String>() { @Override public int compare(String o1, String o2) { return o1.length() - o2.length(); } }); for (int i = 0; i < words.length;i++){ if(wordBreak(words[i],set)){ ans.add(words[i]); } set.add(words[i]); } return ans; } //139. Word Break private boolean wordBreak(String s, Set<String> wordDict){ if(wordDict.isEmpty()) return false; boolean[] f = new boolean[s.length() + 1]; f[0] = true; for (int i = 1; i <= s.length(); i++){ for (int j = 0; j < i; j++){ if (f[j] && wordDict.contains(s.substring(j, i))){ f[i] = true; break; } } } return f[s.length()]; }
X. DFS+Cache
https://discuss.leetcode.com/topic/72953/java-156-ms-solution-recursion-with-cache

I suppose the run time should be O(n^2*k), n is the average length of words and k is number of words?

The worst case time complexity is O(n^3 * k), because substring() cost O(n).

public List<String> findAllConcatenatedWordsInADict(String[] words) {
        Set<String> dict = new HashSet<>();
        for (String s: words) dict.add(s);
        Map<String, Boolean> canform = new HashMap<>();
        List<String> res = new ArrayList<>();
        for (String s: words) {
            if (check(s, canform, dict)) res.add(s);
        }
        return res;
    }
    private boolean check(String s, Map<String, Boolean> canform, Set<String> dict) {
        if (canform.containsKey(s)) {
            return canform.get(s);
        } else {
            for (int i = 1; i <= s.length(); i++) {
                String pre = s.substring(0, i);
                if (dict.contains(pre)) {
                    String post = s.substring(i);
                    if (dict.contains(post) || check(post, canform, dict)) {
                        canform.put(s, true);
                        return true;
                    }
                }
            }
            canform.put(s, false);
            return false;

        }
    }

X. DFS
https://leetcode.com/problems/concatenated-words/discuss/95644/102ms-java-Trie-%2B-DFS-solution.-With-explanation-easy-to-understand.

public List<String> findAllConcatenatedWordsInADict(String[] words) {
        List<String> res = new ArrayList<String>();
        if (words == null || words.length == 0) {
            return res;
        }
        TrieNode root = new TrieNode();
        for (String word : words) { // construct Trie tree
            if (word.length() == 0) {
                continue;
            }
            addWord(word, root);
        }
        for (String word : words) { // test word is a concatenated word or not
            if (word.length() == 0) {
                continue;
            }
            if (testWord(word.toCharArray(), 0, root, 0)) {
                res.add(word);
            }
        }
        return res;
    }
    public boolean testWord(char[] chars, int index, TrieNode root, int count) { // count means how many words during the search path
        TrieNode cur = root;
        int n = chars.length;
        for (int i = index; i < n; i++) {
            if (cur.sons[chars[i] - 'a'] == null) {
                return false;
            }
            if (cur.sons[chars[i] - 'a'].isEnd) {
                if (i == n - 1) { // no next word, so test count to get result.
                    return count >= 1;
                }
                if (testWord(chars, i + 1, root, count + 1)) {
                    return true;
                }
            }
            cur = cur.sons[chars[i] - 'a'];
        }
        return false;
    }
    public void addWord(String str, TrieNode root) {
        char[] chars = str.toCharArray();
        TrieNode cur = root;
        for (char c : chars) {
            if (cur.sons[c - 'a'] == null) {
                cur.sons[c - 'a'] = new TrieNode();
            }
            cur = cur.sons[c - 'a'];
        }
        cur.isEnd = true;
    }
}
class TrieNode {
    TrieNode[] sons;
    boolean isEnd;
    public TrieNode() {
        sons = new TrieNode[26];
    }

http://www.learn4master.com/interview-questions/leetcode/leetcode-concatenated-words

    public List<String> findAllConcatenatedWordsInADict(String[] words) {

        int min = Integer.MAX_VALUE, secondMin = Integer.MAX_VALUE;

        int max = 0;

        Set<String> set = new HashSet<>();

        for(String word: words) {

            int len = word.length();

            if(len <= min) {

                secondMin = min;

                min = len;

            } else if(len < secondMin) {

                secondMin = len;

}

            max = Math.max(max, len);

            set.add(word);

}

        List<String> res = new ArrayList<>();

        int minLen = min + min;

        for(String w: words) {

            int len = w.length();

            if(len < minLen) continue;

            if(test(w, 0, set, min, max, 0)) {

                res.add(w);

}

}

        return res;

}

    boolean test(String w, int start, Set<String> words, int min, int max, int cnt) {

        if(start == w.length()) {

            if(cnt > 1) return true;

            return false;

}

        for(int i = min; i < max; i++) {

            if(start + i > w.length()) break;

            String tmp = w.substring(start, start + i);

            if(words.contains(tmp)) {

                if(test(w, start + i, words, min, max, cnt + 1)) {

                    return true;

}

}

}

        return false;

}

X. Using Trie
https://discuss.leetcode.com/topic/72160/simple-java-trie-dfs-solution-144ms
Most of lines are adding words into Trie Tree
This solution is like putting two pointers to search through the tree. When find a word, put the other pointer back on root then continue searching.
But I'm not sure about the time complexity of my solution. Suppose word length is len and there are n words. Is the time complexity O(len * n ^ 2)?

    public List<String> findAllConcatenatedWordsInADict(String[] words) {
        List<String> res = new ArrayList<>();
        Trie tree = new Trie();
        for (String s: words) {
            tree.insert(s);
        }
        for (String s: words) {
            if (helper(s, tree))    res.add(s);
        }
        return res;
    }
    public boolean helper(String s, Trie tree) {
        TrieNode cur = tree.root;
        char[] arr = s.toCharArray();
        for (int i = 0; i < arr.length; i++) {
            int index = arr[i] - 'a';
            cur = cur.children[index];
            if (cur == null)    return false;
            String suffix = s.substring(i+1);
            // tree.search(suffix) means the combination for this s consists of two words: cur + suffix
            // helper(suffix, tree) means there will be further level searching, the combination consists of more than two words
            if (cur.isWord && (tree.search(suffix) || helper(suffix, tree)))  return true;
        }
        return false;
    }
    class TrieNode {
        char val;
        boolean isWord;
        TrieNode[] children;
        public TrieNode(char val) {
            this.val = val;
            this.isWord = false;
            children = new TrieNode[26];
        }
    }
    class Trie {
        TrieNode root;
        public Trie() {
            this.root = new TrieNode(' ');
        }
        public void insert(String s) {
            TrieNode cur = root;
            char[] arr = s.toCharArray();
            for (int i = 0; i < arr.length; i++) {
                int index = arr[i] - 'a';
                if (cur.children[index] == null)    cur.children[index] = new TrieNode(arr[i]);
                cur = cur.children[index];
            }
            cur.isWord = true;
        }
        public boolean search(String s) {
            TrieNode cur = root;
            char[] arr = s.toCharArray();
            for (int i = 0; i < arr.length; i++) {
                int index = arr[i]-'a';
                if (cur.children[index] == null)    return false;
                cur = cur.children[index];
            }
            return cur.isWord;
        }
    }

https://discuss.leetcode.com/topic/75089/102ms-java-trie-dfs-solution-with-explanation-easy-to-understand
also https://discuss.leetcode.com/topic/72727/java-simple-150ms-trie-dfs-solution-with-some-comments

public List<String> findAllConcatenatedWordsInADict(String[] words) {
        List<String> res = new ArrayList<String>();
        if (words == null || words.length == 0) {
            return res;
        }
        TrieNode root = new TrieNode();
        for (String word : words) { // construct Trie tree
            if (word.length() == 0) {
                continue;
            }
            addWord(word, root);
        }
        for (String word : words) { // test word is a concatenated word or not
            if (word.length() == 0) {
                continue;
            }
            if (testWord(word.toCharArray(), 0, root, 0)) {
                res.add(word);
            }
        }
        return res;
    }
    public boolean testWord(char[] chars, int index, TrieNode root, int count) { // count means how many words during the search path
        TrieNode cur = root;
        int n = chars.length;
        for (int i = index; i < n; i++) {
            if (cur.sons[chars[i] - 'a'] == null) {
                return false;
            }
            if (cur.sons[chars[i] - 'a'].isEnd) {
                if (i == n - 1) { // no next word, so test count to get result.
                    return count >= 1;
                }
                if (testWord(chars, i + 1, root, count + 1)) {
                    return true;
                }
            }
            cur = cur.sons[chars[i] - 'a'];
        }
        return false;
    }
    public void addWord(String str, TrieNode root) {
        char[] chars = str.toCharArray();
        TrieNode cur = root;
        for (char c : chars) {
            if (cur.sons[c - 'a'] == null) {
                cur.sons[c - 'a'] = new TrieNode();
            }
            cur = cur.sons[c - 'a'];
        }
        cur.isEnd = true;
    }
}
class TrieNode {
    TrieNode[] sons;
    boolean isEnd;
    public TrieNode() {
        sons = new TrieNode[26];
    }

http://www.cnblogs.com/grandyang/p/6254527.html
https://yijiajin.github.io/leetcode/2016/12/27/Leetcode-Trie-Hard/

首先构造Trie，经典的写法。再对每个word由root开始进行搜索。搜索的过程中记录已经找到的word数目。如果找到了某个单词，就从下一个char开始查找，substring包含>=2个单词就加入结果中。

O(n*k^2)

- Code is too complex

LeetCode 472 - Concatenated Words

Labels

Popular Posts