LeetCode 472 - Concatenated Words


http://bookshadow.com/weblog/2016/12/18/leetcode-concatenated-words/
Given a list of words, please write a program that returns all concatenated words in the given list of words.
A concatenated word is defined as a string that is comprised entirely of at least two shorter words in the given array.
Example:
Input: ["cat","cats","catsdogcats","dog","dogcatsdog","hippopotamuses","rat","ratcatdogcat"]

Output: ["catsdogcats","dogcatsdog","ratcatdogcat"]

Explanation: "catsdogcats" can be concatenated by "cats", "dog" and "cats"; 
 "dogcatsdog" can be concatenated by "dog", "cats" and "dog"; 
"ratcatdogcat" can be concatenated by "rat", "cat", "dog" and "cat".
Note:
  1. The number of elements of the given array will not exceed 10,000
  2. The length sum of elements in the given array will not exceed 600,000.
  3. The returned elements order does not matter.

X. DP
http://www.cnblogs.com/EdwardLiu/p/6213426.html
We iterate through each word and see if it can be formed by using other words. The subproblem is Word Break I.
But it is a little different from Word Break I, because our result only want words that are concantenated by 2 or more other words in the given array.
How to tell if a word is only by itself in the given array, or it can be concatenated by other words in the given array?
The trick is: it is obvious that a word can only be formed by words shorter than it. So we can first sort the input by length of each word, and only try to form one word by using words in front of it. We also do not add the current word to dictionary when determine if it can be concantenated.
小心一个test case:
Input:[""]
Output:[""]
Expected:[]
如果不加21行那句,就会直接return dp[0], true了
https://leetcode.com/problems/concatenated-words/discuss/95652/Java-DP-Solution
If you do know one optimized solution for above question is using DP, this problem is just one more step further. We iterate through each word and see if it can be formed by using other words.


Of course it is also obvious that a word can only be formed by words shorter than it. So we can first sort the input by length of each word, and only try to form one word by using words in front of it.
    public static List<String> findAllConcatenatedWordsInADict(String[] words) {
        List<String> result = new ArrayList<>();
        Set<String> preWords = new HashSet<>();
        Arrays.sort(words, new Comparator<String>() {
            public int compare (String s1, String s2) {
                return s1.length() - s2.length();
            }
        });
        
        for (int i = 0; i < words.length; i++) {
            if (canForm(words[i], preWords)) {
                result.add(words[i]);
            }
            preWords.add(words[i]);
        }
        
        return result;
    }
 
    private static boolean canForm(String word, Set<String> dict) {
        if (dict.isEmpty()) return false;
 boolean[] dp = new boolean[word.length() + 1];
 dp[0] = true;
 for (int i = 1; i <= word.length(); i++) {
     for (int j = 0; j < i; j++) {
  if (!dp[j]) continue;
  if (dict.contains(word.substring(j, i))) {
      dp[i] = true;
      break;
  }
     }
 }
 return dp[word.length()];
    }
https://discuss.leetcode.com/topic/72113/java-dp-solution
If you do know one optimized solution for above question is using DP, this problem is just one more step further. We iterate through each word and see if it can be formed by using other words.
Of course it is also obvious that a word can only be formed by words shorter than it. So we can first sort the input by length of each word, and only try to form one word by using words in front of it.

Time complexity
Suppose the word has a average length of k, total n words, we still have a TC : n^2* k^2.

I didn't think about this carefully before. If we consider TC of word.substring(j, i) as k, then overall runtime complexity is n*k^3. Anyway it's not n^2*k^2.
http://blog.csdn.net/u014688145/article/details/70702445
一个道理,输入中混杂了字典和匹配单词,所以直接从输入中筛选即可,筛选规则就是word break中的方法,如果能够匹配,就加入到list中
筛选过程中,做了些许优化,很容易理解。单词按长度排序,长度小的一定不能由长度大的单词组成,自己不能组成自己。所以,我们才可以用一个单循环完成这些判别操作。
public List<String> findAllConcatenatedWordsInADict(String[] words) { List<String> ans = new ArrayList<>(); Set<String> set = new HashSet<>(); Arrays.sort(words, new Comparator<String>() { @Override public int compare(String o1, String o2) { return o1.length() - o2.length(); } }); for (int i = 0; i < words.length;i++){ if(wordBreak(words[i],set)){ ans.add(words[i]); } set.add(words[i]); } return ans; } //139. Word Break private boolean wordBreak(String s, Set<String> wordDict){ if(wordDict.isEmpty()) return false; boolean[] f = new boolean[s.length() + 1]; f[0] = true; for (int i = 1; i <= s.length(); i++){ for (int j = 0; j < i; j++){ if (f[j] && wordDict.contains(s.substring(j, i))){ f[i] = true; break; } } } return f[s.length()]; }
X. DFS+Cache
https://discuss.leetcode.com/topic/72953/java-156-ms-solution-recursion-with-cache
I suppose the run time should be O(n^2*k), n is the average length of words and k is number of words?
The worst case time complexity is O(n^3 * k), because substring() cost O(n).
public List<String> findAllConcatenatedWordsInADict(String[] words) {
        Set<String> dict = new HashSet<>();
        for (String s: words) dict.add(s);
        Map<String, Boolean> canform = new HashMap<>();
        List<String> res = new ArrayList<>();
        for (String s: words) {
            if (check(s, canform, dict)) res.add(s);
        }
        return res;
    }
    private boolean check(String s, Map<String, Boolean> canform, Set<String> dict) {
        if (canform.containsKey(s)) {
            return canform.get(s);
        } else {
            for (int i = 1; i <= s.length(); i++) {
                String pre = s.substring(0, i);
                if (dict.contains(pre)) {
                    String post = s.substring(i);
                    if (dict.contains(post) || check(post, canform, dict)) {
                        canform.put(s, true);
                        return true;
                    }
                }
            }
            canform.put(s, false);
            return false;

        }
    }
X. DFS
https://leetcode.com/problems/concatenated-words/discuss/95644/102ms-java-Trie-%2B-DFS-solution.-With-explanation-easy-to-understand.
public List<String> findAllConcatenatedWordsInADict(String[] words) {
        List<String> res = new ArrayList<String>();
        if (words == null || words.length == 0) {
            return res;
        }
        TrieNode root = new TrieNode();
        for (String word : words) { // construct Trie tree
            if (word.length() == 0) {
                continue;
            }
            addWord(word, root);
        }
        for (String word : words) { // test word is a concatenated word or not
            if (word.length() == 0) {
                continue;
            }
            if (testWord(word.toCharArray(), 0, root, 0)) {
                res.add(word);
            }
        }
        return res;
    }
    public boolean testWord(char[] chars, int index, TrieNode root, int count) { // count means how many words during the search path
        TrieNode cur = root;
        int n = chars.length;
        for (int i = index; i < n; i++) {
            if (cur.sons[chars[i] - 'a'] == null) {
                return false;
            }
            if (cur.sons[chars[i] - 'a'].isEnd) {
                if (i == n - 1) { // no next word, so test count to get result.
                    return count >= 1;
                }
                if (testWord(chars, i + 1, root, count + 1)) {
                    return true;
                }
            }
            cur = cur.sons[chars[i] - 'a'];
        }
        return false;
    }
    public void addWord(String str, TrieNode root) {
        char[] chars = str.toCharArray();
        TrieNode cur = root;
        for (char c : chars) {
            if (cur.sons[c - 'a'] == null) {
                cur.sons[c - 'a'] = new TrieNode();
            }
            cur = cur.sons[c - 'a'];
        }
        cur.isEnd = true;
    }
}
class TrieNode {
    TrieNode[] sons;
    boolean isEnd;
    public TrieNode() {
        sons = new TrieNode[26];
    }
http://www.learn4master.com/interview-questions/leetcode/leetcode-concatenated-words
    public List<String> findAllConcatenatedWordsInADict(String[] words) {
        int min = Integer.MAX_VALUE, secondMin = Integer.MAX_VALUE;
        int max = 0;
        Set<String> set = new HashSet<>();
        for(String word: words) {
            int len = word.length();
            if(len <= min) {
                secondMin = min;
                min = len;
            } else if(len < secondMin) {
                secondMin = len;
            }
            max = Math.max(max, len);
            set.add(word);
        }
        List<String> res = new ArrayList<>();
        int minLen = min + min;
        for(String w: words) {
            int len = w.length();
            if(len < minLen) continue;
            if(test(w, 0, set, min, max, 0)) {
                res.add(w);
            }
        }
        
        return res;
    }
    
    boolean test(String w, int start, Set<String> words, int min, int max, int cnt) {
        if(start == w.length()) {
            if(cnt > 1) return true;
            return false;
        }
        
        for(int i = min; i < max; i++) {
            if(start + i > w.length()) break;
            String tmp = w.substring(start, start + i);
            if(words.contains(tmp)) {
                if(test(w, start + i, words, min, max, cnt + 1)) {
                    return true;
                }
            }
        }
        return false;
    }

X. Using Trie
https://discuss.leetcode.com/topic/72160/simple-java-trie-dfs-solution-144ms
Most of lines are adding words into Trie Tree
This solution is like putting two pointers to search through the tree. When find a word, put the other pointer back on root then continue searching.
But I'm not sure about the time complexity of my solution. Suppose word length is len and there are n words. Is the time complexity O(len * n ^ 2)?
    public List<String> findAllConcatenatedWordsInADict(String[] words) {
        List<String> res = new ArrayList<>();
        Trie tree = new Trie();
        for (String s: words) {
            tree.insert(s);
        }
        for (String s: words) {
            if (helper(s, tree))    res.add(s);
        }
        return res;
    }
    public boolean helper(String s, Trie tree) {
        TrieNode cur = tree.root;
        char[] arr = s.toCharArray();
        for (int i = 0; i < arr.length; i++) {
            int index = arr[i] - 'a';
            cur = cur.children[index];
            if (cur == null)    return false;
            String suffix = s.substring(i+1);
            // tree.search(suffix) means the combination for this s consists of two words: cur + suffix
            // helper(suffix, tree) means there will be further level searching, the combination consists of more than two words
            if (cur.isWord && (tree.search(suffix) || helper(suffix, tree)))  return true;
        }
        return false;
    }
    class TrieNode {
        char val;
        boolean isWord;
        TrieNode[] children;
        public TrieNode(char val) {
            this.val = val;
            this.isWord = false;
            children = new TrieNode[26];
        }
    }
    class Trie {
        TrieNode root;
        public Trie() {
            this.root = new TrieNode(' ');
        }
        public void insert(String s) {
            TrieNode cur = root;
            char[] arr = s.toCharArray();
            for (int i = 0; i < arr.length; i++) {
                int index = arr[i] - 'a';
                if (cur.children[index] == null)    cur.children[index] = new TrieNode(arr[i]);
                cur = cur.children[index];
            }
            cur.isWord = true;
        }
        public boolean search(String s) {
            TrieNode cur = root;
            char[] arr = s.toCharArray();
            for (int i = 0; i < arr.length; i++) {
                int index = arr[i]-'a';
                if (cur.children[index] == null)    return false;
                cur = cur.children[index];
            }
            return cur.isWord;
        }
    }
https://discuss.leetcode.com/topic/75089/102ms-java-trie-dfs-solution-with-explanation-easy-to-understand
also https://discuss.leetcode.com/topic/72727/java-simple-150ms-trie-dfs-solution-with-some-comments
public List<String> findAllConcatenatedWordsInADict(String[] words) {
        List<String> res = new ArrayList<String>();
        if (words == null || words.length == 0) {
            return res;
        }
        TrieNode root = new TrieNode();
        for (String word : words) { // construct Trie tree
            if (word.length() == 0) {
                continue;
            }
            addWord(word, root);
        }
        for (String word : words) { // test word is a concatenated word or not
            if (word.length() == 0) {
                continue;
            }
            if (testWord(word.toCharArray(), 0, root, 0)) {
                res.add(word);
            }
        }
        return res;
    }
    public boolean testWord(char[] chars, int index, TrieNode root, int count) { // count means how many words during the search path
        TrieNode cur = root;
        int n = chars.length;
        for (int i = index; i < n; i++) {
            if (cur.sons[chars[i] - 'a'] == null) {
                return false;
            }
            if (cur.sons[chars[i] - 'a'].isEnd) {
                if (i == n - 1) { // no next word, so test count to get result.
                    return count >= 1;
                }
                if (testWord(chars, i + 1, root, count + 1)) {
                    return true;
                }
            }
            cur = cur.sons[chars[i] - 'a'];
        }
        return false;
    }
    public void addWord(String str, TrieNode root) {
        char[] chars = str.toCharArray();
        TrieNode cur = root;
        for (char c : chars) {
            if (cur.sons[c - 'a'] == null) {
                cur.sons[c - 'a'] = new TrieNode();
            }
            cur = cur.sons[c - 'a'];
        }
        cur.isEnd = true;
    }
}
class TrieNode {
    TrieNode[] sons;
    boolean isEnd;
    public TrieNode() {
        sons = new TrieNode[26];
    }
http://www.cnblogs.com/grandyang/p/6254527.html
https://yijiajin.github.io/leetcode/2016/12/27/Leetcode-Trie-Hard/
首先构造Trie,经典的写法。再对每个word由root开始进行搜索。搜索的过程中记录已经找到的word数目。如果找到了某个单词,就从下一个char开始查找,substring包含>=2个单词就加入结果中。
O(n*k^2)
- Code is too complex

Labels

LeetCode (1432) GeeksforGeeks (1122) LeetCode - Review (1067) Review (882) Algorithm (668) to-do (609) Classic Algorithm (270) Google Interview (237) Classic Interview (222) Dynamic Programming (220) DP (186) Bit Algorithms (145) POJ (141) Math (137) Tree (132) LeetCode - Phone (129) EPI (122) Cracking Coding Interview (119) DFS (115) Difficult Algorithm (115) Lintcode (115) Different Solutions (110) Smart Algorithm (104) Binary Search (96) BFS (91) HackerRank (90) Binary Tree (86) Hard (79) Two Pointers (78) Stack (76) Company-Facebook (75) BST (72) Graph Algorithm (72) Time Complexity (69) Greedy Algorithm (68) Interval (63) Company - Google (62) Geometry Algorithm (61) Interview Corner (61) LeetCode - Extended (61) Union-Find (60) Trie (58) Advanced Data Structure (56) List (56) Priority Queue (53) Codility (52) ComProGuide (50) LeetCode Hard (50) Matrix (50) Bisection (48) Segment Tree (48) Sliding Window (48) USACO (46) Space Optimization (45) Company-Airbnb (41) Greedy (41) Mathematical Algorithm (41) Tree - Post-Order (41) ACM-ICPC (40) Algorithm Interview (40) Data Structure Design (40) Graph (40) Backtracking (39) Data Structure (39) Jobdu (39) Random (39) Codeforces (38) Knapsack (38) LeetCode - DP (38) Recursive Algorithm (38) String Algorithm (38) TopCoder (38) Sort (37) Introduction to Algorithms (36) Pre-Sort (36) Beauty of Programming (35) Must Known (34) Binary Search Tree (33) Follow Up (33) prismoskills (33) Palindrome (32) Permutation (31) Array (30) Google Code Jam (30) HDU (30) Array O(N) (29) Logic Thinking (29) Monotonic Stack (29) Puzzles (29) Code - Detail (27) Company-Zenefits (27) Microsoft 100 - July (27) Queue (27) Binary Indexed Trees (26) TreeMap (26) to-do-must (26) 1point3acres (25) GeeksQuiz (25) Merge Sort (25) Reverse Thinking (25) hihocoder (25) Company - LinkedIn (24) Hash (24) High Frequency (24) Summary (24) Divide and Conquer (23) Proof (23) Game Theory (22) Topological Sort (22) Lintcode - Review (21) Tree - Modification (21) Algorithm Game (20) CareerCup (20) Company - Twitter (20) DFS + Review (20) DP - Relation (20) Brain Teaser (19) DP - Tree (19) Left and Right Array (19) O(N) (19) Sweep Line (19) UVA (19) DP - Bit Masking (18) LeetCode - Thinking (18) KMP (17) LeetCode - TODO (17) Probabilities (17) Simulation (17) String Search (17) Codercareer (16) Company-Uber (16) Iterator (16) Number (16) O(1) Space (16) Shortest Path (16) itint5 (16) DFS+Cache (15) Dijkstra (15) Euclidean GCD (15) Heap (15) LeetCode - Hard (15) Majority (15) Number Theory (15) Rolling Hash (15) Tree Traversal (15) Brute Force (14) Bucket Sort (14) DP - Knapsack (14) DP - Probability (14) Difficult (14) Fast Power Algorithm (14) Pattern (14) Prefix Sum (14) TreeSet (14) Algorithm Videos (13) Amazon Interview (13) Basic Algorithm (13) Codechef (13) Combination (13) Computational Geometry (13) DP - Digit (13) LCA (13) LeetCode - DFS (13) Linked List (13) Long Increasing Sequence(LIS) (13) Math-Divisible (13) Reservoir Sampling (13) mitbbs (13) Algorithm - How To (12) Company - Microsoft (12) DP - Interval (12) DP - Multiple Relation (12) DP - Relation Optimization (12) LeetCode - Classic (12) Level Order Traversal (12) Prime (12) Pruning (12) Reconstruct Tree (12) Thinking (12) X Sum (12) AOJ (11) Bit Mask (11) Company-Snapchat (11) DP - Space Optimization (11) Dequeue (11) Graph DFS (11) MinMax (11) Miscs (11) Princeton (11) Quick Sort (11) Stack - Tree (11) 尺取法 (11) 挑战程序设计竞赛 (11) Coin Change (10) DFS+Backtracking (10) Facebook Hacker Cup (10) Fast Slow Pointers (10) HackerRank Easy (10) Interval Tree (10) Limited Range (10) Matrix - Traverse (10) Monotone Queue (10) SPOJ (10) Starting Point (10) States (10) Stock (10) Theory (10) Tutorialhorizon (10) Kadane - Extended (9) Mathblog (9) Max-Min Flow (9) Maze (9) Median (9) O(32N) (9) Quick Select (9) Stack Overflow (9) System Design (9) Tree - Conversion (9) Use XOR (9) Book Notes (8) Company-Amazon (8) DFS+BFS (8) DP - States (8) Expression (8) Longest Common Subsequence(LCS) (8) One Pass (8) Quadtrees (8) Traversal Once (8) Trie - Suffix (8) 穷竭搜索 (8) Algorithm Problem List (7) All Sub (7) Catalan Number (7) Cycle (7) DP - Cases (7) Facebook Interview (7) Fibonacci Numbers (7) Flood fill (7) Game Nim (7) Graph BFS (7) HackerRank Difficult (7) Hackerearth (7) Inversion (7) Kadane’s Algorithm (7) Manacher (7) Morris Traversal (7) Multiple Data Structures (7) Normalized Key (7) O(XN) (7) Radix Sort (7) Recursion (7) Sampling (7) Suffix Array (7) Tech-Queries (7) Tree - Serialization (7) Tree DP (7) Trie - Bit (7) 蓝桥杯 (7) Algorithm - Brain Teaser (6) BFS - Priority Queue (6) BFS - Unusual (6) Classic Data Structure Impl (6) DP - 2D (6) DP - Monotone Queue (6) DP - Unusual (6) DP-Space Optimization (6) Dutch Flag (6) How To (6) Interviewstreet (6) Knapsack - MultiplePack (6) Local MinMax (6) MST (6) Minimum Spanning Tree (6) Number - Reach (6) Parentheses (6) Pre-Sum (6) Probability (6) Programming Pearls (6) Rabin-Karp (6) Reverse (6) Scan from right (6) Schedule (6) Stream (6) Subset Sum (6) TSP (6) Xpost (6) n00tc0d3r (6) reddit (6) AI (5) Abbreviation (5) Anagram (5) Art Of Programming-July (5) Assumption (5) Bellman Ford (5) Big Data (5) Code - Solid (5) Code Kata (5) Codility-lessons (5) Coding (5) Company - WMware (5) Convex Hull (5) Crazyforcode (5) DFS - Multiple (5) DFS+DP (5) DP - Multi-Dimension (5) DP-Multiple Relation (5) Eulerian Cycle (5) Graph - Unusual (5) Graph Cycle (5) Hash Strategy (5) Immutability (5) Java (5) LogN (5) Manhattan Distance (5) Matrix Chain Multiplication (5) N Queens (5) Pre-Sort: Index (5) Quick Partition (5) Quora (5) Randomized Algorithms (5) Resources (5) Robot (5) SPFA(Shortest Path Faster Algorithm) (5) Shuffle (5) Sieve of Eratosthenes (5) Strongly Connected Components (5) Subarray Sum (5) Sudoku (5) Suffix Tree (5) Swap (5) Threaded (5) Tree - Creation (5) Warshall Floyd (5) Word Search (5) jiuzhang (5)

Popular Posts