Find the shortest unique prefix - EPI


https://www.geeksforgeeks.org/find-all-shortest-unique-prefixes-to-represent-each-word-in-a-given-list/
Given an array of words, find all shortest unique prefixes to represent each word in the given array. Assume that no word is prefix of another.
Examples:
Input: arr[] = {"zebra", "dog", "duck", "dove"}
Output: dog, dov, du, z
Explanation: dog => dog
             dove = dov 
             duck = du
             z   => zebra 

Simple Solution is to consider every prefix of every word (starting from the shortest to largest), and if a prefix is not prefix of any other string, then print it.
    static class TrieNode
    {
        TrieNode[] child = new TrieNode[MAX];
        int freq;  // To store frequency
        TrieNode() {
            freq =1;
            for (int i = 0; i < MAX; i++)
                child[i] = null;
        }
    }
    static TrieNode root;
      
    // Method to insert a new string into Trie
    static void insert(String str)
    {
        // Length of the URL
        int len = str.length();
        TrieNode pCrawl = root;
       
        // Traversing over the length of given str.
        for (int level = 0; level<len; level++)
        {
            // Get index of child node from current character
            // in str.
            int index = str.charAt(level);
       
            // Create a new child if not exist already
            if (pCrawl.child[index] == null)
                pCrawl.child[index] = new TrieNode();
            else
               (pCrawl.child[index].freq)++;
       
            // Move to the child
            pCrawl = pCrawl.child[index];
        }
    }
       
    // This function prints unique prefix for every word stored
    // in Trie. Prefixes one by one are stored in prefix[].
    // 'ind' is current index of prefix[]
    static void findPrefixesUtil(TrieNode root, char[] prefix,
                          int ind)
    {
        // Corner case
        if (root == null)
           return;
       
        // Base case
        if (root.freq == 1)
        {
           prefix[ind] = '\0';
           int i = 0;
           while(prefix[i] != '\0')
            System.out.print(prefix[i++]);
           System.out.print("  ");
           return;
        }
       
        for (int i=0; i<MAX; i++)
        {
           if (root.child[i] != null)
           {
               prefix[ind] = (char) i;
               findPrefixesUtil(root.child[i], prefix, ind+1);
           }
        }
    }
       
    // Function to print all prefixes that uniquely
    // represent all words in arr[0..n-1]
    static void findPrefixes(String arr[], int n)
    {
        // Construct a Trie of all words
        root = new TrieNode();
        root.freq = 0;
        for (int i = 0; i<n; i++)
            insert(arr[i]);
       
        // Create an array to store all prefixes
        char[] prefix = new char[MAX_WORD_LEN];
          
        // Print all prefixes using Trie Traversal
        findPrefixesUtil(root, prefix, 0);
    }

看到prefix肯定想到的是trie咯,在trie node里面多加一个field count,表示
这个字符出现多少次
1,insert word into the trie
2, search the word, 找到第一个count为1的node,返回
因为说明这个node下面没有分支了,他就应该是唯一的
    public static final int R = 256;
    private Node root;
     
    private class Node{
        private int count;
        private boolean isEnd;
        private Node next[] = new Node[R];
         
        public Node(){
            count = 0;
            isEnd = false;
        }
 
        public Node(int count, boolean isEnd){
            this.count = count;
            this.isEnd = isEnd;
        }
    }
     
    public void insert(String str){
        if(root == null) root = new Node();
        Node curr = root;
        for(int i = 0; i < str.length(); i++){
            char c = str.charAt(i);
            if(curr.next[c] == null){
                curr.next[c] = new Node(1, false);
            }else{
                curr.next[c].count++;
            }
            curr = curr.next[c];
        }
        curr.isEnd = true;
    }
     
    public String shortestPrefix(String str){
        Node curr = root;
        int len = 0;
        for(int i = 0; i < str.length(); i++){
            char c = str.charAt(i);
            curr = curr.next[c];
            len++;
            if(curr.count == 1) break;
        }
        return str.substring(0, len);
    }
     
    public static void main(String[] args){
        TriePrefix trie = new TriePrefix();
        String[] words = {"by", "sea", "sells", "she", "shells", "shore", "the"};
        String[] words1 = {"zebra", "dog", "duck", "dove"};
        for(String word : words1){
            trie.insert(word);
        }
        String[] res = new String[words1.length];
        for(int i = 0; i < words1.length; i++){
            res[i] = trie.shortestPrefix(words1[i]);
            System.out.println(res[i]);
        }
    }
In this post a sorting based approach is discussed. On comparing the string with 2 other most similar strings in the array, we can find its shortest unique prefix. For example, if we sort the array {“zebra”, “dog”, “duck”, “dove”}, we get {“dog”, “dove”, “duck”, “zebra”}. The shortest unique prefix for the string “dove” can be found as:
Compare “dove” to “dog” –> unique prefix for dove is “dov”
Compare “dove” to “duck” –> unique prefix for dove is “do”
Now, the shortest unique prefix for “dove” is the one with greater length between “dov” and “do”. So, it is “dov”.
The shortest unique prefix for first and last string can be found by comparing them with only 1 most similar neighbor on right and left, respectively.
We can sort the array of strings and keep on doing this for every string of the array.
If we want to output the prefixes as the order of strings in the input array, we can store the string and its corresponding index in the hashmap. While adding the prefix to result array, we can get the index of the corresponding string from hashmap and add the prefix to that index.
    public String[] uniquePrefix(String[] a)
    {
        int size = a.length;
  
        /* create an array to store the results */
        String[] res = new String[size];
  
        Arrays.fill(res, "");
  
        /* hashmap to store the indexes */
        HashMap<String, Integer> hm =
                        new HashMap<String, Integer>();
  
        for (int i = 0; i < size; i++)
            hm.put(a[i], i);
  
        /* sort the array of strings */
        Arrays.sort(a);
  
        /* compare the first string with its only right neighbor */
        int j = 0;
        while (j < Math.min(a[0].length()-1, a[1].length()-1))
        {
            if (a[0].charAt(j) == a[1].charAt(j))
                j++;
            else
                break;
        }
  
        /* get the index of a[0] from HashMap */
        res[ hm.get(a[0]) ] = a[0].substring(0, j+1);
  
        /* Store the unique prefix of a[1] from its
          left neighbor */
        String temp_prefix= a[1].substring(0, j+1);
  
        for (int i = 1; i < size-1; i++)
        {
            /* compute prefix of a[i] unique from its right neighbor */
            j = 0;
            while (j < Math.min(a[i].length()-1, a[i+1].length()-1))
            {
                if (a[i].charAt(j) == a[i+1].charAt(j) )
                    j++;
                else
                    break;
            }
  
            String new_prefix = a[i].substring(0, j+1);
  
            /* compare the new prefix with previous prefix */
            if (temp_prefix.length() > new_prefix.length() )
                res[ hm.get(a[i]) ] = temp_prefix;
            else
                res[ hm.get(a[i]) ] = new_prefix;
  
            /* store the prefix of a[i+1] unique from its
               left neighbour */
            temp_prefix = a[i+1].substring(0, j+1);
  
        }
  
        /* compute the unique prefix for the last string
           in sorted array */
        String sec_last = a[size-2];
        String last = a[size-1];
  
        j = 0;
        while (j < Math.min( sec_last.length()-1, last.length()-1))
        {
            if (sec_last.charAt(j) == last.charAt(j) )
                j++;
            else
                break;
        }
  
        res[ hm.get(last) ] = last.substring(0, j+1);
  
        return res;
    }
http://www.codebytes.in/2014/10/finding-shortest-prefixes-for-strings.html
[We are using two arrays, the one given and the one we'll be using for the prefixes]

1. Make another array for prefixes. Store first characters of original strings in this array.
2. For all the prefixes to the left of it (in the prefix array), check whether this prefix has been used somewhere.
3. If it has been used, check whether you can add on character to the previous duplicate prefix, if not, add one character to the one that is being checked for duplicates.
4. Follow the same procedure for this new updated prefix (whether it was at some previous location in the array or the current one that was being checked, which is now updated)[Recursion]

    public static void checkPrefix(String[] strings, String[] pre, String s, int index){
        
        //System.out.println(Arrays.toString(pre));
        for(int i=index-1;i>=0;--i){
            if(s.matches(pre[i])){ 

                if(s.length()==strings[i].length()){
                    //System.out.println("Can't update the previous one, need to update this one");
                    pre[index] = strings[index].substring(0, s.length()+1);
                    checkPrefix(strings, pre, pre[index], index);
                    return;
                }
                else if(s.length() < strings[i].length()){ 
                    //System.out.println("Can update the previous one");
                    pre[i] = strings[i].substring(0, s.length()+1);
                    checkPrefix(strings, pre, pre[i], i);
                    return;
                } 
            }
        }
    }
    
    public static void findPrefixes(String[] strings){
        System.out.println();
        String[] pre = new String[strings.length];
         
        for(int i=0;i<pre.length;++i){
            if(i>0){
                if(strings[i].matches(strings[i-1])){
                    System.out.println("Duplicate string - Error!"); continue; 
                }
            }
            pre[i]=Character.toString(strings[i].charAt(0)); 
            checkPrefix(strings, pre, pre[i], i);
        } 
        System.out.println(Arrays.toString(strings));
        System.out.println(Arrays.toString(pre));
    }

ShortestUniquePrefix.java
  private static String randString(int len) {
    Random r = new Random();
    StringBuilder ret = new StringBuilder(len);
    while (len-- > 0) {
      ret.append((char) (r.nextInt(26) + 'a'));
    }
    return ret.toString();
  }

  public static String findShortestPrefix(String s, Set<String> D) {
    // Builds a trie according to given dictionary D.
    Trie T = new Trie();
    for (String word : D) {
      T.insert(word);
    }
    return T.getShortestUniquePrefix(s);
  }

  public static class Trie {
    private static class TrieNode {
      private boolean isString = false;
      private Map<Character, TrieNode> leaves = new HashMap<>();

      public boolean getIsString() {
        return isString;
      }

      public void setIsString(boolean string) {
        isString = string;
      }

      public Map<Character, TrieNode> getLeaves() {
        return leaves;
      }
    }

    private TrieNode root = new TrieNode();

    public boolean insert(String s) {
      TrieNode p = root;
      for (char c : s.toCharArray()) {
        if (!p.getLeaves().containsKey(c)) {
          p.getLeaves().put(c, new TrieNode());
        }
        p = p.getLeaves().get(c);
      }

      // s already existed in this trie.
      if (p.getIsString()) {
        return false;
      } else { // p.getIsString() == false
        p.setIsString(true); // Inserts s into this trie.
        return true;
      }
    }

    public String getShortestUniquePrefix(String s) {
      TrieNode p = root;
      StringBuilder prefix = new StringBuilder();
      for (char c : s.toCharArray()) {
        prefix.append(c);
        if (!p.getLeaves().containsKey(c)) {
          return prefix.toString();
        }
        p = p.getLeaves().get(c);
      }
      return "";
    }
  }

Labels

LeetCode (1432) GeeksforGeeks (1122) LeetCode - Review (1067) Review (882) Algorithm (668) to-do (609) Classic Algorithm (270) Google Interview (237) Classic Interview (222) Dynamic Programming (220) DP (186) Bit Algorithms (145) POJ (141) Math (137) Tree (132) LeetCode - Phone (129) EPI (122) Cracking Coding Interview (119) DFS (115) Difficult Algorithm (115) Lintcode (115) Different Solutions (110) Smart Algorithm (104) Binary Search (96) BFS (91) HackerRank (90) Binary Tree (86) Hard (79) Two Pointers (78) Stack (76) Company-Facebook (75) BST (72) Graph Algorithm (72) Time Complexity (69) Greedy Algorithm (68) Interval (63) Company - Google (62) Geometry Algorithm (61) Interview Corner (61) LeetCode - Extended (61) Union-Find (60) Trie (58) Advanced Data Structure (56) List (56) Priority Queue (53) Codility (52) ComProGuide (50) LeetCode Hard (50) Matrix (50) Bisection (48) Segment Tree (48) Sliding Window (48) USACO (46) Space Optimization (45) Company-Airbnb (41) Greedy (41) Mathematical Algorithm (41) Tree - Post-Order (41) ACM-ICPC (40) Algorithm Interview (40) Data Structure Design (40) Graph (40) Backtracking (39) Data Structure (39) Jobdu (39) Random (39) Codeforces (38) Knapsack (38) LeetCode - DP (38) Recursive Algorithm (38) String Algorithm (38) TopCoder (38) Sort (37) Introduction to Algorithms (36) Pre-Sort (36) Beauty of Programming (35) Must Known (34) Binary Search Tree (33) Follow Up (33) prismoskills (33) Palindrome (32) Permutation (31) Array (30) Google Code Jam (30) HDU (30) Array O(N) (29) Logic Thinking (29) Monotonic Stack (29) Puzzles (29) Code - Detail (27) Company-Zenefits (27) Microsoft 100 - July (27) Queue (27) Binary Indexed Trees (26) TreeMap (26) to-do-must (26) 1point3acres (25) GeeksQuiz (25) Merge Sort (25) Reverse Thinking (25) hihocoder (25) Company - LinkedIn (24) Hash (24) High Frequency (24) Summary (24) Divide and Conquer (23) Proof (23) Game Theory (22) Topological Sort (22) Lintcode - Review (21) Tree - Modification (21) Algorithm Game (20) CareerCup (20) Company - Twitter (20) DFS + Review (20) DP - Relation (20) Brain Teaser (19) DP - Tree (19) Left and Right Array (19) O(N) (19) Sweep Line (19) UVA (19) DP - Bit Masking (18) LeetCode - Thinking (18) KMP (17) LeetCode - TODO (17) Probabilities (17) Simulation (17) String Search (17) Codercareer (16) Company-Uber (16) Iterator (16) Number (16) O(1) Space (16) Shortest Path (16) itint5 (16) DFS+Cache (15) Dijkstra (15) Euclidean GCD (15) Heap (15) LeetCode - Hard (15) Majority (15) Number Theory (15) Rolling Hash (15) Tree Traversal (15) Brute Force (14) Bucket Sort (14) DP - Knapsack (14) DP - Probability (14) Difficult (14) Fast Power Algorithm (14) Pattern (14) Prefix Sum (14) TreeSet (14) Algorithm Videos (13) Amazon Interview (13) Basic Algorithm (13) Codechef (13) Combination (13) Computational Geometry (13) DP - Digit (13) LCA (13) LeetCode - DFS (13) Linked List (13) Long Increasing Sequence(LIS) (13) Math-Divisible (13) Reservoir Sampling (13) mitbbs (13) Algorithm - How To (12) Company - Microsoft (12) DP - Interval (12) DP - Multiple Relation (12) DP - Relation Optimization (12) LeetCode - Classic (12) Level Order Traversal (12) Prime (12) Pruning (12) Reconstruct Tree (12) Thinking (12) X Sum (12) AOJ (11) Bit Mask (11) Company-Snapchat (11) DP - Space Optimization (11) Dequeue (11) Graph DFS (11) MinMax (11) Miscs (11) Princeton (11) Quick Sort (11) Stack - Tree (11) 尺取法 (11) 挑战程序设计竞赛 (11) Coin Change (10) DFS+Backtracking (10) Facebook Hacker Cup (10) Fast Slow Pointers (10) HackerRank Easy (10) Interval Tree (10) Limited Range (10) Matrix - Traverse (10) Monotone Queue (10) SPOJ (10) Starting Point (10) States (10) Stock (10) Theory (10) Tutorialhorizon (10) Kadane - Extended (9) Mathblog (9) Max-Min Flow (9) Maze (9) Median (9) O(32N) (9) Quick Select (9) Stack Overflow (9) System Design (9) Tree - Conversion (9) Use XOR (9) Book Notes (8) Company-Amazon (8) DFS+BFS (8) DP - States (8) Expression (8) Longest Common Subsequence(LCS) (8) One Pass (8) Quadtrees (8) Traversal Once (8) Trie - Suffix (8) 穷竭搜索 (8) Algorithm Problem List (7) All Sub (7) Catalan Number (7) Cycle (7) DP - Cases (7) Facebook Interview (7) Fibonacci Numbers (7) Flood fill (7) Game Nim (7) Graph BFS (7) HackerRank Difficult (7) Hackerearth (7) Inversion (7) Kadane’s Algorithm (7) Manacher (7) Morris Traversal (7) Multiple Data Structures (7) Normalized Key (7) O(XN) (7) Radix Sort (7) Recursion (7) Sampling (7) Suffix Array (7) Tech-Queries (7) Tree - Serialization (7) Tree DP (7) Trie - Bit (7) 蓝桥杯 (7) Algorithm - Brain Teaser (6) BFS - Priority Queue (6) BFS - Unusual (6) Classic Data Structure Impl (6) DP - 2D (6) DP - Monotone Queue (6) DP - Unusual (6) DP-Space Optimization (6) Dutch Flag (6) How To (6) Interviewstreet (6) Knapsack - MultiplePack (6) Local MinMax (6) MST (6) Minimum Spanning Tree (6) Number - Reach (6) Parentheses (6) Pre-Sum (6) Probability (6) Programming Pearls (6) Rabin-Karp (6) Reverse (6) Scan from right (6) Schedule (6) Stream (6) Subset Sum (6) TSP (6) Xpost (6) n00tc0d3r (6) reddit (6) AI (5) Abbreviation (5) Anagram (5) Art Of Programming-July (5) Assumption (5) Bellman Ford (5) Big Data (5) Code - Solid (5) Code Kata (5) Codility-lessons (5) Coding (5) Company - WMware (5) Convex Hull (5) Crazyforcode (5) DFS - Multiple (5) DFS+DP (5) DP - Multi-Dimension (5) DP-Multiple Relation (5) Eulerian Cycle (5) Graph - Unusual (5) Graph Cycle (5) Hash Strategy (5) Immutability (5) Java (5) LogN (5) Manhattan Distance (5) Matrix Chain Multiplication (5) N Queens (5) Pre-Sort: Index (5) Quick Partition (5) Quora (5) Randomized Algorithms (5) Resources (5) Robot (5) SPFA(Shortest Path Faster Algorithm) (5) Shuffle (5) Sieve of Eratosthenes (5) Strongly Connected Components (5) Subarray Sum (5) Sudoku (5) Suffix Tree (5) Swap (5) Threaded (5) Tree - Creation (5) Warshall Floyd (5) Word Search (5) jiuzhang (5)

Popular Posts