Massive Algorithms: LeetCode 411 - Minimum Unique Word Abbreviation

LeetCode 411 - Minimum Unique Word Abbreviation

http://bookshadow.com/weblog/2016/10/02/leetcode-minimum-unique-word-abbreviation/

A string such as "word" contains the following abbreviations:

["word", "1ord", "w1rd", "wo1d", "wor1", "2rd", "w2d", "wo2", "1o1d", "1or1", "w1r1", "1o2", "2r1", "3d", "w3", "4"]

Given a target string and a set of strings in a dictionary, find an abbreviation of this target string with the smallest possible length such that it does not conflict with abbreviations of the strings in the dictionary.

Each number or letter in the abbreviation is considered length = 1. For example, the abbreviation "a32bc" has length = 4.

Note:

In the case of multiple answers as shown in the second example below, you may return any one of them.
Assume length of target string = m, and dictionary size = n. You may assume that m ≤ 21, n ≤ 1000, and log2(n) + m ≤ 20.

Examples:

"apple", ["blade"] -> "a4" (because "5" or "4e" conflicts with "blade")

"apple", ["plain", "amber", "blade"] -> "1p3" (other valid answers include "ap3", "a3e", "2p2", "3le", "3l1").

X. Bit mask
https://www.cnblogs.com/EdwardLiu/p/6196177.html

Refer to https://discuss.leetcode.com/topic/61799/java-bit-mask-dfs-with-pruning

bit mask refer to http://bookshadow.com/weblog/2016/10/02/leetcode-minimum-unique-word-abbreviation/

1. The key idea of my solution is to preprocess the dictionary to transfer all the words to bit sequences (int):

比如apple 和 amper 字母相同设1,不同设0,所以得到10100

又比如target='apple'，word='amble'，则word的bitmask为10011

在这个过程中ignore与target长度不一样的，得到每个字典里面的bitmask，下面称呼它为mask，所以mask里为1的位表示与target字母相同

a p p l e
a m b l e
1 0 0 1 1

2. 开始缩写target单词，只不过我们不直接缩写成十进制数，而是先缩写成二进制，1表示保留字母，0表示替换为数字。下面称呼这个当前的缩写结果为curResult, 因为curResult由target缩写而来，所以curResult里为1的位与target字母相同

例如单词manipulation的缩写m2ip6n可以转化为100110000001

m a n i p u l a t i o n
m  2  i p      6      n
1 0 0 1 1 0 0 0 0 0 0 1

3. mask里为1的位表示与target字母相同，同时curResult里为1的位也与target字母相同，如果

curResult & mask == curResult,

则说明curResult里为1的位子上，mask也一定是1，那么curResult也一定是mask所对应的那个string的一个缩写，所以这里conflict出现了，所以这个curResult要被skip掉

4. 在所有没有conflict的curResult里面，找到长度最短的一个，如何找到长度最短，可以在recursion里面维护一个当前curResult长度大小的变量，同时有一个变量保存最小长度用以更新

5. 最后将长度最短那个curResult(它目前只是一个二进制缩写)，复原成十进制缩写

  2     private int minLen;
  3     private int result;
  4     
  5     public String minAbbreviation(String target, String[] dictionary) {
  6         // only keep words whose length == target in new dict, then compute their bit masks
  7         Set<Integer> maskSet = new HashSet<>();
  8         for(String s: dictionary){
  9             if(target.length() == s.length()){
 10                 maskSet.add(getBitMask(s,target));
 11             }
 12         }
 13     
 14         // dfs with pruning
 15         minLen = target.length()+1;
 16         result = -1;
 17     
 18         dfs(target,maskSet,0,0,0);
 19     
 20         if(minLen > target.length()){
 21             return "";
 22         }
 23     
 24         // convert result to word
 25         int zeroCnt = 0;
 26         String res = "";
 27         for (int i = target.length()-1; i>=0; i--) {
 28             //遇到0要累加连续零个数,遇到1填原char
 29             int digit = (result & 1);
 30             if(digit == 0){
 31                 ++zeroCnt;
 32             } else {
 33                 if(zeroCnt > 0){
 34                     res = zeroCnt + res;
 35                     zeroCnt =0;
 36                 }
 37                 res = target.charAt(i) + res;
 38             }
 39             result >>= 1;
 40         }
 41         if(zeroCnt > 0) res = zeroCnt + res;
 42         return res;
 43     }
 44     
 45     /**
 46      *
 47      * @param target
 48      * @param maskSet masks of words in dict
 49      * @param start idx at target
 50      * @param curLen current abbr's length
 51      */
 52     private void dfs(String target,Set<Integer> maskSet,int start,int curLen,int curResult){
 53         // pruning, no need to continue, already not min length
 54         if(curLen >= minLen) return;
 55     
 56         if(start == target.length()){
 57             // check whether curResult mask conflicts with words in dict
 58             for(int mask:maskSet){
 59                 /**
 60                  * 单词manipulation的缩写m2ip6n可以转化为100110000001
 61                  *  m a n i p u l a t i o n
 62                     m  2  i p      6      n
 63                     1 0 0 1 1 0 0 0 0 0 0 1
 64                  * 0代表随意不care,如果这个mask和dict中某个mask的所有1重合代表在意的位置完全相同,
 65                  * 说明这个mask和dict中那个词冲突
 66                  * 我们要找的是不冲突的mask
 67                  */
 68                 if((curResult & mask) == curResult){
 69                     return; // conflict
 70                 }
 71             }
 72             // no conflict happens, can use
 73             if(minLen > curLen){
 74                 minLen = curLen;
 75                 result = curResult;
 76             }
 77             return;
 78         }
 79     
 80         // case 1: replace chars from start in target with number
 81         for (int i = start; i < target.length(); i++) {
 82             //被replace掉的char位置由0代替所以是curResult<<(i+1-start),没replace掉的这里不管,我们只管到i,之后的由backtrack内决定
 83             //注意:不允许word => w11d这种用数字代替但含义不同
 84             if(curLen == 0 || (curResult &1) == 1){
 85                 //后者即上一次是保留了字母
 86                 dfs(target,maskSet,i+1,curLen+1,curResult<<(i+1-start));
 87             }
 88         }
 89     
 90         // case 2: no replace from start (curResult << 1)+1代表新的这位保留了char,所以是加一
 91         dfs(target,maskSet,start+1,curLen+1,(curResult << 1)+1);
 92     }
 93     
 94     // 比如apple 和 amper 字母相同设1,不同设0,所以得到10100
 95     private int getBitMask(String s1,String s2){
 96         int mask = 0;
 97         for (int i = 0; i < s1.length(); i++) {
 98             mask <<= 1;
 99             if(s1.charAt(i) == s2.charAt(i)){
100                 mask += 1;
101             }
102         }
103         return mask;
104     }

X. Priority Queue
https://cheonhyangzhang.gitbooks.io/leetcode-solutions/content/411-minimum-unique-word-abbreviation.html
http://www.cnblogs.com/grandyang/p/5935836.html
这道题实际上是之前那两道Valid Word Abbreviation和Generalized Abbreviation的合体，我们的思路其实很简单，首先找出target的所有的单词缩写的形式，然后按照长度来排序，小的排前面，我们用优先队列来自动排序，里面存一个pair，保存单词缩写及其长度，然后我们从最短的单词缩写开始，跟dictionary中所有的单词一一进行验证，利用Valid Word Abbreviation中的方法，看其是否是合法的单词的缩写，如果是，说明有冲突，直接break，进行下一个单词缩写的验证
https://discuss.leetcode.com/topic/61327/brute-force-solution-just-check-every-possible-abbreviation/3

Use the approach of “320. Generalized Abbreviation” to generate all abbreviations of “target”;
Put all the abbreviations into a PriorityQueue according to the length of the abbreviations;
With each abbreviation, check whether it’s an abbreviation of any word in the dictionary using the approach of “408. Valid Word Abbreviation”.

    public class Abbreviation{
        public String abbr;
        public int len;
        
        public Abbreviation(String abbr, int len){
            this.abbr = abbr;
            this.len = len;
        }
    }
    
    public String minAbbreviation(String target, String[] dictionary) {
        if(dictionary.length == 0) return Integer.toString(target.length());
        PriorityQueue<Abbreviation> q = new PriorityQueue<Abbreviation>(new Comparator<Abbreviation>(){
           public int compare(Abbreviation a1, Abbreviation a2){
               return a1.len - a2.len;
           } 
        });
        generateAbbrs(q, target, "", 0, 0, false);
        System.out.println(q.size());
        while(!q.isEmpty()){
            String abbr = q.poll().abbr;
            boolean notMatch = true;
            for(int i=0; i<dictionary.length; i++){
                if(isValidAbbr(dictionary[i], abbr)){
                    notMatch = false;
                    break;
                }
            }
            if(notMatch) return abbr;
        }
        return "";
    }
    
    private void generateAbbrs(PriorityQueue<Abbreviation> q, String target, String path, int start, int len, boolean prevAbbr){
        if(start == target.length()){
            q.offer(new Abbreviation(path, len));
            return;
        }
        generateAbbrs(q, target, path+target.charAt(start), start+1, len+1, false);
        if(!prevAbbr){
            for(int i=start; i<target.length(); i++){
                String str = target.substring(start, i+1);
                generateAbbrs(q, target, path+Integer.toString(str.length()), i+1, len+1, true);
            }
        }
    }
    
    private boolean isValidAbbr(String word, String abbr){
        int index = 0, i = 0;
        while(i < abbr.length()){
            if(!Character.isDigit(abbr.charAt(i))){
                if(index >= word.length() || word.charAt(index) != abbr.charAt(i)) return false;
                index++; i++;
            }else{
                int start = i;
                while(i < abbr.length() && Character.isDigit(abbr.charAt(i))) i++;
                int number = Integer.parseInt(abbr.substring(start, i));
                index += number;
            }
        }
        return index == word.length();
    }

https://discuss.leetcode.com/topic/61346/trie-bruteforce

Abbreviation number is pretty like wild card and it can match all the characters appearing in the trie.
There's 3 functions:
addTrie: add string to the trie
search: search a string to determine if that's the one in the trie (wild card mode)
abbrGenerator: generate all the possible abbreviations given certain length (which is num parameter).

class Trie{
        Trie[] next = new Trie[26];
        boolean isEnd = false;
    }
    Trie root = new Trie();
    List<String> abbrs;
    public String minAbbreviation(String target, String[] dictionary) {
        for(String s:dictionary) {
            addTrie(s);
        }
        for(int i=0; i<target.length(); i++) {
            abbrs = new ArrayList<>();
            abbrGenerator(target, 0, "", 0, i+1);
            for(String s:abbrs) {
                if(search(s, root, 0, 0)==false) return s;
            }
        }
        return "";
    }
    public void addTrie(String s) {
        Trie cur = root;
        for(int i=0; i<s.length(); i++) {
            char c = s.charAt(i);
            if(cur.next[c-'a']==null) {
                cur.next[c-'a']=new Trie();
            }
            cur = cur.next[c-'a'];
        }
        cur.isEnd = true;
    }
    public boolean search(String target, Trie root, int i, int loop) {
        if(root==null) return false;

        if(loop!=0) {
            for(int a=0; a<26; a++) {
                if(search(target, root.next[a], i, loop-1)) return true;
            }
            return false;
        }
        if(i==target.length()) {
            if(root.isEnd) return true;
            return false;
        }
        if(Character.isDigit(target.charAt(i))) {
            int tmp = 0;
            while(i<target.length()&&Character.isDigit(target.charAt(i))) {
                tmp = tmp*10 + target.charAt(i)-'0';
                i++;
            }
            return search(target, root, i, tmp);
        } else {
            return search(target, root.next[target.charAt(i)-'a'], i+1, 0);
        }
    }
    public void abbrGenerator(String target, int i, String tmp, int abbr, int num) {
        if(i==target.length()) {
            if(num==0&&abbr==0) abbrs.add(tmp);
            if(num==1&&abbr!=0) abbrs.add(tmp+abbr);
            return;
        }
        if(num<=0) return;
        char cur = target.charAt(i);
        abbrGenerator(target, i+1, abbr==0?tmp+cur:tmp+abbr+cur, 0, abbr==0?num-1:num-2);
        abbrGenerator(target, i+1, tmp, abbr+1, num);
    }

X. 深度优先搜索（DFS） + 剪枝
https://blog.csdn.net/magicbean2/article/details/78261631
题目的基本思路是深度优先搜索（DFS），但是单纯的DFS会比较耗时，可以在DFS的过程中加入有效剪枝来提高效率。我们首先将dictionary中长度与target不同的单词去掉，以提高效率（因为长度不同的单词其缩写不可能发生冲突）。DFS从空字符串开发，逐一增加target中的字母，并尝试越过一些字母，插入数字。遍历dictionary中的单词，检查冲突，如果不存在冲突，则递归，并更新最优解。

怎么剪枝呢？可以用一个变量来记录当前时刻的最优的单词缩写长度，若DFS分支的长度大于该长度，则进行剪枝。

http://www.voidcn.com/article/p-vuhpqtnm-bn.html

    string minAbbreviation(string target, vector<string>& dictionary) {
        int len=target.size();
        if(len==0) return "";
        vector<string> dic;
        for(string s:dictionary) {
            if(s.size()==len) {
                dic.push_back(s);
            }
        }
        int len_d=dic.size();
        if(len_d==0) return intTostring(len);
        string res=target;
        dfs("",0,target,0,dic,res,len);
        return res;
    }

    void dfs(string cur,int cur_len,string& target,int pos,vector<string>&dic,string&res,int& minlen) {
        if(pos>=(int)target.size()) {
            if(cur_len<minlen) {  //剪枝
                bool f=true;
                for(string s:dic) {
                    if(check(s,cur)) {
                        f=false;break;
                    }
                }
                if(f){
                    res=cur;
                    minlen=cur_len;

                }
            }
            return;
        }
        if(minlen==cur_len) return;   //剪枝
        if(cur.empty()||!isdigit(cur.back())) {   //
            for(int i=target.size()-1;i>=pos;i--) {
                 string add=intTostring(i-pos+1);
                 dfs(cur+add,cur_len+1,target,i+1,dic,res,minlen);
            }
        }
        dfs(cur+target[pos],cur_len+1,target,pos+1,dic,res,minlen);
    }

    bool check(string s1,string s2) {
        int len1=s1.size();
        int len2=s2.size();
        int l=0,r=0;
        while(l<len1&&r<len2) {
            if(isdigit(s2[r])) {
                int dis=0;
                while(r<len2&&isdigit(s2[r])) {
                    dis=dis*10+s2[r++]-'0';
                }
                l+=dis;
            }
            else if(s2[r]=='0') return false;
            else {
                if(s1[l]==s2[r]) {
                    l++;r++;
                }
                else return false;
            }
         }
         if(l>=len1&&r>=len2) return true;
         return false;
    }

    string intTostring(int num) {
        stringstream ss;
        ss<<num;
        return ss.str();
    }

http://bookshadow.com/weblog/2016/10/02/leetcode-minimum-unique-word-abbreviation/

与上述解法同样的思路，可以将单词缩写的冲突检测用位运算实现。

将字典dictionary中的单词word通过如下函数转化为数字：

def toNumber(self, target, word):
    ans = 0
    for x in range(self.size):
        ans <<= 1
        ans += target[x] == word[x]
    return ans

例如假设target='apple'，word='amble'，则word的二进制数字为10011

a p p l e
a m b l e
1 0 0 1 1

单词缩写abbr通过如下原则转化为二进制数字：

缩写中的字母替换为二进制1，数字覆盖的位数替换为二进制0

例如单词manipulation的缩写m2ip6n可以转化为100110000001

m a n i p u l a t i o n
m  2  i p      6      n
1 0 0 1 1 0 0 0 0 0 0 1

检测单词缩写abbr与字典dicitonary中是否存在冲突通过下列函数实现：

def checkNumber(self, number):
    for w in self.wlist:
        if number & w == number:
            return False
    return True

将单词缩写abbr利用二进制转化为数字形式，例如"w3d"可以视为二进制数10001，即十进制17。

检测单词缩写abbr与字典dictionary中的单词d是否存在冲突，可以通过如下方式判断：

for d in dictionary:
    abbr & d == abbr

若循环中存在上式为真的情形，说明存在冲突。

DFS从target出发，逐一去除字母，检测冲突，若不存在冲突，则递归，并更新最优解。

剪枝策略：

利用变量len记录当前时刻的最优的单词缩写长度，若DFS分支的长度大于len，则可剪枝

def minAbbreviation(self, target, dictionary): """ :type target: str :type dictionary: List[str] :rtype: str """ self.size = len(target) self.wlist = [self.toNumber(target, d) \ for d in dictionary \ if len(d) == self.size] self.ans = (1 << self.size) - 1 self.len = self.size + 1 self.vset = set([self.ans]) self.dfs(self.ans, self.size) return self.toWord(self.ans) def dfs(self, number, depth): if depth >= self.len: return if not self.checkNumber(number): return self.ans = number self.len = depth for x in range(self.size): if (number >> x) & 1: next = number ^ (1 << x) if next not in self.vset: self.vset.add(next) self.dfs(next, depth - 1) def toNumber(self, target, word): ans = 0 for x in range(self.size - 1, -1, -1): ans <<= 1 ans += target[x] == word[x] return ans def toWord(self, number): ans = '' cnt = 0 for x in range(self.size): if number & 1: if cnt: ans += str(cnt) cnt = 0 ans += target[x] else: cnt += 1 number >>= 1 if cnt: ans += str(cnt) return ans def checkNumber(self, number): for w in self.wlist: if number & w == number: return False return True

X. Use Trie
https://github.com/fishercoder1534/Leetcode/blob/master/src/main/java/com/fishercoder/solutions/_411.java

class Trie {

Trie[] children = new Trie[26];

boolean isWord = false;

}

Trie root = new Trie();

List<String> abbrs;

public String minAbbreviation(String target, String[] dictionary) {

for (String s : dictionary) {

addTrie(s);

}

for (int i = 0; i < target.length(); i++) {

abbrs = new ArrayList<>();

abbrGenerator(target, 0, "", 0, i + 1);

for (String s : abbrs) {

if (search(s, root, 0, 0) == false) {

return s;

}

return "";

}

public void addTrie(String s) {

Trie cur = root;

for (int i = 0; i < s.length(); i++) {

char c = s.charAt(i);

if (cur.children[c - 'a'] == null) {

cur.children[c - 'a'] = new Trie();

}

cur = cur.children[c - 'a'];

}

cur.isWord = true;

}

public boolean search(String target, Trie root, int i, int loop) {

if (root == null) {

return false;

}

if (loop != 0) {

for (int a = 0; a < 26; a++) {

if (search(target, root.children[a], i, loop - 1)) {

return true;

}

return false;

}

if (i == target.length()) {

if (root.isWord) {

return true;

}

return false;

}

if (Character.isDigit(target.charAt(i))) {

int tmp = 0;

while (i < target.length() && Character.isDigit(target.charAt(i))) {

tmp = tmp * 10 + target.charAt(i) - '0';

i++;

}

return search(target, root, i, tmp);

} else {

return search(target, root.children[target.charAt(i) - 'a'], i + 1, 0);

}

public void abbrGenerator(String target, int i, String tmp, int abbr, int num) {

if (i == target.length()) {

if (num == 0 && abbr == 0) {

abbrs.add(tmp);

}

if (num == 1 && abbr != 0) {

abbrs.add(tmp + abbr);

}

return;

}

if (num <= 0) {

return;

}

char cur = target.charAt(i);

abbrGenerator(target, i + 1, abbr == 0 ? tmp + cur : tmp + abbr + cur, 0, abbr == 0 ? num - 1 : num - 2);

abbrGenerator(target, i + 1, tmp, abbr + 1, num);

}

X. Trie + BiSection
https://discuss.leetcode.com/topic/61982/java-dfs-trie-binary-search-90ms
https://scottduan.gitbooks.io/leetcode-review/content/minimum_unique_word_abbreviation.html
https://discuss.leetcode.com/topic/61346/trie-bruteforce

Use Trie to build a dictionary with a function to check abbreviation.
Use DFS with backtracking to generate the abbreviations of a given length.
Use binary search to find the smallest possible length.

-- binary search(bisection) seems good idea

-- seem not really good to use trie to check whether it's abbr.

    class Node{ // Trie Node
        Node[] nodes;
        boolean isWord;
        Node(){
            nodes = new Node[26];
            isWord = false;
        }
        void add(String str){ // add a word to Trie
            if (str.length() == 0) isWord=true; // end of a word
            else {
                int idx = str.charAt(0)-'a'; // insert a new node
                if (nodes[idx] == null) nodes[idx] = new Node();
                nodes[idx].add(str.substring(1));
            }
        }
        boolean isAbbr(String abbr, int num){
            if ( num > 0){ // number of '*'
                for (Node node : nodes){ 
                    if (node != null && node.isAbbr(abbr, num-1)) return true; 
                }
                return false; // not exist in the dictionary
            } else {
                if (abbr.length()==0) return isWord; // at the end of the addr
                int idx=0; // get the number of '*' at the start of the abbr
                while (idx < abbr.length() && abbr.charAt(idx) >='0' && abbr.charAt(idx) <='9' ) {
                    num = (num*10) + (abbr.charAt(idx++)-'0'); 
                }
                if (num>0) return isAbbr(abbr.substring(idx),num); // start with number
                else { // start with non-number
                    if (nodes[abbr.charAt(0)-'a'] != null )   
                        return nodes[abbr.charAt(0)-'a'].isAbbr(abbr.substring(1), 0);
                    else return false; // not exist in the dictionary 
                }
            }
        }
    }
    
    void getAbbs(char[] cc, int s, int len, StringBuilder sb, List<String> abbs){ //DFS with backtracking
        boolean preNum = (sb.length() > 0 ) && (sb.charAt(sb.length()-1) >= '0') && (sb.charAt(sb.length()-1) <= '9');
        if (len == 1)  { 
            if ( s  < cc.length) {
                if (s==cc.length-1) abbs.add(sb.toString() + cc[s]); // add one char
                if (! preNum ) abbs.add(sb.toString() + (cc.length-s) ); // add a number
            }
        } else if (len > 1 ) {
            int last = sb.length();
            for (int i=s+1; i < cc.length; i++ ){
                if (! preNum) { // add a number
                    sb.append(i-s);
                    getAbbs(cc, i, len-1, sb, abbs);
                    sb.delete(last, sb.length());
                }
                if (i==s+1) { // add one char
                    sb.append(cc[s]);
                    getAbbs(cc, i, len-1, sb, abbs);
                    sb.delete(last, sb.length());
                }
            }
        }
    }
    
    public String minAbbreviation(String target, String[] dictionary) {
        List<String> dict = new ArrayList();
        int len = target.length();
        for (String str : dictionary) if (str.length() == len ) dict.add(str);
        if (dict.isEmpty()) return ""+len;
        Node root = new Node();
        for (String str : dict) root.add(str);
        char[] cc = target.toCharArray();
        String ret = null;

        int min = 1, max = len; 
        while (max >= min) {
            int mid = min+( (max-min)/2 );
            List<String> abbs = new ArrayList();
            getAbbs(cc, 0, mid, new StringBuilder(), abbs);
            boolean conflict = true;
            for (String abbr: abbs){
                if ( ! root.isAbbr(abbr,0) ) {
                    conflict = false;
                    ret = abbr;
                    break;
                } 
            }
            if (conflict) {
                min = mid+1;
            } else {
                max = mid-1;
            }
        }
        return ret;
    }

https://www.jiuzhang.com/qa/1176/

LeetCode 411 - Minimum Unique Word Abbreviation

Labels

Popular Posts