LeetCode 467 - Unique Substrings in Wraparound String


https://leetcode.com/problems/unique-substrings-in-wraparound-string/
Consider the string s to be the infinite wraparound string of "abcdefghijklmnopqrstuvwxyz", so s will look like this: "...zabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcd....".
Now we have another string p. Your job is to find out how many unique non-empty substrings of p are present in s. In particular, your input is the string p and you need to output the number of different non-empty substrings of p in the string s.
Note: p consists of only lowercase English letters and the size of p might be over 10000.
Example 1:
Input: "a"
Output: 1

Explanation: Only the substring "a" of string "a" is in the string  s.
Example 2:
Input: "cac"
Output: 2
Explanation: There are two substrings "a", "c" of string "cac" in the string s.
Example 3:
Input: "zab"
Output: 6
Explanation: There are six substrings "z", "a", "b", "za", "ab", "zab" of string "zab" in the string s.

X.
https://leetcode.com/problems/unique-substrings-in-wraparound-string/discuss/95439/Concise-Java-solution-using-DP
The idea is, if we know the max number of unique substrings in p ends with 'a', 'b', ..., 'z', then the summary of them is the answer. Why is that?
  1. The max number of unique substring ends with a letter equals to the length of max contiguous substring ends with that letter. Example "abcd", the max number of unique substring ends with 'd' is 4, apparently they are "abcd", "bcd", "cd" and "d".
  2. If there are overlapping, we only need to consider the longest one because it covers all the possible substrings. Example: "abcdbcd", the max number of unique substring ends with 'd' is 4 and all substrings formed by the 2nd "bcd" part are covered in the 4 substrings already.
  3. No matter how long is a contiguous substring in p, it is in s since s has infinite length.
  4. Now we know the max number of unique substrings in p ends with 'a', 'b', ..., 'z' and those substrings are all in s. Summary is the answer, according to the question.
    public int findSubstringInWraproundString(String p) {
        // count[i] is the maximum unique substring end with ith letter.
        // 0 - 'a', 1 - 'b', ..., 25 - 'z'.
        int[] count = new int[26];
        
        // store longest contiguous substring ends at current position.
        int maxLengthCur = 0; 

        for (int i = 0; i < p.length(); i++) {
            if (i > 0 && (p.charAt(i) - p.charAt(i - 1) == 1 || (p.charAt(i - 1) - p.charAt(i) == 25))) {
                maxLengthCur++;
            }
            else {
                maxLengthCur = 1;
            }
            
            int index = p.charAt(i) - 'a';
            count[index] = Math.max(count[index], maxLengthCur);
        }
        
        // Sum to get result
        int sum = 0;
        for (int i = 0; i < 26; i++) {
            sum += count[i];
        }
        return sum;
    }
这道题最直接的方法就是遍历,时间复杂度是O(n2),这个是没有通过的
针对这道题来说,以一个字符作为字符串的结尾,只要知道连续的长度有多少,那么以这个字符结尾的子串就有多少个,这样只需要遍历一遍就可以了
public int findSubstringInWraproundString(String p) {
// count[i] is the maximum unique substring end with ith letter.
// 0 - 'a', 1 - 'b', ..., 25 - 'z'.
int[] count = new int[26];
// store longest contiguous substring ends at current position.
int maxLengthCur = 0;
for (int i = 0; i < p.length(); i++) {
if (i > 0
&& (p.charAt(i) - p.charAt(i - 1) == 1 || (p.charAt(i - 1)
- p.charAt(i) == 25))) {
maxLengthCur++;
} else {
maxLengthCur = 1;
}
int index = p.charAt(i) - 'a';
count[index] = Math.max(count[index], maxLengthCur);
}
// Sum to get result
int sum = 0;
for (int i = 0; i < 26; i++) {
sum += count[i];
}
return sum;
}
http://www.cnblogs.com/grandyang/p/6143071.html
这道题说有一个无限长的封装字符串,然后又给了我们另一个字符串p,问我们p有多少非空子字符串在封装字符串中。我们通过观察题目中的例子可以发现,由于封装字符串是26个字符按顺序无限循环组成的,那么满足题意的p的子字符串要么是单一的字符,要么是按字母顺序的子字符串。这道题遍历p的所有子字符串会TLE,因为如果p很大的话,子字符串很多,会有大量的满足题意的重复子字符串,必须要用到trick,而所谓技巧就是一般来说你想不到的方法。我们看abcd这个字符串,以d结尾的子字符串有abcd, bcd, cd, d,那么我们可以发现bcd或者cd这些以d结尾的字符串的子字符串都包含在abcd中,那么我们知道以某个字符结束的最大字符串包含其他以该字符结束的字符串的所有子字符串,说起来很拗口,但是理解了我上面举的例子就行。那么题目就可以转换为分别求出以每个字符(a-z)为结束字符的最长连续字符串就行了,我们用一个数组cnt记录下来,最后在求出数组cnt的所有数字之和就是我们要的结果啦
http://blog.csdn.net/caoyan_12727/article/details/53510574
前面的"abc"会出现的子串有:a,b,c,ab,bc,abc;
后面的“abcd”会出现的子串有:a,b,c,d,ab,bc,cd,abc,bcd,abcd,在应用普通方法进行累加的时候,会出现重复统计,当时第一反应是建立一个hashmap,但是这样空间复杂度很大且还要将每一个出现过的子串添加到hashmap中,实现难度较大;
但是,换个角度思考,在字符串种p中,出现的字母也就是26个,如果我们求得以每个字母开头的子串,然后再将它们加起来,就可以得到所有的子串,比如说:
example: xyzabcefg
longest(x) = 6, ==> we get 6 sub strings started from x, x xy xyz xyza xyzab xyzabc
longest(y) = 5 ==> y yz yza yzab yzabc
longest(c) = longest(g) = 1 ==> c, g
这样问题就可以变得简单了。此时就是如何求得longest(a)。对于上述的"abcabcd"我们可以求得两个以a为开头的所有子串的长度分别是6和10,这时我们会取较大值,因为只要是连续出现的以a为开头的字符串的形式一定是"abcd...xyzabc..."所以长度长的串一定包含长度短的所有子串,就像abcd的子串包含abc的所有子串一样。。。
  1.     int findSubstringInWraproundString(string p) {  
  2.         vector<int> letters(26, 0);  
  3.         int res = 0, len = 0;  
  4.         for (int i = 0; i < p.size(); i++) {  
  5.             int cur = p[i] - 'a';  
  6.             if (i > 0 && p[i - 1] != (cur + 26 - 1) % 26 + 'a') len = 0;  
  7.             if (++len > letters[cur]) {  
  8.                 res += len - letters[cur];//\\  
  9.                 letters[cur] = len;  
  10.             }  
  11.         }  
  12.         return res;  
  13.     } 
https://discuss.leetcode.com/topic/70654/c-concise-solution
Letters[i] represents the longest consecutive substring ended with chr(97+i), update res only when current L is bigger than letters[curr].
int findSubstringInWraproundString(string p) {
        vector<int> letters(26, 0);
        int res = 0, len = 0;
        for (int i = 0; i < p.size(); i++) {
            int cur = p[i] - 'a';
            if (i > 0 && p[i - 1] != (cur + 26 - 1) % 26 + 'a') len = 0;
            if (++len > letters[cur]) {
                res += len - letters[cur];
                letters[cur] = len;
            }
        }
        return res;
    }
https://discuss.leetcode.com/topic/70705/java-two-different-solutions-with-explanation
https://discuss.leetcode.com/topic/76020/evolve-from-brute-force-to-optimal
  1. O(n^3) check each substr, a hashtable is used to remove duplicate strings.
    int findSubstringInWraproundString(string p) {
        int n = p.size();
        unordered_set<string> ht;
        for(int i=0;i<n;i++)
            for(int j=i;j<n;j++) {
                if(j>i && p[j-1]+1!=p[j] && p[j-1]-p[j]!=25) break;
                ht.insert(p.substr(i,j-i+1));
            }
        return ht.size();
    }
  1. O(n^2 logn), Each valid substr can be represented by the first char and the length, instead of the whole string.
    int findSubstringInWraproundString(string p) {
        int n = p.size();
        set<pair<char,int>> bst;
        for(int i=0;i<n;i++)
            for(int j=i;j<n;j++) {
                if(j>i && p[j-1]+1!=p[j] && p[j-1]-p[j]!=25) break;
                bst.insert(pair<char,int>(p[i],j-i+1));
            }
        return bst.size();
    }
  1. O(n^2). For substrs starting at the same char, we only need to record the longest one. Because it covers all the shorter substrs starting from the char. The length is the number of substrings starting at the char.
    int findSubstringInWraproundString(string p) {
        int n = p.size(), len[26]={0};
        for(int i=0;i<n;i++)
            for(int j=i;j<n;j++) {
                if(j>i && p[j-1]+1!=p[j] && p[j-1]-p[j]!=25) break;
                len[p[i]-'a'] = max(len[p[i]-'a'],j-i+1);
            }
        return accumulate(len,len+26,0);
    }
  1. O(n). Getting the longest substr starting from each char can be done in linear time. We can use two pointers to keep track of the current valid substring.
    int findSubstringInWraproundString(string p) {
        int len[26]={0}, i = 0, n = p.size();
        for(int j=0;j<n;j++)
            if(j>i && p[j-1]+1!=p[j] && p[j-1]-p[j]!=25) {
                for(int k=i;k<min(j,i+26);k++) len[p[k]-'a'] = max(len[p[k]-'a'],j-k);
                i=j--;
            }
        for(int k=i;k<min(n,i+26);k++) len[p[k]-'a'] = max(len[p[k]-'a'],n-k);
        return accumulate(len,len+26,0);
    }
X. DP - no need to use dp[] at all
https://www.xiadong.info/2016/12/05/leetcode-467-unique-substrings-in-wraparound-string/
dp[i]记录以p[i]为结尾的符合要求的子串的长度. 为了防止重复, 用另一个数组tail以字母为索引保存结果中以某个字母结尾的最长的子串的长度. 当p[i]p[i-1]是连续的时候, p[i]=p[i-1]; 否则p[i]=1. 再检查tail[p[i]-'a']dp[i]的大小关系, 若tail[p[i]-'a']>=dp[i], 说明p[i]结尾的所有子串都已经在结果中了; 否则更新tail[p[i]-'a']=dp[i], 增加结果集中以p[i]这个字母结尾的子串的数量. 最后再对tail数组求和就得到结果
int findSubstringInWraproundString(string p) {
if(p.empty()) return 0;
vector<int> dp(p.length(), 0);
vector<int> tail(26, 0);
dp[0] = 1;
tail[p[0] - 'a'] = 1;
for(int i = 1; i < p.length(); i++){
if((p[i - 1] != 'z' && p[i] == p[i - 1] + 1) || (p[i - 1] == 'z' && p[i] == 'a')){
dp[i] = dp[i - 1] + 1;
}
else{
dp[i] = 1;
}
int index = p[i] - 'a';
tail[index] = max(tail[index], dp[i]);
}
int ans = 0;
for(auto i : tail){
ans += i;
}
return ans;
}
http://bookshadow.com/weblog/2016/12/04/leetcode-unique-substrings-in-wraparound-string/
按照子串的首字母分类计数
用字典cmap记录以某字母开始的子串的最大长度
遍历字符串p,维护区间[start, end],p[start ... end]是无限循环字符串s的一部分
更新p[start], p[start + 1], ... p[end]在cmap中的长度值
最终结果为cmap中value的和
def findSubstringInWraproundString(self, p): """ :type p: str :rtype: int """ pattern = 'zabcdefghijklmnopqrstuvwxyz' cmap = collections.defaultdict(int) start = end = 0 for c in range(len(p)): if c and p[c-1:c+1] not in pattern: for x in range(start, end): cmap[p[x]] = max(end - x, cmap[p[x]]) start = c end = c + 1 for x in range(start, end): cmap[p[x]] = max(end - x, cmap[p[x]]) return sum(cmap.values())
将上述代码中cmap的含义变更为记录以某字母结尾的子串的最大长度,可以使代码简化。
def findSubstringInWraproundString(self, p): """ :type p: str :rtype: int """ pattern = 'zabcdefghijklmnopqrstuvwxyz' cmap = collections.defaultdict(int) clen = 0 for c in range(len(p)): if c and p[c-1:c+1] not in pattern: clen = 1 else: clen += 1 cmap[p[c]] = max(clen, cmap[p[c]]) return sum(cmap.values())
http://stackoverflow.com/questions/40955470/unique-substrings-in-wrap-around-strings

Labels

LeetCode (1432) GeeksforGeeks (1122) LeetCode - Review (1067) Review (882) Algorithm (668) to-do (609) Classic Algorithm (270) Google Interview (237) Classic Interview (222) Dynamic Programming (220) DP (186) Bit Algorithms (145) POJ (141) Math (137) Tree (132) LeetCode - Phone (129) EPI (122) Cracking Coding Interview (119) DFS (115) Difficult Algorithm (115) Lintcode (115) Different Solutions (110) Smart Algorithm (104) Binary Search (96) BFS (91) HackerRank (90) Binary Tree (86) Hard (79) Two Pointers (78) Stack (76) Company-Facebook (75) BST (72) Graph Algorithm (72) Time Complexity (69) Greedy Algorithm (68) Interval (63) Company - Google (62) Geometry Algorithm (61) Interview Corner (61) LeetCode - Extended (61) Union-Find (60) Trie (58) Advanced Data Structure (56) List (56) Priority Queue (53) Codility (52) ComProGuide (50) LeetCode Hard (50) Matrix (50) Bisection (48) Segment Tree (48) Sliding Window (48) USACO (46) Space Optimization (45) Company-Airbnb (41) Greedy (41) Mathematical Algorithm (41) Tree - Post-Order (41) ACM-ICPC (40) Algorithm Interview (40) Data Structure Design (40) Graph (40) Backtracking (39) Data Structure (39) Jobdu (39) Random (39) Codeforces (38) Knapsack (38) LeetCode - DP (38) Recursive Algorithm (38) String Algorithm (38) TopCoder (38) Sort (37) Introduction to Algorithms (36) Pre-Sort (36) Beauty of Programming (35) Must Known (34) Binary Search Tree (33) Follow Up (33) prismoskills (33) Palindrome (32) Permutation (31) Array (30) Google Code Jam (30) HDU (30) Array O(N) (29) Logic Thinking (29) Monotonic Stack (29) Puzzles (29) Code - Detail (27) Company-Zenefits (27) Microsoft 100 - July (27) Queue (27) Binary Indexed Trees (26) TreeMap (26) to-do-must (26) 1point3acres (25) GeeksQuiz (25) Merge Sort (25) Reverse Thinking (25) hihocoder (25) Company - LinkedIn (24) Hash (24) High Frequency (24) Summary (24) Divide and Conquer (23) Proof (23) Game Theory (22) Topological Sort (22) Lintcode - Review (21) Tree - Modification (21) Algorithm Game (20) CareerCup (20) Company - Twitter (20) DFS + Review (20) DP - Relation (20) Brain Teaser (19) DP - Tree (19) Left and Right Array (19) O(N) (19) Sweep Line (19) UVA (19) DP - Bit Masking (18) LeetCode - Thinking (18) KMP (17) LeetCode - TODO (17) Probabilities (17) Simulation (17) String Search (17) Codercareer (16) Company-Uber (16) Iterator (16) Number (16) O(1) Space (16) Shortest Path (16) itint5 (16) DFS+Cache (15) Dijkstra (15) Euclidean GCD (15) Heap (15) LeetCode - Hard (15) Majority (15) Number Theory (15) Rolling Hash (15) Tree Traversal (15) Brute Force (14) Bucket Sort (14) DP - Knapsack (14) DP - Probability (14) Difficult (14) Fast Power Algorithm (14) Pattern (14) Prefix Sum (14) TreeSet (14) Algorithm Videos (13) Amazon Interview (13) Basic Algorithm (13) Codechef (13) Combination (13) Computational Geometry (13) DP - Digit (13) LCA (13) LeetCode - DFS (13) Linked List (13) Long Increasing Sequence(LIS) (13) Math-Divisible (13) Reservoir Sampling (13) mitbbs (13) Algorithm - How To (12) Company - Microsoft (12) DP - Interval (12) DP - Multiple Relation (12) DP - Relation Optimization (12) LeetCode - Classic (12) Level Order Traversal (12) Prime (12) Pruning (12) Reconstruct Tree (12) Thinking (12) X Sum (12) AOJ (11) Bit Mask (11) Company-Snapchat (11) DP - Space Optimization (11) Dequeue (11) Graph DFS (11) MinMax (11) Miscs (11) Princeton (11) Quick Sort (11) Stack - Tree (11) 尺取法 (11) 挑战程序设计竞赛 (11) Coin Change (10) DFS+Backtracking (10) Facebook Hacker Cup (10) Fast Slow Pointers (10) HackerRank Easy (10) Interval Tree (10) Limited Range (10) Matrix - Traverse (10) Monotone Queue (10) SPOJ (10) Starting Point (10) States (10) Stock (10) Theory (10) Tutorialhorizon (10) Kadane - Extended (9) Mathblog (9) Max-Min Flow (9) Maze (9) Median (9) O(32N) (9) Quick Select (9) Stack Overflow (9) System Design (9) Tree - Conversion (9) Use XOR (9) Book Notes (8) Company-Amazon (8) DFS+BFS (8) DP - States (8) Expression (8) Longest Common Subsequence(LCS) (8) One Pass (8) Quadtrees (8) Traversal Once (8) Trie - Suffix (8) 穷竭搜索 (8) Algorithm Problem List (7) All Sub (7) Catalan Number (7) Cycle (7) DP - Cases (7) Facebook Interview (7) Fibonacci Numbers (7) Flood fill (7) Game Nim (7) Graph BFS (7) HackerRank Difficult (7) Hackerearth (7) Inversion (7) Kadane’s Algorithm (7) Manacher (7) Morris Traversal (7) Multiple Data Structures (7) Normalized Key (7) O(XN) (7) Radix Sort (7) Recursion (7) Sampling (7) Suffix Array (7) Tech-Queries (7) Tree - Serialization (7) Tree DP (7) Trie - Bit (7) 蓝桥杯 (7) Algorithm - Brain Teaser (6) BFS - Priority Queue (6) BFS - Unusual (6) Classic Data Structure Impl (6) DP - 2D (6) DP - Monotone Queue (6) DP - Unusual (6) DP-Space Optimization (6) Dutch Flag (6) How To (6) Interviewstreet (6) Knapsack - MultiplePack (6) Local MinMax (6) MST (6) Minimum Spanning Tree (6) Number - Reach (6) Parentheses (6) Pre-Sum (6) Probability (6) Programming Pearls (6) Rabin-Karp (6) Reverse (6) Scan from right (6) Schedule (6) Stream (6) Subset Sum (6) TSP (6) Xpost (6) n00tc0d3r (6) reddit (6) AI (5) Abbreviation (5) Anagram (5) Art Of Programming-July (5) Assumption (5) Bellman Ford (5) Big Data (5) Code - Solid (5) Code Kata (5) Codility-lessons (5) Coding (5) Company - WMware (5) Convex Hull (5) Crazyforcode (5) DFS - Multiple (5) DFS+DP (5) DP - Multi-Dimension (5) DP-Multiple Relation (5) Eulerian Cycle (5) Graph - Unusual (5) Graph Cycle (5) Hash Strategy (5) Immutability (5) Java (5) LogN (5) Manhattan Distance (5) Matrix Chain Multiplication (5) N Queens (5) Pre-Sort: Index (5) Quick Partition (5) Quora (5) Randomized Algorithms (5) Resources (5) Robot (5) SPFA(Shortest Path Faster Algorithm) (5) Shuffle (5) Sieve of Eratosthenes (5) Strongly Connected Components (5) Subarray Sum (5) Sudoku (5) Suffix Tree (5) Swap (5) Threaded (5) Tree - Creation (5) Warshall Floyd (5) Word Search (5) jiuzhang (5)

Popular Posts