Problem C. Welcome to Code Jam - Qualification Round 2009 - Google Code Jam


Dashboard - Qualification Round 2009 - Google Code Jam
Problem
So you've registered. We sent you a welcoming email, to welcome you to code jam. But it's possible that you still don't feel welcomed to code jam. That's why we decided to name a problem "welcome to code jam." After solving this problem, we hope that you'll feel very welcome. Very welcome, that is, to code jam.
If you read the previous paragraph, you're probably wondering why it's there. But if you read it very carefully, you might notice that we have written the words "welcome to code jam" several times: 400263727 times in total. After all, it's easy to look through the paragraph and find a 'w'; then find an 'e' later in the paragraph; then find an 'l' after that, and so on. Your task is to write a program that can take any text and print out how many times that text contains the phrase "welcome to code jam".
To be more precise, given a text string, you are to determine how many times the string "welcome to code jam" appears as a sub-sequence of that string. In other words, find a sequence s of increasing indices into the input string such that the concatenation of input[s[0]], input[s[1]], ..., input[s[18]] is the string "welcome to code jam".
The result of your calculation might be huge, so for convenience we would only like you to find the last 4 digits.
Input
The first line of input gives the number of test cases, N. The next N lines of input contain one test case each. Each test case is a single line of text, containing only lower-case letters and spaces. No line will start with a space, and no line will end with a space.

Official Analysis: https://code.google.com/codejam/contest/90101/dashboard#s=a&a=2
The word we want to find is S = "welcome to code jam", in a long string T. In fact the solution is not very different when we want to find any S. It is actually illustrative to picture the cases for short words.
In case S is just a single character, you just need to count how many times this character appears in T. If S = "xy" is a string of length 2, instead of brute force all the possible positions, one can do it in linear time, start from left to the right. For each occurrence of 'y', one needs to know how many 'x's appeared before that 'y'.
The general solution follows this pattern. Let us again use S = "welcome to code jam" as an example. The formal solution will be clear from the example; and you can always download the good solutions (with nice programming techniques) from the scoreboard.
So, let us define, for each position i in TT(i) to be the string consists of the first i characters of T. And write
  • Dp[i,1]: How many times we can find "w" in T(i)?
  • Dp[i,2]: How many times we can find "we" in T(i)?
  • Dp[i,3]: How many times we can find "wel" in T(i)?
  • Dp[i,4]: How many times we can find "welc" in T(i)?
  • ...
  • Dp[i,18]: How many times we can find "welcome to code ja" in T(i)?
  • Dp[i,19]: How many times we can find "welcome to code jam" in T(i)?
Assume Dp[i,j] is computed for each j, let us see how easy we can compute, say, Dp[i+1,4]:
  1. If the (i+1)-th character of T is not 'c', then Dp[i+1,4] = Dp[i,4].
  2. If the (i+1)-th character of T is 'c', then we can include all the "welc"s found in T(i), as well as those "welc"s ends exactly on the (i+1)-th character, so Dp[i+1,4] = Dp[i,4] + Dp[i,3].
Finally, let n be the length of the text T, Dp[n,19] will be our answer.

        private void solve() {
            char[] pattern = "welcome to code jam".toCharArray();
            int patternN = pattern.length;
            int MODULO = 10000;

            char[] line = nextLine().toCharArray();

            int[] d = new int[patternN + 1];
            d[0] = 1;

            for (char c : line) {
                for (int i = patternN - 1; i >= 0; i--) {
                    if (c == pattern[i]) {
                        d[i + 1] += d[i];
                        d[i + 1] %= MODULO;
                    }
                }
            }

            String answer = Integer.toString(d[patternN]);
            while (answer.length() < 4) {
                answer = "0" + answer;
            }
            out.print(' ');
            out.print(answer);
        }

    }
http://wilanw.blogspot.com/2009/09/google-code-jam-2009-qualification.html
Hence your allowed to skip characters in T so long as you use the all the characters in P in order. This easily lends itself to a recursive solution, we just need to keep track of 2 indexes: the index of P and the index of T. This alone let's us to determine how far we are within a particular instance of the problem and lets us determine what next character in P we need to match within T. There are two conditions in which we end the recursion: we have reached the end of P (i.e. this is a valid occurrence of P inside T), we have reached the end of T but there are still characters to process in P (i.e. invalid occurrence of P in T as we haven't finished laying out the characters in P on T). We return 1 and 0 on the respective cases.

This solution will timeout for the large input as the number of possible occurrences skyrockets (this should be evident enough from the problem statement which states 400263727 occurrences of "welcome to code jam" in the small paragraph). The simple optimisation is to use memoisation and cache the indexes so we don't re-compute sub-problems we have already calculated before. Another approach is to convert the memoisation/recurrence relation into a bottom-up dynamic programming algorithm. A minor implementation note is keeping track of the last 4 digits, the safest and easiest is to use modulo arithmetic: we simply mod all our calculations with 10000. Note that it is not sufficient to do this at the end (before outputting) as the calculations may have already overflowed the integer type and modding it by 10000 will just yield the incorrect solution.
int dp[511][21];   // memo cache
string lineR;      // used to keep track of the line we are given
string textR = "welcome to code jam";   // the phrase we are looking for

int calc(int lineIdx, int textIdx) {
   // base cases
   if (textIdx >= textR.size()) return 1;
   if (lineIdx >= lineR.size()) return 0;
   // seen it before - return the previously computed value
   if (dp[lineIdx][textIdx] != -1) return dp[lineIdx][textIdx];
   dp[lineIdx][textIdx] = 0;
   for (int i = lineIdx; i < lineR.size(); i++) {
      // valid transition - add all the combinations
      if (lineR[i] == textR[textIdx]) {
         dp[lineIdx][textIdx] = (dp[lineIdx][textIdx] + calc(i+1, textIdx+1)) % 10000;
      }
   }
   return dp[lineIdx][textIdx] % 10000;
}
http://stevekrenzel.com/articles/welcome_to_code_jam
http://codejamdaemon.blogspot.com/2012/03/qualification-round-2009-alien-language.html
S = 'welcome to code jam'
lS = len(S)
T = int(stdin.next().strip())
for t in xrange(1, T+1):
    L = stdin.next().strip()
    lL = len(L)
    cache = -1 * np.ones((501,20), 'int')
    def count(l,s):
        if cache[l,s] == -1:
            if s==lS:
                cache[l,s] = 1
            else:
                try:
                    f = L[l:].index(S[s])
                    cache[l,s] = (count(l+f+1,s) + count(l+f+1,s+1)) % 10000
                except ValueError:
                    cache[l,s] = 0
        return cache[l,s]
    print 'Case #%d: %0.4d' % (t, count(0,0))
Read full article from Dashboard - Qualification Round 2009 - Google Code Jam

Labels

LeetCode (1432) GeeksforGeeks (1122) LeetCode - Review (1067) Review (882) Algorithm (668) to-do (609) Classic Algorithm (270) Google Interview (237) Classic Interview (222) Dynamic Programming (220) DP (186) Bit Algorithms (145) POJ (141) Math (137) Tree (132) LeetCode - Phone (129) EPI (122) Cracking Coding Interview (119) DFS (115) Difficult Algorithm (115) Lintcode (115) Different Solutions (110) Smart Algorithm (104) Binary Search (96) BFS (91) HackerRank (90) Binary Tree (86) Hard (79) Two Pointers (78) Stack (76) Company-Facebook (75) BST (72) Graph Algorithm (72) Time Complexity (69) Greedy Algorithm (68) Interval (63) Company - Google (62) Geometry Algorithm (61) Interview Corner (61) LeetCode - Extended (61) Union-Find (60) Trie (58) Advanced Data Structure (56) List (56) Priority Queue (53) Codility (52) ComProGuide (50) LeetCode Hard (50) Matrix (50) Bisection (48) Segment Tree (48) Sliding Window (48) USACO (46) Space Optimization (45) Company-Airbnb (41) Greedy (41) Mathematical Algorithm (41) Tree - Post-Order (41) ACM-ICPC (40) Algorithm Interview (40) Data Structure Design (40) Graph (40) Backtracking (39) Data Structure (39) Jobdu (39) Random (39) Codeforces (38) Knapsack (38) LeetCode - DP (38) Recursive Algorithm (38) String Algorithm (38) TopCoder (38) Sort (37) Introduction to Algorithms (36) Pre-Sort (36) Beauty of Programming (35) Must Known (34) Binary Search Tree (33) Follow Up (33) prismoskills (33) Palindrome (32) Permutation (31) Array (30) Google Code Jam (30) HDU (30) Array O(N) (29) Logic Thinking (29) Monotonic Stack (29) Puzzles (29) Code - Detail (27) Company-Zenefits (27) Microsoft 100 - July (27) Queue (27) Binary Indexed Trees (26) TreeMap (26) to-do-must (26) 1point3acres (25) GeeksQuiz (25) Merge Sort (25) Reverse Thinking (25) hihocoder (25) Company - LinkedIn (24) Hash (24) High Frequency (24) Summary (24) Divide and Conquer (23) Proof (23) Game Theory (22) Topological Sort (22) Lintcode - Review (21) Tree - Modification (21) Algorithm Game (20) CareerCup (20) Company - Twitter (20) DFS + Review (20) DP - Relation (20) Brain Teaser (19) DP - Tree (19) Left and Right Array (19) O(N) (19) Sweep Line (19) UVA (19) DP - Bit Masking (18) LeetCode - Thinking (18) KMP (17) LeetCode - TODO (17) Probabilities (17) Simulation (17) String Search (17) Codercareer (16) Company-Uber (16) Iterator (16) Number (16) O(1) Space (16) Shortest Path (16) itint5 (16) DFS+Cache (15) Dijkstra (15) Euclidean GCD (15) Heap (15) LeetCode - Hard (15) Majority (15) Number Theory (15) Rolling Hash (15) Tree Traversal (15) Brute Force (14) Bucket Sort (14) DP - Knapsack (14) DP - Probability (14) Difficult (14) Fast Power Algorithm (14) Pattern (14) Prefix Sum (14) TreeSet (14) Algorithm Videos (13) Amazon Interview (13) Basic Algorithm (13) Codechef (13) Combination (13) Computational Geometry (13) DP - Digit (13) LCA (13) LeetCode - DFS (13) Linked List (13) Long Increasing Sequence(LIS) (13) Math-Divisible (13) Reservoir Sampling (13) mitbbs (13) Algorithm - How To (12) Company - Microsoft (12) DP - Interval (12) DP - Multiple Relation (12) DP - Relation Optimization (12) LeetCode - Classic (12) Level Order Traversal (12) Prime (12) Pruning (12) Reconstruct Tree (12) Thinking (12) X Sum (12) AOJ (11) Bit Mask (11) Company-Snapchat (11) DP - Space Optimization (11) Dequeue (11) Graph DFS (11) MinMax (11) Miscs (11) Princeton (11) Quick Sort (11) Stack - Tree (11) 尺取法 (11) 挑战程序设计竞赛 (11) Coin Change (10) DFS+Backtracking (10) Facebook Hacker Cup (10) Fast Slow Pointers (10) HackerRank Easy (10) Interval Tree (10) Limited Range (10) Matrix - Traverse (10) Monotone Queue (10) SPOJ (10) Starting Point (10) States (10) Stock (10) Theory (10) Tutorialhorizon (10) Kadane - Extended (9) Mathblog (9) Max-Min Flow (9) Maze (9) Median (9) O(32N) (9) Quick Select (9) Stack Overflow (9) System Design (9) Tree - Conversion (9) Use XOR (9) Book Notes (8) Company-Amazon (8) DFS+BFS (8) DP - States (8) Expression (8) Longest Common Subsequence(LCS) (8) One Pass (8) Quadtrees (8) Traversal Once (8) Trie - Suffix (8) 穷竭搜索 (8) Algorithm Problem List (7) All Sub (7) Catalan Number (7) Cycle (7) DP - Cases (7) Facebook Interview (7) Fibonacci Numbers (7) Flood fill (7) Game Nim (7) Graph BFS (7) HackerRank Difficult (7) Hackerearth (7) Inversion (7) Kadane’s Algorithm (7) Manacher (7) Morris Traversal (7) Multiple Data Structures (7) Normalized Key (7) O(XN) (7) Radix Sort (7) Recursion (7) Sampling (7) Suffix Array (7) Tech-Queries (7) Tree - Serialization (7) Tree DP (7) Trie - Bit (7) 蓝桥杯 (7) Algorithm - Brain Teaser (6) BFS - Priority Queue (6) BFS - Unusual (6) Classic Data Structure Impl (6) DP - 2D (6) DP - Monotone Queue (6) DP - Unusual (6) DP-Space Optimization (6) Dutch Flag (6) How To (6) Interviewstreet (6) Knapsack - MultiplePack (6) Local MinMax (6) MST (6) Minimum Spanning Tree (6) Number - Reach (6) Parentheses (6) Pre-Sum (6) Probability (6) Programming Pearls (6) Rabin-Karp (6) Reverse (6) Scan from right (6) Schedule (6) Stream (6) Subset Sum (6) TSP (6) Xpost (6) n00tc0d3r (6) reddit (6) AI (5) Abbreviation (5) Anagram (5) Art Of Programming-July (5) Assumption (5) Bellman Ford (5) Big Data (5) Code - Solid (5) Code Kata (5) Codility-lessons (5) Coding (5) Company - WMware (5) Convex Hull (5) Crazyforcode (5) DFS - Multiple (5) DFS+DP (5) DP - Multi-Dimension (5) DP-Multiple Relation (5) Eulerian Cycle (5) Graph - Unusual (5) Graph Cycle (5) Hash Strategy (5) Immutability (5) Java (5) LogN (5) Manhattan Distance (5) Matrix Chain Multiplication (5) N Queens (5) Pre-Sort: Index (5) Quick Partition (5) Quora (5) Randomized Algorithms (5) Resources (5) Robot (5) SPFA(Shortest Path Faster Algorithm) (5) Shuffle (5) Sieve of Eratosthenes (5) Strongly Connected Components (5) Subarray Sum (5) Sudoku (5) Suffix Tree (5) Swap (5) Threaded (5) Tree - Creation (5) Warshall Floyd (5) Word Search (5) jiuzhang (5)

Popular Posts