Sunday, June 5, 2016

Princeton Part II - Burrows-Wheeler Data Compression - Programming Assignment 5


http://coursera.cs.princeton.edu/algs4/assignments/burrows.html
Implement the Burrows-Wheeler data compression algorithm. This revolutionary algorithm outcompresses gzip and PKZIP, is relatively easy to implement, and is not protected by any patents. It forms the basis of the Unix compression utility bzip2.
The Burrows-Wheeler compression algorithm consists of three algorithmic components, which are applied in succession:

  1. Burrows-Wheeler transform. Given a typical English text file, transform it into a text file in which sequences of the same character occur near each other many times.
  2. Move-to-front encoding. Given a text file in which sequences of the same character occur near each other many times, convert it into a text file in which certain characters appear more frequently than others.
  3. Huffman compression. Given a text file in which certain characters appear more frequently than others, compress it by encoding frequently occurring characters with short codewords and rare ones with long codewords.
Step 3 is the one that compresses the message: it is particularly effective because Steps 1 and 2 steps result in a text file in which certain characters appear much more frequently than others. To expand a message, apply the inverse operations in reverse order: first apply the Huffman expansion, then the move-to-front decoding, and finally the inverse Burrows-Wheeler transform. Your task is to implement Burrows-Wheeler and move-to-front components efficiently.
Binary input and binary output. To enable your programs to work with binary data, you will use BinaryStdIn and BinaryStdOut, which are described in Algorithms, 4th edition and part of algs4.jar. To display the binary output when debugging, you can use HexDump, which takes a command-line argument N, reads bytes from standard input and writes them to standard output in hexadecimal, N per line.
% more abra.txt
ABRACADABRA!

% java  edu.princeton.cs.algs4.HexDump 16 < abra.txt
41 42 52 41 43 41 44 41 42 52 41 21
96 bits
Note that in ASCII, 'A' is 41 (hex) and '!' is 21 (hex).
Huffman encoding and decoding. Huffman (Program 5.10 in Algorithms, 4th edition) implements the classic Huffman compression and expansion algorithms.
% java edu.princeton.cs.algs4.Huffman - < abra.txt | java  edu.princeton.cs.algs4.HexDump 16
50 4a 22 43 43 54 a8 40 00 00 01 8f 96 8f 94
120 bits
% java  edu.princeton.cs.algs4.Huffman - < abra.txt | java  edu.princeton.cs.algs4.Huffman +
ABRACADABRA!
You will not write any code for this step.
Move-to-front encoding and decoding. The main idea of move-to-front encoding is to maintain an ordered sequence of the characters in the alphabet, and repeatedly read in a character from the input message, print out the position in which that character appears, and move that character to the front of the sequence. As a simple example, if the initial ordering over a 6-character alphabet is A B C D E F, and we want to encode the inputCAAABCCCACCF, then we would update the move-to-front sequences as follows:
move-to-front    in   out
-------------    ---  ---
 A B C D E F      C    2 
 C A B D E F      A    1
 A C B D E F      A    0
 A C B D E F      A    0
 A C B D E F      B    2
 B A C D E F      C    2
 C B A D E F      C    0
 C B A D E F      C    0
 C B A D E F      A    2
 A C B D E F      C    1
 C A B D E F      C    0
 C A B D E F      F    5
 F C A B D E  
If the same character occurs next to each other many times in the input, then many of the output values will be small integers, such as 0, 1, and 2. The extremely high frequency of certain characters makes an ideal scenario for Huffman coding.

  • Move-to-front encoding. Your task is to maintain an ordered sequence of the 256 extended ASCII characters. Initialize the sequence by making the ith character in the sequence equal to the ith extended ASCII character. Now, read in each 8-bit character c from standard input one at a time, output the 8-bit index in the sequence where c appears, and move c to the front.
    % java MoveToFront - < abra.txt | java  edu.princeton.cs.algs4.HexDump 16
    41 42 52 02 44 01 45 01 04 04 02 26
    96 bits
    
  • Move-to-front decoding. Initialize an ordered sequence of 256 characters, where extended ASCII character i appears ith in the sequence. Now, read in each 8-bit character i (but treat it as an integer between 0 and 255) from standard input one at a time, write the ith character in the sequence, and move that character to the front. Check that the decoder recovers any encoded message.
    % java MoveToFront - < abra.txt | java MoveToFront +
    ABRACADABRA!
    
Name your program MoveToFront.java and organize it using the following API:
public class MoveToFront {
    // apply move-to-front encoding, reading from standard input and writing to standard output
    public static void encode()

    // apply move-to-front decoding, reading from standard input and writing to standard output
    public static void decode()

    // if args[0] is '-', apply move-to-front encoding
    // if args[0] is '+', apply move-to-front decoding
    public static void main(String[] args)
}
Performance requirements. The running time of move-to-front encoding and decoding should be proportional to R N (or better) in the worst case and proportional to N + R (or better) in practice on inputs that arise when compressing typical English text, where N is the number of characters in the input and R is the alphabet size.
Circular suffix array. To efficiently implement the key component in the Burrows-Wheeler transform, you will use a fundamental data structure known as the circular suffix array, which describes the abstraction of a sorted array of the N circular suffixes of a string of length N. As an example, consider the string "ABRACADABRA!" of length 12. The table below shows its 12 circular suffixes and the result of sorting them.
 i       Original Suffixes           Sorted Suffixes         index[i]
--    -----------------------     -----------------------    --------
 0    A B R A C A D A B R A !     ! A B R A C A D A B R A    11
 1    B R A C A D A B R A ! A     A ! A B R A C A D A B R    10
 2    R A C A D A B R A ! A B     A B R A ! A B R A C A D    7
 3    A C A D A B R A ! A B R     A B R A C A D A B R A !    0
 4    C A D A B R A ! A B R A     A C A D A B R A ! A B R    3
 5    A D A B R A ! A B R A C     A D A B R A ! A B R A C    5
 6    D A B R A ! A B R A C A     B R A ! A B R A C A D A    8
 7    A B R A ! A B R A C A D     B R A C A D A B R A ! A    1
 8    B R A ! A B R A C A D A     C A D A B R A ! A B R A    4
 9    R A ! A B R A C A D A B     D A B R A ! A B R A C A    6
10    A ! A B R A C A D A B R     R A ! A B R A C A D A B    9
11    ! A B R A C A D A B R A     R A C A D A B R A ! A B    2
We define index[i] to be the index of the original suffix that appears ith in the sorted array. For example, index[11] = 2 means that the 2nd original suffix appears 11th in the sorted order (i.e., last alphabetically).
Your job is to implement the following circular suffix array API, which provides the client access to the index[] values:
public class CircularSuffixArray {
    public CircularSuffixArray(String s)  // circular suffix array of s
    public int length()                   // length of s
    public int index(int i)               // returns index of ith sorted suffix
    public static void main(String[] args)// unit testing of the methods (optional)
}
Corner cases. The constructor should throw a java.lang.NullPointerException if the argument is null; the method index() should throw a java.lang.IndexOutOfBoundsException if i is outside its prescribed range (between 0 and N − 1).
Performance requirements. Your data type should use space proportional to N. The constructor should take time proportional to N log N (or better) on typical English text; the methods length() and index() should take constant time in the worst case. Warning: beginning with Java 7, Update 6, the substring() method takes time and space proportional to the length of the substring—in other words, you cannot form the N circular suffixes explicitly because that would take both quadratic time and space.
Burrows-Wheeler transform. The goal of the Burrows-Wheeler transform is not to compress a message, but rather to transform it into a form that is more amenable to compression. The transform rearranges the characters in the input so that there are lots of clusters with repeated characters, but in such a way that it is still possible to recover the original input. It relies on the following intuition: if you see the letters hen in English text, then most of the time the letter preceding it is t or w. If you could somehow group all such preceding letters together (mostly t's and some w's), then you would have an easy opportunity for data compression.

  • Burrows-Wheeler encoding. The Burrows-Wheeler transform of a string s of length N is defined as follows: Consider the result of sorting the N circular suffixes of s. The Burrows-Wheeler transform is the last column in the sorted suffixes array t[], preceded by the row number first in which the original string ends up. Continuing with the "ABRACADABRA!" example above, we highlight the two components of the Burrows-Wheeler transform in the table below.
     i     Original Suffixes          Sorted Suffixes       t    index[i]
    --    -----------------------     -----------------------    --------
     0    A B R A C A D A B R A !     ! A B R A C A D A B R A    11
     1    B R A C A D A B R A ! A     A ! A B R A C A D A B R    10
     2    R A C A D A B R A ! A B     A B R A ! A B R A C A D    7
    *3    A C A D A B R A ! A B R     A B R A C A D A B R A !   *0
     4    C A D A B R A ! A B R A     A C A D A B R A ! A B R    3
     5    A D A B R A ! A B R A C     A D A B R A ! A B R A C    5
     6    D A B R A ! A B R A C A     B R A ! A B R A C A D A    8
     7    A B R A ! A B R A C A D     B R A C A D A B R A ! A    1
     8    B R A ! A B R A C A D A     C A D A B R A ! A B R A    4
     9    R A ! A B R A C A D A B     D A B R A ! A B R A C A    6
    10    A ! A B R A C A D A B R     R A ! A B R A C A D A B    9
    11    ! A B R A C A D A B R A     R A C A D A B R A ! A B    2
    
    Since the original string ABRACADABRA! ends up in row 3, we have first = 3. Thus, the Burrows-Wheeler transform is
    3
    ARD!RCAAAABB
    
    Notice how there are 4 consecutive As and 2 consecutive Bs—these clusters make the message easier to compress.
    % java BurrowsWheeler - < abra.txt | java edu.princeton.cs.algs4.HexDump 16
    00 00 00 03 41 52 44 21 52 43 41 41 41 41 42 42
    128 bits
    
    Also, note that the integer 3 is represented using 4 bytes (00 00 00 03). The character 'A' is represented by hex 41, the character 'R' by 52, and so forth.
  • Burrows-Wheeler decoder. Now, we describe how to invert the Burrows-Wheeler transform and recover the original input string. If the jth original suffix (original string, shifted j characters to the left) is the ith row in the sorted order, we define next[i] to be the row in the sorted order where the (j + 1)st original suffix appears. For example, if first is the row in which the original input string appears, then next[first] is the row in the sorted order where the 1st original suffix (the original string left-shifted by 1) appears; next[next[first]] is the row in the sorted order where the 2nd original suffix appears; next[next[next[first]]] is the row where the 3rd original suffix appears; and so forth.

    • Decoding the message given t[], first, and the next[] array. The input to the Burrows-Wheeler decoder is the last column t[] of the sorted suffixes along with first. From t[], we can deduce the first column of the sorted suffixes because it consists of precisely the same characters, but in sorted order.
       i      Sorted Suffixes     t      next
      --    -----------------------      ----
       0    ! ? ? ? ? ? ? ? ? ? ? A        3
       1    A ? ? ? ? ? ? ? ? ? ? R        0
       2    A ? ? ? ? ? ? ? ? ? ? D        6
      *3    A ? ? ? ? ? ? ? ? ? ? !        7
       4    A ? ? ? ? ? ? ? ? ? ? R        8
       5    A ? ? ? ? ? ? ? ? ? ? C        9
       6    B ? ? ? ? ? ? ? ? ? ? A       10
       7    B ? ? ? ? ? ? ? ? ? ? A       11
       8    C ? ? ? ? ? ? ? ? ? ? A        5
       9    D ? ? ? ? ? ? ? ? ? ? A        2
      10    R ? ? ? ? ? ? ? ? ? ? B        1
      11    R ? ? ? ? ? ? ? ? ? ? B        4
      
      Now, given the next[] array and first, we can reconstruct the original input string because the first character of the ith original suffix is the ith character in the input string. In the example above, since first3, we know that the original input string appears in row 3; thus, the original input string starts with 'A' (and ends with '!'). Since next[first] = 7, the next original suffix appears in row 7; thus, the next character in the original input string is 'B'. Since next[next[first]] = 11, the next original suffix appears in row 11; thus, the next character in the original input string is 'R'.
    • Constructing the next[] array from t[] and first. Amazingly, the information contained in the Burrows-Wheeler transform suffices to reconstruct the next[] array, and, hence, the original message! Here's how. It is easy to deduce a next[] value for a character that appears exactly once in the input string. For example, consider the suffix that starts with 'C'. By inspecting the first column, it appears 8th in the sorted order. The next original suffix after this one will have the character 'C' as its last character. By inspecting the last column, the next original appears 5th in the sorted order. Thus, next[8] = 5. Similarly, 'D' and'!' each occur only once, so we can deduce that next[9] = 2 and next[0] = 3.
       i      Sorted Suffixes     t      next
      --    -----------------------      ----
       0    ! ? ? ? ? ? ? ? ? ? ? A        3
       1    A ? ? ? ? ? ? ? ? ? ? R
       2    A ? ? ? ? ? ? ? ? ? ? D
      *3    A ? ? ? ? ? ? ? ? ? ? !
       4    A ? ? ? ? ? ? ? ? ? ? R 
       5    A ? ? ? ? ? ? ? ? ? ? C
       6    B ? ? ? ? ? ? ? ? ? ? A
       7    B ? ? ? ? ? ? ? ? ? ? A
       8    C ? ? ? ? ? ? ? ? ? ? A        5
       9    D ? ? ? ? ? ? ? ? ? ? A        2
      10    R ? ? ? ? ? ? ? ? ? ? B
      11    R ? ? ? ? ? ? ? ? ? ? B
      
      However, since 'R' appears twice, it may seem ambiguous whether next[10] = 1 and next[11] = 4, or whether next[10] = 4 and next[11] = 1. Here's the key rule that resolves the apparent ambiguity:
      If sorted row i and j both start with the same character and i < j, then next[i] < next[j].
      This rule implies next[10] = 1 and next[11] = 4. Why is this rule valid? The rows are sorted so row 10 is lexicographically less than row 11. Thus the ten unknown characters in row 10 must be less than the ten unknown characters in row 11 (since both start with 'R'). We also know that between the two rows that end with 'R', row 1 is less than row 4. But, the ten unknown characters in row 10 and 11 are precisely the first ten characters in rows 1 and 4. Thus, next[10] = 1 and next[11] = 4 or this would contradict the fact that the suffixes are sorted.
    Check that the decoder recovers any encoded message.
    % java BurrowsWheeler - < abra.txt | java BurrowsWheeler +
    ABRACADABRA!
    
Name your program BurrowsWheeler.java and organize it using the following API:
public class BurrowsWheeler {
    // apply Burrows-Wheeler encoding, reading from standard input and writing to standard output
    public static void encode()

    // apply Burrows-Wheeler decoding, reading from standard input and writing to standard output
    public static void decode()

    // if args[0] is '-', apply Burrows-Wheeler encoding
    // if args[0] is '+', apply Burrows-Wheeler decoding
    public static void main(String[] args)
}
Performance requirements. The running time of your Burrows-Wheeler encoder should be proportional to N + R (or better) in the worst case, excluding the time to construct the circular suffix array. The running time of your Burrows-Wheeler decoder should be proportional to N + R (or better) in the worst case.
Analysis (optional). Once you have MoveToFront.java and BurrowsWheeler.java working, compress some of these text files; then, test it on some binary files. Calculate the compression ratio achieved for each file and report the time to compress and expand each file. (Here, compression and expansion consists of applying BurrowsWheelerMoveToFront, and Huffman in succession.) Finally, determine the order of growth of the running time of each of your encoders and decoders, both in the worst case and on typical English text inputs.
数据压缩,则成为了这最后一次作业的主题;它看似无趣,却在作业细致入微的步骤上隐藏着许多非常精致的点,其间如抽丝拨茧般细腻的过程,堪称快感连连;而其中一个Circular Suffix Array(CSA)排序的实现,可以涵盖从第一部分起讨论过的所有排序算法:题目本身并没有任何限制;学生需要认真比较每一种排序方式的优劣,并选择出其中一种或几种方法的组合,重新制造出一个最适合当前情况下的高效解决方案;在这个意义上,它相比前面的作业,与real world problem更接近了一步。
数据压缩的过程本身就是比较抽象的,而这次的作业说明与checklist更是文字翻倍而全无附图,有同学开始时看不懂题目也是正常,慢慢来啦。【下文所有中括号为题目简短说明,推荐阅读详细说明】
iOriginal SuffixesSorted Suffixesindex[i]
0A B R A C A D A B R A !! A B R A C A D A B R A11
1B R A C A D A B R A ! AA ! A B R A C A D A B R10
2R A C A D A B R A ! A BA B R A ! A B R A C A D7
*3A C A D A B R A ! A B RA B R A C A D A B R A !*0
4C A D A B R A ! A B R AA C A D A B R A ! A B R3
5A D A B R A ! A B R A CA D A B R A ! A B R A C5
6D A B R A ! A B R A C AB R A ! A B R A C A D A8
7A B R A ! A B R A C A DB R A C A D A B R A ! A1
8B R A ! A B R A C A D AC A D A B R A ! A B R A4
9R A ! A B R A C A D A BD A B R A ! A B R A C A6
10A ! A B R A C A D A B RR A ! A B R A C A D A B9
11! A B R A C A D A B R AR A C A D A B R A ! A B2
【encode部分:对源字符串的CSA(左)排序,返回排好序的CSA(右)的最后一列(ARD!RCAAAABB),及源字符串所在位置(3)。】
认真阅读作业材料即可大致了解,Burrows-Wheeler压缩算法的三部分:
Huffman压缩已实现不必关心;
Move-to-front编码最为简单可直接实现;
Burrows-Wheeler decoder被称为“the trickest part”,但紧接着便已提示到key-indexed counting与10行的核心代码长度,其实已算是提示了很多;
而其encoding需要使用Circular Suffix Array,所以最大的课题其实是,如何以最高效率实现Circular Suffix Array排序。
下面就先来看CSA的排序问题:
因由源字符串逐个循环偏移构成,CSA是一种具备很多特殊性质的数组,而对这样的数组排序照搬通用的排序算法一定有很大的优化空间没有利用,因此手工打造一个高效实用“特别化”的排序算法,就变成了眼前最实际的问题。
有上一周迭代更新的经验在前,这次几乎是立即开始了手工打造的过程,而可考虑的范围又几乎没有限制,以下是我的几个正确版本:(分数为CSA构造函数运行时间的student / reference ratio(越小越快),均取数据长度为409600的测试成绩作为样本;两个分数仅输入数据不同,第一个为随机ASCII,第二个为典型英文内容(dickens.txt);满分要求均为3以下)
  • 一版(2.85,5.16):统一使用Mergesort;
  • 二版(4.07,6.69):统一使用Comparator暴力比较(或封装为Comparable类,效率接近);
  • 三版(0.45,1.34):长度15以下切换到insertion sort,以上使用MSD radix sort;
  • 四版(1.29,1.87):长度15以下切换到insertion sort,以上使用3 way radix quick sort;
  • 五版(3.04,3.83):改编自booksite所附源码,较为低效的ManberMyers实现(使用MSD排好首位字符,接着使用简单quicksort);
这几个是在权衡利弊后选择实现(如没有选择LSD),并已保证正确性的版本(其中不少的实现使用了algs4.jar源码),可以看出目前相对最高效的方案是MSD+insertion;
因对课堂中提到的ManberMyers算法很感兴趣,所以后期做了不少尝试,但还是没有完全原创地完成一个正确版本,加上还有很多好的选择,这里还会继续努力;
课程API对Java的static块没有要求,所以可以在合适处(如MoveToFront类)利用。
最后是本作业最“Nifty”的部分,Burrows-Wheeler decoder:
iSorted Suffixesnext
0! ? ? ? ? ? ? ? ? ? ? A3
1A ? ? ? ? ? ? ? ? ? ? R0
2A ? ? ? ? ? ? ? ? ? ? D6
*3A ? ? ? ? ? ? ? ? ? ? !7
4A ? ? ? ? ? ? ? ? ? ? R8
5A ? ? ? ? ? ? ? ? ? ? C9
6B ? ? ? ? ? ? ? ? ? ? A10
7B ? ? ? ? ? ? ? ? ? ? A11
8C ? ? ? ? ? ? ? ? ? ? A5
9D ? ? ? ? ? ? ? ? ? ? A2
10R ? ? ? ? ? ? ? ? ? ? B1
11R ? ? ? ? ? ? ? ? ? ? B4
【decode部分(层层相扣,建议读原文):已知CSA最后一列(ARD!...),与源字符串所在行(3);对最后一列排序可得第一列(!AAA...);而next数组使用方式如下:因已知源字符串(排序前第0行)在排序后位置为3,而next[3] = 7,即排序后第7行(B...A)为排序前的第1行(第0行的下一行),可知源字符串第1位为'B'(根据第3行,第0位为‘A’),依此类推即可得到源字符串;next数组的构造为通过比较首尾字符出现位置得到:如第i行末位为A,而首位第一个未标记过的A出现在第j行,则next[i] = j,标记第j行。】
开始时我只看了作业说明,所以冒失地写好了一个“暴力decode”还浑然不自知,直到参考checklist后:“WTF”
事实证明那10行核心代码基本与下图所示相同:

问题的关键在于next数组的构造,可总结为如下几点:
  • key-indexed counting核心机理是index的累加,可以保证排序结果的稳定(stable);
  • 根据作业说明的分析,next数组构建的过程,确定重复字符位置的核心机理也是“稳定性”(针对多次出现的同一字符,首尾列的相对先后顺序一致);
  • 对CSA可以不显式创建字符串数组而仅保留源字符串,只添加一个简单的根据offset和index循环获取对应字符的工具函数即可,这为直接使用key-indexed counting创造了非常好的环境。
于是在对源码极简单的改变后,完成了截然不同的任务:(spoiler alert)
for (int i = 0; i < N; i++)
    count[t[i]+1]++;
for (int i = 0; i < 256; i++)
    count[i+1] += count[i];
// The trickiest part
for (int i = 0; i < N; i++)
    next[count[t[i]]++] = i;
其中N为字符串长度,count为计数数组(默认ASCII编码包含256个字符),next为目标数组,t为已知尾列字符串的对应char[];(无需显式排序找出首列,第二步count的累加已达到目的)
6行完成了next数组的构建,加上接下来还原源字符串的过程,达到10行。




No comments:

Post a Comment

Labels

GeeksforGeeks (976) Algorithm (811) LeetCode (654) to-do (599) Review (362) Classic Algorithm (334) Classic Interview (298) Dynamic Programming (263) Google Interview (233) LeetCode - Review (233) Tree (146) POJ (137) Difficult Algorithm (136) EPI (127) Different Solutions (119) Bit Algorithms (110) Cracking Coding Interview (110) Smart Algorithm (109) Math (91) HackerRank (85) Lintcode (83) Binary Search (73) Graph Algorithm (73) Greedy Algorithm (61) Interview Corner (61) Binary Tree (58) List (58) DFS (56) Algorithm Interview (53) Advanced Data Structure (52) Codility (52) ComProGuide (52) LeetCode - Extended (47) USACO (46) Geometry Algorithm (45) BFS (43) Data Structure (42) Mathematical Algorithm (42) ACM-ICPC (41) Jobdu (39) Interval (38) Recursive Algorithm (38) Stack (38) String Algorithm (38) Binary Search Tree (37) Knapsack (37) Codeforces (36) Introduction to Algorithms (36) Matrix (36) Must Known (36) Beauty of Programming (35) Sort (35) Space Optimization (34) Array (33) Trie (33) prismoskills (33) Backtracking (32) Segment Tree (32) Union-Find (32) HDU (31) Google Code Jam (30) Permutation (30) Puzzles (30) Array O(N) (29) Data Structure Design (29) Company-Zenefits (28) Microsoft 100 - July (28) to-do-must (28) Random (27) Sliding Window (27) GeeksQuiz (25) Logic Thinking (25) hihocoder (25) High Frequency (23) Palindrome (23) Algorithm Game (22) Company - LinkedIn (22) Graph (22) Hash (22) Queue (22) DFS + Review (21) TopCoder (21) Binary Indexed Trees (20) Brain Teaser (20) CareerCup (20) Company - Twitter (20) Pre-Sort (20) Company-Facebook (19) UVA (19) Probabilities (18) Follow Up (17) Codercareer (16) Company-Uber (16) Game Theory (16) Heap (16) Shortest Path (16) String Search (16) Topological Sort (16) Tree Traversal (16) itint5 (16) Iterator (15) Merge Sort (15) O(N) (15) Bisection Method (14) Difficult (14) Number (14) Number Theory (14) Post-Order Traverse (14) Priority Quieue (14) Amazon Interview (13) BST (13) Basic Algorithm (13) Codechef (13) Majority (13) mitbbs (13) Combination (12) Computational Geometry (12) KMP (12) Long Increasing Sequence(LIS) (12) Modify Tree (12) Reconstruct Tree (12) Reservoir Sampling (12) 尺取法 (12) AOJ (11) DFS+Backtracking (11) Fast Power Algorithm (11) Graph DFS (11) LCA (11) LeetCode - DFS (11) Ordered Stack (11) Princeton (11) Tree DP (11) 挑战程序设计竞赛 (11) Binary Search - Bisection (10) Company - Microsoft (10) Company-Airbnb (10) Euclidean GCD (10) Facebook Hacker Cup (10) HackerRank Easy (10) Reverse Thinking (10) Rolling Hash (10) SPOJ (10) Theory (10) Tutorialhorizon (10) X Sum (10) Coin Change (9) Divide and Conquer (9) Lintcode - Review (9) Mathblog (9) Max-Min Flow (9) Stack Overflow (9) Stock (9) Two Pointers (9) Book Notes (8) Bottom-Up (8) DP-Space Optimization (8) Graph BFS (8) LeetCode - DP (8) LeetCode Hard (8) Prefix Sum (8) Prime (8) Suffix Tree (8) System Design (8) Tech-Queries (8) Time Complexity (8) Use XOR (8) 穷竭搜索 (8) Algorithm Problem List (7) DFS+BFS (7) Facebook Interview (7) Fibonacci Numbers (7) Game Nim (7) HackerRank Difficult (7) Hackerearth (7) Interval Tree (7) Linked List (7) Longest Common Subsequence(LCS) (7) Math-Divisible (7) Miscs (7) O(1) Space (7) Probability DP (7) Radix Sort (7) Simulation (7) Xpost (7) n00tc0d3r (7) 蓝桥杯 (7) Bucket Sort (6) Catalan Number (6) Classic Data Structure Impl (6) DFS+DP (6) DP - Tree (6) How To (6) Interviewstreet (6) Kadane’s Algorithm (6) Knapsack - MultiplePack (6) Level Order Traversal (6) Manacher (6) Minimum Spanning Tree (6) One Pass (6) Programming Pearls (6) Quick Select (6) Rabin-Karp (6) Randomized Algorithms (6) Sampling (6) Schedule (6) Suffix Array (6) Threaded (6) reddit (6) AI (5) Art Of Programming-July (5) Big Data (5) Brute Force (5) Code Kata (5) Codility-lessons (5) Coding (5) Company - WMware (5) Crazyforcode (5) DFS+Cache (5) DP-Multiple Relation (5) DP-Print Solution (5) Dutch Flag (5) Fast Slow Pointers (5) Graph Cycle (5) Hash Strategy (5) Immutability (5) Inversion (5) Java (5) Kadane - Extended (5) Matrix Chain Multiplication (5) Microsoft Interview (5) Morris Traversal (5) Pruning (5) Quadtrees (5) Quick Partition (5) Quora (5) SPFA(Shortest Path Faster Algorithm) (5) Subarray Sum (5) Sweep Line (5) Traversal Once (5) TreeMap (5) jiuzhang (5) to-do-2 (5) 单调栈 (5) 树形DP (5) 1point3acres (4) Anagram (4) Approximate Algorithm (4) Backtracking-Include vs Exclude (4) Brute Force - Enumeration (4) Chess Game (4) Company-Amazon (4) Consistent Hash (4) Convex Hull (4) Cycle (4) DP-Include vs Exclude (4) Dijkstra (4) Distributed (4) Eulerian Cycle (4) Flood fill (4) Graph-Classic (4) HackerRank AI (4) Histogram (4) Kadane Max Sum (4) Knapsack - Mixed (4) Knapsack - Unbounded (4) Left and Right Array (4) MinMax (4) Multiple Data Structures (4) N Queens (4) Nerd Paradise (4) Parallel Algorithm (4) Practical Algorithm (4) Pre-Sum (4) Probability (4) Programcreek (4) Quick Sort (4) Spell Checker (4) Stock Maximize (4) Subsets (4) Sudoku (4) Symbol Table (4) TreeSet (4) Triangle (4) Water Jug (4) Word Ladder (4) algnotes (4) fgdsb (4) 最大化最小值 (4) A Star (3) Abbreviation (3) Algorithm - Brain Teaser (3) Algorithm Design (3) Anagrams (3) B Tree (3) Big Data Algorithm (3) Binary Search - Smart (3) Caterpillar Method (3) Coins (3) Company - Groupon (3) Company - Indeed (3) Cumulative Sum (3) DP-Fill by Length (3) DP-Two Variables (3) Dedup (3) Dequeue (3) Dropbox (3) Easy (3) Edit Distance (3) Expression (3) Finite Automata (3) Forward && Backward Scan (3) Github (3) GoLang (3) Include vs Exclude (3) Joseph (3) Jump Game (3) Knapsack-多重背包 (3) LeetCode - Bit (3) LeetCode - TODO (3) Linked List Merge Sort (3) LogN (3) Master Theorem (3) Maze (3) Min Cost Flow (3) Minesweeper (3) Missing Numbers (3) NP Hard (3) Online Algorithm (3) Pascal's Triangle (3) Pattern Match (3) Project Euler (3) Rectangle (3) Scala (3) SegmentFault (3) Stack - Smart (3) State Machine (3) Streaming Algorithm (3) Subset Sum (3) Subtree (3) Transform Tree (3) Two Pointers Window (3) Warshall Floyd (3) With Random Pointer (3) Word Search (3) bookkeeping (3) codebytes (3) Activity Selection Problem (2) Advanced Algorithm (2) AnAlgorithmADay (2) Application of Algorithm (2) Array Merge (2) BOJ (2) BT - Path Sum (2) Balanced Binary Search Tree (2) Bellman Ford (2) Binomial Coefficient (2) Bit Mask (2) Bit-Difficult (2) Bloom Filter (2) Book Coding Interview (2) Branch and Bound Method (2) Clock (2) Codesays (2) Company - Baidu (2) Complete Binary Tree (2) DFS+BFS, Flood Fill (2) DP - DFS (2) DP-3D Table (2) DP-Classical (2) DP-Output Solution (2) DP-Slide Window Gap (2) DP-i-k-j (2) DP-树形 (2) Distributed Algorithms (2) Divide and Conqure (2) Doubly Linked List (2) GoHired (2) Graham Scan (2) Graph - Bipartite (2) Graph BFS+DFS (2) Graph Coloring (2) Graph-Cut Vertices (2) Hamiltonian Cycle (2) Huffman Tree (2) In-order Traverse (2) Include or Exclude Last Element (2) Information Retrieval (2) Interview - Linkedin (2) Invariant (2) Islands (2) Knuth Shuffle (2) LeetCode - Recursive (2) Linked Interview (2) Linked List Sort (2) Longest SubArray (2) Lucene-Solr (2) MST (2) MST-Kruskal (2) Math-Remainder Queue (2) Matrix Power (2) Minimum Vertex Cover (2) Negative All Values (2) Number Each Digit (2) Numerical Method (2) Object Design (2) Order Statistic Tree (2) Palindromic (2) Parentheses (2) Parser (2) Peak (2) Programming (2) Range Minimum Query (2) Reuse Forward Backward (2) Robot (2) Rosettacode (2) Scan from right (2) Search (2) Shuffle (2) Sieve of Eratosthenes (2) SimHash (2) Simple Algorithm (2) Skyline (2) Spatial Index (2) Stream (2) Strongly Connected Components (2) Summary (2) TV (2) Tile (2) Traversal From End (2) Tree Sum (2) Tree Traversal Return Multiple Values (2) Word Break (2) Word Graph (2) Word Trie (2) Young Tableau (2) 剑指Offer (2) 数位DP (2) 1-X (1) 51Nod (1) Akka (1) Algorithm - How To (1) Algorithm - New (1) Algorithm Series (1) Algorithms Part I (1) Analysis of Algorithm (1) Array-Element Index Negative (1) Array-Rearrange (1) Auxiliary Array (1) Auxiliary Array: Inc&Dec (1) BACK (1) BK-Tree (1) BZOJ (1) Basic (1) Bayes (1) Beauty of Math (1) Big Integer (1) Big Number (1) Binary (1) Binary Tree Variant (1) Bipartite (1) Bit-Missing Number (1) BitMap (1) BitMap index (1) BitSet (1) Bug Free Code (1) BuildIt (1) C/C++ (1) CC Interview (1) Cache (1) Calculate Height at Same Recusrion (1) Cartesian tree (1) Check Tree Property (1) Chinese (1) Circular Buffer (1) Code Quality (1) Codesolutiony (1) Company - Alibaba (1) Company - Palantir (1) Company - WalmartLabs (1) Company-Apple (1) Company-Epic (1) Company-Salesforce (1) Company-Snapchat (1) Company-Yelp (1) Compression Algorithm (1) Concurrency (1) Convert BST to DLL (1) Convert DLL to BST (1) Custom Sort (1) Cyclic Replacement (1) DFS-Matrix (1) DP - Probability (1) DP Fill Diagonal First (1) DP-Difficult (1) DP-End with 0 or 1 (1) DP-Fill Diagonal First (1) DP-Graph (1) DP-Left and Right Array (1) DP-MaxMin (1) DP-Memoization (1) DP-Node All Possibilities (1) DP-Optimization (1) DP-Preserve Previous Value (1) DP-Print All Solution (1) Database (1) Detect Negative Cycle (1) Directed Graph (1) Do Two Things at Same Recusrion (1) Domino (1) Dr Dobb's (1) Duplicate (1) Equal probability (1) External Sort (1) FST (1) Failure Function (1) Fraction (1) Front End Pointers (1) Funny (1) Fuzzy String Search (1) Game (1) Generating Function (1) Generation (1) Genetic algorithm (1) GeoHash (1) Geometry - Orientation (1) Google APAC (1) Graph But No Graph (1) Graph Transpose (1) Graph Traversal (1) Graph-Coloring (1) Graph-Longest Path (1) Gray Code (1) HOJ (1) Hanoi (1) Hard Algorithm (1) How Hash (1) How to Test (1) Improve It (1) In Place (1) Inorder-Reverse Inorder Traverse Simultaneously (1) Interpolation search (1) Interview (1) Interview - Easy (1) Interview - Facebook (1) Isomorphic (1) JDK8 (1) K Dimensional Tree (1) Knapsack - Fractional (1) Knapsack - ZeroOnePack (1) Knight (1) Kosaraju’s algorithm (1) Kruskal (1) Kruskal MST (1) Kth Element (1) Least Common Ancestor (1) LeetCode - Binary Tree (1) LeetCode - Coding (1) LeetCode - Detail (1) LeetCode - Related (1) LeetCode Diffcult (1) Linked List Reverse (1) Linkedin (1) Linkedin Interview (1) Local MinMax (1) Logic Pattern (1) Longest Common Subsequence (1) Longest Common Substring (1) Longest Prefix Suffix(LPS) (1) Manhattan Distance (1) Map && Reverse Map (1) Math - Induction (1) Math-Multiply (1) Math-Sum Of Digits (1) Matrix - O(N+M) (1) Matrix BFS (1) Matrix Graph (1) Matrix Search (1) Matrix+DP (1) Matrix-Rotate (1) Max Min So Far (1) Median (1) Memory-Efficient (1) MinHash (1) MinMax Heap (1) Monotone Queue (1) Monto Carlo (1) Multi-Reverse (1) Multiple DFS (1) Multiple Tasks (1) Next Successor (1) Offline Algorithm (1) PAT (1) Parent-Only Tree (1) Partition (1) Path Finding (1) Patience Sort (1) Persistent (1) Pigeon Hole Principle (1) Power Set (1) Pratical Algorithm (1) Probabilistic Data Structure (1) Proof (1) Python (1) Queue & Stack (1) RSA (1) Ranking (1) Rddles (1) ReHash (1) Realtime (1) Recurrence Relation (1) Recursive DFS (1) Recursive to Iterative (1) Red-Black Tree (1) Region (1) Regular Expression (1) Resources (1) Reverse Inorder Traversal (1) Robin (1) Selection (1) Self Balancing BST (1) Similarity (1) Sort && Binary Search (1) String Algorithm. Symbol Table (1) String DP (1) String Distance (1) SubMatrix (1) Subsequence (1) System of Difference Constraints(差分约束系统) (1) TSP (1) Ternary Search Tree (1) Test (1) Thread (1) TimSort (1) Top-Down (1) Tournament (1) Tournament Tree (1) Transform Tree in Place (1) Tree Diameter (1) Tree Rotate (1) Trie + DFS (1) Trie and Heap (1) Trie vs Hash (1) Trie vs HashMap (1) Triplet (1) Two Data Structures (1) Two Stacks (1) USACO - Classical (1) USACO - Problems (1) UyHiP (1) Valid Tree (1) Vector (1) Wiggle Sort (1) Wikipedia (1) Yahoo Interview (1) ZOJ (1) baozitraining (1) codevs (1) cos126 (1) javabeat (1) jum (1) namic Programming (1) sqrt(N) (1) 两次dijkstra (1) 九度 (1) 二进制枚举 (1) 夹逼法 (1) 归一化 (1) 折半枚举 (1) 枚举 (1) 状态压缩DP (1) 男人八题 (1) 英雄会 (1) 逆向思维 (1)

Popular Posts