## Monday, November 30, 2015

### Sampling - Programming Pearls

http://www.hawstein.com/posts/make-thiner-programming-pearls.html

``````void GenKnuth(int m, int n) {
for(int i=0; i<n; ++i) {
if((bigrand()%(n-i)) < m) {
cout<<i<<endl;
--m;
}
}
}``````

``````void GenSets(int m, int n) {
set<int> s;
while(s.size() < m)
s.insert(bigrand() % n);
set<int>::iterator i;
for(i=s.begin(); i!=s.end(); ++i)
cout<<*i<<endl;
}``````

``````void GenShuf(int m, int n) {
int x[n];
for(int i=0; i<n; ++i)
x[i] = i;
for(int i=0; i<m; ++i) {
int j = randint(i, n-1);
swap(x[i], x[j]);
}
sort(x, x+m);
for(int i=0; i<m; ++i)
cout<<x[i]<<endl;
}``````
`http://dongxicheng.org/data-mining/hadoop-sampling/`
(1) 第一种方法来自Knuth的《The art of Computer Programming, Volume 2: Seminumerical Algorithms》

 1 2 3 4 5 6 7 8 9 10 11 12 13 `select = m`   `remaining = n`   `for` `I = [0 n )`   `  ``if` `(bigrand() % remaining) < select`   `    ``print i`   `    ``select—`   `  ``remaining—`

C++的实现如下：
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 `void` `genKnuth(``int` `m, ``int` `n) {`   `  ``for``( ``int` `i = 0; i < n; i++) {`   `   ``if``(bigrand() % (n - i) < m） {`   `    ``cout <

（2）第二种方法的复杂度只与m有关，采用了set（实际上是红黑树）节省时间。代码如下：
 1 2 3 4 5 6 7 8 9 10 11 12 13 `void` `gensets(``int` `m, ``int` `n) {`   `  ``set<``int``> S;`   `  ``while``(S.size() < m) {`   `   ``S.insert(bigrand() % n);`   `  ``}`   `  ``// print S`   `}`

（3）第三种方法克服了（2）的缺点，代码如下：
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 `void` `genshuf(``int` `m, ``int` `n) {`   `  ``int` `i, j;`   `  ``int` `*x = ``new` `int``[n];`   `  ``for``(i = 0; i < n; i++) {`   `   ``x[i] = i;`   `  ``}`   `  ``for``(i = 0; i < m; i++) {`   `   ``j = randint(i, n-1);`   `   ``int` `t = x[i]; x[i] = x[j]; x[j] = x;`   `  ``}`   `  ``sort(x, x+m);`   `  ``//print result`   `}`

（4）当m接近n时，基于集合的算法生成的很多随机数都要丢掉，因为之前的数已经存在于集合中了，为了改进这一点，算法如下：
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 `void` `genfloyd(``int` `m, ``int` `n)`   `{`   `  ``set<``int``> S;`   `  ``set<``int``>::iterator i;`   `  ``for``(``int` `j=n-m; j < n; j++) {`   `   ``int` `t = bigrand()%（j+1);`   `   ``if``(S.find(t) == S.end()){`   `   ``S.insert(t); ``// t not in S`   `  ``} ``else` `{`   `   ``S.insert(j); ``// t in S`   `  ``}`   ` ``}`   `  ``//print results`   `}`

## Labels

GeeksforGeeks (1107) LeetCode (993) Algorithm (795) Review (766) to-do (633) LeetCode - Review (514) Classic Algorithm (324) Dynamic Programming (293) Classic Interview (288) Google Interview (242) Tree (145) POJ (139) Difficult Algorithm (132) LeetCode - Phone (127) EPI (125) Different Solutions (120) Bit Algorithms (118) Lintcode (113) Smart Algorithm (109) Math (107) HackerRank (89) Binary Tree (82) Binary Search (81) Graph Algorithm (74) Greedy Algorithm (72) DFS (67) Interview Corner (61) Stack (60) List (58) BFS (54) Codility (54) ComProGuide (52) USACO (46) Trie (45) ACM-ICPC (41) Interval (41) Data Structure (40) Knapsack (40) Jobdu (39) LeetCode Hard (39) Matrix (38) String Algorithm (38) Backtracking (36) Codeforces (36) Must Known (36) Sort (35) Union-Find (34) Array (33) prismoskills (33) Segment Tree (32) Sliding Window (32) HDU (31) Google Code Jam (30) Permutation (30) Puzzles (30) Array O(N) (29) Company-Airbnb (29) Company-Zenefits (28) Palindrome (28) to-do-must (28) Priority Queue (27) Random (27) Graph (26) GeeksQuiz (25) Logic Thinking (25) Pre-Sort (25) hihocoder (25) Queue (24) Company-Facebook (23) High Frequency (23) TopCoder (23) Algorithm Game (22) Bisection Method (22) Hash (22) DFS + Review (21) Brain Teaser (20) CareerCup (20) Merge Sort (20) O(N) (20) Follow Up (19) Time Complexity (19) Two Pointers (19) UVA (19) Ordered Stack (18) Probabilities (18) Company-Uber (17) Game Theory (17) Topological Sort (17) Codercareer (16) Heap (16) Shortest Path (16) String Search (16) Tree Traversal (16) itint5 (16) Difficult (15) Iterator (15) BST (14) Number (14) Number Theory (14) Amazon Interview (13) Basic Algorithm (13) Codechef (13) Euclidean GCD (13) KMP (13) Majority (13) mitbbs (13) Combination (12) Modify Tree (12) Reconstruct Tree (12) Reverse Thinking (12) 尺取法 (12) AOJ (11) DFS+Backtracking (11) Graph DFS (11) LCA (11) LeetCode - DFS (11) Miscs (11) Princeton (11) Proof (11) Tree DP (11) X Sum (11) 挑战程序设计竞赛 (11) Bisection (10) Bucket Sort (10) Coin Change (10) DFS+Cache (10) HackerRank Easy (10) O(1) Space (10) Rolling Hash (10) SPOJ (10) Theory (10) Tutorialhorizon (10) Mathblog (9) Max-Min Flow (9) Prefix Sum (9) Quick Sort (9) Simulation (9) Stock (9) TreeMap (9) Use XOR (9) Book Notes (8) Bottom-Up (8) DFS+BFS (8) Linked List (8) Prime (8) Suffix Tree (8) Tech-Queries (8) 穷竭搜索 (8) Expression (7) Game Nim (7) Graph BFS (7) Hackerearth (7) Inversion (7) Quick Select (7) Radix Sort (7) n00tc0d3r (7) 蓝桥杯 (7) DFS+DP (6) DP - Tree (6) Dijkstra (6) Dutch Flag (6) How To (6) Manacher (6) One Pass (6) Pruning (6) Rabin-Karp (6) Sampling (6) Schedule (6) Stream (6) Suffix Array (6) Threaded (6) TreeSet (6) Xpost (6) reddit (6) AI (5) Big Data (5) Brute Force (5) Code Kata (5) Coding (5) Convex Hull (5) Crazyforcode (5) Cycle (5) Find Rule (5) Graph Cycle (5) Immutability (5) Java (5) Maze (5) Pre-Sum (5) Quadtrees (5) Quora (5) Subarray Sum (5) Sudoku (5) Sweep Line (5) Word Search (5) jiuzhang (5) 单调栈 (5) 树形DP (5) 1point3acres (4) Abbreviation (4) Anagram (4) Anagrams (4) Chess Game (4) Distributed (4) Flood fill (4) Histogram (4) MST (4) MinMax (4) N Queens (4) Probability (4) Programcreek (4) Subset Sum (4) Subsets (4) Symbol Table (4) Triangle (4) Water Jug (4) algnotes (4) fgdsb (4) to-do-2 (4) 最大化最小值 (4) A Star (3) B Tree (3) Coins (3) Dedup (3) Dequeue (3) Dropbox (3) Easy (3) Github (3) GoLang (3) Joseph (3) Jump Game (3) K (3) LogN (3) Minesweeper (3) NP Hard (3) O(N) Hard (3) Rectangle (3) Scala (3) SegmentFault (3) Shuffle (3) Subtree (3) Trie + DFS (3) Word Ladder (3) bookkeeping (3) codebytes (3) Array Merge (2) BOJ (2) Bellman Ford (2) Bit Counting (2) Bit Mask (2) Bloom Filter (2) Clock (2) Codesays (2) DP - DFS (2) DP-3D Table (2) DP-Classical (2) DP-i-k-j (2) DP-树形 (2) Factor (2) GoHired (2) Graham Scan (2) Huffman Tree (2) Invariant (2) Islands (2) Lucene-Solr (2) Matrix Power (2) Median (2) Parentheses (2) Parser (2) Peak (2) Programming (2) Robot (2) Rosettacode (2) Search (2) SimHash (2) Skyline (2) Summary (2) TV (2) Tile (2) Tree Sum (2) Word Break (2) Word Graph (2) Word Trie (2) 剑指Offer (2) 数位DP (2) 1-X (1) 51Nod (1) Akka (1) BACK (1) BK-Tree (1) BZOJ (1) Basic (1) Bayes (1) Big Integer (1) Big Number (1) Binary (1) Bipartite (1) BitMap (1) BitMap index (1) BitSet (1) BuildIt (1) C/C++ (1) CC Interview (1) Cache (1) Chinese (1) Cloest (1) Clone (1) Code Quality (1) Company-Epic (1) Company-Yelp (1) Concurrency (1) Custom Sort (1) DFS-Matrix (1) DP-Difficult (1) DP-Graph (1) DP-MaxMin (1) Database (1) Diagonal (1) Domino (1) Dr Dobb's (1) Duplicate (1) FST (1) Fraction (1) Funny (1) Game (1) Generation (1) GeoHash (1) Google APAC (1) Gray Code (1) HOJ (1) Hanoi (1) How Hash (1) How to Test (1) Improve It (1) In Place (1) Interview (1) Isomorphic (1) JDK8 (1) Knight (1) Kruskal (1) Kth Element (1) Linkedin (1) Local MinMax (1) Matrix BFS (1) Matrix Graph (1) Matrix+DP (1) MinHash (1) MinMax Heap (1) Monto Carlo (1) Multiple DFS (1) Next Element (1) PAT (1) Parenthesis (1) Partition (1) Path Finding (1) Persistent (1) Power Set (1) PreProcess (1) Python (1) RSA (1) Ranking (1) Rddles (1) ReHash (1) Realtime (1) Region (1) Resources (1) Robin (1) Selection (1) Similarity (1) Square (1) String DP (1) SubMatrix (1) Subsequence (1) TSP (1) Test (1) Test Cases (1) Thread (1) TimSort (1) Top-Down (1) Tournament (1) Tree Rotate (1) Trie vs Hash (1) Triplet (1) Two Stacks (1) UyHiP (1) Valid Tree (1) Vector (1) Wiggle Sort (1) Wikipedia (1) ZOJ (1) ZigZag (1) codevs (1) cos126 (1) javabeat (1) jum (1) sqrt(N) (1) 两次dijkstra (1) 九度 (1) 二进制枚举 (1) 夹逼法 (1) 归一化 (1) 折半枚举 (1) 枚举 (1) 状态压缩DP (1) 男人八题 (1) 英雄会 (1) 逆向思维 (1)