Saturday, November 28, 2015

leetcode面试题8 第k大的数


leetcode面试题8 第k大的数
初阶:有两个数组A和B,假设A和B已经有序(从大到小),求A和B数组中所有数的第K大。

进阶:有N台机器,每台机器上有一个有序的大数组,需要求得所有机器上所有数中的第K大。注意,需要考虑N台机器的并行计算能力。 

解答

初阶:比较A[k/2]和B[k/2],如果A[k/2]>=B[k/2]那么A的前k/2个数一定都在前k-1大中,将A数组前k/2个数扔掉,反之扔掉B的前k/2个数。将k减小k/2。重复上述操作直到k=1。比较A和B的第一个数即可得到结果。时间复杂度O(logk)
进阶:二分答案S,将S广播给每台机器,每台机器用二分法求得有多少比该数小的数。汇总结果后可判断是该将S往上还是往下调整。
面试官角度:
初阶问题是一个难度比较大的算法题。需要有一定的算法训练功底。主要用到的思想是递归。首先容易想到的方法是合并两个数组(见面试题5,有序数组的合并),这样复杂度为O(k),那么答出这个以后,面试官会问你,还有更好的方法么?这个时候就要往O(logk)的思路去想,O(logk)就意味着需要用一种方法每次将k的规模减小一半,于是想到,每次要扔掉一个数组或两个数组中的k/2个数,于是想到去比较A[k/2]和B[k/2],仔细思考比较结果,然后想到较大的那一方的前k/2个数一定都在前k-1大的数中, 所以可以扔掉。
进阶问题的考察点是逆向思维。二分答案是一种常见的算法思路(见面试题2 抄书问题),所以当你想不出题目的时候,往往可以试试看是否可以二分答案。因为需要发挥N台机器的并行计算能力,所以想到让每台机器互不相关的做一件事情,然后将结果汇总来判断。
http://lixinzhang.github.io/jiu-zhang-suan-fa-mian-shi-ti-zong-jie.html
int FindK(vector<vector<int> > & mq, int N, int k)
{
    int ans_upper = INT_MAX;
    int ans_lower = INT_MIN;
    int sum_len = 0;
    while(sum_len != k)
    {
        ans = ans_lower + (ans_upper - ans_lower) /2;
        for (int i=0; i<N; i++)
        {
           sum_len += left_count_binarySearch(mq[i], ans);
        }
        if (sum_len > k) ans_upper = ans;
        if (sum_len < k) ans_lower = ans;
    }
    return ans;
}
http://www.cnblogs.com/zhjp11/archive/2010/02/26/1674227.html
          《编程之美》2.5节课后习题:
           1. 如果需要找出N个数中最大的K个不同的浮点数呢?比如,含有10个浮点数的数组(1.5,1.5,2.5,3.5,3.5,5,0,- 1.5,3.5)中最大的3个不同的浮点数是(5,3.5,2.5)。
           解答:上面的解法均适用,需要注意的是浮点数比较时和整数不同,另外求hashkey的方法也会略有不同。
           2. 如果是找第k到第m(0<k<=m<=n)大的数呢?
           解答:如果把问题看做m-k+1个第k大问题,则前面解法均适用。但是对于类似前k大这样的问题,最好使用解法5或者解法7,总体复杂度较低。
       3. 在搜索引擎中,网络上的每个网页都有“权威性”权重,如page rank。如果我们需要寻找权重最大的K个网页,而网页的权重会不断地更新,那么算法要如何变动以达到快速更新(incremental update)并及时返回权重最大的K个网页?
提示:堆排序?当每一个网页权重更新的时候,更新堆。还有更好的方法吗?
       解答:要达到快速的更新,我们可以解法5,使用映射二分堆,可以使更新的操作达到O(logn)
       4. 在实际应用中,还有一个“精确度”的问题。我们可能并不需要返回严格意义上的最大的K个元素,在边界位置允许出现一些误差。当用户输入一个query的时候,对于每一个文档d来说,它跟这个query之间都有一个相关性衡量权重f (query, d)。搜索引擎需要返回给用户的就是相关性权重最大的K个网页。如果每页10个网页,用户不会关心第1000页开外搜索结果的“精确度”,稍有误差是可以接受的。比如我们可以返回相关性第10 001大的网页,而不是第9999大的。在这种情况下,算法该如何改进才能更快更有效率呢?网页的数目可能大到一台机器无法容纳得下,这时怎么办呢?
      提示:归并排序?如果每台机器都返回最相关的K个文档,那么所有机器上最相关K个文档的并集肯定包含全集中最相关的K个文档。由于边界情况并不需要非常精确,如果每台机器返回最好的K’个文档,那么K’应该如何取值,以达到我们返回最相关的90%*K个文档是完全精确的,或者最终返回的最相关的K个文档精确度超过90%(最相关的K个文档中90%以上在全集中相关性的确排在前K),或者最终返回的最相关的K个文档最差的相关性排序没有超出110%*K。
      解答:正如提示中所说,可以让每台机器返回最相关的K'个文档,然后利用归并排序的思想,得到所有文档中最相关的K个。 最好的情况是这K个文档在所有机器中平均分布,这时每台机器只要K' = K / n (n为所有机器总数);最坏情况,所有最相关的K个文档只出现在其中的某一台机器上,这时K'需近似等于K了。我觉得比较好的做法可以在每台机器上维护一个堆,然后对堆顶元素实行归并排序。
       5. 如第4点所说,对于每个文档d,相对于不同的关键字q1, q2, …, qm,分别有相关性权重f(d, q1),f(d, q2), …, f(d, qm)。如果用户输入关键字qi之后,我们已经获得了最相关的K个文档,而已知关键字qj跟关键字qi相似,文档跟这两个关键字的权重大小比较靠近,那么关键字qi的最相关的K个文档,对寻找qj最相关的K个文档有没有帮助呢?
解答:肯定是有帮助的。在搜索关键字qj最相关的K个文档时,可以在qj的“近义词”相关文档中搜索部分,然后在全局的所有文档中在搜索部分。
http://www.w2bc.com/Article/72564
Read full article from leetcode面试题8 第k大的数

No comments:

Post a Comment

Labels

GeeksforGeeks (976) Algorithm (811) LeetCode (654) to-do (599) Review (362) Classic Algorithm (334) Classic Interview (298) Dynamic Programming (263) Google Interview (233) LeetCode - Review (233) Tree (146) POJ (137) Difficult Algorithm (136) EPI (127) Different Solutions (119) Bit Algorithms (110) Cracking Coding Interview (110) Smart Algorithm (109) Math (91) HackerRank (85) Lintcode (83) Binary Search (73) Graph Algorithm (73) Greedy Algorithm (61) Interview Corner (61) Binary Tree (58) List (58) DFS (56) Algorithm Interview (53) Advanced Data Structure (52) Codility (52) ComProGuide (52) LeetCode - Extended (47) USACO (46) Geometry Algorithm (45) BFS (43) Data Structure (42) Mathematical Algorithm (42) ACM-ICPC (41) Jobdu (39) Interval (38) Recursive Algorithm (38) Stack (38) String Algorithm (38) Binary Search Tree (37) Knapsack (37) Codeforces (36) Introduction to Algorithms (36) Matrix (36) Must Known (36) Beauty of Programming (35) Sort (35) Space Optimization (34) Array (33) Trie (33) prismoskills (33) Backtracking (32) Segment Tree (32) Union-Find (32) HDU (31) Google Code Jam (30) Permutation (30) Puzzles (30) Array O(N) (29) Data Structure Design (29) Company-Zenefits (28) Microsoft 100 - July (28) to-do-must (28) Random (27) Sliding Window (27) GeeksQuiz (25) Logic Thinking (25) hihocoder (25) High Frequency (23) Palindrome (23) Algorithm Game (22) Company - LinkedIn (22) Graph (22) Hash (22) Queue (22) DFS + Review (21) TopCoder (21) Binary Indexed Trees (20) Brain Teaser (20) CareerCup (20) Company - Twitter (20) Pre-Sort (20) Company-Facebook (19) UVA (19) Probabilities (18) Follow Up (17) Codercareer (16) Company-Uber (16) Game Theory (16) Heap (16) Shortest Path (16) String Search (16) Topological Sort (16) Tree Traversal (16) itint5 (16) Iterator (15) Merge Sort (15) O(N) (15) Bisection Method (14) Difficult (14) Number (14) Number Theory (14) Post-Order Traverse (14) Priority Quieue (14) Amazon Interview (13) BST (13) Basic Algorithm (13) Codechef (13) Majority (13) mitbbs (13) Combination (12) Computational Geometry (12) KMP (12) Long Increasing Sequence(LIS) (12) Modify Tree (12) Reconstruct Tree (12) Reservoir Sampling (12) 尺取法 (12) AOJ (11) DFS+Backtracking (11) Fast Power Algorithm (11) Graph DFS (11) LCA (11) LeetCode - DFS (11) Ordered Stack (11) Princeton (11) Tree DP (11) 挑战程序设计竞赛 (11) Binary Search - Bisection (10) Company - Microsoft (10) Company-Airbnb (10) Euclidean GCD (10) Facebook Hacker Cup (10) HackerRank Easy (10) Reverse Thinking (10) Rolling Hash (10) SPOJ (10) Theory (10) Tutorialhorizon (10) X Sum (10) Coin Change (9) Divide and Conquer (9) Lintcode - Review (9) Mathblog (9) Max-Min Flow (9) Stack Overflow (9) Stock (9) Two Pointers (9) Book Notes (8) Bottom-Up (8) DP-Space Optimization (8) Graph BFS (8) LeetCode - DP (8) LeetCode Hard (8) Prefix Sum (8) Prime (8) Suffix Tree (8) System Design (8) Tech-Queries (8) Time Complexity (8) Use XOR (8) 穷竭搜索 (8) Algorithm Problem List (7) DFS+BFS (7) Facebook Interview (7) Fibonacci Numbers (7) Game Nim (7) HackerRank Difficult (7) Hackerearth (7) Interval Tree (7) Linked List (7) Longest Common Subsequence(LCS) (7) Math-Divisible (7) Miscs (7) O(1) Space (7) Probability DP (7) Radix Sort (7) Simulation (7) Xpost (7) n00tc0d3r (7) 蓝桥杯 (7) Bucket Sort (6) Catalan Number (6) Classic Data Structure Impl (6) DFS+DP (6) DP - Tree (6) How To (6) Interviewstreet (6) Kadane’s Algorithm (6) Knapsack - MultiplePack (6) Level Order Traversal (6) Manacher (6) Minimum Spanning Tree (6) One Pass (6) Programming Pearls (6) Quick Select (6) Rabin-Karp (6) Randomized Algorithms (6) Sampling (6) Schedule (6) Suffix Array (6) Threaded (6) reddit (6) AI (5) Art Of Programming-July (5) Big Data (5) Brute Force (5) Code Kata (5) Codility-lessons (5) Coding (5) Company - WMware (5) Crazyforcode (5) DFS+Cache (5) DP-Multiple Relation (5) DP-Print Solution (5) Dutch Flag (5) Fast Slow Pointers (5) Graph Cycle (5) Hash Strategy (5) Immutability (5) Inversion (5) Java (5) Kadane - Extended (5) Matrix Chain Multiplication (5) Microsoft Interview (5) Morris Traversal (5) Pruning (5) Quadtrees (5) Quick Partition (5) Quora (5) SPFA(Shortest Path Faster Algorithm) (5) Subarray Sum (5) Sweep Line (5) Traversal Once (5) TreeMap (5) jiuzhang (5) to-do-2 (5) 单调栈 (5) 树形DP (5) 1point3acres (4) Anagram (4) Approximate Algorithm (4) Backtracking-Include vs Exclude (4) Brute Force - Enumeration (4) Chess Game (4) Company-Amazon (4) Consistent Hash (4) Convex Hull (4) Cycle (4) DP-Include vs Exclude (4) Dijkstra (4) Distributed (4) Eulerian Cycle (4) Flood fill (4) Graph-Classic (4) HackerRank AI (4) Histogram (4) Kadane Max Sum (4) Knapsack - Mixed (4) Knapsack - Unbounded (4) Left and Right Array (4) MinMax (4) Multiple Data Structures (4) N Queens (4) Nerd Paradise (4) Parallel Algorithm (4) Practical Algorithm (4) Pre-Sum (4) Probability (4) Programcreek (4) Quick Sort (4) Spell Checker (4) Stock Maximize (4) Subsets (4) Sudoku (4) Symbol Table (4) TreeSet (4) Triangle (4) Water Jug (4) Word Ladder (4) algnotes (4) fgdsb (4) 最大化最小值 (4) A Star (3) Abbreviation (3) Algorithm - Brain Teaser (3) Algorithm Design (3) Anagrams (3) B Tree (3) Big Data Algorithm (3) Binary Search - Smart (3) Caterpillar Method (3) Coins (3) Company - Groupon (3) Company - Indeed (3) Cumulative Sum (3) DP-Fill by Length (3) DP-Two Variables (3) Dedup (3) Dequeue (3) Dropbox (3) Easy (3) Edit Distance (3) Expression (3) Finite Automata (3) Forward && Backward Scan (3) Github (3) GoLang (3) Include vs Exclude (3) Joseph (3) Jump Game (3) Knapsack-多重背包 (3) LeetCode - Bit (3) LeetCode - TODO (3) Linked List Merge Sort (3) LogN (3) Master Theorem (3) Maze (3) Min Cost Flow (3) Minesweeper (3) Missing Numbers (3) NP Hard (3) Online Algorithm (3) Pascal's Triangle (3) Pattern Match (3) Project Euler (3) Rectangle (3) Scala (3) SegmentFault (3) Stack - Smart (3) State Machine (3) Streaming Algorithm (3) Subset Sum (3) Subtree (3) Transform Tree (3) Two Pointers Window (3) Warshall Floyd (3) With Random Pointer (3) Word Search (3) bookkeeping (3) codebytes (3) Activity Selection Problem (2) Advanced Algorithm (2) AnAlgorithmADay (2) Application of Algorithm (2) Array Merge (2) BOJ (2) BT - Path Sum (2) Balanced Binary Search Tree (2) Bellman Ford (2) Binomial Coefficient (2) Bit Mask (2) Bit-Difficult (2) Bloom Filter (2) Book Coding Interview (2) Branch and Bound Method (2) Clock (2) Codesays (2) Company - Baidu (2) Complete Binary Tree (2) DFS+BFS, Flood Fill (2) DP - DFS (2) DP-3D Table (2) DP-Classical (2) DP-Output Solution (2) DP-Slide Window Gap (2) DP-i-k-j (2) DP-树形 (2) Distributed Algorithms (2) Divide and Conqure (2) Doubly Linked List (2) GoHired (2) Graham Scan (2) Graph - Bipartite (2) Graph BFS+DFS (2) Graph Coloring (2) Graph-Cut Vertices (2) Hamiltonian Cycle (2) Huffman Tree (2) In-order Traverse (2) Include or Exclude Last Element (2) Information Retrieval (2) Interview - Linkedin (2) Invariant (2) Islands (2) Knuth Shuffle (2) LeetCode - Recursive (2) Linked Interview (2) Linked List Sort (2) Longest SubArray (2) Lucene-Solr (2) MST (2) MST-Kruskal (2) Math-Remainder Queue (2) Matrix Power (2) Minimum Vertex Cover (2) Negative All Values (2) Number Each Digit (2) Numerical Method (2) Object Design (2) Order Statistic Tree (2) Palindromic (2) Parentheses (2) Parser (2) Peak (2) Programming (2) Range Minimum Query (2) Reuse Forward Backward (2) Robot (2) Rosettacode (2) Scan from right (2) Search (2) Shuffle (2) Sieve of Eratosthenes (2) SimHash (2) Simple Algorithm (2) Skyline (2) Spatial Index (2) Stream (2) Strongly Connected Components (2) Summary (2) TV (2) Tile (2) Traversal From End (2) Tree Sum (2) Tree Traversal Return Multiple Values (2) Word Break (2) Word Graph (2) Word Trie (2) Young Tableau (2) 剑指Offer (2) 数位DP (2) 1-X (1) 51Nod (1) Akka (1) Algorithm - How To (1) Algorithm - New (1) Algorithm Series (1) Algorithms Part I (1) Analysis of Algorithm (1) Array-Element Index Negative (1) Array-Rearrange (1) Auxiliary Array (1) Auxiliary Array: Inc&Dec (1) BACK (1) BK-Tree (1) BZOJ (1) Basic (1) Bayes (1) Beauty of Math (1) Big Integer (1) Big Number (1) Binary (1) Binary Tree Variant (1) Bipartite (1) Bit-Missing Number (1) BitMap (1) BitMap index (1) BitSet (1) Bug Free Code (1) BuildIt (1) C/C++ (1) CC Interview (1) Cache (1) Calculate Height at Same Recusrion (1) Cartesian tree (1) Check Tree Property (1) Chinese (1) Circular Buffer (1) Code Quality (1) Codesolutiony (1) Company - Alibaba (1) Company - Palantir (1) Company - WalmartLabs (1) Company-Apple (1) Company-Epic (1) Company-Salesforce (1) Company-Snapchat (1) Company-Yelp (1) Compression Algorithm (1) Concurrency (1) Convert BST to DLL (1) Convert DLL to BST (1) Custom Sort (1) Cyclic Replacement (1) DFS-Matrix (1) DP - Probability (1) DP Fill Diagonal First (1) DP-Difficult (1) DP-End with 0 or 1 (1) DP-Fill Diagonal First (1) DP-Graph (1) DP-Left and Right Array (1) DP-MaxMin (1) DP-Memoization (1) DP-Node All Possibilities (1) DP-Optimization (1) DP-Preserve Previous Value (1) DP-Print All Solution (1) Database (1) Detect Negative Cycle (1) Directed Graph (1) Do Two Things at Same Recusrion (1) Domino (1) Dr Dobb's (1) Duplicate (1) Equal probability (1) External Sort (1) FST (1) Failure Function (1) Fraction (1) Front End Pointers (1) Funny (1) Fuzzy String Search (1) Game (1) Generating Function (1) Generation (1) Genetic algorithm (1) GeoHash (1) Geometry - Orientation (1) Google APAC (1) Graph But No Graph (1) Graph Transpose (1) Graph Traversal (1) Graph-Coloring (1) Graph-Longest Path (1) Gray Code (1) HOJ (1) Hanoi (1) Hard Algorithm (1) How Hash (1) How to Test (1) Improve It (1) In Place (1) Inorder-Reverse Inorder Traverse Simultaneously (1) Interpolation search (1) Interview (1) Interview - Easy (1) Interview - Facebook (1) Isomorphic (1) JDK8 (1) K Dimensional Tree (1) Knapsack - Fractional (1) Knapsack - ZeroOnePack (1) Knight (1) Kosaraju’s algorithm (1) Kruskal (1) Kruskal MST (1) Kth Element (1) Least Common Ancestor (1) LeetCode - Binary Tree (1) LeetCode - Coding (1) LeetCode - Detail (1) LeetCode - Related (1) LeetCode Diffcult (1) Linked List Reverse (1) Linkedin (1) Linkedin Interview (1) Local MinMax (1) Logic Pattern (1) Longest Common Subsequence (1) Longest Common Substring (1) Longest Prefix Suffix(LPS) (1) Manhattan Distance (1) Map && Reverse Map (1) Math - Induction (1) Math-Multiply (1) Math-Sum Of Digits (1) Matrix - O(N+M) (1) Matrix BFS (1) Matrix Graph (1) Matrix Search (1) Matrix+DP (1) Matrix-Rotate (1) Max Min So Far (1) Median (1) Memory-Efficient (1) MinHash (1) MinMax Heap (1) Monotone Queue (1) Monto Carlo (1) Multi-Reverse (1) Multiple DFS (1) Multiple Tasks (1) Next Successor (1) Offline Algorithm (1) PAT (1) Parent-Only Tree (1) Partition (1) Path Finding (1) Patience Sort (1) Persistent (1) Pigeon Hole Principle (1) Power Set (1) Pratical Algorithm (1) Probabilistic Data Structure (1) Proof (1) Python (1) Queue & Stack (1) RSA (1) Ranking (1) Rddles (1) ReHash (1) Realtime (1) Recurrence Relation (1) Recursive DFS (1) Recursive to Iterative (1) Red-Black Tree (1) Region (1) Regular Expression (1) Resources (1) Reverse Inorder Traversal (1) Robin (1) Selection (1) Self Balancing BST (1) Similarity (1) Sort && Binary Search (1) String Algorithm. Symbol Table (1) String DP (1) String Distance (1) SubMatrix (1) Subsequence (1) System of Difference Constraints(差分约束系统) (1) TSP (1) Ternary Search Tree (1) Test (1) Thread (1) TimSort (1) Top-Down (1) Tournament (1) Tournament Tree (1) Transform Tree in Place (1) Tree Diameter (1) Tree Rotate (1) Trie + DFS (1) Trie and Heap (1) Trie vs Hash (1) Trie vs HashMap (1) Triplet (1) Two Data Structures (1) Two Stacks (1) USACO - Classical (1) USACO - Problems (1) UyHiP (1) Valid Tree (1) Vector (1) Wiggle Sort (1) Wikipedia (1) Yahoo Interview (1) ZOJ (1) baozitraining (1) codevs (1) cos126 (1) javabeat (1) jum (1) namic Programming (1) sqrt(N) (1) 两次dijkstra (1) 九度 (1) 二进制枚举 (1) 夹逼法 (1) 归一化 (1) 折半枚举 (1) 枚举 (1) 状态压缩DP (1) 男人八题 (1) 英雄会 (1) 逆向思维 (1)

Popular Posts