Top K with Label


https://docs.google.com/document/d/1qxA2wps0IhVRWULulQ55W4SGPMu2AE5MkBB37h8Dr58/edit#heading=h.t18sa5iilvre

Top k的value,受label限制


第三轮:
value: 9,  8, 6, 8, 7.
label:  A, A, B, A,B
输入一些node, node里有value和label, 找出K个value最大的, 但每个label不能超过M个. 比如K=3, M=2, 答案是 9,  8, 7. 8 > 7, 但是A的数量不能超过2.
                                                                                                                                                                                                 A, A, B A B
先sort by value再从大到小scan,复杂度O(nlogn).
面试官又写了另外三个其它可行的复杂度, A:O(n), B: O(nlogk), C: 不记得了. 问我想再试试哪个, 我选了O(nlogk), 感觉像是heap. 一开始想错了, 面试官直接说这样不行, 很多人都这样想过但就是不对. 然后就是全程带着我做, 还让我把算法的步骤写在白板上, 说是思路更清楚. 我当时都有点儿懵,没见过这样的... 再加上还在想着上一轮bug的事儿, 精力有点不太集中. 后来想出来了, 也不难. 对每个label用heap选出M个最大的, 然后再从这些用heap选出K个最大的, 最后就是O(nlogk). 中午带着吃饭的大哥说这人好像是欧洲那边的, style确实不太一样, 有点儿严厉, 但还有点儿可爱, 挺有意思. 讨论完时间不多了, 就让写了O(nlogn)的解法, 简单问了问如何测试.
思路:
如题主描述

code

class Solution:

   def topk(self, elements, k, m):
       '''

       :param elements: [val, label]
       :param k:
       :param m:
       :return:
       '''

       # O(nlogn)
       def _topk0(elements, k, m):
           elements = sorted(elements, reverse=True)
           ans = []
           labels = {}
           for val, lb in elements:
               if labels.get(lb, 0) >= m:
                   continue
               ans.append(val)
               labels[lb] = labels.get(lb, 0) + 1
               if len(ans) == k:
                   break
           return ans

       # O(nlogk)
       # heap for each label(len == m)
       # merge heap
       def _topk1(elements, k, m):
           from collections import defaultdict
           import heapq
           labels = defaultdict(list)

           for val, label in elements:
               pq = labels[label]
               heapq.heappush(pq, val)
               if len(pq) > m:
                   heapq.heappop(pq)

           que = []
           for pq in labels.values():
               for e in pq:
                   heapq.heappush(que, e)
                   if len(que) > k:
                       heapq.heappop(que)
           return que

       # O(n)
       # 用quick select,对每个label都求出第m大,然后再merge成一个新的数组,求第k大
       def _topk2(elements, k, m):
           
           def quick_select(arr, lo, hi, p):
               i, j, k = lo, lo, hi - 1
               pi = arr[p]
               while j <= k:
                   if arr[j] == pi:
                       j += 1
                   elif arr[j] > pi:
                       arr[i], arr[j] = arr[j], arr[i]
                       i += 1
                       j += 1
                   else:
                       arr[j], arr[k] = arr[k], arr[j]
                       k -= 1
               return i

           def kth(nums,  m):
               lo, hi = 0, len(nums)
               while lo < hi:
                   mid = (lo + hi) >> 1
                   mid = quick_select(nums, lo, hi, mid)
                   if mid == m - 1:
                       break
                   elif mid < m - 1:
                       lo = mid + 1
                   else:
                       hi = mid

           from collections import defaultdict
           labels = defaultdict(list)
           for val, label in elements:
               labels[label].append(val)

           ans = []
           for nums in labels.values():
               kth(nums, m)
               ans.extend(nums[:m])

           kth(ans, k)
           return ans[:k]

       ans0 = _topk0(elements, k, m)
       ans1 = _topk1(elements, k, m)
       ans2 = _topk2(elements, k, m)
       assert( sorted(ans0) == sorted(ans1) == sorted(ans2))

import random
for _ in range(100):
   elements = []
   for i in range(30):
       elements.append([random.randint(1, 100), random.randint(0, 4)])
   k = random.randint(5, 15)
   m = random.randint(2, 5)
   so = Solution()
   so.topk(elements, k, m)

我感觉这个O(N)的算法,很可能是用quick select,对每个label都求出第m大,然后再merge成一个新的数组,求第k大

Labels

LeetCode (1432) GeeksforGeeks (1122) LeetCode - Review (1067) Review (882) Algorithm (668) to-do (609) Classic Algorithm (270) Google Interview (237) Classic Interview (222) Dynamic Programming (220) DP (186) Bit Algorithms (145) POJ (141) Math (137) Tree (132) LeetCode - Phone (129) EPI (122) Cracking Coding Interview (119) DFS (115) Difficult Algorithm (115) Lintcode (115) Different Solutions (110) Smart Algorithm (104) Binary Search (96) BFS (91) HackerRank (90) Binary Tree (86) Hard (79) Two Pointers (78) Stack (76) Company-Facebook (75) BST (72) Graph Algorithm (72) Time Complexity (69) Greedy Algorithm (68) Interval (63) Company - Google (62) Geometry Algorithm (61) Interview Corner (61) LeetCode - Extended (61) Union-Find (60) Trie (58) Advanced Data Structure (56) List (56) Priority Queue (53) Codility (52) ComProGuide (50) LeetCode Hard (50) Matrix (50) Bisection (48) Segment Tree (48) Sliding Window (48) USACO (46) Space Optimization (45) Company-Airbnb (41) Greedy (41) Mathematical Algorithm (41) Tree - Post-Order (41) ACM-ICPC (40) Algorithm Interview (40) Data Structure Design (40) Graph (40) Backtracking (39) Data Structure (39) Jobdu (39) Random (39) Codeforces (38) Knapsack (38) LeetCode - DP (38) Recursive Algorithm (38) String Algorithm (38) TopCoder (38) Sort (37) Introduction to Algorithms (36) Pre-Sort (36) Beauty of Programming (35) Must Known (34) Binary Search Tree (33) Follow Up (33) prismoskills (33) Palindrome (32) Permutation (31) Array (30) Google Code Jam (30) HDU (30) Array O(N) (29) Logic Thinking (29) Monotonic Stack (29) Puzzles (29) Code - Detail (27) Company-Zenefits (27) Microsoft 100 - July (27) Queue (27) Binary Indexed Trees (26) TreeMap (26) to-do-must (26) 1point3acres (25) GeeksQuiz (25) Merge Sort (25) Reverse Thinking (25) hihocoder (25) Company - LinkedIn (24) Hash (24) High Frequency (24) Summary (24) Divide and Conquer (23) Proof (23) Game Theory (22) Topological Sort (22) Lintcode - Review (21) Tree - Modification (21) Algorithm Game (20) CareerCup (20) Company - Twitter (20) DFS + Review (20) DP - Relation (20) Brain Teaser (19) DP - Tree (19) Left and Right Array (19) O(N) (19) Sweep Line (19) UVA (19) DP - Bit Masking (18) LeetCode - Thinking (18) KMP (17) LeetCode - TODO (17) Probabilities (17) Simulation (17) String Search (17) Codercareer (16) Company-Uber (16) Iterator (16) Number (16) O(1) Space (16) Shortest Path (16) itint5 (16) DFS+Cache (15) Dijkstra (15) Euclidean GCD (15) Heap (15) LeetCode - Hard (15) Majority (15) Number Theory (15) Rolling Hash (15) Tree Traversal (15) Brute Force (14) Bucket Sort (14) DP - Knapsack (14) DP - Probability (14) Difficult (14) Fast Power Algorithm (14) Pattern (14) Prefix Sum (14) TreeSet (14) Algorithm Videos (13) Amazon Interview (13) Basic Algorithm (13) Codechef (13) Combination (13) Computational Geometry (13) DP - Digit (13) LCA (13) LeetCode - DFS (13) Linked List (13) Long Increasing Sequence(LIS) (13) Math-Divisible (13) Reservoir Sampling (13) mitbbs (13) Algorithm - How To (12) Company - Microsoft (12) DP - Interval (12) DP - Multiple Relation (12) DP - Relation Optimization (12) LeetCode - Classic (12) Level Order Traversal (12) Prime (12) Pruning (12) Reconstruct Tree (12) Thinking (12) X Sum (12) AOJ (11) Bit Mask (11) Company-Snapchat (11) DP - Space Optimization (11) Dequeue (11) Graph DFS (11) MinMax (11) Miscs (11) Princeton (11) Quick Sort (11) Stack - Tree (11) 尺取法 (11) 挑战程序设计竞赛 (11) Coin Change (10) DFS+Backtracking (10) Facebook Hacker Cup (10) Fast Slow Pointers (10) HackerRank Easy (10) Interval Tree (10) Limited Range (10) Matrix - Traverse (10) Monotone Queue (10) SPOJ (10) Starting Point (10) States (10) Stock (10) Theory (10) Tutorialhorizon (10) Kadane - Extended (9) Mathblog (9) Max-Min Flow (9) Maze (9) Median (9) O(32N) (9) Quick Select (9) Stack Overflow (9) System Design (9) Tree - Conversion (9) Use XOR (9) Book Notes (8) Company-Amazon (8) DFS+BFS (8) DP - States (8) Expression (8) Longest Common Subsequence(LCS) (8) One Pass (8) Quadtrees (8) Traversal Once (8) Trie - Suffix (8) 穷竭搜索 (8) Algorithm Problem List (7) All Sub (7) Catalan Number (7) Cycle (7) DP - Cases (7) Facebook Interview (7) Fibonacci Numbers (7) Flood fill (7) Game Nim (7) Graph BFS (7) HackerRank Difficult (7) Hackerearth (7) Inversion (7) Kadane’s Algorithm (7) Manacher (7) Morris Traversal (7) Multiple Data Structures (7) Normalized Key (7) O(XN) (7) Radix Sort (7) Recursion (7) Sampling (7) Suffix Array (7) Tech-Queries (7) Tree - Serialization (7) Tree DP (7) Trie - Bit (7) 蓝桥杯 (7) Algorithm - Brain Teaser (6) BFS - Priority Queue (6) BFS - Unusual (6) Classic Data Structure Impl (6) DP - 2D (6) DP - Monotone Queue (6) DP - Unusual (6) DP-Space Optimization (6) Dutch Flag (6) How To (6) Interviewstreet (6) Knapsack - MultiplePack (6) Local MinMax (6) MST (6) Minimum Spanning Tree (6) Number - Reach (6) Parentheses (6) Pre-Sum (6) Probability (6) Programming Pearls (6) Rabin-Karp (6) Reverse (6) Scan from right (6) Schedule (6) Stream (6) Subset Sum (6) TSP (6) Xpost (6) n00tc0d3r (6) reddit (6) AI (5) Abbreviation (5) Anagram (5) Art Of Programming-July (5) Assumption (5) Bellman Ford (5) Big Data (5) Code - Solid (5) Code Kata (5) Codility-lessons (5) Coding (5) Company - WMware (5) Convex Hull (5) Crazyforcode (5) DFS - Multiple (5) DFS+DP (5) DP - Multi-Dimension (5) DP-Multiple Relation (5) Eulerian Cycle (5) Graph - Unusual (5) Graph Cycle (5) Hash Strategy (5) Immutability (5) Java (5) LogN (5) Manhattan Distance (5) Matrix Chain Multiplication (5) N Queens (5) Pre-Sort: Index (5) Quick Partition (5) Quora (5) Randomized Algorithms (5) Resources (5) Robot (5) SPFA(Shortest Path Faster Algorithm) (5) Shuffle (5) Sieve of Eratosthenes (5) Strongly Connected Components (5) Subarray Sum (5) Sudoku (5) Suffix Tree (5) Swap (5) Threaded (5) Tree - Creation (5) Warshall Floyd (5) Word Search (5) jiuzhang (5)

Popular Posts