TTL HashMap


支持存活时间的HashMap
设计一个HashMap,需要完成
  • put(key, value, ttl)
  • get(key)
  • size()
题目中的ttl是time to live。就是过多长时间这个value就失效了。

我能想出来O(N)的方法和O(logN)的方法,但是我觉得都不是面试官想要的。求教能不能有O(1)的方法。

https://www.1point3acres.com/bbs ... read&tid=459371
https://www.1point3acres.com/bbs ... read&tid=203157
https://www.1point3acres.com/bbs/thread-230236-1-1.html

这个put和get O(1)都好办。

不知道这个size()是不是一定要指还活着的keys。如果是,那么这个size()比较难想。如果不用多线程的话,任何时间别人叫size()都能正确回答,我能想到的也就只有 O(Log N)而且这样也会导致put变成 O(Log N)了。

这个问题我之前在一个回帖里稍微写过一点,复制过来..

Map with expiration可以参考一下比较有名的key-value cache例如Redis和Memcached的实现,其实它们就是用的lazy check TTL + daemon clean up,关于expiration方面好像没有什么更fancy的东西了。这边举一下Memcached的例子:
--
(1) Memcached的item结构中的timestamp部分:
https://github.com/memcached/mem ... er/memcached.h#L471

(2) Memcached的get操作中进行lazy expiration:
https://github.com/memcached/memcached/blob/master/items.c#L1007

(3) Memcached的daemon thread cleanup:
Daemon线程的主循环在这边:
https://github.com/memcached/memcached/blob/master/crawler.c#L365

其在主循环中遍历(Memcached中称为crawl)一定数量的item并且清理,通过加mutex锁来避免与get/set之类的操作产生race:
clean up的锁:https://github.com/memcached/memcached/blob/master/crawler.c#L398
get item的锁:https://github.com/memcached/memcached/blob/master/thread.c#L563

然后主循环第419行通过函数指针调用了crawler_expired_eval()来做具体的expire工作:
https://github.com/memcached/memcached/blob/master/crawler.c#L194
--

然后是getSize()的问题。首先面试题里提到的getSize()好像是要返回当前时间所有“真正”没有expire的元素个数 —— 但现实中什么场景下需要知道这个比较精确的净值呢?而且Redis和Memcached都不支持这一命令(它们返回的count应该都是包括已经expire但没有清除的元素的),总体上感觉getSize()只是一个只是用来面试的伪需求。

不过一定要支持的话也仍然可以用牺牲一点“实时性”的方法。

首先不妨假定expire的时间最长不能设到两天以后,那么最多就是3600 * 24 * 2 = 172800秒。现在开一个长度为172800的int型循环队列S,用数组实现(以支持对任意位置的随机访问),那么只要不到1MB的内存空间。这个队列就是记录未来每一秒内有多少个item将要expire的,每次add/update/remove item时都可以用O(1)时间维护。

同时,维护一个atomic int型的total_live_items变量,每次add/remove item时也是O(1)时间更新。最后,再用一个daemon线程,每隔一秒将total_live_item减去当前这秒对应的队列S的头部值,并将S向后滚动一秒。

这样getSize()直接返回total_live_items的值,就可以保证结果的实时性在1秒以内

当然,也可以将时间尺度分得再细一些,比如每10毫秒一个slot,那么循环队列就要扩大100倍,也就是占用200M不到的空间,也可以接受。

补充内容 (2019-3-11 13:41):
那个“(1) Memcached的item结构中的timestamp部分”的链接应该是:
https://github.com/memcached/mem ... er/memcached.h#L471

Try to expire a few timed out keys. The algorithm used is adaptive and will use few CPU cycles if there are few expiring keys, otherwise it will get more aggressive to avoid that too much memory is used by keys that can be removed from the keyspace.

#define ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP 20 /* Loopkups per loop. */


https://github.com/antirez/redis/blob/5.0/src/expire.c#L229

在key空间不是太稀疏的时候,对于带有TTL的keys,随机选取20个,如果5个(概率25%)的key已经过期,就继续做这个操作,使他们过期。


Labels

LeetCode (1432) GeeksforGeeks (1122) LeetCode - Review (1067) Review (882) Algorithm (668) to-do (609) Classic Algorithm (270) Google Interview (237) Classic Interview (222) Dynamic Programming (220) DP (186) Bit Algorithms (145) POJ (141) Math (137) Tree (132) LeetCode - Phone (129) EPI (122) Cracking Coding Interview (119) DFS (115) Difficult Algorithm (115) Lintcode (115) Different Solutions (110) Smart Algorithm (104) Binary Search (96) BFS (91) HackerRank (90) Binary Tree (86) Hard (79) Two Pointers (78) Stack (76) Company-Facebook (75) BST (72) Graph Algorithm (72) Time Complexity (69) Greedy Algorithm (68) Interval (63) Company - Google (62) Geometry Algorithm (61) Interview Corner (61) LeetCode - Extended (61) Union-Find (60) Trie (58) Advanced Data Structure (56) List (56) Priority Queue (53) Codility (52) ComProGuide (50) LeetCode Hard (50) Matrix (50) Bisection (48) Segment Tree (48) Sliding Window (48) USACO (46) Space Optimization (45) Company-Airbnb (41) Greedy (41) Mathematical Algorithm (41) Tree - Post-Order (41) ACM-ICPC (40) Algorithm Interview (40) Data Structure Design (40) Graph (40) Backtracking (39) Data Structure (39) Jobdu (39) Random (39) Codeforces (38) Knapsack (38) LeetCode - DP (38) Recursive Algorithm (38) String Algorithm (38) TopCoder (38) Sort (37) Introduction to Algorithms (36) Pre-Sort (36) Beauty of Programming (35) Must Known (34) Binary Search Tree (33) Follow Up (33) prismoskills (33) Palindrome (32) Permutation (31) Array (30) Google Code Jam (30) HDU (30) Array O(N) (29) Logic Thinking (29) Monotonic Stack (29) Puzzles (29) Code - Detail (27) Company-Zenefits (27) Microsoft 100 - July (27) Queue (27) Binary Indexed Trees (26) TreeMap (26) to-do-must (26) 1point3acres (25) GeeksQuiz (25) Merge Sort (25) Reverse Thinking (25) hihocoder (25) Company - LinkedIn (24) Hash (24) High Frequency (24) Summary (24) Divide and Conquer (23) Proof (23) Game Theory (22) Topological Sort (22) Lintcode - Review (21) Tree - Modification (21) Algorithm Game (20) CareerCup (20) Company - Twitter (20) DFS + Review (20) DP - Relation (20) Brain Teaser (19) DP - Tree (19) Left and Right Array (19) O(N) (19) Sweep Line (19) UVA (19) DP - Bit Masking (18) LeetCode - Thinking (18) KMP (17) LeetCode - TODO (17) Probabilities (17) Simulation (17) String Search (17) Codercareer (16) Company-Uber (16) Iterator (16) Number (16) O(1) Space (16) Shortest Path (16) itint5 (16) DFS+Cache (15) Dijkstra (15) Euclidean GCD (15) Heap (15) LeetCode - Hard (15) Majority (15) Number Theory (15) Rolling Hash (15) Tree Traversal (15) Brute Force (14) Bucket Sort (14) DP - Knapsack (14) DP - Probability (14) Difficult (14) Fast Power Algorithm (14) Pattern (14) Prefix Sum (14) TreeSet (14) Algorithm Videos (13) Amazon Interview (13) Basic Algorithm (13) Codechef (13) Combination (13) Computational Geometry (13) DP - Digit (13) LCA (13) LeetCode - DFS (13) Linked List (13) Long Increasing Sequence(LIS) (13) Math-Divisible (13) Reservoir Sampling (13) mitbbs (13) Algorithm - How To (12) Company - Microsoft (12) DP - Interval (12) DP - Multiple Relation (12) DP - Relation Optimization (12) LeetCode - Classic (12) Level Order Traversal (12) Prime (12) Pruning (12) Reconstruct Tree (12) Thinking (12) X Sum (12) AOJ (11) Bit Mask (11) Company-Snapchat (11) DP - Space Optimization (11) Dequeue (11) Graph DFS (11) MinMax (11) Miscs (11) Princeton (11) Quick Sort (11) Stack - Tree (11) 尺取法 (11) 挑战程序设计竞赛 (11) Coin Change (10) DFS+Backtracking (10) Facebook Hacker Cup (10) Fast Slow Pointers (10) HackerRank Easy (10) Interval Tree (10) Limited Range (10) Matrix - Traverse (10) Monotone Queue (10) SPOJ (10) Starting Point (10) States (10) Stock (10) Theory (10) Tutorialhorizon (10) Kadane - Extended (9) Mathblog (9) Max-Min Flow (9) Maze (9) Median (9) O(32N) (9) Quick Select (9) Stack Overflow (9) System Design (9) Tree - Conversion (9) Use XOR (9) Book Notes (8) Company-Amazon (8) DFS+BFS (8) DP - States (8) Expression (8) Longest Common Subsequence(LCS) (8) One Pass (8) Quadtrees (8) Traversal Once (8) Trie - Suffix (8) 穷竭搜索 (8) Algorithm Problem List (7) All Sub (7) Catalan Number (7) Cycle (7) DP - Cases (7) Facebook Interview (7) Fibonacci Numbers (7) Flood fill (7) Game Nim (7) Graph BFS (7) HackerRank Difficult (7) Hackerearth (7) Inversion (7) Kadane’s Algorithm (7) Manacher (7) Morris Traversal (7) Multiple Data Structures (7) Normalized Key (7) O(XN) (7) Radix Sort (7) Recursion (7) Sampling (7) Suffix Array (7) Tech-Queries (7) Tree - Serialization (7) Tree DP (7) Trie - Bit (7) 蓝桥杯 (7) Algorithm - Brain Teaser (6) BFS - Priority Queue (6) BFS - Unusual (6) Classic Data Structure Impl (6) DP - 2D (6) DP - Monotone Queue (6) DP - Unusual (6) DP-Space Optimization (6) Dutch Flag (6) How To (6) Interviewstreet (6) Knapsack - MultiplePack (6) Local MinMax (6) MST (6) Minimum Spanning Tree (6) Number - Reach (6) Parentheses (6) Pre-Sum (6) Probability (6) Programming Pearls (6) Rabin-Karp (6) Reverse (6) Scan from right (6) Schedule (6) Stream (6) Subset Sum (6) TSP (6) Xpost (6) n00tc0d3r (6) reddit (6) AI (5) Abbreviation (5) Anagram (5) Art Of Programming-July (5) Assumption (5) Bellman Ford (5) Big Data (5) Code - Solid (5) Code Kata (5) Codility-lessons (5) Coding (5) Company - WMware (5) Convex Hull (5) Crazyforcode (5) DFS - Multiple (5) DFS+DP (5) DP - Multi-Dimension (5) DP-Multiple Relation (5) Eulerian Cycle (5) Graph - Unusual (5) Graph Cycle (5) Hash Strategy (5) Immutability (5) Java (5) LogN (5) Manhattan Distance (5) Matrix Chain Multiplication (5) N Queens (5) Pre-Sort: Index (5) Quick Partition (5) Quora (5) Randomized Algorithms (5) Resources (5) Robot (5) SPFA(Shortest Path Faster Algorithm) (5) Shuffle (5) Sieve of Eratosthenes (5) Strongly Connected Components (5) Subarray Sum (5) Sudoku (5) Suffix Tree (5) Swap (5) Threaded (5) Tree - Creation (5) Warshall Floyd (5) Word Search (5) jiuzhang (5)

Popular Posts