Massive Algorithms: LeetCode 460

https://leetcode.com/problems/lfu-cache/

Design and implement a data structure for Least Frequently Used (LFU) cache. It should support the following operations: getand set.

get(key) - Get the value (will always be positive) of the key if the key exists in the cache, otherwise return -1.
set(key, value) - Set or insert the value if the key is not already present. When the cache reaches its capacity, it should invalidate the least frequently used item before inserting a new item. For the purpose of this problem, when there is a tie (i.e., two or more keys that have the same frequency), the least recently used key would be evicted.

Follow up:
Could you do both operations in O(1) time complexity?

Example:

LFUCache cache = new LFUCache( 2 /* capacity */ );

cache.set(1, 1);
cache.set(2, 2);
cache.get(1);       // returns 1
cache.set(3, 3);    // evicts key 2
cache.get(2);       // returns -1 (not found)
cache.get(3);       // returns 3.
cache.set(4, 4);    // evicts key 1.
cache.get(1);       // returns -1 (not found)
cache.get(3);       // returns 3
cache.get(4);       // returns 4

双向链表（Doubly Linked List） + 哈希表（Hash Table）

首先定义双向链表节点：KeyNode（Key节点）与FreqNode（频度节点）。

KeyNode中保存key（键），value（值），freq（频度），prev（前驱），next（后继）

FreqNode中保存freq（频度）、prev（前驱）、next（后继）、first（指向最新的KeyNode），last（指向最老的KeyNode）

在数据结构LFUCache中维护如下属性：

capacity：缓存的容量

keyDict：从key到KeyNode的映射

freqDict：从freq到FreqNode的映射

head：指向最小的FreqNode

整体数据结构设计如下图所示：

head --- FreqNode1 ---- FreqNode2 ---- ... ---- FreqNodeN
              |               |                       |               
            first           first                   first             
              |               |                       |               
           KeyNodeA        KeyNodeE                KeyNodeG           
              |               |                       |               
           KeyNodeB        KeyNodeF                KeyNodeH           
              |               |                       |               
           KeyNodeC         last                   KeyNodeI           
              |                                       |      
           KeyNodeD                                 last
              |
            last

LFUCache操作实现如下：

set(key, value)：

如果capacity为0，忽略当前操作，结束

如果keyDict中包含key，则替换其value，更新节点频度，结束

否则，如果当前keyDict的长度 == capcity，移除head.last（频度最低且最老的KeyNode）

新增KeyNode(key, value)，加入keyDict，并更新freqDict

get(key)：

若keyDict中包含key，则更新节点频度，返回对应的value

否则，返回-1

节点频度的更新：

从keyDict中找到对应的KeyNode，然后通过KeyNode的freq值，从freqDict找到对应的FreqNode

如果FreqNode的next节点不等于freq + 1，则在其右侧插入一个值为freq + 1的新FreqNode节点

将KeyNode的freq值+1后，从当前KeyNode链表转移到新的FreqNode对应的KeyNode链表

如果KeyNode移动之后，原来的FreqNode对应的KeyNode链表为空，则删除原来的FreqNode

在操作完毕后如果涉及到head的变更，则更新head

https://discuss.leetcode.com/topic/69137/java-o-1-accept-solution-using-hashmap-doublelinkedlist-and-linkedhashset
Two HashMaps are used, one to store <key, value> pair, another store the <key, node>.
I use double linked list to keep the frequent of each key. In each double linked list node, keys with the same count are saved using java built in LinkedHashSet. This can keep the order.
Every time, one key is referenced, first find the current node corresponding to the key, If the following node exist and the frequent is larger by one, add key to the keys of the following node, else create a new node and add it following the current node.
All operations are guaranteed to be O(1).

public class LFUCache {
    private Node head = null;
    private int cap = 0;
    private HashMap<Integer, Integer> valueHash = null;
    private HashMap<Integer, Node> nodeHash = null;
    
    public LFUCache(int capacity) {
        this.cap = capacity;
        valueHash = new HashMap<Integer, Integer>();
        nodeHash = new HashMap<Integer, Node>();
    }
    
    public int get(int key) {
        if (valueHash.containsKey(key)) {
            increaseCount(key);
            return valueHash.get(key);
        }
        return -1;
    }
    
    public void set(int key, int value) {
        if ( cap == 0 ) return;
        if (valueHash.containsKey(key)) {
            valueHash.put(key, value);
        } else {
            if (valueHash.size() < cap) {
                valueHash.put(key, value);
            } else {
                removeOld();
                valueHash.put(key, value);
            }
            addToHead(key);
        }
        increaseCount(key);
    }
    
    private void addToHead(int key) {
        if (head == null) {
            head = new Node(0);
            head.keys.add(key);
        } else if (head.count > 0) {
            Node node = new Node(0);
            node.keys.add(key);
            node.next = head;
            head.prev = node;
            head = node;
        } else {
            head.keys.add(key);
        }
        nodeHash.put(key, head);      
    }
    
    private void increaseCount(int key) {
        Node node = nodeHash.get(key);
        node.keys.remove(key);
        
        if (node.next == null) {
            node.next = new Node(node.count+1);
            node.next.prev = node;
            node.next.keys.add(key);
        } else if (node.next.count == node.count+1) {
            node.next.keys.add(key);
        } else {
            Node tmp = new Node(node.count+1);
            tmp.keys.add(key);
            tmp.prev = node;
            tmp.next = node.next;
            node.next.prev = tmp;
            node.next = tmp;
        }

        nodeHash.put(key, node.next);
        if (node.keys.size() == 0) remove(node);
    }
    
    private void removeOld() {
        if (head == null) return;
        int old = 0;
        for (int n: head.keys) {
            old = n;
            break;
        }
        head.keys.remove(old);
        if (head.keys.size() == 0) remove(head);
        nodeHash.remove(old);
        valueHash.remove(old);
    }
    
    private void remove(Node node) {
        if (node.prev == null) {
            head = node.next;
        } else {
            node.prev.next = node.next;
        } 
        if (node.next != null) {
            node.next.prev = node.prev;
        }
    }
    
    class Node {
        public int count = 0;
        public LinkedHashSet<Integer> keys = null;
        public Node prev = null, next = null;
        
        public Node(int count) {
            this.count = count;
            keys = new LinkedHashSet<Integer>();
            prev = next = null;
        }
    }
}

X. using 3 HashMaps and LinkedHashSet
https://www.cnblogs.com/Dylan-Java-NYC/p/6475008.html

去掉least frequently used element, 就需要一个min来maintain到目前最不被利用的元素的利用次数.

用三个map, 一个是维护正常key value pair的HashMap<Integer, Integer> keyVals.

第二个是维护每个key的使用次数.

第三个是维护每个count下对应的key set.

当put第一个元素时, min=1, 对应更新keyVals, keyCounts 和 countKeySets.

get时, key的count要加一, 对应调整keyCounts 和 countKeySets. 若这个key的count恰巧是最少使用次数的最后一个值，那么最少使用次数min++.

在达到capacity后在加新key时利用min来找到least frequently used element, 并对应调整keyVals, keyCounts 和 countKeySets.

http://www.cnblogs.com/grandyang/p/6258459.html

这道题是让我们实现最近不常用页面置换算法LFU (Least Frequently Used), 之前我们做过一道类似的题LRU Cache，让我们求最近最少使用页面置换算法LRU (Least Recnetly Used)。两种算法虽然名字看起来很相似，但是其实是不同的。顾名思义，LRU算法是首先淘汰最长时间未被使用的页面，而LFU是先淘汰一定时间内被访问次数最少的页面。光说无凭，举个例子来看看，比如说我们的cache的大小为3，然后我们按顺序存入 5，4，5，4，5，7，这时候cache刚好被装满了，因为put进去之前存在的数不会占用额外地方。那么此时我们想再put进去一个8，如果使用LRU算法，应该将4删除，因为4最久未被使用，而如果使用LFU算法，则应该删除7，因为7被使用的次数最少，只使用了一次。相信这个简单的例子可以大概说明二者的区别。

这道题比之前那道LRU的题目还要麻烦一些，因为那道题只要用个list把数字按时间顺序存入，链表底部的位置总是最久未被使用的，每次删除底部的值即可。而这道题不一样，由于需要删除最少次数的数字，那么我们必须要统计每一个key出现的次数，所以我们用一个哈希表m来记录当前数据{key, value}和其出现次数之间的映射，这样还不够，为了方便操作，我们需要把相同频率的key都放到一个list中，那么需要另一个哈希表freq来建立频率和一个里面所有key都是当前频率的list之间的映射。由于题目中要我们在O(1)的时间内完成操作了，为了快速的定位freq中key的位置，我们再用一个哈希表iter来建立key和freq中key的位置之间的映射。最后当然我们还需要两个变量cap和minFreq，分别来保存cache的大小，和当前最小的频率。

为了更好的讲解思路，我们还是用例子来说明吧，我们假设cache的大小为2，假设我们已经按顺序put进去5，4，那么来看一下内部的数据是怎么保存的，由于value的值并不是很重要，为了不影响key和frequence，我们采用value#来标记：

m:

5 -> {value5, 1}

4 -> {value4, 1}

freq:

1 -> {5，4}

iter:

4 -> list.begin() + 1

5 -> list.begin()

这应该不是很难理解，m中5对应的频率为1，4对应的频率为1，然后freq中频率为1的有4和5。iter中是key所在freq中对应链表中的位置的iterator。然后我们的下一步操作是get(5)，下面是get需要做的步骤：

1. 如果m中不存在5，那么返回-1

2. 从freq中频率为1的list中将5删除

3. 将m中5对应的frequence值自增1

4. 将5保存到freq中频率为2的list的末尾

5. 在iter中保存5在freq中频率为2的list中的位置

6. 如果freq中频率为minFreq的list为空，minFreq自增1

7. 返回m中5对应的value值

经过这些步骤后，我们再来看下此时内部数据的值：

m:

5 -> {value5, 2}

4 -> {value4, 1}

freq:

1 -> {4}

2 -> {5}

iter:

4 -> list.begin()

5 -> list.begin()

这应该不是很难理解，m中5对应的频率为2，4对应的频率为1，然后freq中频率为1的只有4，频率为2的只有5。iter中是key所在freq中对应链表中的位置的iterator。然后我们下一步操作是要put进去一个7，下面是put需要做的步骤：

1. 如果调用get(7)返回的结果不是-1，那么在将m中7对应的value更新为当前value，并返回

2. 如果此时m的大小大于了cap，即超过了cache的容量，则：

　　a）在m中移除minFreq对应的list的首元素的纪录，即移除4 -> {value4, 1}

　　b）在iter中清除4对应的纪录，即移除4 -> list.begin()

　　c）在freq中移除minFreq对应的list的首元素，即移除4

3. 在m中建立7的映射，即 7 -> {value7, 1}

4. 在freq中频率为1的list末尾加上7

5. 在iter中保存7在freq中频率为1的list中的位置

6. minFreq重置为1

经过这些步骤后，我们再来看下此时内部数据的值：

m:

5 -> {value5, 2}

7 -> {value7, 1}

freq:

1 -> {7}

2 -> {5}

iter:

7 -> list.begin()

5 -> list.begin()

https://www.jianshu.com/p/14f6db6ebfc5

LFU Cache的经典solution是用doubly linkded list (http://dhruvbird.com/lfu.pdf)，上文解法用三个Hashmap模拟了这个doubly linked list，非常巧妙. 另外trick是在set时调用get，并讨论考虑get != -1的情况. 本文变化是当capacity reach full时，将队尾pop出去. 注意，第一个keyValue Hashmap: key => {value, frequency}
https://medium.com/@nanofaroque/lfu-cache-in-o-1-in-java-4bac0892bdb3

To put and get data in Java in O(1), We need to use Map or more specifically HashMap.

HashMap<K,V>

Since we need to find the least frequently used item to remove it for putting new data, we need a counter to keep track number of times a Key(K) has been accessed. Access could be get or put. To achieve that we need another Map<K,C>;K is the key of the item to put and C is the counter.

HashMap<K,C>

From the above two data structure, we can put and get data in O(1). We can also get the counter of an item has been used.

Another thing, we need, is a list where we can store the information of count and items key. Lets elaborate that in details, assume A has been used 5times, B also has been used 5times. We need to store that information such a ways that will hold the items in a list based on their insertion order. (FIFO). To achieve that we can use HashSet<K> and more precisely LinkedHashSet<K>. But we want to keep track of the counter as well(in our example 5 times or 5). So we need another map.

HashMap<K,LinkedHashSet<K>>

We need a tag or min variable, it will hold the current min. Whenever a new Item insert into the cache min=1; It will be increased only when there is no items in the (counter==min).

https://leetcode.com/problems/lfu-cache/discuss/94521/java-o1-very-easy-solution-using-3-hashmaps-and-linkedhashset

public class LFUCache {
    HashMap<Integer, Integer> vals;
    HashMap<Integer, Integer> counts;
    HashMap<Integer, LinkedHashSet<Integer>> lists;
    int cap;
    int min = -1;
    public LFUCache(int capacity) {
        cap = capacity;
        vals = new HashMap<>();
        counts = new HashMap<>();
        lists = new HashMap<>();
        lists.put(1, new LinkedHashSet<>());
    }
    
    public int get(int key) {
        if(!vals.containsKey(key))
            return -1;
        int count = counts.get(key);
        counts.put(key, count+1);
        lists.get(count).remove(key);
        if(count==min && lists.get(count).size()==0)
            min++;
        if(!lists.containsKey(count+1))
            lists.put(count+1, new LinkedHashSet<>());
        lists.get(count+1).add(key);
        return vals.get(key);
    }
    
    public void set(int key, int value) {
        if(cap<=0)
            return;
        if(vals.containsKey(key)) {
            vals.put(key, value);
            get(key);
            return;
        } 
        if(vals.size() >= cap) {
            int evit = lists.get(min).iterator().next();
            lists.get(min).remove(evit);
            vals.remove(evit);
        }
        vals.put(key, value);
        counts.put(key, 1);
        min = 1;
        lists.get(1).add(key);
    }
}

https://github.com/laurentluce/lfu-cache/blob/master/lfucache/lfu_cache.py
http://www.laurentluce.com/posts/least-frequently-used-cache-eviction-scheme-with-complexity-o1-in-python/

X. Using TreeSet
https://discuss.leetcode.com/topic/69033/java-solution-using-priorityqueue-with-detailed-explanation/2

We need to implement get() and set() in average O(logn) time, or we will get TLE.

Obviously, we need a hashmap to remember key-value pair.
What we need to do, is to remember (frequency, recentness) for each key; and sort them to get the smallest one.
So, we need to use Collection such as TreeSet or PriorityQueue.

Now, the only question is, how to update?
It is difficult to update (frequency, recentness) in the collection, as we don't know the index.
(Maybe using binary search or hashmap can do this, I haven't tried it.)

The trick is, just override equals() and hashCode() function, in order to use remove.

    class Cache implements Comparable<Cache> {
        int key, f, r;
        public Cache(int k, int f, int r) {key=k;this.f=f;this.r=r;}
        public boolean equals(Object object) {return key==((Cache) object).key;}
        public int hashCode() {return key;}
        public int compareTo(Cache o) {return key==o.key?0:f==o.f?r-o.r:f-o.f;}
    }

    int capacity,id;
    HashMap<Integer, Integer> hashMap;
    HashMap<Integer, Cache> caches;
    TreeSet<Cache> treeSet;

    public LFUCache(int capacity) {
        this.capacity=capacity;
        id=0;
        hashMap=new HashMap<>();
        caches=new HashMap<>();
        treeSet=new TreeSet<>();
    }

    public int get(int key) {
        id++;
        if (hashMap.containsKey(key)) {
            update(key);
            return hashMap.get(key);
        }
        return -1;
    }

    public void set(int key, int value) {
        if (capacity==0) return;
        id++;
        if (hashMap.containsKey(key)) {
            update(key);
            hashMap.put(key, value);
            return;
        }
        if (hashMap.size()==capacity) {
            Cache first=treeSet.pollFirst();
            hashMap.remove(first.key);
            caches.remove(first.key);
        }
        hashMap.put(key, value);
        Cache cache=new Cache(key, 1, id);
        caches.put(key, cache);
        treeSet.add(cache);
    }

    private void update(int key) {
        int f=caches.get(key).f;
        treeSet.remove(caches.get(key));
        Cache cache=new Cache(key, f+1, id);
        caches.put(key, cache);
        treeSet.add(cache);
    }

http://www.deepakvadgama.com/blog/lfu-cache-in-O(1)/

We need 2 things to start with

Map to store key-value pairs
Map to store counts/frequency of access

How do we implement evict method? When size of map reaches max capacity, we need to find item with lowest frequency count. There are 2 problems:

We have to iterate through all values of frequencies map, find lowest count and remove corresponding key from both maps. This will take O(n) time.
Also, what if there are multiple keys with same frequency count? How do we find least recently used? That’s not possible because HashMap does not store the order of insertion.

To solve both of above problems we need to add one more data structure: Sorted map with frequency as map-keys and ‘list of item-keys with same frequency’ as map-values.

We can add new item can to the end of the list with frequency 1.
We can find the list with lowest frequency in O(1), since map is sorted by frequencies.
We can delete the first item of the list (of lowest frequency) since that will be least recently used. Also O(1).

public class LFUCache {

    private Map<Integer, Integer> values = new HashMap<>();
    private Map<Integer, Integer> counts = new HashMap<>();
    private TreeMap<Integer, List<Integer>> frequencies = new TreeMap<>();
    private final int MAX_CAPACITY;

    public LFUCache(int capacity) {
        MAX_CAPACITY = capacity;
    }

    public int get(int key) {
        if (!values.containsKey(key)) {
            return -1;
        }

        // Move item from one frequency list to next. Not O(1) due to list iteration.
        int frequency = counts.get(key);
        frequencies.get(frequency).remove(new Integer(key)); //O(n)
        if (frequencies.get(frequency).size() == 0) {
            frequencies.remove(frequency);  // remove from map if list is empty
        }
        frequencies.computeIfAbsent(frequency + 1, k -> new LinkedList<>()).add(key);

        counts.put(key, frequency + 1);
        return values.get(key);
    }

    public void set(int key, int value) {
        if (!values.containsKey(key)) {

            if (values.size() == MAX_CAPACITY) {
                // first item from 'list of smallest frequency'
                int lowestCount = frequencies.firstKey();
                int keyToDelete = frequencies.get(lowestCount).remove(0);
                if (frequencies.get(lowestCount).size() == 0) {
                    frequencies.remove(lowestCount); // remove from map if list is empty
                }
                values.remove(keyToDelete);
                counts.remove(keyToDelete);
            }

            values.put(key, value);
            counts.put(key, 1);
            frequencies.computeIfAbsent(1, k -> new LinkedList<>()).add(key); // starting frequency = 1
        }
    }
}

While solving our delete problem, we accidentally increased our access operation time to O(n). How? Note that all of item-keys sharing same frequency are in a list. Now if one of these items is accessed, how do we move it to list of next frequency? We will have to iterate through the list first to find the item, which in worst-case will take O(n) operations.

To solve the problem, we somehow need to jump directly to that item in the list without iteration. If we can do that, it will be easier to delete the item and add it to end of next frequency list.

Unfortunately, this is not possible using our in-built data structures. We need to create a new one (mentioned in the paper).

We need to store each item’s position.

We will create a simple class which stores item’s key, value and its position in the list.
We will convert the linked list to

We need to store each item’s position.

We will create a simple class which stores item’s key, value and its position in the list.
We will convert the linked list to

public class LFUCache {

    private Map<Integer, Node> values = new HashMap<>();
    private Map<Integer, Integer> counts = new HashMap<>();
    private TreeMap<Integer, DoubleLinkedList> frequencies = new TreeMap<>();
    private final int MAX_CAPACITY;

    public LFUCache(int capacity) {
        MAX_CAPACITY = capacity;
    }

    public int get(int key) {
        if (!values.containsKey(key)) {
            return -1;
        }

        Node node = values.get(key);

        // Move item from one frequency list to next. O(1) this time.
        int frequency = counts.get(key);
        frequencies.get(frequency).remove(node);
        removeIfListEmpty(frequency);
        frequencies.computeIfAbsent(frequency + 1, k -> new DoubleLinkedList()).add(node);

        counts.put(key, frequency + 1);
        return values.get(key).value;
    }

    public void set(int key, int value) {
        if (!values.containsKey(key)) {

            Node node = new Node(key, value);

            if (values.size() == MAX_CAPACITY) {

                int lowestCount = frequencies.firstKey();   // smallest frequency
                Node nodeTodelete = frequencies.get(lowestCount).head(); // first item (LRU)
                frequencies.get(lowestCount).remove(nodeTodelete);

                int keyToDelete = nodeTodelete.key();
                removeIfListEmpty(lowestCount);
                values.remove(keyToDelete);
                counts.remove(keyToDelete);
            }

            values.put(key, node);
            counts.put(key, 1);
            frequencies.computeIfAbsent(1, k -> new DoubleLinkedList()).add(node); // starting frequency = 1
        }
    }

    private void removeIfListEmpty(int frequency) {
        if (frequencies.get(frequency).size() == 0) {
            frequencies.remove(frequency);  // remove from map if list is empty
        }
    }

    private class Node {
        private int key;
        private int value;
        private Node next;
        private Node prev;

        public Node(int key, int value) {
            this.key = key;
            this.value = value;
        }

        public int key() {
            return key;
        }

        public int value() {
            return value;
        }
    }

    private class DoubleLinkedList {
        private int n;
        private Node head;
        private Node tail;

        public void add(Node node) {
            if (head == null) {
                head = node;
            } else {
                tail.next = node;
                node.prev = tail;
            }
            tail = node;
            n++;
        }

        public void remove(Node node) {

            if (node.next == null) tail = node.prev;
            else node.next.prev = node.prev;

            if (head.key == node.key) head = node.next;
            else node.prev.next = node.next;

            n--;
        }

        public Node head() {
            return head;
        }

        public int size() {
            return n;
        }
    }
}

Step 4: Remove the counts HashMap

Note that we need the intermediate map called counts to jump to the appropriate list. We can go one step further (code not written) to remove this extra data structure.

Convert frequencies HashMap keys into a doubly linked list

Add variable reference to each item, which points to corresponding frequency

So instead of counts hashmap, we can go get frequency node directly from the item itself.

This is precisely the algorithm implemented in this paper

https://www.jianshu.com/p/437f53341f67

LeetCode 460 - LFU Cache

Step 4: Remove the counts HashMap

Labels

Popular Posts