LeetCode - 220 Contains Duplicate III


http://www.programcreek.com/2014/06/leetcode-contains-duplicate-iii-java/
Given an array of integers, find out whether there are two distinct indices i and j in the array such that the difference between nums[i] and nums[j] is at most t and the difference between i and j is at most k.
X. Bucket Sort: O(n)
The idea is like the bucket sort algorithm. Suppose we have consecutive buckets covering the range of nums with each bucket a width of (t+1). If there are two item with difference <= t, one of the two will happen: (1) the two in the same bucket (2) the two in neighbor buckets
https://discuss.leetcode.com/topic/27608/java-python-one-pass-solution-o-n-time-o-n-space-using-buckets
http://algobox.org/contains-duplicate-iii/
To determine whether there is a nearby almost duplicate in one pass we need to keep a sliding window with length kand a dynamic set can do this. Problem is we need to check almost duplicate each time we add a item into the window/set. If we use BST as our dynamic set, the cost of find almost duplicate is O(log n) therefore the whole solution is O(n log n).

Now can we check it in O(1) time on average? The answer is yes. Hash table can do insert, find and remove in O(1) time on average and by carefully design the hash table we can check almost duplicate inO(1) too.
The idea is like the bucket sort algorithm. Suppose we have consecutive buckets covering the range of nums with each bucket a width of t+1. If there are two item with difference <= t, one of the two will happen:
  1. the two in the same bucket
  2. the two in neighbor buckets
Note that we do not need to actually allocate a lot of buckets. At any time there will only be at most min(k, n)buckets. All we need to do is calculate the label of the bucket m = value/(t+1), and check the buckets m - 1m,m + 1. The whole algorithm is then O(n).
For Java, it has a bit of problem because in java -3 / 5 = 0 unlike in python -3 / 5 = -1. We can use a functiongetID to work around this. Or we can get the minimum of nums first and use that as the “zero point”. Or we can simply use Integer.MIN_VALUE as the “zero point”. The two alternatives may need Long in case of overflow.

the way to handle negative in getID cab be replaced by Math.floorDiv(i,w)
the bucket width has to be t+1; as we want bucket[0] contains 0... t; size is t+1;
private long getID(long i, long w) {
    return i < 0 ? (i + 1) / w - 1 : i / w;
}

public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) {
    if (t < 0) return false;
    Map<Long, Long> d = new HashMap<>();
    long w = (long)t + 1;
    for (int i = 0; i < nums.length; ++i) {
        long m = getID(nums[i], w);
        if (d.containsKey(m))
            return true;
        if (d.containsKey(m - 1) && Math.abs(nums[i] - d.get(m - 1)) < w)
            return true;
        if (d.containsKey(m + 1) && Math.abs(nums[i] - d.get(m + 1)) < w)
            return true;
        d.put(m, (long)nums[i]);
        if (i >= k) d.remove(getID(nums[i - k], w));
    }
    return false;
}
http://zyy1217.com/2016/08/18/leetcode220/
还有一个难点就是,在整数数组里可能有负数。这样在java(C和C++)中,num/t+1会使得[-t-1,0]的所有的数都映射到0号桶。(ps:python则不会,具体参考python负数除法 )
eg:
-t-1,-t…-1属于-1号桶
0,1,2…t属于0号桶
但是使用nums[i]/t+1这个公式会使得-1号桶的元素映射到0号桶中
对此我们有两种解决方案:
  1. 将每一个数组元素的起点调整为Integer.MIN_VALUE,即每个元素改为num-Integer.MIN_VALUE,桶号都为正数
  2. 将正数元素映射到num/t+1号桶,负数元素映射到(num+1)/(t+1)-1号桶
此外,因为增加了元素的大小,这里考虑到java的Int会溢出,计算时需要使用Long类型(python不会溢出)
https://zhengyang2015.gitbooks.io/lintcode/contains-duplicate-iii-leetcode-220.html

https://leetcode.com/discuss/38206/ac-o-n-solution-in-java-using-buckets-with-explanation
As a followup question, it naturally also requires maintaining a window of size k. When t == 0, it reduces to the previous question so we just reuse the solution.
Since there is now a constraint on the range of the values of the elements to be considered duplicates, it reminds us of doing a range check which is implemented in tree data structure and would take O(LogN) if a balanced tree structure is used, or doing a bucket check which is constant time. We shall just discuss the idea using bucket here.
Bucketing means we map a range of values to the a bucket. For example, if the bucket size is 3, we consider 0, 1, 2 all map to the same bucket. However, if t == 3, (0, 3) is a considered duplicates but does not map to the same bucket. This is fine since we are checking the buckets immediately before and after as well. So, as a rule of thumb, just make sure the size of the bucket is reasonable such that elements having the same bucket is immediately considered duplicates or duplicates must lie within adjacent buckets. So this actually gives us a range of possible bucket size, i.e. t and t + 1. We just choose it to be t and a bucket mapping to be num / t.
Another complication is that negative ints are allowed. A simple num / t just shrinks everything towards 0. Therefore, we can just reposition every element to start from Integer.MIN_VALUE.

    public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) {
        if (k < 1 || t < 0) return false;
        Map<Long, Long> map = new HashMap<>();
        for (int i = 0; i < nums.length; i++) {
            long remappedNum = (long) nums[i] - Integer.MIN_VALUE;
            long bucket = remappedNum / ((long) t + 1);
            if (map.containsKey(bucket)
                    || (map.containsKey(bucket - 1) && remappedNum - map.get(bucket - 1) <= t)
                        || (map.containsKey(bucket + 1) && map.get(bucket + 1) - remappedNum <= t))
                            return true;
            if (map.entrySet().size() >= k) {
                long lastBucket = ((long) nums[i - k] - Integer.MIN_VALUE) / ((long) t + 1);
                map.remove(lastBucket);
            }
            map.put(bucket, remappedNum);
        }
        return false;
    }
May I ask why it's necessary to reposition every element to start from Integer.MIN_VALUE?
I suppose it is not necessary if you use a different way to do the assignment of buckets. In this case, the elements are assigned to a bucket by a simple division of a bucket size. Hence, it will, for example, assign both -2 and +2 to bucket 0 if the bucket size is > 2. If we want that elements having the same bucket 0 are immediately considered duplicates, such assignment would be undesirable. Hence, a repositioning is used to cope with this. And starting from MIN_VALUE is just because of the constraints on the input range.

X. Using TreeSet O(NlogK)
- the treeSet's size is k


public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) {
 if (k < 1 || t < 0)
  return false;
 
 TreeSet<Integer> set = new TreeSet<Integer>();
 
 for (int i = 0; i < nums.length; i++) {
  int c = nums[i];
  if ((set.floor(c) != null && c <= set.floor(c) + t)
  || (set.ceiling(c) != null && c >= set.ceiling(c) -t))
   return true;
 
  set.add(c); // remove first then add => corner case: nums[i - k]=c
 
  if (i >= k)
   set.remove(nums[i - k]);
 }
 
 return false;
}
public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) {
 if (k < 1 || t < 0)
  return false;
 
 SortedSet<Long> set = new TreeSet<Long>();
 
 for (int j = 0; j < nums.length; j++) {
  long leftBoundary = (long) nums[j] - t;
  long rightBoundary = (long) nums[j] + t + 1;
  SortedSet<Long> subSet = set.subSet(leftBoundary, rightBoundary);
 
  if (!subSet.isEmpty())
   return true;
 
  set.add((long) nums[j]);
 
  if (j >= k) {
   set.remove((long) nums[j - k]);
  }
 }
 
 return false;
}
https://leetcode.com/discuss/68090/easy-ac-solution-using-treeset-long-in-java
Use subset
The time complexity of TreeSet.subSet is O(1) because it essentially creates a wrapper around the original set. However, its isEmpty operation is not instant because it searches for successors/predecessors, which, I believe, is O(log k). So the complexity is O(n log k) like other tree-based solutions.
public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) { if (nums == null || nums.length == 0) return false; TreeSet<Long> set = new TreeSet<>(); set.add((long) nums[0]); for (int i = 1; i < nums.length; i++) { if (i > k) set.remove((long) nums[i - k - 1]);//\\ long left = (long) nums[i] - t; long right = (long) nums[i] + t; if (left <= right && !set.subSet(left, right + 1).isEmpty()) return true; set.add((long) nums[i]); } return false; }
https://discuss.leetcode.com/topic/15191/java-o-n-lg-k-solution
This problem requires to maintain a window of size k of the previous values that can be queried for value ranges. The best data structure to do that is Binary Search Tree. As a result maintaining the tree of size k will result in time complexity O(N lg K). In order to check if there exists any value of range abs(nums[i] - nums[j]) to simple queries can be executed both of time complexity O(lg K)
    public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) {
        if (nums == null || nums.length == 0 || k <= 0) {
            return false;
        }

        final TreeSet<Integer> values = new TreeSet<>();
        for (int ind = 0; ind < nums.length; ind++) {

            final Integer floor = values.floor(nums[ind] + t);
            final Integer ceil = values.ceiling(nums[ind] - t);
            if ((floor != null && floor >= nums[ind])
                    || (ceil != null && ceil <= nums[ind])) {
                return true;
            }

            values.add(nums[ind]);
            if (ind >= k) {
                values.remove(nums[ind - k]);
            }
        }

        return false;
    }
TODO:
how about save one ceiling operation

because if floor <= nums[ind] + t and floor >= nums[ind] - t, then that's a duplicate. If floor < nums[ind] - t, there won't be a duplicate because floor is the biggest value in the treeset that is less than (nums[ind] + t). Hope it helps.
public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) { if (nums == null || nums.length == 0 || k <= 0) { return false; } final TreeSet<Long> values = new TreeSet<>(); for (int ind = 0; ind < nums.length; ind++) { Long floor = values.floor((long)nums[ind] + t); if (floor != null && floor + t >= nums[ind]) { return true; } values.add((long)nums[ind]); if (ind >= k) { values.remove((long)nums[ind - k]); } } return false; }

X. Different
public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) { if(nums.length<2||k<1||t<0) return false; ValuePosPair[] valPosArr = new ValuePosPair[nums.length]; for(int i =0;i<nums.length;i++) valPosArr[i] = new ValuePosPair(nums[i],i); Arrays.sort(valPosArr); for(int i=0;i<valPosArr.length;i++){ for(int j=i+1;j<valPosArr.length&&((long)valPosArr[j].val-(long)valPosArr[i].val<=(long)t);j++){ if(Math.abs(valPosArr[j].pos-valPosArr[i].pos)<=k) return true; } } return false; }
class ValuePosPair implements Comparable<ValuePosPair>{ int val; int pos; ValuePosPair(int v, int p) { val = v; pos = p;} public int compareTo(ValuePosPair x){ return this.val - x.val; } }


https://dotblogs.com.tw/hatelove/2017/02/17/leet-code-220-contains-duplicate-iii-by-tdd
第一個紅燈,k =0 應回傳 false
新增一個測試案例,當 nums 長度小於 2,應回傳 false
新增一個失敗的測試案例,當 t = 0 時,應等同於 leet code 219 的需求


Labels

LeetCode (1432) GeeksforGeeks (1122) LeetCode - Review (1067) Review (882) Algorithm (668) to-do (609) Classic Algorithm (270) Google Interview (237) Classic Interview (222) Dynamic Programming (220) DP (186) Bit Algorithms (145) POJ (141) Math (137) Tree (132) LeetCode - Phone (129) EPI (122) Cracking Coding Interview (119) DFS (115) Difficult Algorithm (115) Lintcode (115) Different Solutions (110) Smart Algorithm (104) Binary Search (96) BFS (91) HackerRank (90) Binary Tree (86) Hard (79) Two Pointers (78) Stack (76) Company-Facebook (75) BST (72) Graph Algorithm (72) Time Complexity (69) Greedy Algorithm (68) Interval (63) Company - Google (62) Geometry Algorithm (61) Interview Corner (61) LeetCode - Extended (61) Union-Find (60) Trie (58) Advanced Data Structure (56) List (56) Priority Queue (53) Codility (52) ComProGuide (50) LeetCode Hard (50) Matrix (50) Bisection (48) Segment Tree (48) Sliding Window (48) USACO (46) Space Optimization (45) Company-Airbnb (41) Greedy (41) Mathematical Algorithm (41) Tree - Post-Order (41) ACM-ICPC (40) Algorithm Interview (40) Data Structure Design (40) Graph (40) Backtracking (39) Data Structure (39) Jobdu (39) Random (39) Codeforces (38) Knapsack (38) LeetCode - DP (38) Recursive Algorithm (38) String Algorithm (38) TopCoder (38) Sort (37) Introduction to Algorithms (36) Pre-Sort (36) Beauty of Programming (35) Must Known (34) Binary Search Tree (33) Follow Up (33) prismoskills (33) Palindrome (32) Permutation (31) Array (30) Google Code Jam (30) HDU (30) Array O(N) (29) Logic Thinking (29) Monotonic Stack (29) Puzzles (29) Code - Detail (27) Company-Zenefits (27) Microsoft 100 - July (27) Queue (27) Binary Indexed Trees (26) TreeMap (26) to-do-must (26) 1point3acres (25) GeeksQuiz (25) Merge Sort (25) Reverse Thinking (25) hihocoder (25) Company - LinkedIn (24) Hash (24) High Frequency (24) Summary (24) Divide and Conquer (23) Proof (23) Game Theory (22) Topological Sort (22) Lintcode - Review (21) Tree - Modification (21) Algorithm Game (20) CareerCup (20) Company - Twitter (20) DFS + Review (20) DP - Relation (20) Brain Teaser (19) DP - Tree (19) Left and Right Array (19) O(N) (19) Sweep Line (19) UVA (19) DP - Bit Masking (18) LeetCode - Thinking (18) KMP (17) LeetCode - TODO (17) Probabilities (17) Simulation (17) String Search (17) Codercareer (16) Company-Uber (16) Iterator (16) Number (16) O(1) Space (16) Shortest Path (16) itint5 (16) DFS+Cache (15) Dijkstra (15) Euclidean GCD (15) Heap (15) LeetCode - Hard (15) Majority (15) Number Theory (15) Rolling Hash (15) Tree Traversal (15) Brute Force (14) Bucket Sort (14) DP - Knapsack (14) DP - Probability (14) Difficult (14) Fast Power Algorithm (14) Pattern (14) Prefix Sum (14) TreeSet (14) Algorithm Videos (13) Amazon Interview (13) Basic Algorithm (13) Codechef (13) Combination (13) Computational Geometry (13) DP - Digit (13) LCA (13) LeetCode - DFS (13) Linked List (13) Long Increasing Sequence(LIS) (13) Math-Divisible (13) Reservoir Sampling (13) mitbbs (13) Algorithm - How To (12) Company - Microsoft (12) DP - Interval (12) DP - Multiple Relation (12) DP - Relation Optimization (12) LeetCode - Classic (12) Level Order Traversal (12) Prime (12) Pruning (12) Reconstruct Tree (12) Thinking (12) X Sum (12) AOJ (11) Bit Mask (11) Company-Snapchat (11) DP - Space Optimization (11) Dequeue (11) Graph DFS (11) MinMax (11) Miscs (11) Princeton (11) Quick Sort (11) Stack - Tree (11) 尺取法 (11) 挑战程序设计竞赛 (11) Coin Change (10) DFS+Backtracking (10) Facebook Hacker Cup (10) Fast Slow Pointers (10) HackerRank Easy (10) Interval Tree (10) Limited Range (10) Matrix - Traverse (10) Monotone Queue (10) SPOJ (10) Starting Point (10) States (10) Stock (10) Theory (10) Tutorialhorizon (10) Kadane - Extended (9) Mathblog (9) Max-Min Flow (9) Maze (9) Median (9) O(32N) (9) Quick Select (9) Stack Overflow (9) System Design (9) Tree - Conversion (9) Use XOR (9) Book Notes (8) Company-Amazon (8) DFS+BFS (8) DP - States (8) Expression (8) Longest Common Subsequence(LCS) (8) One Pass (8) Quadtrees (8) Traversal Once (8) Trie - Suffix (8) 穷竭搜索 (8) Algorithm Problem List (7) All Sub (7) Catalan Number (7) Cycle (7) DP - Cases (7) Facebook Interview (7) Fibonacci Numbers (7) Flood fill (7) Game Nim (7) Graph BFS (7) HackerRank Difficult (7) Hackerearth (7) Inversion (7) Kadane’s Algorithm (7) Manacher (7) Morris Traversal (7) Multiple Data Structures (7) Normalized Key (7) O(XN) (7) Radix Sort (7) Recursion (7) Sampling (7) Suffix Array (7) Tech-Queries (7) Tree - Serialization (7) Tree DP (7) Trie - Bit (7) 蓝桥杯 (7) Algorithm - Brain Teaser (6) BFS - Priority Queue (6) BFS - Unusual (6) Classic Data Structure Impl (6) DP - 2D (6) DP - Monotone Queue (6) DP - Unusual (6) DP-Space Optimization (6) Dutch Flag (6) How To (6) Interviewstreet (6) Knapsack - MultiplePack (6) Local MinMax (6) MST (6) Minimum Spanning Tree (6) Number - Reach (6) Parentheses (6) Pre-Sum (6) Probability (6) Programming Pearls (6) Rabin-Karp (6) Reverse (6) Scan from right (6) Schedule (6) Stream (6) Subset Sum (6) TSP (6) Xpost (6) n00tc0d3r (6) reddit (6) AI (5) Abbreviation (5) Anagram (5) Art Of Programming-July (5) Assumption (5) Bellman Ford (5) Big Data (5) Code - Solid (5) Code Kata (5) Codility-lessons (5) Coding (5) Company - WMware (5) Convex Hull (5) Crazyforcode (5) DFS - Multiple (5) DFS+DP (5) DP - Multi-Dimension (5) DP-Multiple Relation (5) Eulerian Cycle (5) Graph - Unusual (5) Graph Cycle (5) Hash Strategy (5) Immutability (5) Java (5) LogN (5) Manhattan Distance (5) Matrix Chain Multiplication (5) N Queens (5) Pre-Sort: Index (5) Quick Partition (5) Quora (5) Randomized Algorithms (5) Resources (5) Robot (5) SPFA(Shortest Path Faster Algorithm) (5) Shuffle (5) Sieve of Eratosthenes (5) Strongly Connected Components (5) Subarray Sum (5) Sudoku (5) Suffix Tree (5) Swap (5) Threaded (5) Tree - Creation (5) Warshall Floyd (5) Word Search (5) jiuzhang (5)

Popular Posts