## Sunday, March 27, 2016

### LeetCode - 220 Contains Duplicate III

http://www.programcreek.com/2014/06/leetcode-contains-duplicate-iii-java/
Given an array of integers, find out whether there are two distinct indices i and j in the array such that the difference between nums[i] and nums[j] is at most t and the difference between i and j is at most k.
X. Bucket Sort: O(n)
The idea is like the bucket sort algorithm. Suppose we have consecutive buckets covering the range of nums with each bucket a width of (t+1). If there are two item with difference <= t, one of the two will happen: (1) the two in the same bucket (2) the two in neighbor buckets
https://discuss.leetcode.com/topic/27608/java-python-one-pass-solution-o-n-time-o-n-space-using-buckets
http://algobox.org/contains-duplicate-iii/
To determine whether there is a nearby almost duplicate in one pass we need to keep a sliding window with length `k`and a dynamic set can do this. Problem is we need to check almost duplicate each time we add a item into the window/set. If we use BST as our dynamic set, the cost of find almost duplicate is `O(log n)` therefore the whole solution is `O(n log n)`.

Now can we check it in `O(1)` time on average? The answer is yes. Hash table can do insert, find and remove in `O(1)` time on average and by carefully design the hash table we can check almost duplicate in`O(1)` too.
The idea is like the bucket sort algorithm. Suppose we have consecutive buckets covering the range of `nums` with each bucket a width of `t+1`. If there are two item with difference `<= t`, one of the two will happen:
1. the two in the same bucket
2. the two in neighbor buckets
Note that we do not need to actually allocate a lot of buckets. At any time there will only be at most `min(k, n)`buckets. All we need to do is calculate the label of the bucket `m = value/(t+1)`, and check the buckets `m - 1``m`,`m + 1`. The whole algorithm is then `O(n)`.
For Java, it has a bit of problem because in java `-3 / 5` = `0` unlike in python `-3 / 5` = `-1`. We can use a function`getID` to work around this. Or we can get the minimum of `nums` first and use that as the “zero point”. Or we can simply use `Integer.MIN_VALUE` as the “zero point”. The two alternatives may need Long in case of overflow.

the way to handle negative in getID cab be replaced by Math.floorDiv(i,w)
the bucket width has to be t+1; as we want bucket[0] contains 0... t; size is t+1;
``````private long getID(long i, long w) {
return i < 0 ? (i + 1) / w - 1 : i / w;
}

public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) {
if (t < 0) return false;
Map<Long, Long> d = new HashMap<>();
long w = (long)t + 1;
for (int i = 0; i < nums.length; ++i) {
long m = getID(nums[i], w);
if (d.containsKey(m))
return true;
if (d.containsKey(m - 1) && Math.abs(nums[i] - d.get(m - 1)) < w)
return true;
if (d.containsKey(m + 1) && Math.abs(nums[i] - d.get(m + 1)) < w)
return true;
d.put(m, (long)nums[i]);
if (i >= k) d.remove(getID(nums[i - k], w));
}
return false;
}``````
http://zyy1217.com/2016/08/18/leetcode220/

eg:
-t-1,-t…-1属于-1号桶
0,1,2…t属于0号桶

1. 将每一个数组元素的起点调整为Integer.MIN_VALUE,即每个元素改为num-Integer.MIN_VALUE,桶号都为正数
2. 将正数元素映射到num/t+1号桶，负数元素映射到(num+1)/(t+1)-1号桶

https://zhengyang2015.gitbooks.io/lintcode/contains-duplicate-iii-leetcode-220.html

https://leetcode.com/discuss/38206/ac-o-n-solution-in-java-using-buckets-with-explanation
As a followup question, it naturally also requires maintaining a window of size k. When t == 0, it reduces to the previous question so we just reuse the solution.
Since there is now a constraint on the range of the values of the elements to be considered duplicates, it reminds us of doing a range check which is implemented in tree data structure and would take O(LogN) if a balanced tree structure is used, or doing a bucket check which is constant time. We shall just discuss the idea using bucket here.
Bucketing means we map a range of values to the a bucket. For example, if the bucket size is 3, we consider 0, 1, 2 all map to the same bucket. However, if t == 3, (0, 3) is a considered duplicates but does not map to the same bucket. This is fine since we are checking the buckets immediately before and after as well. So, as a rule of thumb, just make sure the size of the bucket is reasonable such that elements having the same bucket is immediately considered duplicates or duplicates must lie within adjacent buckets. So this actually gives us a range of possible bucket size, i.e. t and t + 1. We just choose it to be t and a bucket mapping to be num / t.
Another complication is that negative ints are allowed. A simple num / t just shrinks everything towards 0. Therefore, we can just reposition every element to start from Integer.MIN_VALUE.

public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) {
if (k < 1 || t < 0) return false;
Map<Long, Long> map = new HashMap<>();
for (int i = 0; i < nums.length; i++) {
long remappedNum = (long) nums[i] - Integer.MIN_VALUE;
long bucket = remappedNum / ((long) t + 1);
if (map.containsKey(bucket)
|| (map.containsKey(bucket - 1) && remappedNum - map.get(bucket - 1) <= t)
|| (map.containsKey(bucket + 1) && map.get(bucket + 1) - remappedNum <= t))
return true;
if (map.entrySet().size() >= k) {
long lastBucket = ((long) nums[i - k] - Integer.MIN_VALUE) / ((long) t + 1);
map.remove(lastBucket);
}
map.put(bucket, remappedNum);
}
return false;
}
May I ask why it's necessary to reposition every element to start from Integer.MIN_VALUE?
I suppose it is not necessary if you use a different way to do the assignment of buckets. In this case, the elements are assigned to a bucket by a simple division of a bucket size. Hence, it will, for example, assign both -2 and +2 to bucket 0 if the bucket size is > 2. If we want that elements having the same bucket 0 are immediately considered duplicates, such assignment would be undesirable. Hence, a repositioning is used to cope with this. And starting from MIN_VALUE is just because of the constraints on the input range.

X. Using TreeSet O(NlogK)
- the treeSet's size is k

 ```public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) { if (k < 1 || t < 0) return false;   TreeSet set = new TreeSet();   for (int i = 0; i < nums.length; i++) { int c = nums[i]; if ((set.floor(c) != null && c <= set.floor(c) + t) || (set.ceiling(c) != null && c >= set.ceiling(c) -t)) return true;   set.add(c); // remove first then add => corner case: nums[i - k]=c   if (i >= k) set.remove(nums[i - k]); }   return false; }```
 ```public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) { if (k < 1 || t < 0) return false;   SortedSet set = new TreeSet();   for (int j = 0; j < nums.length; j++) { long leftBoundary = (long) nums[j] - t; long rightBoundary = (long) nums[j] + t + 1; SortedSet subSet = set.subSet(leftBoundary, rightBoundary);   if (!subSet.isEmpty()) return true;   set.add((long) nums[j]);   if (j >= k) { set.remove((long) nums[j - k]); } }   return false; }```
https://leetcode.com/discuss/68090/easy-ac-solution-using-treeset-long-in-java
Use subset
The time complexity of `TreeSet.subSet` is O(1) because it essentially creates a wrapper around the original set. However, its `isEmpty` operation is not instant because it searches for successors/predecessors, which, I believe, is O(log k). So the complexity is O(n log k) like other tree-based solutions.
public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) { if (nums == null || nums.length == 0) return false; TreeSet<Long> set = new TreeSet<>(); set.add((long) nums[0]); for (int i = 1; i < nums.length; i++) { if (i > k) set.remove((long) nums[i - k - 1]);//\\ long left = (long) nums[i] - t; long right = (long) nums[i] + t; if (left <= right && !set.subSet(left, right + 1).isEmpty()) return true; set.add((long) nums[i]); } return false; }
https://discuss.leetcode.com/topic/15191/java-o-n-lg-k-solution
This problem requires to maintain a window of size k of the previous values that can be queried for value ranges. The best data structure to do that is Binary Search Tree. As a result maintaining the tree of size k will result in time complexity O(N lg K). In order to check if there exists any value of range abs(nums[i] - nums[j]) to simple queries can be executed both of time complexity O(lg K)
``````    public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) {
if (nums == null || nums.length == 0 || k <= 0) {
return false;
}

final TreeSet<Integer> values = new TreeSet<>();
for (int ind = 0; ind < nums.length; ind++) {

final Integer floor = values.floor(nums[ind] + t);
final Integer ceil = values.ceiling(nums[ind] - t);
if ((floor != null && floor >= nums[ind])
|| (ceil != null && ceil <= nums[ind])) {
return true;
}

if (ind >= k) {
values.remove(nums[ind - k]);
}
}

return false;
}``````
TODO:
how about save one ceiling operation

because if floor <= nums[ind] + t and floor >= nums[ind] - t, then that's a duplicate. If floor < nums[ind] - t, there won't be a duplicate because floor is the biggest value in the treeset that is less than (nums[ind] + t). Hope it helps.
public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) { if (nums == null || nums.length == 0 || k <= 0) { return false; } final TreeSet<Long> values = new TreeSet<>(); for (int ind = 0; ind < nums.length; ind++) { Long floor = values.floor((long)nums[ind] + t); if (floor != null && floor + t >= nums[ind]) { return true; } values.add((long)nums[ind]); if (ind >= k) { values.remove((long)nums[ind - k]); } } return false; }

X. Different
public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) { if(nums.length<2||k<1||t<0) return false; ValuePosPair[] valPosArr = new ValuePosPair[nums.length]; for(int i =0;i<nums.length;i++) valPosArr[i] = new ValuePosPair(nums[i],i); Arrays.sort(valPosArr); for(int i=0;i<valPosArr.length;i++){ for(int j=i+1;j<valPosArr.length&&((long)valPosArr[j].val-(long)valPosArr[i].val<=(long)t);j++){ if(Math.abs(valPosArr[j].pos-valPosArr[i].pos)<=k) return true; } } return false; }
class ValuePosPair implements Comparable<ValuePosPair>{ int val; int pos; ValuePosPair(int v, int p) { val = v; pos = p;} public int compareTo(ValuePosPair x){ return this.val - x.val; } }

https://dotblogs.com.tw/hatelove/2017/02/17/leet-code-220-contains-duplicate-iii-by-tdd