Massive Algorithms: LeetCode

http://www.programcreek.com/2014/06/leetcode-contains-duplicate-iii-java/

Given an array of integers, find out whether there are two distinct indices i and j in the array such that the difference between nums[i] and nums[j] is at most t and the difference between i and j is at most k.

X. Bucket Sort: O(n)

The idea is like the bucket sort algorithm. Suppose we have consecutive buckets covering the range of nums with each bucket a width of (t+1). If there are two item with difference <= t, one of the two will happen: (1) the two in the same bucket (2) the two in neighbor buckets
https://discuss.leetcode.com/topic/27608/java-python-one-pass-solution-o-n-time-o-n-space-using-buckets
http://algobox.org/contains-duplicate-iii/

To determine whether there is a nearby almost duplicate in one pass we need to keep a sliding window with length kand a dynamic set can do this. Problem is we need to check almost duplicate each time we add a item into the window/set. If we use BST as our dynamic set, the cost of find almost duplicate is O(log n) therefore the whole solution is O(n log n).

Now can we check it in O(1) time on average? The answer is yes. Hash table can do insert, find and remove in O(1) time on average and by carefully design the hash table we can check almost duplicate inO(1) too.

The idea is like the bucket sort algorithm. Suppose we have consecutive buckets covering the range of nums with each bucket a width of t+1. If there are two item with difference <= t, one of the two will happen:

the two in the same bucket
the two in neighbor buckets

Note that we do not need to actually allocate a lot of buckets. At any time there will only be at most min(k, n)buckets. All we need to do is calculate the label of the bucket m = value/(t+1), and check the buckets m - 1, m,m + 1. The whole algorithm is then O(n).

For Java, it has a bit of problem because in java -3 / 5 = 0 unlike in python -3 / 5 = -1. We can use a functiongetID to work around this. Or we can get the minimum of nums first and use that as the “zero point”. Or we can simply use Integer.MIN_VALUE as the “zero point”. The two alternatives may need Long in case of overflow.

the way to handle negative in getID cab be replaced by Math.floorDiv(i,w)
the bucket width has to be t+1; as we want bucket[0] contains 0... t; size is t+1;

private long getID(long i, long w) {
    return i < 0 ? (i + 1) / w - 1 : i / w;
}

public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) {
    if (t < 0) return false;
    Map<Long, Long> d = new HashMap<>();
    long w = (long)t + 1;
    for (int i = 0; i < nums.length; ++i) {
        long m = getID(nums[i], w);
        if (d.containsKey(m))
            return true;
        if (d.containsKey(m - 1) && Math.abs(nums[i] - d.get(m - 1)) < w)
            return true;
        if (d.containsKey(m + 1) && Math.abs(nums[i] - d.get(m + 1)) < w)
            return true;
        d.put(m, (long)nums[i]);
        if (i >= k) d.remove(getID(nums[i - k], w));
    }
    return false;
}

http://zyy1217.com/2016/08/18/leetcode220/

还有一个难点就是，在整数数组里可能有负数。这样在java（C和C++）中，num/t+1会使得[-t-1,0]的所有的数都映射到0号桶。（ps:python则不会,具体参考python负数除法 )

eg:
-t-1,-t…-1属于-1号桶

0,1,2…t属于0号桶

但是使用nums[i]/t+1这个公式会使得-1号桶的元素映射到0号桶中

对此我们有两种解决方案：

将每一个数组元素的起点调整为Integer.MIN_VALUE,即每个元素改为num-Integer.MIN_VALUE,桶号都为正数
将正数元素映射到num/t+1号桶，负数元素映射到(num+1)/(t+1)-1号桶

此外，因为增加了元素的大小，这里考虑到java的Int会溢出，计算时需要使用Long类型（python不会溢出）

https://zhengyang2015.gitbooks.io/lintcode/contains-duplicate-iii-leetcode-220.html

https://leetcode.com/discuss/38206/ac-o-n-solution-in-java-using-buckets-with-explanation

As a followup question, it naturally also requires maintaining a window of size k. When t == 0, it reduces to the previous question so we just reuse the solution.

Since there is now a constraint on the range of the values of the elements to be considered duplicates, it reminds us of doing a range check which is implemented in tree data structure and would take O(LogN) if a balanced tree structure is used, or doing a bucket check which is constant time. We shall just discuss the idea using bucket here.

Bucketing means we map a range of values to the a bucket. For example, if the bucket size is 3, we consider 0, 1, 2 all map to the same bucket. However, if t == 3, (0, 3) is a considered duplicates but does not map to the same bucket. This is fine since we are checking the buckets immediately before and after as well. So, as a rule of thumb, just make sure the size of the bucket is reasonable such that elements having the same bucket is immediately considered duplicates or duplicates must lie within adjacent buckets. So this actually gives us a range of possible bucket size, i.e. t and t + 1. We just choose it to be t and a bucket mapping to be num / t.

Another complication is that negative ints are allowed. A simple num / t just shrinks everything towards 0. Therefore, we can just reposition every element to start from Integer.MIN_VALUE.

public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) { if (k < 1 || t < 0) return false; Map<Long, Long> map = new HashMap<>(); for (int i = 0; i < nums.length; i++) { long remappedNum = (long) nums[i] - Integer.MIN_VALUE; long bucket = remappedNum / ((long) t + 1); if (map.containsKey(bucket) || (map.containsKey(bucket - 1) && remappedNum - map.get(bucket - 1) <= t) || (map.containsKey(bucket + 1) && map.get(bucket + 1) - remappedNum <= t)) return true; if (map.entrySet().size() >= k) { long lastBucket = ((long) nums[i - k] - Integer.MIN_VALUE) / ((long) t + 1); map.remove(lastBucket); } map.put(bucket, remappedNum); } return false; }

May I ask why it's necessary to reposition every element to start from Integer.MIN_VALUE?

I suppose it is not necessary if you use a different way to do the assignment of buckets. In this case, the elements are assigned to a bucket by a simple division of a bucket size. Hence, it will, for example, assign both -2 and +2 to bucket 0 if the bucket size is > 2. If we want that elements having the same bucket 0 are immediately considered duplicates, such assignment would be undesirable. Hence, a repositioning is used to cope with this. And starting from MIN_VALUE is just because of the constraints on the input range.

put the start to min_value, makes the bucket index all positive, which is a good thing, since we need to do module operation subsequently. Using positive number % positive number is well defined.

X. Using TreeSet O(NlogK)

- the treeSet's size is k

public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) {
 if (k < 1 || t < 0)
  return false;
 
 TreeSet<Integer> set = new TreeSet<Integer>();
 
 for (int i = 0; i < nums.length; i++) {
  int c = nums[i];
  if ((set.floor(c) != null && c <= set.floor(c) + t)
  || (set.ceiling(c) != null && c >= set.ceiling(c) -t))
   return true;
 
  set.add(c); // remove first then add => corner case: nums[i - k]=c
 
  if (i >= k)
   set.remove(nums[i - k]);
 }
 
 return false;
}

public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) {
 if (k < 1 || t < 0)
  return false;
 
 SortedSet<Long> set = new TreeSet<Long>();
 
 for (int j = 0; j < nums.length; j++) {
  long leftBoundary = (long) nums[j] - t;
  long rightBoundary = (long) nums[j] + t + 1;
  SortedSet<Long> subSet = set.subSet(leftBoundary, rightBoundary);
 
  if (!subSet.isEmpty())
   return true;
 
  set.add((long) nums[j]);
 
  if (j >= k) {
   set.remove((long) nums[j - k]);
  }
 }
 
 return false;
}

https://leetcode.com/discuss/68090/easy-ac-solution-using-treeset-long-in-java
Use subset

The time complexity of TreeSet.subSet is O(1) because it essentially creates a wrapper around the original set. However, its isEmpty operation is not instant because it searches for successors/predecessors, which, I believe, is O(log k). So the complexity is O(n log k) like other tree-based solutions.

public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) { if (nums == null || nums.length == 0) return false; TreeSet<Long> set = new TreeSet<>(); set.add((long) nums[0]); for (int i = 1; i < nums.length; i++) { if (i > k) set.remove((long) nums[i - k - 1]);//\\ long left = (long) nums[i] - t; long right = (long) nums[i] + t; if (left <= right && !set.subSet(left, right + 1).isEmpty()) return true; set.add((long) nums[i]); } return false; }
https://discuss.leetcode.com/topic/15191/java-o-n-lg-k-solution

This problem requires to maintain a window of size k of the previous values that can be queried for value ranges. The best data structure to do that is Binary Search Tree. As a result maintaining the tree of size k will result in time complexity O(N lg K). In order to check if there exists any value of range abs(nums[i] - nums[j]) to simple queries can be executed both of time complexity O(lg K)

    public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) {
        if (nums == null || nums.length == 0 || k <= 0) {
            return false;
        }

        final TreeSet<Integer> values = new TreeSet<>();
        for (int ind = 0; ind < nums.length; ind++) {

            final Integer floor = values.floor(nums[ind] + t);
            final Integer ceil = values.ceiling(nums[ind] - t);
            if ((floor != null && floor >= nums[ind])
                    || (ceil != null && ceil <= nums[ind])) {
                return true;
            }

            values.add(nums[ind]);
            if (ind >= k) {
                values.remove(nums[ind - k]);
            }
        }

        return false;
    }

TODO:

https://leetcode.com/discuss/47974/java-treeset-implementation-nlogk

how about save one ceiling operation

because if floor <= nums[ind] + t and floor >= nums[ind] - t, then that's a duplicate. If floor < nums[ind] - t, there won't be a duplicate because floor is the biggest value in the treeset that is less than (nums[ind] + t). Hope it helps.

public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) { if (nums == null || nums.length == 0 || k <= 0) { return false; } final TreeSet<Long> values = new TreeSet<>(); for (int ind = 0; ind < nums.length; ind++) { Long floor = values.floor((long)nums[ind] + t); if (floor != null && floor + t >= nums[ind]) { return true; } values.add((long)nums[ind]); if (ind >= k) { values.remove((long)nums[ind - k]); } } return false; }

X. Different

https://leetcode.com/discuss/52545/java-solution-without-dictionary-sort-nums-record-positions
https://cheonhyangzhang.wordpress.com/2015/10/14/220-leetcode-java-contains-duplicate-iii-medium/

public boolean containsNearbyAlmostDuplicate(int[] nums, int k, int t) { if(nums.length<2||k<1||t<0) return false; ValuePosPair[] valPosArr = new ValuePosPair[nums.length]; for(int i =0;i<nums.length;i++) valPosArr[i] = new ValuePosPair(nums[i],i); Arrays.sort(valPosArr); for(int i=0;i<valPosArr.length;i++){ for(int j=i+1;j<valPosArr.length&&((long)valPosArr[j].val-(long)valPosArr[i].val<=(long)t);j++){ if(Math.abs(valPosArr[j].pos-valPosArr[i].pos)<=k) return true; } } return false; }

class ValuePosPair implements Comparable<ValuePosPair>{ int val; int pos; ValuePosPair(int v, int p) { val = v; pos = p;} public int compareTo(ValuePosPair x){ return this.val - x.val; } }

http://bookshadow.com/weblog/2015/06/03/leetcode-contains-duplicate-iii/

https://dotblogs.com.tw/hatelove/2017/02/17/leet-code-220-contains-duplicate-iii-by-tdd
第一個紅燈，k =0 應回傳 false
新增一個測試案例，當 nums 長度小於 2，應回傳 false
新增一個失敗的測試案例，當 t = 0 時，應等同於 leet code 219 的需求

LeetCode - 220 Contains Duplicate III

Labels

Popular Posts