Massive Algorithms: LeetCode 287 - Find the Duplicate Number

Related:
Related:
Lintcode 570 - Find the Missing Number II
LeetCode 41 - First Missing Positive
LintCode 196 - Find the Missing Number I
LeetCode 287 - Find the Duplicate Number - Google Interview

[LeetCode] Find the Duplicate Number - Eason Liu - 博客园
Find the Duplicate Number

Given an array nums containing n + 1 integers where each integer is between 1 and n (inclusive), prove that at least one duplicate number must exist. Assume that there is only one duplicate number, find the duplicate one.

Example 1:

Input: [1,3,4,2,2]
Output: 2

Example 2:

Input: [3,1,3,4,2]
Output: 3

Note:

You must not modify the array (assume the array is read only).
You must use only constant, O(1) extra space.
Your runtime complexity should be less than O(n2).
There is only one duplicate number in the array, but it could be repeated more than once

解法II：时间复杂度O(n * log n)

https://discuss.leetcode.com/topic/31661/java-easy-version-to-understand/10

The idea is calculate a middle number between low and high, then in for loop to check total numbers less or equal than middle. If numbers less than middle, then the duplicate must fall in the [low, middle] range; else in [middle+1, high]. For this solution, time is O(nlgn) and space is O(1) for my understanding.

Except declare "int count = 0;" out side of while loop. You can also consider the following two suggestions:

Check nums == null before nums.length == 0, else you can get a NullPointException as well.
You can also add data validation code below in the for loop to make it more robust.
if(x >= nums.length || x < 1) return x;

public static int findDuplicate(int[] nums) {
 if (nums.length == 0 || nums == null)
  return 0;
 int low = 1, high = nums.length - 1, mid;
 while (low < high) {
  mid = low + (high - low) / 2;
  int count = 0;
  for (int i = 0; i < nums.length; i++) {
   if (nums[i] <= mid)
    count++;
  }
  if (count > mid)
   high = mid;
  else
   low = mid + 1;
 }
 return low;
}

Also http://www.cnblogs.com/easonliu/p/4843818.html

二分查找（Binary Search）+ 鸽笼原理（Pigeonhole Principle）

参考维基百科关于鸽笼原理的词条链接：https://en.wikipedia.org/wiki/Pigeonhole_principle

“不允许修改数组” 与 “常数空间复杂度”这两个限制条件意味着：禁止排序，并且不能使用Map等数据结构

小于O(n2)的运行时间复杂度可以联想到使用二分将其中的一个n化简为log n

参考LeetCode Discuss：https://leetcode.com/discuss/60830/python-solution-explanation-without-changing-input-array

二分枚举答案范围，使用鸽笼原理进行检验

根据鸽笼原理，给定n + 1个范围[1, n]的整数，其中一定存在数字出现至少两次。

假设枚举的数字为 n / 2：

遍历数组，若数组中不大于n / 2的数字个数超过n / 2，则可以确定[1 - n /2]范围内一定有解，

否则可以确定解落在(n / 2, n]范围内。

http://algobox.org/find-the-duplicate-number/

Binary Search in the cumulative sums. We cannot use O(n) extra space therefore we have to calculate the sum every time. The algorithm is O(nlogn).

def findDuplicate(self, nums):

    l = 1

    r = len(nums)

    while l != r:

        m = (l + r) / 2

        f = sum(x <= m for x in nums)  # the count of numbers that is <= x

        if m < f:

            r = m

        else:

            l = m + 1

    return l

http://shibaili.blogspot.com/2015/10/day-132
-287-find-duplicate-number.html
Sol #1: 因为数字的范围是1 - n, 在范围内取一个值，遍历所给的数组，记下所有比这个值小的个数，进行对比。取值的方法用binary search

    int findDuplicate(vector<int>& nums) {

        int n = nums.size() - 1;

        int low = 1, high = n;

        while (low < high) {

            int mid = (low + high) / 2;

            int count = 0;

            for (int i = 0; i <= n; i++) {

                if (nums[i] <= mid) {

                    count++;

}

}

            if (count > mid) {

                high = mid;

            }else {

                low = mid + 1;

}

            if (low == high) return low;

}

}

https://richdalgo.wordpress.com/2015/10/06/find-the-duplicate-number/

def findDuplicate(self, nums):

size = len(nums)

i, j = 1, 1+size

while i <= j:

mid = (i+j)//2

x = y = 0

for num in nums:

if num < mid:

x += 1

elif num == mid:

y += 1

if y > 1:

return mid

if x > mid - 1:

j = mid - 1

else:

i = mid + 1

Java: http://blog.csdn.net/xudli/article/details/48802345

public int findDuplicate(int[] nums) {
int n = nums.length-1;
int low = 1, high= n;
int mid = 0;
while(low<high) {
mid = low + (high-low)/2;
int c= count(nums, mid); //count #numbers less than mid.
if(c<=mid) {
low = mid+1;
} else {
high = mid;
}
}
return low;
}
private int count(int[] nums, int mid) {
int c =0;
for(int i=0; i<nums.length; i++) {
if(nums[i]<=mid) c++;
}
return c;
}

X. O(N) - find cycle in linkedlist
Approach #3 Floyd's Tortoise and Hare (Cycle Detection)

First off, we can easily show that the constraints of the problem imply that a cycle must exist. Because each number in nums is between

1

and

n

, it will necessarily point to an index that exists. Therefore, the list can be traversed infinitely, which implies that there is a cycle. Additionally, because

0

cannot appear as a value in nums, nums[0] cannot be part of the cycle. Therefore, traversing the array in this manner from nums[0] is equivalent to traversing a cyclic linked list. Given this, the problem can be solved just like Linked List Cycle II.

public int findDuplicate(int[] nums) {

// Find the intersection point of the two runners.

int tortoise = nums[0];

int hare = nums[0];

do {

tortoise = nums[tortoise];

hare = nums[nums[hare]];

} while (tortoise != hare);

// Find the "entrance" to the cycle.

int ptr1 = nums[0];

int ptr2 = tortoise;

while (ptr1 != ptr2) {

ptr1 = nums[ptr1];

ptr2 = nums[ptr2];

}

return ptr1;

}

https://discuss.leetcode.com/topic/25913/my-easy-understood-solution-with-o-n-time-and-o-1-space-without-modifying-the-array-with-clear-explanation

The main idea is the same with problem Linked List Cycle II,https://leetcode.com/problems/linked-list-cycle-ii/. Use two pointers the fast and the slow. The fast one goes forward two steps each time, while the slow one goes only step each time. They must meet the same item when slow==fast. In fact, they meet in a circle, the duplicate number must be the entry point of the circle when visiting the array from nums[0]. Next we just need to find the entry point. We use a point(we can use the fast one before) to visit form begining with one step each time, do the same job to slow. When fast==slow, they meet at the entry point of the circle. The easy understood code is as follows.

int findDuplicate3(vector<int>& nums)
{
 if (nums.size() > 1)
 {
  int slow = nums[0];
  int fast = nums[nums[0]];
  while (slow != fast)
  {
   slow = nums[slow];
   fast = nums[nums[fast]];
  }

  fast = 0;
  while (fast != slow)
  {
   fast = nums[fast];
   slow = nums[slow];
  }
  return slow;
 }
 return -1;
}

http://blog.5ibc.net/p/17688.html

把数组当成一个静态链表，也就是说nums[]，数组相当于next[]，0为起始节点，节点地址范围就是0-n，一共n+1个节点。

因为数组范围是1-n，所以没有节点next会指向0。

而在有n+1个，next指针的情况下，链表必然会出现环，也就是两个next指向同一个节点，所以这题其实是相当于寻找链表的环的起始点（方法就是快慢指针，快指针在环里追上慢指针，再让一个指针跑到起始点，然后两个指针同速而行，交点就是答案）。

http://keithschwarz.com/interesting/code/find-duplicate/FindDuplicate.python.html
# The main trick we need to use to solve this problem is to notice that because
# we have an array of n elements ranging from 0 to n - 2, we can think of the
# array as defining a function f from the set {0, 1, ..., n - 1} onto itself.
# This function is defined by f(i) = A[i].  Given this setup, a duplicated
# value corresponds to a pair of indices i != j such that f(i) = f(j).  Our
# challenge, therefore, is to find this pair (i, j).  Once we have it, we can
# easily find the duplicated value by just picking f(i) = A[i].
#
# But how are we to find this repeated value?  It turns out that this is a
# well-studied problem in computer science called cycle detection.  The general
# form of the problem is as follows.  We are given a function f.  Define the
# sequence x_i as
#
#    x_0     = k       (for some k)
#    x_1     = f(x_0)
#    x_2     = f(f(x_0))
#    ...
#    x_{n+1} = f(x_n)
#
# Assuming that f maps from a domain into itself, this function will have one
# of three forms.  First, if the domain is infinite, then the sequence could be
# infinitely long and nonrepeating.  For example, the function f(n) = n + 1 on
# the integers has this property - no number is ever duplicated.  Second, the
# sequence could be a closed loop, which means that there is some i so that
# x_0 = x_i.  In this case, the sequence cycles through some fixed set of
# values indefinitely.  Finally, the sequence could be "rho-shaped."  In this
# case, the sequence looks something like this:
#
#     x_0 -> x_1 -> ... x_k -> x_{k+1} ... -> x_{k+j}
#                        ^                       |
#                        |                       |
#                        +-----------------------+
#
# That is, the sequence begins with a chain of elements that enters a cycle,
# then cycles around indefinitely.  We'll denote the first element of the cycle
# that is reached in the sequence the "entry" of the cycle.
#
# For our particular problem of finding a duplicated element in the array,
# consider the sequence formed by starting at position n - 1 and then
# repeatedly applying f.  That is, we start at the last position in the array,
# then go to the indicated index, repeating this process.  My claim is that
# this sequence is rho-shaped.  To see this, note that it must contains a cycle
# because the array is finite and after visiting n elements, we necessarily
# must visit some element twice.  This is true no matter where we start off in
# the array.  Moreover, note that since the array elements range from 0 to
# n - 2 inclusive, there is no array index that contains n - 1 as a value.
# Consequently, when we leave index n - 1 after applying the function f one
# time, we can never get back there.  This means that n - 1 can't be part of a
# cycle, but if we follow indices starting there we must eventually hit some
# other node twice.  The concatenation of the chain starting at n - 1 with the
# cycle it hits must be rho-shaped.
#
# Moreover, think about the node we encounter that starts at the entry of the
# cycle.  Since this node is at the entry of the cycle, there must be two
# inputs to the function f that both result in that index being generated.  For
# this to be possible, it must be that there are indices i != j with
# f(i) = f(j), meaning that A[i] = A[j].  Thus the index of the entry of the
# cycle must be one of the values that is duplicated in the array.
#
# There is a famous algorithm due to Robert Floyd that, given a rho-shaped
# sequence, finds the entry point of the cycle in linear time and using only
# constant space.  This algorithm is often referred to as the "tortoise and
# hare" algorithm, for reasons that will become clearer shortly.
#
# The idea behind the algorithm is to define two quantities.  First, let c be
# the length of the chain that enters the cycle, and let l be the length of the
# cycle.  Next, let l' be the smallest multiple of l that's larger than c.
# I claim that for any rho-shaped sequence l' defined above, that
#
#    x_{l'} = x_{2l'}
#
# The proof is actually straightforward and very illustrative - it's one of my
# favorite proofs in computer science.  The idea is that since l' is at least
# c, it must be contained in the cycle.  Moreover, since l' is a multiple of
# the length of the loop, we can write it as ml for some constant m.  If we
# start at position x_{l'}, which is inside the loop, then take l' more steps
# forward to get to x_{2l'}, then we will just walk around the loop m times,
# ending up right back where we started.
#
# One key trick of Floyd's algorithm is that even if we don't explicitly know l
# or c, we can still find the value l' in O(l') time.  The idea is as follows.
# We begin by keeping track of two values "slow" and "fast," both starting at
# x_0.  We then iteratively compute
#
#    slow = f(slow)
#    fast = f(f(fast))
#
# We repeat this process until we find that slow and fast are equal to one
# another.  When this happens, we know that slow = x_j for some j, and
# fast = x_{2j} for that same j.  Since x_j = x_{2j}, we know that j must be at
# least c, since it has to be contained in the cycle.  Moreover, we know that j
# must be a multiple of l, since the fact that x_j = x_{2j} means that taking j
# steps while in the cycle ends up producing the same result.  Finally, j must
# be the smallest multiple of l greater than c, since if there were a smaller
# multiple of l greater than c then we would have reached that multiple before
# we reached j.  Consequently, we must have that j = l', meaning that we can
# find l' without knowing anything about the length or shape of the cycle!
#
# To complete the construction, we need to show how to use our information
# about l' to find the entry to the cycle (which is at position x_c).  To do
# this, we start off one final variable, which we call "finder," at x_0.  We
# then iteratively repeat the following:
#
#   finder = f(finder)
#   slow   = f(slow)
#
# until finder = slow.  We claim that (1) the two will eventually hit each
# other, and (2) they will hit each other at the entry to the cycle.  To see
# this, we remark that since slow is at position x_{l'}, if we take c steps
# forward, then we have that slow will be at position x_{l' + c}.  Since l' is
# a multiple of the loop length, this is equivalent to taking c steps forward,
# then walking around the loop some number of times back to where you started.
# In other words, x_{l' + c} = x_c.  Moreover, consider the position of the
# finder variable after c steps.  It starts at x_0, so after c steps it will be
# at position x_c.  This proves both (1) and (2), since we've shown that the
# two must eventually hit each other, and when they do they hit at position x_c
# at the entry to the cycle.
#
# The beauty of this algorithm is that it uses only O(1) external memory to
# keep track of two different pointers - the slow pointer, and then the fast
# pointer (for the first half) and the finder pointer (for the second half).
# But on top of that, it runs in O(n) time.  To see this, note that the time
# required for the slow pointer to hit the fast pointer is O(l').  Since l' is
# the smallest multiple of l greater than c, we have two cases to consider.
# First, if l > c, then this is l.  Otherwise, if l < c, then we have that
# there must be some multiple of l between c and 2c.  To see this, note that
# in the range c and 2c there are c different values, and since l < c at least
# one of them must be equal to 0 mod l.  Finally, the time required to find the
# start of the cycle from this point is O(c).  This gives a total runtime of at
# most O(c + max{l, 2c}).  All of these values are at most n, so this algorithm
# runs in time O(n).

def findArrayDuplicate(array):
    assert len(array) > 0

    # The "tortoise and hare" step.  We start at the end of the array and try
    # to find an intersection point in the cycle.
    slow = len(array) - 1
    fast = len(array) - 1

    # Keep advancing 'slow' by one step and 'fast' by two steps until they
    # meet inside the loop.
    while True:
        slow = array[slow]
        fast = array[array[fast]]

        if slow == fast:
            break

    # Start up another pointer from the end of the array and march it forward
    # until it hits the pointer inside the array.
    finder = len(array) - 1
    while True:
        slow   = array[slow]
        finder = array[finder]

        # If the two hit, the intersection index is the duplicate element.
        if slow == finder:
            return slow
https://leetcode.com/problems/find-the-duplicate-number/discuss/72845/java-on-time-and-o1-space-solution-similar-to-find-loop-in-linkedlist

suppose the array is

index: 0 1 2 3 4 5

value: 2 5 1 1 4 3

first subtract 1 from each element in the array, so it is much easy to understand.
use the value as pointer. the array becomes:

index: 0 1 2 3 4 5

value: 1 4 0 0 3 2

Second if the array is

index: 0 1 2 3 4 5

value: 0 1 2 4 2 3

we must choose the last element as the head of the linked list. If we choose 0, we can not detect the cycle.

Now the problem is the same as find the cycle in linkedlist!

Nice idea! Code could be simpler: start with 0 and no need to deduct 1.

Then what about: - invalid case

0 1 2 3

1 2 0 2

0 -> 1 -> 2 -> 0 -> 1...
But the dupicate is 2.

Each integer is between 1 and n, not 0 and n.

public int findDuplicate(int[] nums) {
    int slow = 0, fast = 0;
    do{
        slow = nums[slow];
        fast = nums[nums[fast]];
    }while(slow != fast);
    slow = 0;
    while(slow != fast){
        slow = nums[slow];
        fast = nums[fast];
    }
    return slow;
}

public int findDuplicate(int[] nums) {
    int n = nums.length;
    for(int i=0;i<nums.length;i++) nums[i]--;
    int slow = n-1;
    int fast = n-1;
    do{
        slow = nums[slow];
        fast = nums[nums[fast]];
    }while(slow != fast);
    slow = n-1;
    while(slow != fast){
        slow = nums[slow];
        fast = nums[fast];
    }
    return slow+1;
}

One condition is we cannot modify the array. So the solution is

public int findDuplicate(int[] nums) {
    int n = nums.length;
    int slow = n;
    int fast = n;
    do{
        slow = nums[slow-1];
        fast = nums[nums[fast-1]-1];
    }while(slow != fast);
    slow = n;
    while(slow != fast){
        slow = nums[slow-1];
        fast = nums[fast-1];
    }
    return slow;
}

http://algobox.org/find-the-duplicate-number/

We treat each j = nums[i] as a link to nums[j], then the array is a bunch of Linked circles or P shape. Since there are one and only one duplicate number, therefore we only have one P shape. The rest are all circles. We know that 0is not gonna be pointed at, therefore it must be the end of the P shape. We then use Floyd’s loop detection algorithm to find the crossing point which is our duplicate number.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

public int findDuplicate(int[] nums) {

    int fast = 0, slow = 0, finder = 0;

    do {

        fast = nums[nums[fast]];

        slow = nums[slow];

    } while (fast != slow);

    while (finder != slow) {

        finder = nums[finder];

        slow = nums[slow];

    }

    return finder;

}

http://shaowei-su.github.io/2015/12/05/leetcode287/

问题的第二部分 - 在给定约束条件下寻找重复元素 - 可就难多了。 要解决这个问题，我们需要敏锐的洞察力，使问题通过一列的转化，变为一个完全不同的问题。

解决本题需要的主要技巧就是要注意到：由于数组的n + 1个元素范围从1到n，我们可以将数组考虑成一个从集合{1, 2, ..., n}到其本身的函数f。这个函数的定义为f(i) = A[i]。基于这个设定，重复元素对应于一对下标i != j满足 f(i) = f(j)。我们的任务就变成了寻找一对(i, j)。一旦我们找到这个值对，只需通过f(i) = A[i]即可获得重复元素。

但是我们怎样寻找这个重复值呢？这变成了计算机科学界一个广为人知的“环检测”问题。问题的一般形式如下：给定一个函数f，序列x_i的定义为

    x_0     = k       (for some k)
    x_1     = f(x_0)
    x_2     = f(f(x_0))
    ...
    x_{n+1} = f(x_n)

假设函数f从定义域映射到它本身，此时会有3种情况。首先，如果定义域是无穷的，则序列是无限长并且没有循环的。例如，函数 f(n) = n + 1，在整数范围内满足这个性质 - 没有数字是重复的。 第二， 序列可能是一个闭合循环，这意味着存在一个i使得x_0 = x_i。在这个例子中，序列在一组值内无限循环。第三，序列有可能是的“ρ型的”，此时序列看起来像下面这样：

      x_0 -> x_1 -> ... x_k -> x_{k+1} ... -> x_{k+j}
                         ^                       |
                         |                       |
                         +-----------------------+

也就是说，序列从一列链条型的元素开始进入一个环，然后无限循环。我们将环的起点称为环的“入口”。

对于从数组中寻找重复元素这个问题，考虑序列从位置n开始重复调用函数f。亦即从数组的最后一个元素开始，然后移动到其元素值对应的下标处，并且重复此过程。可以得到：此序列是ρ型的。要明白这一点，需要注意到其中一定有环，因为数组是有限的并且当访问n个元素时，一定会对某个元素访问两次。无论从数组的哪一个位置开始，这都是成立的。

另外，注意由于数组元素范围1到n，因此不存在值为0的元素。进而，从数组的第一个元素开始调用一次函数f之后，再也不会回到这里。这意味着第一个元素不会是环的一部分，但如果我们继续重复调用函数f，最终总会访问某个节点两次。从0节点开始的链条与环形相接，使得其形状一定是ρ型。

此外，考虑一下环的入口。由于节点位于环的入口，一定存在两个输入，其对应的函数f的输出值都等于入口元素下标。要使其成立，一定存在两个下标i != j，满足f(i) = f(j)，亦即A[i] = A[j]。因而环的入口一定是重复值。

这是由Robert Floyd提出的一个著名算法，给定一个ρ型序列，在线性时间，只使用常数空间寻找环的起点。这个算法经常被称为“龟兔”算法，至于原因下面就明白了。
算法背后的思想是定义两个变量。首先，令c为进入环的链的长度，然后令l为环的长度。接下来，令l'为大于c的l的倍数的最小值。可以得出结论：对于上文定义的任意ρ型序列的l'，都有

     x_{l'} = x_{2l'}

证明实际上非常直观并且具有自明性 - 这是计算机科学中我最喜欢的证明之一。思路就是由于l'至少为c，它一定包含在环内。同时，由于l'是环长度的倍数，我们可以将其写作ml，其中m为常数。如果我们从位置x_{l'}开始（其在环内），然后再走l'步到达x_{2l'}，则我们恰好绕环m次，正好回到起点。

Floyd算法的一个关键点就是即使我们不明确知道c的值，依然可以在O(l')时间内找到值l'。思路如下。我们追踪两个值"slow"和"fast"，均从x_0开始。然后迭代计算

     slow = f(slow)
     fast = f(f(fast))

我们重复此步骤直到slow与fast彼此相等。此时，我们可知存在j满足slow = x_j，并且fast = x_{2j}。 由于x_j = x_{2j}，可知j一定至少为c，因为此时已经在环中。另外，可知j一定是l的倍数，因为x_j = x_{2j}意味着在环内再走j步会得到同样的结果。最后，j一定是大于c的l的最小倍数，因为如果存在一个更小的大于c的l的倍数，我们一定会在到达j之前到达那里。所以，我们一定有j = l'，意味着我们可以在不知道环的长度或者形状的情况下找到l'。

要完成整个过程，我们需要明白如何使用l'来找到环的入口（记为x_c）。要做到这一步，我们再用最后一个变量，记为"finder"，从x_0出发。然后迭代重复执行过程：


    finder = f(finder)
    slow   = f(slow)

直到finder = slow为止。我们可知：(1) 两者一定会相遇 (2) 它们会在环的入口相遇。 要理解这两点，我们注意由于slow位于x_{l'}，如果我们向前走c步，那么slow会到达位置x_{l' + c}。由于l'是环长度的倍数，相当于向前走了c步，然后绕环几圈回到原位。换言之，x_{l' + c} = x_c。另外，考虑finder变量在行进c步之后的位置。 它由x_0出发，因此c步之后会到达x_c。这证明了(1)和(2)，由此我们已经证明两者最终会相遇，并且相遇点就是环的入口。

算法的美妙之处在于它只用O(1)的额外存储空间来记录两个不同的指针 - slow指针和fast指针（第一部分），以及finder指针（第二部分）。但是在此之上，运行时间是O(n)的。要明白这一点，注意slow指针追上fast指针的时间是O(l')。由于l'是大于c的l的最小倍数，有两种情况需要考虑。首先，如果l > c，那么就是l。 否则，如果l < c，那么我们可知一定存在l的倍数介于c与2c之间。要证明这一点，注意在范围c到2c内，有c个不同的值，由于l < c，其中一定有值对l取模运算等于0。最后，寻找环起点的时间为O(c)。这给出了总的运行时间至多为O(c + max{l, 2c})。所有这些值至多为n，因此算法的运行时间复杂度为O(n)。

http://bookshadow.com/weblog/2015/09/28/leetcode-find-duplicate-number/
def findArrayDuplicate(array):
    assert len(array) > 0

    # The "tortoise and hare" step.  We start at the end of the array and try
    # to find an intersection point in the cycle.
    slow = len(array) - 1
    fast = len(array) - 1

    # Keep advancing 'slow' by one step and 'fast' by two steps until they
    # meet inside the loop.
    while True:
        slow = array[slow]
        fast = array[array[fast]]

        if slow == fast:
            break

    # Start up another pointer from the end of the array and march it forward
    # until it hits the pointer inside the array.
    finder = len(array) - 1
    while True:
        slow   = array[slow]
        finder = array[finder]

        # If the two hit, the intersection index is the duplicate element.
        if slow == finder:
            return slow

http://blog.csdn.net/er_plough/article/details/48783645

int findDuplicate(vector<int>& nums)
{
int slow =0;
int fast =0;
while(1)
{
slow=nums[slow];
fast=nums[nums[fast]];
if(slow == fast)
break;
}
fast = 0;
while(1)
{
slow=nums[slow];
fast=nums[fast];
if(slow==fast)return slow;
}
}

http://shibaili.blogspot.com/2015/10/day-132-287-find-duplicate-number.html

    int findDuplicate(vector<int>& nums) {

        int n = nums.size();

        int slow = n, fast = n;

        while (true) {

            fast = nums[nums[fast - 1] - 1];

            slow = nums[slow - 1];

            if (slow == fast) break;

}

        slow = n;

        while (slow != fast) {

            slow = nums[slow - 1];

            fast = nums[fast - 1];

}

        return slow;

}

http://gaohow.com/blog/article?id=10

X. bit O(32n)
https://discuss.leetcode.com/topic/38493/o-32-n-solution-using-bit-manipulation-in-10-lines

We can count the sum of each 32 bits separately for the given array (stored in "b" variable) and for the array [1, 2, 3, ..., n] (stored in "a" variable). If "b" is greater than "a", it means that duplicated number has 1 at the current bit position (otherwise, "b" couldn't be greater than "a"). This way we retrieve the answer bit by bit:

public int findDuplicate(int[] nums) {
    int n = nums.length-1, res = 0;
    for (int p = 0; p < 32; ++ p) {
        int bit = (1 << p), a = 0, b = 0;
        for (int i = 0; i <= n; ++ i) {
            if (i > 0 && (i & bit) > 0) ++a;
            if ((nums[i] & bit) > 0) ++b;
        }
        if (b > a) res |= bit;
    }
    return res;
}

An interesting solution, but as lg n < 32, it will be less efficient than n lg n binary search solutions. It is one of the cases where linear algorithms are slower than log-linear, much like radix sort or binary search on the whole 32-bit integer range for some problems (like finding the median).

X. http://likemyblogger.blogspot.com/2015/09/cdatac-32msclass-solution-public-int.html

    int findDuplicate(vector<int>& nums) {

        sort(nums.begin(), nums.end());

        for(int i=1; i<nums.size(); ++i){

            if(nums[i]==nums[i-1]) return nums[i];

}

        return false;

}

// U can't change the num

    int findDuplicate(vector<int>& nums) {

        int n = nums.size();

        bool cont = true;

        while(cont){

            cont = false;

            for(int i=0; i<n; ++i){

                if(nums[i]!=i+1 && nums[nums[i]-1]!=nums[i]){

                    swap(nums[i], nums[nums[i]-1]);

                    cont = true;

}

}

}

        for(int i=0; i<n; ++i){

            if(nums[i]!=i+1) return nums[i];

}

}

http://www.1point3acres.com/bbs/thread-199569-1-1.html

输入是一个字符串数组，一个int数组，输入的字符串数组是另外一个字符串数组通过int数组变换得到的，int数组的值代表的是原来这位置上的字符串经过变换后的坐标，然后输出是求变换之前的字符串数组，要求用线性时间，o(1)额外空间

打个比方，比如一个字符串数组是"cat", "rabbit","dog", "mouse"，int数组给的2,0,3,1，意思是string数组第0个词是cat，它本来的位置是在哪呢，我们要看int数组，int数组的0在index 1上，所以说cat之前应该是1号位的，同理rabbit在string数组的1号位，而index数组3号位的值是1，说明rabbit这个词之前应该在3号位上的，依次类推，所以变换前的字符串数组应该是 dog, cat, mouse, rabbit

再打个比方，如果输入是Cat mouse dog rabbit和2,0,1,3，输出也会是dog, cat, mouse, rabbit
再打个比方，如果输入是Cat mouse dog rabbit, tiger, lion和2,0,1,3,5,4，输出会是dog, cat ,moutse, rabbit, lion, tiger

这个题感觉从变换前的数组求变换后的数组很好做，假设string数组是S，int数组是A，直接一位一位循环遍历，当A[i] != i 时候把A[i] 与 A[A[i]]以及S[i] 与S[A[i]]一直调换就可以了

我是一开始用了开数组的方法做，跟他讲了想法他才说这题要是只能用O（1）空间要怎么做。。。感觉就是和他想的不一样，他不想让你从这条路走，所以不断给你加限制条件

比如lz举得第一个例子，从int数组的第0个开始，int[0] = 2, 那么 tmp = string[0], string[0] = string[2], 然后看int[int[0]]也就是int[2], int[2] = 3, 那么string[2] = string[3], 以此类推，再看int[3], int[3] = 1, 那么string[3] = string[1], 最后看int[1], int[1] = 0, 那么string[1] = tmp.

只需要想通如何用O(1)空间把int数组还原就行了

但如果把int[] 换成 2 0 1 3会更麻烦一些，因为前面三个就是一个环了，最后一个3要再一个while loop来处理
也不麻烦外面有一个for loop就可，遇到不是index == value的元素就开始进行上面说到的这个解法，依然是O(N)的，每个元素遍历两遍大概

外面一个是for loop 碰到遇到正好在正确位置上的就continue
里面一个是while loop

能请问一下为什么while(map[j] != i) 里面要看等不等于i呢？

利用int(int[i]),把 array 转化成linked list的问题， leetcode 287用了相同trick。

public void reverseMapping(String[] str, int[] map) {. 1point 3acres 璁哄潧
for(int i=0; i<map.length; i++){
int j = i;
while(map[j] != i){
swap(str, j, map[j]);
int temp = map[j];.鐣欏璁哄潧-涓€浜�-涓夊垎鍦�
map[j] = j;
j = temp;
}. more info on 1point3acres.com
map[j] = j;
}
}
private void swap(String[] str, int i, int j) {
String temp = str[i];
str[i] = str[j];
str[j] = temp;
}

其实就是贪心算法
public String[] reverseMapping(String[] strs, int[] nums) {
      if (strs.length != nums.length || strs.length == 0)
         return strs;
      for (int i = 0; i < strs.length; i++) {
         if (nums[i] != i) {
            String s = strs[i];
            int next = i;
            while (nums[next] != i) {
                  strs[next] = strs[nums[next]];
                  int tmp = nums[next];
                  nums[next] = next;
                  next = tmp;
            }
            strs[next] = s;
            nums[next] = next;
         }
      }
      return strs;
}

这题应该是linked list style地traverse array。虽然空间复杂度要求O(1)，但是题目本身的int array

可以看做一个O(n)的额外空间。保证O(n)的时间的难点在于如果traverse中遇到环怎么办。

可以把所有traverse过的地方全都标为-1即可。每次开始traverse之前检查当前index是不是返回-1

  public String[] findOriginalArray(final String[] str, final int[] it) {
    for (int j = 0; j < str.length; j++) {
      if(it[j] == -1) continue;
      int i = j; // first item in the linked list style array traversal
      int si = it[i]; // next item's index
      it[i] = -1; // null next of current
      String tmpStr = str[i]; // save the head. Waral 鍗氬鏈夋洿澶氭枃绔�,
      while (si != j) { // back to the first item in this traversal and exit
        str[i] = str[si]; // put next to current position
        i = si; // current index becomes next's
        si = it[i]; // next become next.next
        it[i] = -1; // null next of current
      }
      str[i] = tmpStr; // put head to tail
      it[i] = -1; // null next
    }
    return str;
  }

根据linked list style的traversal array的思路我们可以把index array拆成n个无环的linked list, 每次遍历到一条新的linked list的时候（index != -1）就先把head node.val存成tmpStr, 然后去找tmpStr应该放的位置，这个过程中顺便修改了str[next]应该在的位置(str[cur])。

所以就变成了在index array里找无环的linked list并且遍历修改中间值的问题。

public static String[] recover( String[] str, int[] index) {
    for (int j = 0; j < str.length; j++) {. Waral 鍗氬鏈夋洿澶氭枃绔�,
      if(index[j] == -1) continue;. From 1point 3acres bbs
      int cur = j; // j就是linked list的头节点，我们要用cur指针去遍历这个链表直到找到环()
      int next = index[cur]; // next item's index.
      //index[cur] = -1; // 当前访问过的节点置为－1，标记visited
      String tmpStr = str[cur]; // 记录头节点的value. visit 1point3acres.com for more.
      while (next != j) { // 直到以j为头节点的无环链表全部遍历完
        str[cur] = str[next]; // put next to current position
        cur = next; // current index becomes next's
        next = index[cur]; // next become next.next
        index[cur] = -1; // 标记为visited
      }
      str[cur] = tmpStr; // 找到了tmpStr该放的位置，traversal linked list也把这条链表上的对应的值做了修改
      index[cur] = -1; // 标记为visited
    }
    return str;
  }

https://www.interviewcake.com/question/python/find-duplicate-optimize-for-space

We have a list of integers, where:

The integers are in the range $1..n$
The list has a length of $n+1$

It follows that our list has at least one integer which appears at least twice. But it may have several duplicates, and each duplicate may appear more than twice.

Write a function which finds an integer that appears more than once in our list.(If there are multiple duplicates, you only need to find one of them.)

https://github.com/danialgoodwin/dev/blob/master/code-practice/interview-cake/src/com/danialgoodwin/interviewcake/question/FindDuplicateOptimizeForSpace.java

* Space: O(n/2), time: O(n * log n), which is limited by sorting.

private static List<Integer> findDuplicates(List<Integer> numbers) {

List<Integer> repeats = new ArrayList<>();

if (numbers == null || numbers.isEmpty()) { return repeats; }

Collections.sort(numbers);

Integer previousValue = numbers.get(0);

int length = numbers.size();

for (int i = 1; i < length; i++) {

Integer num = numbers.get(i);

if (num != null && num.equals(previousValue) &&

(repeats.isEmpty() ||

!repeats.isEmpty() && !repeats.get(repeats.size() - 1).equals(num))) {

repeats.add(num);

}

previousValue = num;

}

return repeats;

}

https://www.interviewcake.com/question/python/find-duplicate-optimize-for-space-beast-mode

This is a tricky one to derive (unless you have a strong background in graph theory), so we'll get you started:

Imagine each item in the list as a node in a linked list. In any linked list ↴ , each node has a value and a "next" pointer. In this case:

The value is the integer from the list.
The "next" pointer points to the value-eth node in the list (numbered starting from 1). For example, if our value was 3, the "next" node would be the thirdnode.

https://leetcode.com/problems/find-the-duplicate-number/solution/
public int findDuplicate(int[] nums) {
Arrays.sort(nums);
for (int i = 1; i < nums.length; i++) {
if (nums[i] == nums[i-1]) {
return nums[i];
}
}

return -1;
}

public int findDuplicate(int[] nums) {
Set<Integer> seen = new HashSet<Integer>();
for (int num : nums) {
if (seen.contains(num)) {
return num;
}
seen.add(num);
}

return -1;
}
Read full article from [LeetCode] Find the Duplicate Number - Eason Liu - 博客园

LeetCode 287 - Find the Duplicate Number - Google Interview

Labels

Popular Posts