The Maximum Gap Problem : Pigeonhole Principle - Algorithms and Problem SolvingAlgorithms and Problem Solving


The Maximum Gap Problem : Pigeonhole Principle - Algorithms and Problem SolvingAlgorithms and Problem Solving
Given a unsorted array with n elements. How can we find the largest gap between consecutive numbers of sorted version in O(n)?
For example, we have a unsorted array, a=[5, 3, 1, 8, 9, 2, 4] of size n=7 then the sorted version is
[1, 2, 3, 4, 5, 8, 9]. The output of the algorithm should be 8-5 = 3.
Similarly, for a=[5, 1, 8, 9, 999999, 99999] then answer should be 999999-99999 = 900000.
One way to do it is to using counting sort or radix sort if the integer range is known and small compared to size of array. But it is not feasible solution in case we don't know the range of the values or the values can be very large.
If we think carefully we will observe that if the array contains n numbers within a contagious sequence of numbers with values between a min and max value (inclusive) then the maximum gap is one. Now, in real problem many of these numbers will be missing. It may happen that some range of values are missing in the sequence and thus creating gaps. If we can identify such missing ranges in a cheap way then we can solve this problem in O(n).

The idea is to bucketize the values between min and max value (exclusive) of the given array into a set of equal sized buckets so that at least one empty bucket is formed. Each of such empty buckets will create a gap of size equals to the difference between max value in the previous non-empty bucket and the minimum value in the next non-empty bucket. The motivation behind this is the Pigeonhole Principle which tells us that if we divide n-1 numbers in n buckets than at least one one bucket is empty
Below is a
 linear‐time 
algorithm 
for
 computing
 the 
maximum
 gap
 allowing
 the
 constant
 time 
computation.

Given
 a 
set 
S 
of 
n
>
2 
real
 numbers
 x1,
x2,
…,
xn.



1. Find
 the
 maximum, 
x­max 
and 
the 
minimum,
 x­min 
in 
S.


2. Divide 
the 
interval 
[x­min,
x­max]
 into
 (n−1) 
"buckets" 
of 
equal 
size
 δ
=
(x­max
–
x­min)/(n‐1).

3. For
 each 
of 
the
 remaining 
n‐2 
numbers 
determine 
in 
which 
bucket
 it 
falls 
using 
the
 floor 
function.
 The number 
xi 
belongs 
to
 the 
kth bucket 
Bk 
if,
 and 
only
 if, 
(xi
‐
x­min)/δ
=
k‐1.

4. For 
each 
bucket 
Bk 
compute 
xk­min 
and xk­max 
among 
the 
numbers
 that
 fall
 in 
Bk. 
If 
the 
bucket 
is 
empty return 
nothing. 
If
the
bucket
 contains 
only 
one
 number 
return 
that 
number 
as
 both
 xk­min 
and xk­max.
5. Construct 
a
 list 
L 
of 
all 
the 
ordered 
minima
 and 
maxima:
 L:
(x1­min,
x1­max),
(x2­min,
x2­max),
…
,
(x(n‐1)­min,
x(n‐1)­max),

        • Note:
Since
 there 
are 
n‐1 
buckets
 and
 only
 n‐2 
numbers,
 by 
the
 Pigeonhole 
Principle, 
at
least 
one bucket
 must 
be 
empty. 
Therefore
 the
 maximum
 distance 
between
 a
 pair
 of
consecutive
 points 
must
 be 
at
 least the 
length 
of 
the 
bucket. 
Therefore
 the
 solution 
is
 not 
found
 among 
a 
pair 
of
 points 
that
 are 
contained
 in the
 same
 bucket.

6. In
 L
 find
 the
 maximum
 distance
 between
 a 
pair 
of
 consecutive
 minimum
 and
 maximum
(xi­max,
xj­min), 
where
j >
i.

7. 
Exit 
with
 this 
number 
as 
the
 maximum 
gap.

For example, with a=[5, 3, 1, 8, 9, 2, 4],
a=[5, 3, 1, 8, 9, 2, 4], n=7
1. Min = 1, Max = 9
2. Create 7-1 = 6 buckets with equal size (or width), δ = (9-1)/(7-1) = 8/6 = 1.33
3. Populate bucket with rest of the 5 elements and compute max and min in each bucket: 
          
          b1    b1    b2    b4    b5    b6
         ____________________________________
        |_2___|__3__|__4__|__5__|______|__8__|
        ^     ^     ^     ^     ^      ^     ^
        1    2.33   3.66  5    6.33   7.66   9

4. Identify all the empty buckets i.e. gaps and compute gap value = max of previous nonempty bucket - min of next non-empty bucket. For example, in this case b5 is an empty bucket, previous non empty bucket is b4 and next non-empty bucket is b6. So, gap value at b5 = max(b5) - min(b6) = 8-5 = 3.
5. Update a global max and iterate over all empty buckets to perform step 3. As there is only one bucket the answer is 3. 
I have implemented the above algorithm using two simple auxiliary arrays to keep track of maximum and minimum in the n-1 buckets with n-2 values (excluding min and max value). The algorithm runs in O(n) time and O(n) space.
public static int maxGap(int[] a){
 int n = a.length;
 if(n < 2){
  return 0;
 }
 
 int max = Integer.MIN_VALUE;
 int min = Integer.MAX_VALUE;
 
 for(int i = 0; i < n; i++){
  max = Math.max(max, a[i]);
  min = Math.min(min, a[i]);
 }
 
 //n-1 buckets -  we only care about max and min in each buckets
 int[] bucketMaxima = new int[n-1];
 Arrays.fill(bucketMaxima, Integer.MIN_VALUE);
 int[] bucketMinima = new int[n-1];
 Arrays.fill(bucketMinima, Integer.MAX_VALUE);
 //bucket width
 float delta = (float)(max-min)/((float)n-1);
 
 //populate the bucket maxima and minima
 for(int i = 0; i < n; i++){
  if(a[i] == max || a[i] == min){
   continue;
  }
  
  int bucketIndex = (int) Math.floor((a[i]-min)/delta);
  bucketMaxima[bucketIndex] = bucketMaxima[bucketIndex] == Integer.MIN_VALUE ? a[i] : Math.max(bucketMaxima[bucketIndex], a[i]);
  bucketMinima[bucketIndex] = bucketMinima[bucketIndex] == Integer.MAX_VALUE ? a[i] : Math.min(bucketMinima[bucketIndex], a[i]);
 }
 
 //find the maxgap - maxgaps
 int prev =  min;
 int maxGap = 0;
 for(int i = 0; i< n-1; i++){
  //empty bucket according to Pigeonhole principle
  if(bucketMinima[i] == Integer.MAX_VALUE){
   continue;
  }
  
  maxGap = Math.max(maxGap, bucketMinima[i]-prev);
  prev = bucketMaxima[i];
 }
 
 maxGap = Math.max(maxGap, max-prev);
 
 return maxGap;
}
http://allenlipeng47.com/PersonalPage/index/view/199/nkey
One solution is by bitmap. Let's go through all elements, and set the bitmap 1 for each element position. Then, we go throught whole bitmap to find the max gap.
Here is a better solution. Let's think about 1, 2, 2, 4. There are 4 elements, let's put them into 5 buckets. Each buckets maintain a distance=(maxValue – minValue)/5=(4-1)/5=0.6 In this way, the bucket will look like:
Then, let’s put each element into each bucket. It will look like:
 Because we choose 5 buckets, then there is at least 1 empty bucket. And we can gurantee the first bucket and last bucket always has element. 
Thus, we know max gap exists between element in left and right bucket of empty bucket. We just find all the empty bucket, then maxValue in its left non-empty bucket, minValue in its right non-empty bucket, gap=minValue-maxValue.

http://blueocean-penn.blogspot.com/2015/11/max-gap-problem-and-pigoen-hole.html
    public static int maxGap(int[] nums){
        int min = nums[0], max = nums[0];
        for(int k : nums){
            min = Math.min(min, k);
            max = Math.max(max, k);
        }
        
        int size = nums.length;
        int interval = (max-min)/(size-1);
        /*
         * we will create either n-1 or n pigeon holes
         */
        int count = size-1;
        if(interval*count+ min <=max)
            count++;
        
        Integer[] mins = new Integer[count];
        Integer[] maxs = new Integer[count];
        mins[0] = min; maxs[count-1] = max;
        
        for(int k : nums){
            int index = (k-min)/interval;
            mins[index] = mins[index] == null ? k : (Math.min(k, mins[index]));
            maxs[index] = maxs[index] == null ? k : (Math.max(k, maxs[index]));
        }
        
        int maxGap = maxs[0] - mins[0];
        int pre = maxs[0];
        for(int i = 1; i<count; i++){
            if(mins[i]==null)
                continue;
            maxGap = Math.max(mins[i] - pre, maxGap);
            maxGap = Math.max(maxs[i]-mins[i], maxGap);
            pre = maxs[i];
        }
        
        return maxGap;
    }
http://cgm.cs.mcgill.ca/~godfried/teaching/dm-reading-assignments/Maximum-Gap-Problem.pdf
Read full article from The Maximum Gap Problem : Pigeonhole Principle - Algorithms and Problem SolvingAlgorithms and Problem Solving

Labels

LeetCode (1432) GeeksforGeeks (1122) LeetCode - Review (1067) Review (882) Algorithm (668) to-do (609) Classic Algorithm (270) Google Interview (237) Classic Interview (222) Dynamic Programming (220) DP (186) Bit Algorithms (145) POJ (141) Math (137) Tree (132) LeetCode - Phone (129) EPI (122) Cracking Coding Interview (119) DFS (115) Difficult Algorithm (115) Lintcode (115) Different Solutions (110) Smart Algorithm (104) Binary Search (96) BFS (91) HackerRank (90) Binary Tree (86) Hard (79) Two Pointers (78) Stack (76) Company-Facebook (75) BST (72) Graph Algorithm (72) Time Complexity (69) Greedy Algorithm (68) Interval (63) Company - Google (62) Geometry Algorithm (61) Interview Corner (61) LeetCode - Extended (61) Union-Find (60) Trie (58) Advanced Data Structure (56) List (56) Priority Queue (53) Codility (52) ComProGuide (50) LeetCode Hard (50) Matrix (50) Bisection (48) Segment Tree (48) Sliding Window (48) USACO (46) Space Optimization (45) Company-Airbnb (41) Greedy (41) Mathematical Algorithm (41) Tree - Post-Order (41) ACM-ICPC (40) Algorithm Interview (40) Data Structure Design (40) Graph (40) Backtracking (39) Data Structure (39) Jobdu (39) Random (39) Codeforces (38) Knapsack (38) LeetCode - DP (38) Recursive Algorithm (38) String Algorithm (38) TopCoder (38) Sort (37) Introduction to Algorithms (36) Pre-Sort (36) Beauty of Programming (35) Must Known (34) Binary Search Tree (33) Follow Up (33) prismoskills (33) Palindrome (32) Permutation (31) Array (30) Google Code Jam (30) HDU (30) Array O(N) (29) Logic Thinking (29) Monotonic Stack (29) Puzzles (29) Code - Detail (27) Company-Zenefits (27) Microsoft 100 - July (27) Queue (27) Binary Indexed Trees (26) TreeMap (26) to-do-must (26) 1point3acres (25) GeeksQuiz (25) Merge Sort (25) Reverse Thinking (25) hihocoder (25) Company - LinkedIn (24) Hash (24) High Frequency (24) Summary (24) Divide and Conquer (23) Proof (23) Game Theory (22) Topological Sort (22) Lintcode - Review (21) Tree - Modification (21) Algorithm Game (20) CareerCup (20) Company - Twitter (20) DFS + Review (20) DP - Relation (20) Brain Teaser (19) DP - Tree (19) Left and Right Array (19) O(N) (19) Sweep Line (19) UVA (19) DP - Bit Masking (18) LeetCode - Thinking (18) KMP (17) LeetCode - TODO (17) Probabilities (17) Simulation (17) String Search (17) Codercareer (16) Company-Uber (16) Iterator (16) Number (16) O(1) Space (16) Shortest Path (16) itint5 (16) DFS+Cache (15) Dijkstra (15) Euclidean GCD (15) Heap (15) LeetCode - Hard (15) Majority (15) Number Theory (15) Rolling Hash (15) Tree Traversal (15) Brute Force (14) Bucket Sort (14) DP - Knapsack (14) DP - Probability (14) Difficult (14) Fast Power Algorithm (14) Pattern (14) Prefix Sum (14) TreeSet (14) Algorithm Videos (13) Amazon Interview (13) Basic Algorithm (13) Codechef (13) Combination (13) Computational Geometry (13) DP - Digit (13) LCA (13) LeetCode - DFS (13) Linked List (13) Long Increasing Sequence(LIS) (13) Math-Divisible (13) Reservoir Sampling (13) mitbbs (13) Algorithm - How To (12) Company - Microsoft (12) DP - Interval (12) DP - Multiple Relation (12) DP - Relation Optimization (12) LeetCode - Classic (12) Level Order Traversal (12) Prime (12) Pruning (12) Reconstruct Tree (12) Thinking (12) X Sum (12) AOJ (11) Bit Mask (11) Company-Snapchat (11) DP - Space Optimization (11) Dequeue (11) Graph DFS (11) MinMax (11) Miscs (11) Princeton (11) Quick Sort (11) Stack - Tree (11) 尺取法 (11) 挑战程序设计竞赛 (11) Coin Change (10) DFS+Backtracking (10) Facebook Hacker Cup (10) Fast Slow Pointers (10) HackerRank Easy (10) Interval Tree (10) Limited Range (10) Matrix - Traverse (10) Monotone Queue (10) SPOJ (10) Starting Point (10) States (10) Stock (10) Theory (10) Tutorialhorizon (10) Kadane - Extended (9) Mathblog (9) Max-Min Flow (9) Maze (9) Median (9) O(32N) (9) Quick Select (9) Stack Overflow (9) System Design (9) Tree - Conversion (9) Use XOR (9) Book Notes (8) Company-Amazon (8) DFS+BFS (8) DP - States (8) Expression (8) Longest Common Subsequence(LCS) (8) One Pass (8) Quadtrees (8) Traversal Once (8) Trie - Suffix (8) 穷竭搜索 (8) Algorithm Problem List (7) All Sub (7) Catalan Number (7) Cycle (7) DP - Cases (7) Facebook Interview (7) Fibonacci Numbers (7) Flood fill (7) Game Nim (7) Graph BFS (7) HackerRank Difficult (7) Hackerearth (7) Inversion (7) Kadane’s Algorithm (7) Manacher (7) Morris Traversal (7) Multiple Data Structures (7) Normalized Key (7) O(XN) (7) Radix Sort (7) Recursion (7) Sampling (7) Suffix Array (7) Tech-Queries (7) Tree - Serialization (7) Tree DP (7) Trie - Bit (7) 蓝桥杯 (7) Algorithm - Brain Teaser (6) BFS - Priority Queue (6) BFS - Unusual (6) Classic Data Structure Impl (6) DP - 2D (6) DP - Monotone Queue (6) DP - Unusual (6) DP-Space Optimization (6) Dutch Flag (6) How To (6) Interviewstreet (6) Knapsack - MultiplePack (6) Local MinMax (6) MST (6) Minimum Spanning Tree (6) Number - Reach (6) Parentheses (6) Pre-Sum (6) Probability (6) Programming Pearls (6) Rabin-Karp (6) Reverse (6) Scan from right (6) Schedule (6) Stream (6) Subset Sum (6) TSP (6) Xpost (6) n00tc0d3r (6) reddit (6) AI (5) Abbreviation (5) Anagram (5) Art Of Programming-July (5) Assumption (5) Bellman Ford (5) Big Data (5) Code - Solid (5) Code Kata (5) Codility-lessons (5) Coding (5) Company - WMware (5) Convex Hull (5) Crazyforcode (5) DFS - Multiple (5) DFS+DP (5) DP - Multi-Dimension (5) DP-Multiple Relation (5) Eulerian Cycle (5) Graph - Unusual (5) Graph Cycle (5) Hash Strategy (5) Immutability (5) Java (5) LogN (5) Manhattan Distance (5) Matrix Chain Multiplication (5) N Queens (5) Pre-Sort: Index (5) Quick Partition (5) Quora (5) Randomized Algorithms (5) Resources (5) Robot (5) SPFA(Shortest Path Faster Algorithm) (5) Shuffle (5) Sieve of Eratosthenes (5) Strongly Connected Components (5) Subarray Sum (5) Sudoku (5) Suffix Tree (5) Swap (5) Threaded (5) Tree - Creation (5) Warshall Floyd (5) Word Search (5) jiuzhang (5)

Popular Posts