Bucket Sort


Bucket Sort | GeeksforGeeks
Bucket sort is mainly useful when input is uniformly distributed over a range. For example, consider the following problem. 
Sort a large set of floating point numbers which are in range from 0.0 to 1.0 and are uniformly distributed across the range. How do we sort the numbers efficiently?


Counting sort can not be applied here as we use keys as index in counting sort. Here keys are floating point numbers. 
Bucket sort it’s the perfect sorting algorithm for the sequence above. We must know in advance that the integers are fairly well distributed over an interval (i, j). Then we can divide this interval in N equal sub-intervals (or buckets). We’ll put each number in its corresponding bucket. Finally for every bucket that contains more than one number we’ll use some linear sorting algorithm.
bucketSort(arr[], n)
1) Create n empty buckets (Or lists).
2) Do following for every array element arr[i].
.......a) Insert arr[i] into bucket[n*array[i]]
3) Sort individual buckets using insertion sort.
4) Concatenate all sorted buckets.
The main step to analyze is step 3. This step also takes O(n) time on average if all numbers are uniformly distributed.

void bucketSort(float arr[], int n)
{
    // 1) Create n empty buckets
    vector<float> b[n];
    // 2) Put array elements in different buckets
    for (int i=0; i<n; i++)
    {
       int bi = n*arr[i]; // Index in bucket
       b[bi].push_back(arr[i]);
    }
    // 3) Sort individual buckets
    for (int i=0; i<n; i++)
       sort(b[i].begin(), b[i].end());
    // 4) Concatenate all buckets into arr[]
    int index = 0;
    for (int i = 0; i < n; i++)
        for (int j = 0; j < b[i].size(); j++)
          arr[index++] = b[i][j];
}

float arr[] = {0.897, 0.565, 0.656, 0.1234, 0.665, 0.3434};
int n = sizeof(arr)/sizeof(arr[0]);
bucketSort(arr, n);
http://www.growingwiththeweb.com/2015/06/bucket-sort.html
  • Counting sort: buckets hold only a single value
  • Bucket sort: buckets hold a range of values
  • Radix sort: buckets hold values based on digits within their values
The most common implementation of bucket sort works by splitting the array of size n into k buckets, each of which house a value range of n/k. The buckets are then sorted using a simple sorting algorithm that works well on the expected data, such as insertion sort


public class BucketSort {
    private static final int DEFAULT_BUCKET_SIZE = 5;

    public static void sort(Integer[] array) {
        sort(array, DEFAULT_BUCKET_SIZE);
    }

    public static void sort(Integer[] array, int bucketSize) {
        if (array.length == 0) {
            return;
        }

        // Determine minimum and maximum values
        Integer minValue = array[0];
        Integer maxValue = array[0];
        for (int i = 1; i < array.length; i++) {
            if (array[i] < minValue) {
                minValue = array[i];
            } else if (array[i] > maxValue) {
                maxValue = array[i];
            }
        }

        // Initialise buckets
        int bucketCount = (maxValue - minValue) / bucketSize + 1;
        List<List<Integer>> buckets = new ArrayList<List<Integer>>(bucketCount);
        for (int i = 0; i < bucketCount; i++) {
            buckets.add(new ArrayList<Integer>());
        }

        // Distribute input array values into buckets
        for (int i = 0; i < array.length; i++) {
            buckets.get((array[i] - minValue) / bucketSize).add(array[i]);
        }

        // Sort buckets and place back into input array
        int currentIndex = 0;
        for (int i = 0; i < buckets.size(); i++) {
            Integer[] bucketArray = new Integer[buckets.get(i).size()];
            bucketArray = buckets.get(i).toArray(bucketArray);
            InsertionSort.sort(bucketArray);
            for (int j = 0; j < bucketArray.length; j++) {
                array[currentIndex++] = bucketArray[j];
            }
        }
    }
}
http://en.algoritmy.net/article/41160/Bucket-sort
Usage
Bucket sort can be used for distributed sorting – each bucket can be ordered by a different thread or even by a different computer. Another use case is a sorting of huge input data, which cannot be loaded into the main memory by an ordinary O(n\cdot \log(n)) algorithm. This problem can be solved by dividing the data into sufficiently small buckets, sorting them one by one by appropriate algorithm, while storing rest of the data in the external memory (e.g. hard drive).
public static int[] bucketSort(int[] array, int bucketCount) {
08.if (bucketCount <= 0throw new IllegalArgumentException("Invalid bucket count");
09.if (array.length <= 1return array; //trivially sorted
10. 
11.int high = array[0];
12.int low = array[0];
13.for (int i = 1; i < array.length; i++) { //find the range of input elements
14.if (array[i] > high) high = array[i];
15.if (array[i] < low) low = array[i];
16.}
17.double interval = ((double)(high - low + 1))/bucketCount; //range of one bucket
18. 
19.ArrayList<Integer> buckets[] = new ArrayList[bucketCount];
20.for (int i = 0; i < bucketCount; i++) { //initialize buckets
21.buckets[i] = new ArrayList();
22.}
23. 
24.for (int i = 0; i < array.length; i++) { //partition the input array
25.buckets[(int)((array[i] - low)/interval)].add(array[i]);
26.}
27. 
28.int pointer = 0;
29.for (int i = 0; i < buckets.length; i++) {
30.Collections.sort(buckets[i]); //mergeSort
31.for (int j = 0; j < buckets[i].size(); j++) { //merge the buckets
32.array[pointer] = buckets[i].get(j);
33.pointer++;
34.}
35.}
36.return array;
37.}
https://www-927.ibm.com/ibm/cas/hspc/student/algorithms/BucketSort.html
Problem 2. Museum
At a nature museum, you are in charge of the ancient rock collection. Your scientists have collected a number of shiny, coloured rocks, and have compiled the information into a text file. Each rock is listed like this: rock_weight, rock_colour, so for example, {23, blue}.
As head of the rock collection, you need to write a program that will sort these rocks into separate weight categories (0-10 grams, 11-20 grams, 21-30 grams, 31-40 grams and 41+ grams) using bucket sort. Then, you must use bucket sort again to separate the weight-categorized rocks into colour order, so red is first, then orange, yellow, green, blue, purple and grey.
http://www.cs.hmc.edu/~keller/courses/cs60/s98/examples/Sort/bucketsort.java
Also refer to http://java.dzone.com/articles/algorithm-week-bucket-sort
http://en.wikipedia.org/wiki/Bucket_sort

http://www.geeksforgeeks.org/bucket-sort-to-sort-an-array-with-negative-numbers/
Bucket sort is mainly useful when input is uniformly distributed over a range. For example, consider the problem of sorting a large set of floating point numbers which are in range from 0.0 to 1.0 and are uniformly distributed across the range

How to modify Bucket Sort to sort both positive and negative numbers?
Example:


Input : arr[] = { -0.897, 0.565, 0.656, -0.1234, 0, 0.3434 } 
Output : -0.897 -0.1234  0 0.3434 0.565 0.656 
sortMixed(arr[], n)
1) Split array into two parts 
   create two Empty vector Neg[], Pos[] 
   (for negative and positive element respectively)
   Store all negative element in Neg[] by converting
   into positive (Neg[i] = -1 * Arr[i] )
   Store all +ve in pos[]  (pos[i] =  Arr[i])
2) Call function bucketSortPositive(Pos, pos.size())
   Call function bucketSortPositive(Neg, Neg.size())

bucketSortPositive(arr[], n)
3) Create n empty buckets (Or lists).
4) Do following for every array element arr[i]. 
       a) Insert arr[i] into bucket[n*array[i]]
5) Sort individual buckets using insertion sort.
6) Concatenate all sorted buckets. 
void bucketSort(vector<float> &arr, int n)
{
    // 1) Create n empty buckets
    vector<float> b[n];
    // 2) Put array elements in different
    //    buckets
    for (int i=0; i<n; i++)
    {
        int bi = n*arr[i]; // Index in bucket
        b[bi].push_back(arr[i]);
    }
    // 3) Sort individual buckets
    for (int i=0; i<n; i++)
        sort(b[i].begin(), b[i].end());
    // 4) Concatenate all buckets into arr[]
    int index = 0;
    arr.clear();
    for (int i = 0; i < n; i++)
        for (int j = 0; j < b[i].size(); j++)
            arr.push_back(b[i][j]);
}
// This function mainly slpits array into two
// and then calls bucketSort() for two arrays.
void sortMixed(float arr[], int n)
{
    vector<float>Neg ;
    vector<float>Pos;
    // traverse array elements
    for (int i=0; i<n; i++)
    {
        if (arr[i] < 0)
            // store -Ve elements by
            // converting into +ve element
            Neg.push_back (-1 * arr[i]) ;
        else
            // store +ve elements
            Pos.push_back (arr[i]) ;
    }
    bucketSort(Neg, (int)Neg.size());
    bucketSort(Pos, (int)Pos.size());
    // First store elements of Neg[] array
    // by converting into -ve
    for (int i=0; i < Neg.size(); i++)
        arr[i] = -1 * Neg[ Neg.size() -1 - i];
    // store +ve element
    for(int j=Neg.size(); j < n; j++)
        arr[j] = Pos[j - Neg.size()];
}

http://www.cnblogs.com/EdwardLiu/p/6546969.html
Give n points on 2-D plane, find the K closest points to origin

Based on bucket sort:
16     public static List<Coordinate> findK(List<Coordinate> input, int k) {
17         HashMap<Coordinate, Integer> map = new HashMap<Coordinate, Integer>();
18         int longest = 0;
19         for (Coordinate each : input) {
20             int distance = cal(each);
21             map.put(each, distance);
22             longest = Math.max(longest, distance);
23         }
24         
25         List<Coordinate>[] arr = new ArrayList[longest + 1];
26         for (Coordinate each : map.keySet()) {
27             int dis = map.get(each);
28             if (arr[dis] == null)
29                 arr[dis] = new ArrayList<Coordinate>();
30             arr[dis].add(each);
31         }
32         
33         List<Coordinate> res = new ArrayList<Coordinate>();
34         for (int i=0; i<arr.length-1 && res.size()<k; i++) {
35             if (arr[i] != null)
36                 res.addAll(arr[i]);
37         }
38         return res;
39     }
40     
41     public static int cal(Coordinate a) {
42         return a.x * a.x + a.y * a.y;
43     }

还有的方法是维护一个size为k的最大堆
13     public List<Point> KClosest(List<Point> input, int k) {
14         List<Point> res = new LinkedList<Point>();
15         PriorityQueue<Point> pq = new PriorityQueue<Point>(10, new Comparator<Point>() {
16             public int compare(Point a, Point b) {
17                 return (b.x * b.x + b.y * b.y) - (a.x * a.x + a.y * a.y);
18             }            
19         });
20         
21         for (Point each : input) {
22             pq.offer(each);
23             if (pq.size() > k) pq.poll();
24         }
25         while (!pq.isEmpty()) {
26             res.add(0, pq.poll());
27         }
28         return res;
29     }

Read full article from Bucket Sort | GeeksforGeeks

Labels

LeetCode (1432) GeeksforGeeks (1122) LeetCode - Review (1067) Review (882) Algorithm (668) to-do (609) Classic Algorithm (270) Google Interview (237) Classic Interview (222) Dynamic Programming (220) DP (186) Bit Algorithms (145) POJ (141) Math (137) Tree (132) LeetCode - Phone (129) EPI (122) Cracking Coding Interview (119) DFS (115) Difficult Algorithm (115) Lintcode (115) Different Solutions (110) Smart Algorithm (104) Binary Search (96) BFS (91) HackerRank (90) Binary Tree (86) Hard (79) Two Pointers (78) Stack (76) Company-Facebook (75) BST (72) Graph Algorithm (72) Time Complexity (69) Greedy Algorithm (68) Interval (63) Company - Google (62) Geometry Algorithm (61) Interview Corner (61) LeetCode - Extended (61) Union-Find (60) Trie (58) Advanced Data Structure (56) List (56) Priority Queue (53) Codility (52) ComProGuide (50) LeetCode Hard (50) Matrix (50) Bisection (48) Segment Tree (48) Sliding Window (48) USACO (46) Space Optimization (45) Company-Airbnb (41) Greedy (41) Mathematical Algorithm (41) Tree - Post-Order (41) ACM-ICPC (40) Algorithm Interview (40) Data Structure Design (40) Graph (40) Backtracking (39) Data Structure (39) Jobdu (39) Random (39) Codeforces (38) Knapsack (38) LeetCode - DP (38) Recursive Algorithm (38) String Algorithm (38) TopCoder (38) Sort (37) Introduction to Algorithms (36) Pre-Sort (36) Beauty of Programming (35) Must Known (34) Binary Search Tree (33) Follow Up (33) prismoskills (33) Palindrome (32) Permutation (31) Array (30) Google Code Jam (30) HDU (30) Array O(N) (29) Logic Thinking (29) Monotonic Stack (29) Puzzles (29) Code - Detail (27) Company-Zenefits (27) Microsoft 100 - July (27) Queue (27) Binary Indexed Trees (26) TreeMap (26) to-do-must (26) 1point3acres (25) GeeksQuiz (25) Merge Sort (25) Reverse Thinking (25) hihocoder (25) Company - LinkedIn (24) Hash (24) High Frequency (24) Summary (24) Divide and Conquer (23) Proof (23) Game Theory (22) Topological Sort (22) Lintcode - Review (21) Tree - Modification (21) Algorithm Game (20) CareerCup (20) Company - Twitter (20) DFS + Review (20) DP - Relation (20) Brain Teaser (19) DP - Tree (19) Left and Right Array (19) O(N) (19) Sweep Line (19) UVA (19) DP - Bit Masking (18) LeetCode - Thinking (18) KMP (17) LeetCode - TODO (17) Probabilities (17) Simulation (17) String Search (17) Codercareer (16) Company-Uber (16) Iterator (16) Number (16) O(1) Space (16) Shortest Path (16) itint5 (16) DFS+Cache (15) Dijkstra (15) Euclidean GCD (15) Heap (15) LeetCode - Hard (15) Majority (15) Number Theory (15) Rolling Hash (15) Tree Traversal (15) Brute Force (14) Bucket Sort (14) DP - Knapsack (14) DP - Probability (14) Difficult (14) Fast Power Algorithm (14) Pattern (14) Prefix Sum (14) TreeSet (14) Algorithm Videos (13) Amazon Interview (13) Basic Algorithm (13) Codechef (13) Combination (13) Computational Geometry (13) DP - Digit (13) LCA (13) LeetCode - DFS (13) Linked List (13) Long Increasing Sequence(LIS) (13) Math-Divisible (13) Reservoir Sampling (13) mitbbs (13) Algorithm - How To (12) Company - Microsoft (12) DP - Interval (12) DP - Multiple Relation (12) DP - Relation Optimization (12) LeetCode - Classic (12) Level Order Traversal (12) Prime (12) Pruning (12) Reconstruct Tree (12) Thinking (12) X Sum (12) AOJ (11) Bit Mask (11) Company-Snapchat (11) DP - Space Optimization (11) Dequeue (11) Graph DFS (11) MinMax (11) Miscs (11) Princeton (11) Quick Sort (11) Stack - Tree (11) 尺取法 (11) 挑战程序设计竞赛 (11) Coin Change (10) DFS+Backtracking (10) Facebook Hacker Cup (10) Fast Slow Pointers (10) HackerRank Easy (10) Interval Tree (10) Limited Range (10) Matrix - Traverse (10) Monotone Queue (10) SPOJ (10) Starting Point (10) States (10) Stock (10) Theory (10) Tutorialhorizon (10) Kadane - Extended (9) Mathblog (9) Max-Min Flow (9) Maze (9) Median (9) O(32N) (9) Quick Select (9) Stack Overflow (9) System Design (9) Tree - Conversion (9) Use XOR (9) Book Notes (8) Company-Amazon (8) DFS+BFS (8) DP - States (8) Expression (8) Longest Common Subsequence(LCS) (8) One Pass (8) Quadtrees (8) Traversal Once (8) Trie - Suffix (8) 穷竭搜索 (8) Algorithm Problem List (7) All Sub (7) Catalan Number (7) Cycle (7) DP - Cases (7) Facebook Interview (7) Fibonacci Numbers (7) Flood fill (7) Game Nim (7) Graph BFS (7) HackerRank Difficult (7) Hackerearth (7) Inversion (7) Kadane’s Algorithm (7) Manacher (7) Morris Traversal (7) Multiple Data Structures (7) Normalized Key (7) O(XN) (7) Radix Sort (7) Recursion (7) Sampling (7) Suffix Array (7) Tech-Queries (7) Tree - Serialization (7) Tree DP (7) Trie - Bit (7) 蓝桥杯 (7) Algorithm - Brain Teaser (6) BFS - Priority Queue (6) BFS - Unusual (6) Classic Data Structure Impl (6) DP - 2D (6) DP - Monotone Queue (6) DP - Unusual (6) DP-Space Optimization (6) Dutch Flag (6) How To (6) Interviewstreet (6) Knapsack - MultiplePack (6) Local MinMax (6) MST (6) Minimum Spanning Tree (6) Number - Reach (6) Parentheses (6) Pre-Sum (6) Probability (6) Programming Pearls (6) Rabin-Karp (6) Reverse (6) Scan from right (6) Schedule (6) Stream (6) Subset Sum (6) TSP (6) Xpost (6) n00tc0d3r (6) reddit (6) AI (5) Abbreviation (5) Anagram (5) Art Of Programming-July (5) Assumption (5) Bellman Ford (5) Big Data (5) Code - Solid (5) Code Kata (5) Codility-lessons (5) Coding (5) Company - WMware (5) Convex Hull (5) Crazyforcode (5) DFS - Multiple (5) DFS+DP (5) DP - Multi-Dimension (5) DP-Multiple Relation (5) Eulerian Cycle (5) Graph - Unusual (5) Graph Cycle (5) Hash Strategy (5) Immutability (5) Java (5) LogN (5) Manhattan Distance (5) Matrix Chain Multiplication (5) N Queens (5) Pre-Sort: Index (5) Quick Partition (5) Quora (5) Randomized Algorithms (5) Resources (5) Robot (5) SPFA(Shortest Path Faster Algorithm) (5) Shuffle (5) Sieve of Eratosthenes (5) Strongly Connected Components (5) Subarray Sum (5) Sudoku (5) Suffix Tree (5) Swap (5) Threaded (5) Tree - Creation (5) Warshall Floyd (5) Word Search (5) jiuzhang (5)

Popular Posts