CLRS - Chapter 9: Medians and Order Statistics



Simultaneous minimum and maximum
A simple algorithm to find the minimum and maximum is to find each one independently.
There will be n − 1 comparisons for the minimum and n − 1 comparisons
for the maximum, for a total of 2n −2 comparisons. This will result in (n) time.
In fact, at most 3 n/2 comparisons are needed to find both the minimum and
maximum:
Maintain the minimum and maximum of elements seen so far.
Don.t compare each element to the minimum and maximum separately.
Process elements in pairs.
Compare the elements of a pair to each other.
Then compare the larger element to the maximum so far, and compare the
smaller element to the minimum so far.
This leads to only 3 comparisons for every 2 elements.
Setting up the initial values for the min and max depends on whether n is odd or
even.
If n is even, compare the first two elements and assign the larger to max and the
smaller to min. Then process the rest of the elements in pairs.
If n is odd, set both min and max to the first element. Then process the rest of
the elements in pairs.

Selection in expected linear time
Selection of the i th smallest element of the array A can be done in (n) time.
The function RANDOMIZED-SELECT uses RANDOMIZED-PARTITION from the
quicksort algorithm in Chapter 7. RANDOMIZED-SELECT differs from quicksort
because it recurses on one side of the partition only.
RANDOMIZED-SELECT(A, p, r, i )
if p = r
then return A[p]
q ←RANDOMIZED-PARTITION(A, p, r )
k ← q − p + 1
if i = k pivot value is the answer
then return A[q]
elseif i < k
then return RANDOMIZED-SELECT(A, p, q − 1, i )
else return RANDOMIZED-SELECT(A, q + 1, r, i − k)
Selection in worst-case linear time
Idea: Guarantee a good split when the array is partitioned.
• Will use the deterministic procedure PARTITION, but with a small modiÞcation.
Instead of assuming that the last element of the subarray is the pivot, the
modiÞed PARTITION procedure is told which element to use as the pivot.
SELECT works on an array of n > 1 elements. It executes the following steps:
1. Divide the n elements into groups of 5. Get n/5 groups: n/5
 groups with
exactly 5 elements and, if 5 does not divide n, one group with the remaining
n mod 5 elements.
2. Find the median of each of the n/5 groups:
• Run insertion sort on each group. Takes O(1) time per group since each
group has ≤ 5 elements.
• Then just pick the median from each group, in O(1) time.
3. Find the median x of the n/5 medians by a recursive call to SELECT. (If
n/5 is even, then follow our convention and Þnd the lower median.)
4. Using the modiÞed version of PARTITION that takes the pivot element as input,
partition the input array around x. Let x be the kth element of the array after
partitioning, so that there are k −1 elements on the low side of the partition and
n − k elements on the high side.
5. Now there are three possibilities:
• If i = k, just return x.
• If i < k, return the i th smallest element on the low side of the partition by
making a recursive call to SELECT.
• If i > k, return the (i −k)th smallest element on the high side of the partition
by making a recursive call to SELECT.

Show that the second smallest of n elements can be found with n + lg n- 2 comparisons in
the worst case. (Hint: Also find the smallest element.)
In the search for the smallest number, the second smallest number must have come
out smallest in every comparison made with it until it was eventually compared
with the smallest. So the second smallest is among the elements that were compared
with the smallest during the tournament. To find it, conduct another tournament
(as above) to find the smallest of these numbers. At most lg n (the height
of the tree of comparisons) elements were compared with the smallest, so finding
the smallest of these takes lg n − 1 comparisons in the worst case.
The total number of comparisons made in the two tournaments was
n − 1 + lg n − 1 = n + lg n − 2
in the worst case.

Exercises 9.3-1
In the algorithm SELECT, the input elements are divided into groups of 5. Will the algorithm
work in linear time if they are divided into groups of 7? Argue that SELECT does not run in
linear time if groups of 3 are used.

Exercises 9.3-3
Show how quicksort can be made to run in O(n lg n) time in the worst case.
A modiÞcation to quicksort that allows it to run in O(n lg n) time in the worst case
uses the deterministic PARTITION algorithm that was modiÞed to take an element
to partition around as an input parameter.
SELECT takes an array A, the bounds p and r of the subarray in A, and the rank i
of an order statistic, and in time linear in the size of the subarray A[p . . r ] it returns
the i th smallest element in A[p . . r ].
BEST-CASE-QUICKSORT(A, p, r )
if p < r
then i ← (r − p + 1)/2
x ←SELECT(A, p, r, i )
q ←PARTITION(x)
BEST-CASE-QUICKSORT(A, p, q − 1)
BEST-CASE-QUICKSORT(A, q + 1, r )
Because BEST-CASE-QUICKSORT always recurses on subarrays that are at most
half the size of the original array, the recurrence for the worst-case running time is
T (n) ≤ 2T (n/2) + O(n) = O(n lg n).

Exercises 9.3-5
Suppose that you have a "black-box" worst-case linear-time median subroutine. Give a
simple, linear-time algorithm that solves the selection problem for an arbitrary order statistic.

Given MEDIAN, here is a linear-time algorithm SELECT for Þnding the i th smallest
element in A[p . . r ]. This algorithm uses the deterministic PARTITION algorithm
that was modiÞed to take an element to partition around as an input parameter.
SELECT
(A, p, r, i )
if p = r
then return A[p]
x ←MEDIAN(A, p, r )
q ←PARTITION(x)
k ← q − p + 1
if i = k
then return A[q]
elseif i < k
then return SELECT
(A, p, q − 1, i )
else return SELECT
(A, q + 1, r, i − k)

Because x is the median of A[p . . r ], each of the subarrays A[p . . q − 1] and
A[q + 1 . . r ] has at most half the number of elements of A[p . . r ]. The recurrence
for the worst-case running time of SELECT is T (n) ≤ T (n/2) + O(n) = O(n).

Exercises 9.3-8
Let X[1 .. n] and Y [1 .. n] be two arrays, each containing n numbers already in sorted order.
Give an O(lg n)-time algorithm to find the median of all 2n elements in arrays X and Y.

TWO-ARRAY-MEDIAN(X, Y )
n ← length[X] // n also equals length[Y ]
median ← FIND-MEDIAN(X, Y, n, 1, n)
if median = NOT-FOUND
then median ← FIND-MEDIAN(Y, X, n, 1, n)
return median
FIND-MEDIAN(A, B, n, low, high)
if low > high
then return NOT-FOUND
else k ← (low+high)/2
if k = n and A[n] ≤ B[1]
then return A[n]
elseif k < n and B[n − k] ≤ A[k] ≤ B[n − k + 1]
then return A[k]
elseif A[k] > B[n − k + 1]
then return FIND-MEDIAN(A, B, n, low, k − 1)
else return FIND-MEDIAN(A, B, n, k + 1, high)

Exercises 9.3-9
Professor Olay is consulting for an oil company, which is planning a large pipeline running
east to west through an oil field of n wells. From each well, a spur pipeline is to be connected
directly to the main pipeline along a shortest path (either north or south), as shown in Figure
9.2. Given x- and y-coordinates of the wells, how should the professor pick the optimal
location of the main pipeline (the one that minimizes the total length of the spurs)? Show that
the optimal location can be determined in linear time

In order to Þnd the optimal placement for Professor Olay’s pipeline, we need only
Þnd the median(s) of the y-coordinates of his oil wells, as the following proof
explains.
Claim
The optimal y-coordinate for Professor Olay’s east-west oil pipeline is as follows:
• If n is even, then on either the oil well whose y-coordinate is the lower median
or the one whose y-coordinate is the upper median, or anywhere between them.
• If n is odd, then on the oil well whose y-coordinate is the median.

Problems 9-1: Largest i numbers in sorted order
Given a set of n numbers, we wish to find the i largest in sorted order using a comparisonbased
algorithm. Find the algorithm that implements each of the following methods with the
best asymptotic worst-case running time, and analyze the running times of the algorithms in
terms of n and i.
a. Sort the numbers, and list the i largest.
b. Build a max-priority queue from the numbers, and call EXTRACT-MAX i times.
Total worst-case running time:(n + i lg n).

==> Limit heap size to i
c. Use an order-statistic algorithm to find the ith largest number, partition around that
number, and sort the i largest numbers.
c. Use the SELECT algorithm of Section 9.3 to Þnd the i th largest number in (n)
time. Partition around that number in (n) time. Sort the i largest numbers in
(i lg i ) worst-case time (with merge sort or heapsort).
Total worst-case running time: (n + i lg i ).

Problems 9-2: Weighted median




Labels

LeetCode (1432) GeeksforGeeks (1122) LeetCode - Review (1067) Review (882) Algorithm (668) to-do (609) Classic Algorithm (270) Google Interview (237) Classic Interview (222) Dynamic Programming (220) DP (186) Bit Algorithms (145) POJ (141) Math (137) Tree (132) LeetCode - Phone (129) EPI (122) Cracking Coding Interview (119) DFS (115) Difficult Algorithm (115) Lintcode (115) Different Solutions (110) Smart Algorithm (104) Binary Search (96) BFS (91) HackerRank (90) Binary Tree (86) Hard (79) Two Pointers (78) Stack (76) Company-Facebook (75) BST (72) Graph Algorithm (72) Time Complexity (69) Greedy Algorithm (68) Interval (63) Company - Google (62) Geometry Algorithm (61) Interview Corner (61) LeetCode - Extended (61) Union-Find (60) Trie (58) Advanced Data Structure (56) List (56) Priority Queue (53) Codility (52) ComProGuide (50) LeetCode Hard (50) Matrix (50) Bisection (48) Segment Tree (48) Sliding Window (48) USACO (46) Space Optimization (45) Company-Airbnb (41) Greedy (41) Mathematical Algorithm (41) Tree - Post-Order (41) ACM-ICPC (40) Algorithm Interview (40) Data Structure Design (40) Graph (40) Backtracking (39) Data Structure (39) Jobdu (39) Random (39) Codeforces (38) Knapsack (38) LeetCode - DP (38) Recursive Algorithm (38) String Algorithm (38) TopCoder (38) Sort (37) Introduction to Algorithms (36) Pre-Sort (36) Beauty of Programming (35) Must Known (34) Binary Search Tree (33) Follow Up (33) prismoskills (33) Palindrome (32) Permutation (31) Array (30) Google Code Jam (30) HDU (30) Array O(N) (29) Logic Thinking (29) Monotonic Stack (29) Puzzles (29) Code - Detail (27) Company-Zenefits (27) Microsoft 100 - July (27) Queue (27) Binary Indexed Trees (26) TreeMap (26) to-do-must (26) 1point3acres (25) GeeksQuiz (25) Merge Sort (25) Reverse Thinking (25) hihocoder (25) Company - LinkedIn (24) Hash (24) High Frequency (24) Summary (24) Divide and Conquer (23) Proof (23) Game Theory (22) Topological Sort (22) Lintcode - Review (21) Tree - Modification (21) Algorithm Game (20) CareerCup (20) Company - Twitter (20) DFS + Review (20) DP - Relation (20) Brain Teaser (19) DP - Tree (19) Left and Right Array (19) O(N) (19) Sweep Line (19) UVA (19) DP - Bit Masking (18) LeetCode - Thinking (18) KMP (17) LeetCode - TODO (17) Probabilities (17) Simulation (17) String Search (17) Codercareer (16) Company-Uber (16) Iterator (16) Number (16) O(1) Space (16) Shortest Path (16) itint5 (16) DFS+Cache (15) Dijkstra (15) Euclidean GCD (15) Heap (15) LeetCode - Hard (15) Majority (15) Number Theory (15) Rolling Hash (15) Tree Traversal (15) Brute Force (14) Bucket Sort (14) DP - Knapsack (14) DP - Probability (14) Difficult (14) Fast Power Algorithm (14) Pattern (14) Prefix Sum (14) TreeSet (14) Algorithm Videos (13) Amazon Interview (13) Basic Algorithm (13) Codechef (13) Combination (13) Computational Geometry (13) DP - Digit (13) LCA (13) LeetCode - DFS (13) Linked List (13) Long Increasing Sequence(LIS) (13) Math-Divisible (13) Reservoir Sampling (13) mitbbs (13) Algorithm - How To (12) Company - Microsoft (12) DP - Interval (12) DP - Multiple Relation (12) DP - Relation Optimization (12) LeetCode - Classic (12) Level Order Traversal (12) Prime (12) Pruning (12) Reconstruct Tree (12) Thinking (12) X Sum (12) AOJ (11) Bit Mask (11) Company-Snapchat (11) DP - Space Optimization (11) Dequeue (11) Graph DFS (11) MinMax (11) Miscs (11) Princeton (11) Quick Sort (11) Stack - Tree (11) 尺取法 (11) 挑战程序设计竞赛 (11) Coin Change (10) DFS+Backtracking (10) Facebook Hacker Cup (10) Fast Slow Pointers (10) HackerRank Easy (10) Interval Tree (10) Limited Range (10) Matrix - Traverse (10) Monotone Queue (10) SPOJ (10) Starting Point (10) States (10) Stock (10) Theory (10) Tutorialhorizon (10) Kadane - Extended (9) Mathblog (9) Max-Min Flow (9) Maze (9) Median (9) O(32N) (9) Quick Select (9) Stack Overflow (9) System Design (9) Tree - Conversion (9) Use XOR (9) Book Notes (8) Company-Amazon (8) DFS+BFS (8) DP - States (8) Expression (8) Longest Common Subsequence(LCS) (8) One Pass (8) Quadtrees (8) Traversal Once (8) Trie - Suffix (8) 穷竭搜索 (8) Algorithm Problem List (7) All Sub (7) Catalan Number (7) Cycle (7) DP - Cases (7) Facebook Interview (7) Fibonacci Numbers (7) Flood fill (7) Game Nim (7) Graph BFS (7) HackerRank Difficult (7) Hackerearth (7) Inversion (7) Kadane’s Algorithm (7) Manacher (7) Morris Traversal (7) Multiple Data Structures (7) Normalized Key (7) O(XN) (7) Radix Sort (7) Recursion (7) Sampling (7) Suffix Array (7) Tech-Queries (7) Tree - Serialization (7) Tree DP (7) Trie - Bit (7) 蓝桥杯 (7) Algorithm - Brain Teaser (6) BFS - Priority Queue (6) BFS - Unusual (6) Classic Data Structure Impl (6) DP - 2D (6) DP - Monotone Queue (6) DP - Unusual (6) DP-Space Optimization (6) Dutch Flag (6) How To (6) Interviewstreet (6) Knapsack - MultiplePack (6) Local MinMax (6) MST (6) Minimum Spanning Tree (6) Number - Reach (6) Parentheses (6) Pre-Sum (6) Probability (6) Programming Pearls (6) Rabin-Karp (6) Reverse (6) Scan from right (6) Schedule (6) Stream (6) Subset Sum (6) TSP (6) Xpost (6) n00tc0d3r (6) reddit (6) AI (5) Abbreviation (5) Anagram (5) Art Of Programming-July (5) Assumption (5) Bellman Ford (5) Big Data (5) Code - Solid (5) Code Kata (5) Codility-lessons (5) Coding (5) Company - WMware (5) Convex Hull (5) Crazyforcode (5) DFS - Multiple (5) DFS+DP (5) DP - Multi-Dimension (5) DP-Multiple Relation (5) Eulerian Cycle (5) Graph - Unusual (5) Graph Cycle (5) Hash Strategy (5) Immutability (5) Java (5) LogN (5) Manhattan Distance (5) Matrix Chain Multiplication (5) N Queens (5) Pre-Sort: Index (5) Quick Partition (5) Quora (5) Randomized Algorithms (5) Resources (5) Robot (5) SPFA(Shortest Path Faster Algorithm) (5) Shuffle (5) Sieve of Eratosthenes (5) Strongly Connected Components (5) Subarray Sum (5) Sudoku (5) Suffix Tree (5) Swap (5) Threaded (5) Tree - Creation (5) Warshall Floyd (5) Word Search (5) jiuzhang (5)

Popular Posts