[CareerCup] 11.4 Sort the File 文件排序 - Grandyang - 博客园
11.4 Imagine you have a 20 GB file with one string per line. Explain how you would sort the file.
这道题说给了我们一个20GB大小的文件,每行有一个字符串,让我们给文件内容排序。那么既然强调了这么大的一个文件,肯定不想让我们直接进入内存中,那么我们可以把大文件分块,每块xMB,其中x的大小为我们可用的内存大小,我们对每块分别排序,然后把所有的有序块进行合并,这样我们就能得到一个有序的文件了。
http://www.shuatiblog.com/blog/2014/09/24/sort-20GB-file/
11.4 Imagine you have a 20 GB file with one string per line. Explain how you would sort the file.
这道题说给了我们一个20GB大小的文件,每行有一个字符串,让我们给文件内容排序。那么既然强调了这么大的一个文件,肯定不想让我们直接进入内存中,那么我们可以把大文件分块,每块xMB,其中x的大小为我们可用的内存大小,我们对每块分别排序,然后把所有的有序块进行合并,这样我们就能得到一个有序的文件了。
http://www.shuatiblog.com/blog/2014/09/24/sort-20GB-file/
Use External Sort. For example, for sorting 900 megabytes of data using only 100 megabytes of RAM:
- Divide the file into chunks of 100MB each. Sort using quicksort.
- Read the first 10 MB of each sorted chunk into input buffers in RAM and allocate the remaining 10 MB for an output buffer. (In practice, it might provide better performance to make the output buffer larger and the input buffers slightly smaller.)
- [Important] Perform a 9-way merge and store the result in the output buffer. Wheneverthe output buffer fills, write it to the final sorted file and empty it. Whenever any of the 9input buffers empties, fill it with the next 10 MB of its associated 100 MB sorted chunk until no more data from the chunk is available.