[Tutor] concurrent file reading using python
Abhishek Pratap
abhishek.vit at gmail.com
Tue Mar 27 00:46:53 CEST 2012
Thanks Walter and Steven for the insight. I guess I will post my
question to python main mailing list and see if people have anything
to say.
-Abhi
On Mon, Mar 26, 2012 at 3:28 PM, Walter Prins <wprins at gmail.com> wrote:
> Abhi,
>
> On 26 March 2012 19:05, Abhishek Pratap <abhishek.vit at gmail.com> wrote:
>> I want to utilize the power of cores on my server and read big files
>> (> 50Gb) simultaneously by seeking to N locations. Process each
>> separate chunk and merge the output. Very similar to MapReduce
>> concept.
>>
>> What I want to know is the best way to read a file concurrently. I
>> have read about file-handle.seek(), os.lseek() but not sure if thats
>> the way to go. Any used cases would be of help.
>
> Your idea won't work. Reading from disk is not a CPU-bound process,
> it's an I/O bound process. Meaning, the speed by which you can read
> from a conventional mechanical hard disk drive is not constrained by
> how fast your CPU is, but generally by how fast your disk(s) can read
> data from the disk surface, which is limited by the rotation speed and
> areal density of the data on the disk (and the seek time), and by how
> fast it can shovel the data down it's I/O bus. And *that* speed is
> still orders of magnitude slower than your RAM and your CPU. So, in
> reality even just one of your cores will spend the vast majority of
> its time waiting for the disk when reading your 50GB file. There's
> therefore __no__ way to make your file reading faster by increasing
> your __CPU cores__ -- the only way is by improving your disk I/O
> throughput. You can for example stripe several hard disks together in
> RAID0 (but that increases the risk of data loss due to data being
> spread over multiple drives) and/or ensure you use a faster I/O
> subsystem (move to SATA3 if you're currently using SATA2 for example),
> and/or use faster hard disks (use 10,000 or 15,000 RPM instead of
> 7,200, or switch to SSD [solid state] disks.) Most of these options
> will cost you a fair bit of money though, so consider these thoughts
> in that light.
>
> Walter
More information about the Tutor
mailing list