How to write fast into a file in python?

Carlos Nepomuceno carlosnepomuceno at outlook.com
Fri May 17 14:18:15 EDT 2013


You've hit the bullseye! ;)

Thanks a lot!!!

> Oh, I forgot to mention: you have a bug in this function. You're already
> including the newline in the len(line), so there is no need to add one.
> The result is that you only generate 44MB instead of 50MB.

That's because I'm running on Windows.
What's the fastest way to check if '\n' translates to 2 bytes on file?

> Here are the results of profiling the above on my computer. Including the
> overhead of the profiler, it takes just over 50 seconds to run your file
> on my computer.
>
> [steve at ando ~]$ python -m cProfile fastwrite5.py
> 17846645 function calls in 53.575 seconds
>

Didn't know the cProfile module.Thanks a lot!

> Ordered by: standard name
>
> ncalls tottime percall cumtime percall filename:lineno(function)
> 1 30.561 30.561 53.575 53.575 fastwrite5.py:1(<module>)
> 1 0.000 0.000 0.000 0.000 {cStringIO.StringIO}
> 5948879 5.582 0.000 5.582 0.000 {len}
> 1 0.004 0.004 0.004 0.004 {method 'close' of 'cStringIO.StringO' objects}
> 1 0.000 0.000 0.000 0.000 {method 'close' of 'file' objects}
> 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
> 5948879 9.979 0.000 9.979 0.000 {method 'format' of 'str' objects}
> 1 0.103 0.103 0.103 0.103 {method 'getvalue' of 'cStringIO.StringO' objects}
> 5948879 7.135 0.000 7.135 0.000 {method 'write' of 'cStringIO.StringO' objects}
> 1 0.211 0.211 0.211 0.211 {method 'write' of 'file' objects}
> 1 0.000 0.000 0.000 0.000 {open}
>
>
> As you can see, the time is dominated by repeatedly calling len(),
> str.format() and StringIO.write() methods. Actually writing the data to
> the file is quite a small percentage of the cumulative time.
>
> So, here's another version, this time using a pre-calculated limit. I
> cheated and just copied the result from the fastwrite5 output :-)
>
> # fasterwrite.py
> filename = 'fasterwrite.dat'
> with open(filename, 'w') as f:
> for i in xrange(5948879): # Actually only 44MB, not 50MB.
> f.write('%d\n' % i)
>

I had the same idea but kept the original method because I didn't want to waste time creating a function for calculating the actual number of iterations needed to deliver 50MB of data. ;)

> And the profile results are about twice as fast as fastwrite5 above, with
> only 8 seconds in total writing to my HDD.
>
> [steve at ando ~]$ python -m cProfile fasterwrite.py
> 5948882 function calls in 28.840 seconds
>
> Ordered by: standard name
>
> ncalls tottime percall cumtime percall filename:lineno(function)
> 1 20.592 20.592 28.840 28.840 fasterwrite.py:1(<module>)
> 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
> 5948879 8.229 0.000 8.229 0.000 {method 'write' of 'file' objects}
> 1 0.019 0.019 0.019 0.019 {open}
>

I thought there would be a call to format method by "'%d\n' % i". It seems the % operator is a lot faster than format.
I just stopped using it because I read it was going to be deprecated. :(
Why replace such a great and fast operator by a slow method? I mean, why format is been preferred over %?

> Without the overhead of the profiler, it is a little faster:
>
> [steve at ando ~]$ time python fasterwrite.py
>
> real 0m16.187s
> user 0m13.553s
> sys 0m0.508s
>
>
> Although it is still slower than the heavily optimized dd command,
> but not unreasonably slow for a high-level language:
>
> [steve at ando ~]$ time dd if=fasterwrite.dat of=copy.dat
> 90781+1 records in
> 90781+1 records out
> 46479922 bytes (46 MB) copied, 0.737009 seconds, 63.1 MB/s
>
> real 0m0.786s
> user 0m0.071s
> sys 0m0.595s
>
>
>
>
> --
> Steven
> --
> http://mail.python.org/mailman/listinfo/python-list 		 	   		  


More information about the Python-list mailing list