should writing Unicode files be so slow
Antoine Pitrou
solipsis at pitrou.net
Fri Mar 19 20:19:07 EDT 2010
Le Fri, 19 Mar 2010 17:18:17 +0000, djc a écrit :
>
> changing
> with open(filename, 'rU') as tabfile: to
> with codecs.open(filename, 'rU', 'utf-8', 'backslashreplace') as
> tabfile:
>
> and
> with open(outfile, 'wt') as out_part: to
> with codecs.open(outfile, 'w', 'utf-8') as out_part:
>
> causes a program that runs in
> 43 seconds to take 4 minutes to process the same data.
codecs.open() (and the object it returns) is slow as it is written in
pure Python.
Accelerated reading and writing of unicode files is available in Python
2.7 and 3.1, using the new `io` module.
Regards
Antoine.
More information about the Python-list
mailing list