codecs.oen [was: PEP 385: the eol-type issue]
M.-A. Lemburg wrote:
... and because of this, the feature is already available if you use codecs.open() instead of the built-in open():
Neil Hodgson asked:
So should I not add an issue for the basic open because codecs.open should be used for this case?
In python 3, why does codecs.open even still exist? As best I can tell, codecs.open should be the same as regular open, but for a unicode file -- and all text files are treated as unicode in python 3.0 So at this point, are there any differences beyond: (a) The builtin open doesn't work on multi-byte line-endings other than the multi-character CRLF. (In other words, it goes by the traditional Operating System conventions developed when a char was a byte, but the Unicode standard allows for a few more possibilities, which are currently rare in practice.) (b) The codecs version is much slower, because it hasn't seen the optimization effort. -jJ
Jim Jewett <jimjjewett <at> gmail.com> writes:
In python 3, why does codecs.open even still exist?
I don't remember anyone proposing to deprecate it, so I suppose that's the (social) reason.
So at this point, are there any differences beyond:
(c) The built-in open is probably a little more featureful, especially when it comes to seek() and tell().
(b) The codecs version is much slower, because it hasn't seen the optimization effort.
By the way, the built-in open would also benefit from an optimization of codecs.py's IncrementalEncoder classes: they are just thin Python wrappers around C function calls, and the overhead of calling a Python method is very significant when doing a lot of small unicode writes with a non-optimized codec (a couple of dominant codecs have been optimized by means of internal shortcuts bypassing codecs.py: latin-1, utf-8, utf-16). Regards Antoine.
(a) The builtin open doesn't work on multi-byte line-endings other than the multi-character CRLF. (In other words, it goes by the traditional Operating System conventions developed when a char was a byte, but the Unicode standard allows for a few more possibilities, which are currently rare in practice.)
participants (2)
-
Antoine Pitrou
-
Jim Jewett