[Python-Dev] Bytes path support
Stephen J. Turnbull
stephen at xemacs.org
Sat Aug 23 12:14:47 CEST 2014
Oleg Broytman writes:
> This is the core of the problem. Python2 favors Unix model but
> Windows people pays the price. Python3 reverses that
This is certainly not true. What is true is that Python 3 makes no
attempt to make it easy to write crappy software in the old Unix
style, that breaks when unexpected character encoding are encountered.
Python 3 is designed to make it easier to write reliable software,
even if it will only ever be used on one platform. Nevertheless, it's
still a reasonable language for writing byte-shoveling software, with
the last piece in place as of the acceptance of PEP 461.
As of that PEP, you can use regexps for tokenizing byte streams and
%-formatting to conveniently produce them. If you want to treat them
piecewise as character streams with different encodings, you have a
large library of codecs, which provide an incremental decoder
interface. While AFAIK no codec implements a decode-until-error mode,
that's not all that much of a loss, as many encodings overlap. Eg, if
you start decoding using a latin-1 codec, decoding the whole document
will succeed, even if it switches to windows-1251 in the meantime.
Oleg, I gather Russian is your native language. That's moderately
complicated, I admit. But the Russians are a distant second to the
Japanese in self-destructive proliferation of incompatible character
coding standards and non-standard variants. After 24 years of dealing
with the mess that is East Asian encodings (which is even bound up
with the "religion" of Japanese exceptionalism -- some Japanese have
argued that there is a spiritual superiority to Japanese JIS codes!),
I cannot believe you are going to find a better environment for
dealing with these issues than Python 3.
More information about the Python-Dev