[Python-3000] Pre-PEP: Easy Text File Decoding
Paul Prescod
paul at prescod.net
Mon Sep 11 16:15:07 CEST 2006
On 9/11/06, Marcin 'Qrczak' Kowalczyk <qrczak at knm.org.pl> wrote:
>
> "Paul Prescod" <paul at prescod.net> writes:
>
> > Guido's goal was that quick and dirty text processing should "just
> > work" for newbies and encoding-disintererested expert programmers.
>
> What does 'guess' mean for creating files?
I wasn't sure about this one. But on Windows and Mac it seems safe to
generate UTF-8-with-BOM. Textedit, VIM and notepad all auto-detect the UTF-8
BOM and do the right thing.
2. Files are created in UTF-8.
>
> Then files encoded with the locale encoding will be silently
> recoded to UTF-8, causing trouble for further work with the file
> (it can't be even typed to the terminal).
It can on the teriminal on the mac. And on the increasing number of UTF-8
defaulted Linux distributions. Perhaps it should by default use the Unix
locale for output, but only on Unix and not on mac/Windows.
I've implemented a hack which allows simple programs to "just work" in
> case of UTF-8. It's a modified encoder/decoder which escapes malformed
> UTF-8 sequences with '\0' bytes, and thus allows arbitrary byte
> sequences to round-trip UTF-8 decoding and encoding. It's not used by
> default and it's never used when "UTF-8" is specified explicitly,
> because it's not the true UTF-8, but I have an environment variable
> which says "if the locale is UTF-8, use the modified UTF-8 as the
> default encoding".
That's an interesting idea. I'm not sure if you are proposing it as being
applicable to this PEP or not...
Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060911/f5e90075/attachment.htm
More information about the Python-3000
mailing list