[Python-3000] Pre-PEP: Easy Text File Decoding

Mon Sep 11 16:15:07 CEST 2006

On 9/11/06, Marcin 'Qrczak' Kowalczyk <qrczak at knm.org.pl> wrote:
>
> "Paul Prescod" <paul at prescod.net> writes:
>
> > Guido's goal was that quick and dirty text processing should "just
> > work" for newbies and encoding-disintererested expert programmers.
>
> What does 'guess' mean for creating files?

I wasn't sure about this one. But on Windows and Mac it seems safe to
generate UTF-8-with-BOM. Textedit, VIM and notepad all auto-detect the UTF-8
BOM and do the right thing.

2. Files are created in UTF-8.
>
>    Then files encoded with the locale encoding will be silently
>    recoded to UTF-8, causing trouble for further work with the file
>    (it can't be even typed to the terminal).

It can on the teriminal on the mac. And on the increasing number of UTF-8
defaulted Linux distributions. Perhaps it should by default use the Unix
locale for output, but only on Unix and not on mac/Windows.

I've implemented a hack which allows simple programs to "just work" in
> case of UTF-8. It's a modified encoder/decoder which escapes malformed
> UTF-8 sequences with '\0' bytes, and thus allows arbitrary byte
> sequences to round-trip UTF-8 decoding and encoding. It's not used by
> default and it's never used when "UTF-8" is specified explicitly,
> because it's not the true UTF-8, but I have an environment variable
> which says "if the locale is UTF-8, use the modified UTF-8 as the
> default encoding".

That's an interesting idea. I'm not sure if you are proposing it as being
applicable to this PEP or not...

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060911/f5e90075/attachment.htm