[Python-3000] locale-aware strings ?

Paul Prescod paul at prescod.net
Wed Sep 6 19:21:33 CEST 2006


On 9/6/06, Oleg Broytmann <phd at oper.phd.pp.ru> wrote:
> On Wed, Sep 06, 2006 at 03:55:04AM -0700, Paul Prescod wrote:
>    These situations are caused because of the lack of metadata or clear
> encoding-friendly standards. Ogg, for example, is encoding friendly - it
> clearly states that tags (comments) must be in UTF-8, and all Ogg Vorbis
> files I have saw were really in UTF-8, and all tag editors and players
> write/use UTF-8.

Michael Urman disagrees with you. He says that he sometimes sees
Latin-1 encoded files. Let's trace back how that could have happened.

1. The end-user must have had Latin-1 as their system encoding.

2. The programmer of the ID tagging app had not thought through encoding issues.

3. The programming language either implicitly encoded the data
according to the locale or treated it as binary data. (unless the
programmer did this on purpose, which would imply that he was VERY
confused and not just lazy)

>    I fail to see how Python can help here.

Python can refuse to be the programming language in Step 3 that
guesses the appropriate encoding without consulting the programmer or
end-user.

 Paul Prescod


More information about the Python-3000 mailing list