[Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

Marcin 'Qrczak' Kowalczyk qrczak at knm.org.pl
Tue Sep 30 21:46:36 CEST 2008


2008/9/30 Marcin 'Qrczak' Kowalczyk <qrczak at knm.org.pl>:

> I've experimentally implemented (not for Python) a different escaping
> scheme with a similar goal as UTF-8b: undecodable bytes are prefixed
> with U+0000 instead of being converted to unpaired surrogates, and
> '\x00' decodes as U+0000 U+0000.

This was not my idea: mono did that first.
http://go-mono.com/docs/index.aspx?link=T%3AMono.Unix.UnixEncoding
"In short, it's a Glorious Hack. Rejoice. Or something."

Note that there are many people, including the Unicode list, who
consider this evil because they view this as a non-standard
modification of UTF-8. I am undecided on how evil it is.

(My implementation differs from mono by the strictness of what Unicode
sequences can be encoded: mono encodes all and mine does not, OTOH
mine is a bijection and mono is not. Both implementations decode all
byte sequences of course.)

-- 
Marcin Kowalczyk
qrczak at knm.org.pl
http://qrnik.knm.org.pl/~qrczak/


More information about the Python-3000 mailing list