[Python-Dev] Re: Relaxing Unicode error handling
David Eppstein
eppstein at ics.uci.edu
Sun Jan 4 13:18:22 EST 2004
In article <3FF6FFE3.60608 at v.loewis.de>,
"Martin v. Loewis" <martin at v.loewis.de> wrote:
> > Or, am I missing the point entirely, and there's some other circumstance
> > where one gets UnicodeErrors besides .decode()? If the use case is
> > mixing strings and unicode objects (i.e. adding, joining, searching,
> > etc.), then I'd have to say a big fat -1, as opposed to merely a -0 for
> > having other ways to spell .decode(codec,"ignore").
>
> Yes, it is these use cases: Somebody invokes an SQL method, which
> happens to return a Unicode string, and then adds a latin-1 byte
> string to it. It works for all ASCII byte strings, but then the
> customer happens to enter accented characters, and the application
> crashes without offering to safe recent changes.
>
> So I guess that's -1 from you.
I am -1 also on allowing the programmer to let such errors pass silently.
The world of unicode encodings and decodings is painful enough without
giving people more freedom to create broken files. I have an application
which has to guess which encoding to use for certain files (the file
format specifies an encoding but nobody pays attention), and one of the
sample files I found on the web can't be read correctly because it mixes
UTF-8 and I think CP1252. I am very much in favor of anything that
prevents more such files from being created.
--
David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science
More information about the Python-Dev
mailing list