[Python-Dev] Re: Relaxing Unicode error handling

David Eppstein eppstein at ics.uci.edu
Sun Jan 4 13:18:22 EST 2004


In article <3FF6FFE3.60608 at v.loewis.de>,
 "Martin v. Loewis" <martin at v.loewis.de> wrote:

> > Or, am I missing the point entirely, and there's some other circumstance 
> > where one gets UnicodeErrors besides .decode()?  If the use case is 
> > mixing strings and unicode objects (i.e. adding, joining, searching, 
> > etc.), then I'd have to say a big fat -1, as opposed to merely a -0 for 
> > having other ways to spell .decode(codec,"ignore"). 
> 
> Yes, it is these use cases: Somebody invokes an SQL method, which 
> happens to return a Unicode string, and then adds a latin-1 byte
> string to it. It works for all ASCII byte strings, but then the
> customer happens to enter accented characters, and the application
> crashes without offering to safe recent changes.
> 
> So I guess that's -1 from you.

I am -1 also on allowing the programmer to let such errors pass silently.

The world of unicode encodings and decodings is painful enough without 
giving people more freedom to create broken files. I have an application 
which has to guess which encoding to use for certain files (the file 
format specifies an encoding but nobody pays attention), and one of the 
sample files I found on the web can't be read correctly because it mixes 
UTF-8 and I think CP1252.  I am very much in favor of anything that 
prevents more such files from being created.

-- 
David Eppstein                      http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science




More information about the Python-Dev mailing list