[Python-ideas] Python 3000 TIOBE -3%
Carl M. Johnson
cmjohnson.mailinglist at gmail.com
Wed Feb 15 06:03:10 CET 2012
If I can I would like to offer one argument for surrogateescape over latin-1 as the newbie approach. Suppose I am naively processing text files to create a webpage and one of my filters is a "smart quotes" filter to change "" to “”. Of course, there's no way to smarten quotes up if you don't know the encoding of your input or output files; you'll just make a mess. In this situation, Latin-1 lets you mojibake it up. If your input turns out not to have been Latin-1, the final result will be corrupted by the quote smartener. On the other hand, if you use encoding="ascii", errors="surrogateescape" Python will complain, because the smart quotes being added aren't ascii. In other words, the surrogate escape force naive users to stick to ASCII unless they can determine what encoding they want to use for their input/output. It's not perfect, but I think it strikes a better balance than letting the users shoot themselves in the foot.
More information about the Python-ideas
mailing list