str(bytes) in Python 3.0

Lorenzo Gatti gatti at dsdata.it
Sat Apr 12 12:32:03 EDT 2008


On Apr 12, 5:51 pm, Kay Schluehr <kay.schlu... at gmx.net> wrote:
> On 12 Apr., 16:29, Carl Banks <pavlovevide... at gmail.com> wrote:
>
> > > And making an utf-8 encoding default is not possible without writing a
> > > new function?
>
> > I believe the Zen in effect here is, "In the face of ambiguity, refuse
> > the temptation to guess."  How do you know if the bytes are utf-8
> > encoded?
>
> How many "encodings" would you define for a Rectangle constructor?
>
> Making things infinitely configurable is very nice and shows that the
> programmer has worked hard. Sometimes however it suffices to provide a
> mandatory default and some supplementary conversion methods. This
> still won't exhaust all possible cases but provides a reasonable
> coverage.

There is no sensible default because many incompatible encodings are
in common use; programmers need to take responsibility for tracking ot
guessing string encodings according to their needs, in ways that
depend on application architecture, characteristics of users and data,
and various risk and quality trade-offs.

In languages that, like Java, have a default encoding for convenience,
documents are routinely mangled by sloppy programmers who think that
they live in an ASCII or UTF-8 fairy land and that they don't need
tight control of the encoding of all text that enters and leaves the
system.
Ceasing to support this obsolete attitude with lenient APIs is the
only way forward; being forced to learn that encodings are important
is better than, say, discovering unrecoverable data corruption in a
working system.

Regards,
Lorenzo Gatti





More information about the Python-list mailing list