[Python-Dev] Unicode docs
Tim Peters
tim.one@home.com
Tue, 15 May 2001 03:33:06 -0400
I don't know that the Unicode docs need massive work, but the docs that are
there simply don't answer the technical questions people have: they're too
thin.
Let's keep it simple. Contrast the Library manual's:
unicode(string[, encoding[, errors]])
Decodes string using the codec for encoding. Error handling is
done according to errors. The default behavior is to decode UTF-8
in strict mode, meaning that encoding errors raise ValueError. See
also the codecs module.
with Andrew's description (from http://www.amk.ca/python/2.0/):
unicode(string [, encoding] [, errors])
Creates a Unicode string from an 8-bit string. encoding is a
string naming the encoding to use. The errors parameter specifies
the treatment of characters that are invalid for the current
encoding; passing 'strict' as the value causes an exception
to be raised on any encoding error, while 'ignore' causes errors
to be silently ignored and 'replace' uses U+FFFD, the official
replacement character, in case of any problems.
The latter addresses several *fundamental* questions untouched by the former,
like whar are the datatypes of the arguments and the result, what values does
errors accept, and what do they mean? The first blurb answers some more,
like what's the default encoding, and which exception is raised? Neither is
complete on its own, but the reference manual should have a complete answer
to all such questions. It doesn't have to go on at great length.
A round-trip example would be invaluable.
If Fred wanted to incorporate a brief overview too, a light rework of
Andrew/Moshe's writeup would be an excellent start.