[Python-Dev] Adding .decode() method to Unicode

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Tue, 12 Jun 2001 11:42:52 +0200


> str.encode()
> str.decode()
> uni.encode()
> #uni.decode() # still missing

It's not missing. str.decode and uni.encode go through a single codec;
that's easy. str.encode is somewhat more confusing, because it really
is unicode(str).encode. Now, you are not proposing that uni.decode is
str(uni).decode, are you?

If not that, what else would it mean? And if it means something else,
it is clearly not symmetric to str.encode, so it is not "missing".

> One very useful application for this method is XML unescaping
> which turns numeric XML entities into Unicode chars. 

Ok. Please show me how that would work. More precisely, please write a
PEP describing the rationale for this feature, including use case
examples and precise semantics of the proposed addition.

> The key argument for these interfaces is that they provide
> an extensible transformation mechanism for string and binary
> data. 

That is too general for me to understand; I need to see detailed
examples that solve real-world problems.

Regards,
Martin

P.S. I don't think that unescaping XML characters entities into
Unicode characters is a useful application in itself. This is normally
done by the XML parser, which not only has to deal with character
entities, but also with general entities and a lot of other markup.
Very few people write XML parsers, and they are using the string
methods and the sre module successfully (if the parser is written in
Python - a C parser would do the unescaping before even passing the
text to Python).