[issue2630] repr() should not escape non-ASCII characters

Tue May 6 10:26:36 CEST 2008

Marc-Andre Lemburg <mal at egenix.com> added the comment:

On 2008-05-06 00:07, Guido van Rossum wrote:
> Guido van Rossum <guido at python.org> added the comment:
> 
> On Fri, Apr 18, 2008 at 1:46 AM, Marc-Andre Lemburg
> <report at bugs.python.org> wrote:
>> On 2008-04-18 05:35, atsuo ishimoto wrote:
>>  > atsuo ishimoto <ishimoto at users.sourceforge.net> added the comment:
>>  >
>>  > Is a codec which encode() returns an Unicode allowed in Python3?
>>
>>  Sure, why not ?
> 
> Actually, it is not. In Py3k, x.encode() always requires x to be a str
> (i.e. unicode) instance and return a bytes instance. y.decode()
> requires y to be a bytes instance and returns a str (i.e. unicode)
> instance.

So you've limited the codec design to just doing Unicode<->bytes
conversions ?

The original codec design was to have the codec decide which
types to take on input and to generate on output, e.g. to
escape characters in Unicode (converting Unicode to Unicode),
work on compressed 8-bit strings (converting 8-bit strings to
8-bit strings), etc.

>>  I think you have to ask another question: Is repr() allowed to
>>  return a string (instead of Unicode) in Py3k ?
> 
> In Py3k, "strings" *are* unicode. The str data type is Unicode.

With "strings" I always refer to 8-bit strings, ie. 8-bit data that
is encoded in some encoding.

> If you're asking about repr() possibly returning a bytes instance,
> definitely not.
> 
>>  If not, then unicode_repr() will have to check the return value of
>>  the codec and convert it back to Unicode as necessary.
> 
> What codec?

The idea is to have a codec which takes the Unicode object and
converts it to its repr()-value.

Now, since you apparently cannot
go the direct way anymore (ie. have the codec encode Unicode to
Unicode), you'd have to first use a codec which converts the Unicode
object to its repr()-value represented as bytes object and then
convert the bytes object back to Unicode in unicode_repr().

With the original design, this extra step wouldn't have been
necessary.

>>  > I started to think codec is not nessesary, but python function is enough.
>>
>>  That's what we currently have with unicode_repr(), but it doesn't
>>  solve the problem.
> 
> I'm lost here.

See my previous replies on this ticket.

> PS. Atsuo's PEP has now been checked in as PEP 3138. Discussion should
> start soon on the python-3000 list.

__________________________________
Tracker <report at bugs.python.org>
<http://bugs.python.org/issue2630>
__________________________________