"Decoding unicode is not supported" in unusual situation

Prasad, Ramit ramit.prasad at jpmorgan.com
Thu Mar 8 17:58:44 EST 2012


>     Right. The real problem is that Python 2.7 doesn't have distinct
> "str" and "bytes" types.  type(bytes() returns <type 'str'>
> "str" is assumed to be ASCII 0..127, but that's not enforced.
> "bytes" and "str" should have been distinct types, but
> that would have broken much old code.  If they were distinct, then
> constructors could distinguish between string type conversion
> (which requires no encoding information) and byte stream decoding.
> 
>     So it's possible to get junk characters in a "str", and they
> won't convert to Unicode.  I've had this happen with databases which
> were supposed to be ASCII, but occasionally a non-ASCII character
> would slip through.

bytes and str are just aliases for each other. 

>>> id( bytes )
505366496
>>> id( str )
505366496
>>> type( bytes )
<type 'type'>
>>> type( str )
<type 'type'>
>>> bytes == str 
True
>>> bytes is str
True


And I do not think they were ever intended to be just 
ASCII because chr() takes 0 - 256 (non-inclusive) and 
returns a str.


Ramit


Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology
712 Main Street | Houston, TX 77002
work phone: 713 - 216 - 5423

--


> -----Original Message-----
> From: python-list-bounces+ramit.prasad=jpmorgan.com at python.org
> [mailto:python-list-bounces+ramit.prasad=jpmorgan.com at python.org] On Behalf
> Of John Nagle
> Sent: Thursday, March 08, 2012 4:24 PM
> To: python-list at python.org
> Subject: Re: "Decoding unicode is not supported" in unusual situation
> 
> On 3/7/2012 6:18 PM, Ben Finney wrote:
> > Steven D'Aprano<steve+comp.lang.python at pearwood.info>  writes:
> >
> >> On Thu, 08 Mar 2012 08:48:58 +1100, Ben Finney wrote:
> >>> I think that's a Python bug. If the latter succeeds as a no-op, the
> >>> former should also succeed as a no-op. Neither should ever get any
> >>> errors when ‘s’ is a ‘unicode’ object already.
> >>
> >> No. The semantics of the unicode function (technically: a type
> >> constructor) are well-defined, and there are two distinct behaviours:
> 
> 
>     This is all different in Python 3.x, where "str" is Unicode and
> "bytes" really are a distinct type.
> 
> 				John Nagle
> --
> http://mail.python.org/mailman/listinfo/python-list

This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  


More information about the Python-list mailing list