[Python-bugs-list] [ python-Bugs-411267 ] s.encode('latin-1') passes non-latin-1 c

Tue, 27 Mar 2001 00:13:52 -0800

Bugs item #411267, was updated on 2001-03-25 19:31
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=411267&group_id=5470

Category: Unicode
Group: None
>Status: Closed
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
>Assigned to: Nobody/Anonymous (nobody)
Summary: s.encode('latin-1') passes non-latin-1 c

Initial Comment:
>>> u'\x81'.encode('latin-1')
'\201'

this should probably raise an exception.

  -- erno@iki.fi

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2001-03-27 00:13

Message:
Logged In: YES 
user_id=21627

Please have a look at RFC 1345, which clearly lists the
characters between 80 and A0 as part of ISO_8859-1:1987. The
C1 control characters are shared among all character sets of
ISO 8859, so they are not specific to Latin-1.

Please also have a look at
http://czyborra.com/charsets/iso8859.html
which confirms that C1 is between US-ASCII and G1. So you
probably need to consult a copy of ISO/IEC 8859-1:1998 to
proof us wrong. Note that the Unicode charts do not
contradict with this theory: they only list the characters
*specific* to Latin-1 as such, the characters shared by
other parts of 8859 (C0, G0, and C1) are not named "Latin-1"
in these charts.

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2001-03-26 08:16

Message:
Logged In: NO 

http://www.unicode.org/charts/PDF/U0080.pdf
shows 0080-009f in unicode being something called "c1
controls" and 00a0-00ff being iso-8859-1 aka latin-1.

  -- erno

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2001-03-26 07:50

Message:
Logged In: NO 

every reference i can find on the web (and my linux
latin1(7) manual page) says they are 160-255...?

i think the reason for 128-159 characters not being used
might be that with the high bit stripped off they would be
ascii control characters and not printable.

  -- erno

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2001-03-26 01:14

Message:
Logged In: YES 
user_id=38388

u'\x81' is a perfectly valid Latin-1 character, in fact, the
first
256 Unicode characters are the Latin-1 characters.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=411267&group_id=5470