[Python-bugs-list] [ python-Bugs-411267 ] s.encode('latin-1') passes non-latin-1 c
noreply@sourceforge.net
noreply@sourceforge.net
Tue, 27 Mar 2001 00:13:52 -0800
Bugs item #411267, was updated on 2001-03-25 19:31
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=411267&group_id=5470
Category: Unicode
Group: None
>Status: Closed
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
>Assigned to: Nobody/Anonymous (nobody)
Summary: s.encode('latin-1') passes non-latin-1 c
Initial Comment:
>>> u'\x81'.encode('latin-1')
'\201'
this should probably raise an exception.
-- erno@iki.fi
----------------------------------------------------------------------
>Comment By: Martin v. Löwis (loewis)
Date: 2001-03-27 00:13
Message:
Logged In: YES
user_id=21627
Please have a look at RFC 1345, which clearly lists the
characters between 80 and A0 as part of ISO_8859-1:1987. The
C1 control characters are shared among all character sets of
ISO 8859, so they are not specific to Latin-1.
Please also have a look at
http://czyborra.com/charsets/iso8859.html
which confirms that C1 is between US-ASCII and G1. So you
probably need to consult a copy of ISO/IEC 8859-1:1998 to
proof us wrong. Note that the Unicode charts do not
contradict with this theory: they only list the characters
*specific* to Latin-1 as such, the characters shared by
other parts of 8859 (C0, G0, and C1) are not named "Latin-1"
in these charts.
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2001-03-26 08:16
Message:
Logged In: NO
http://www.unicode.org/charts/PDF/U0080.pdf
shows 0080-009f in unicode being something called "c1
controls" and 00a0-00ff being iso-8859-1 aka latin-1.
-- erno
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2001-03-26 07:50
Message:
Logged In: NO
every reference i can find on the web (and my linux
latin1(7) manual page) says they are 160-255...?
i think the reason for 128-159 characters not being used
might be that with the high bit stripped off they would be
ascii control characters and not printable.
-- erno
----------------------------------------------------------------------
Comment By: M.-A. Lemburg (lemburg)
Date: 2001-03-26 01:14
Message:
Logged In: YES
user_id=38388
u'\x81' is a perfectly valid Latin-1 character, in fact, the
first
256 Unicode characters are the Latin-1 characters.
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=411267&group_id=5470