[Python-Dev] Python3 "complexity"

Nick Coghlan ncoghlan at gmail.com
Thu Jan 9 19:08:46 CET 2014


On 9 Jan 2014 22:25, "Kristján Valur Jónsson" <kristjan at ccpgames.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Victor Stinner [mailto:victor.stinner at gmail.com]
> > Sent: 9. janúar 2014 13:51
> > To: Kristján Valur Jónsson
> > Cc: Antoine Pitrou; python-dev at python.org
> > Subject: Re: [Python-Dev] Python3 "complexity"
> >
> > 2014/1/9 Kristján Valur Jónsson <kristjan at ccpgames.com>:
> > > This definition is funny, because according to Wikipedia, it is a
> > > "superset" of 8869-1 ( latin1)
> >
> > Bytes 0x80..0x9f are unassigned in ISO/CEI 8859-1... but are assigned in
> > (IANA's) ISO-8859-1.
> >
> > Python implements the latter, ISO-8859-1.
> >
> > Wikipedia says "This encoding is a superset of ISO 8859-1, but differs
from
> > the IANA's ISO-8859-1".
> >
>
> Thanks.  That's entirely non-confusing :)
> " ISO-8859-1 is the IANA preferred name for this standard when
supplemented with the C0 and C1 control codes from ISO/IEC 6429."
>
> So anyway, yes, Python's "latin1" encoding does cover the entire 256
range.  But on windows we use cp1252 instead which does not,
> but instead defines useful and common windows characters in many of the
control caracters slots.
> Hence the need for "surrogateescape" to be able to roundtrip characters.
>
> Again, this is non-obvious, and knowing from my experience with cp1252, I
had no way of guessing that the "subset", i.e. latin1, would indeed cover
all the range.  Two things then I have learned since my initial foray into
parsing ascii files with python3:  Surrogateescapes and "latin1 in python
== IANA's ISO-8859-1 which does indeed define the whole 8 bit range".

http://python-notes.curiousefficiency.org/en/latest/python3/text_file_processing.htmlis
currently linked from the Unicode HOWTO. However, I'd be happy to
offer
it for direct inclusion to help make it more discoverable.

Cheers,
Nick.

>
> K
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140110/477ec2a7/attachment.html>


More information about the Python-Dev mailing list