[Tutor] converting string to text
Marc Tompkins
marc.tompkins at gmail.com
Wed Jul 10 19:19:02 CEST 2013
On Wed, Jul 10, 2013 at 3:45 AM, Dave Angel <davea at davea.name> wrote:
>
> Get rid of the BOM from the data file, and it'll work fine. You don't
> specify what version of Python you're using, so I have to guess. But
> there's a utf-8 BOM conversion of a BOM at the beginning of that file, and
> that's not numeric. Best would be to change the way you generate that
> file, and don't put in a BOM for utf-8.
>
> BOM's are markers that are put at the beginning of certain encodings of
> files to distinguish between BE and LE encodings. But since your file is
> utf-8, a BOM is unnecessary and confusing.
Just jumping in to translate a bit of jargon...
BOM stands for Byte Order Mark. (
http://www.opentag.com/xfaq_enc.htm#enc_bom)<http://www.opentag.com/xfaq_enc.htm#enc_bom>
BE stands for "big-endian", and LE stands for "little-endian".
Since the first digital computers were built, there have been two schools
of thought as to how numbers should be stored: with the "most significant"
digits first, or the "least significant" digits first. The two schools are
called "big-endian" and "little-endian", after a famous controversy in
"Gulliver's Travels". The BOM is a sequence of bytes at the beginning of a
Unicode string that tells the reader whether the rest of the string will be
big-endian or little-endian. UTF-8 was designed to be endian-agnostic, so
a BOM is not actually needed.
> It may even be illegal, but I'm not sure about that.
>
No, it's not illegal; when utf-8 was first introduced it was actually
required. It's no longer required - so now even utf-8 comes in two flavors
(with and without BOM)!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20130710/9879c351/attachment.html>
More information about the Tutor
mailing list