[Tutor] name shortening in a csv module output

Dave Angel davea at davea.name
Thu Apr 23 23:40:34 CEST 2015


On 04/23/2015 05:08 PM, Mark Lawrence wrote:
>
> Slight aside, why a BOM, all I ever think of is Inspector Clouseau? :)
>

As I recall, it stands for "Byte Order Mark".  Applicable only to 
multi-byte storage formats  (eg. UTF-16), it lets the reader decide 
which of the formats were used.

For example, a file that reads

    fe ff 41 00 42 00

might be a big-endian version of UTF-16

while
    ff fe 00 41 00 42

might be the little-endian version of the same data.

I probably have it all inside out and backwards, but that's the general 
idea.  If the BOM appears backwards, you switch between BE and LE and 
the data will make sense.


The same concept was used many years ago in two places I know of. 
Binary files representing faxes had "II" or "MM" at the beginning.

But the UCSD-P system program format used a number (I think it was 0001) 
which would decode wrong if you were on the wrong processor type.  The 
idea was that instead of coding an explicit check, you just looked at 
one piece of data, and if it was wrong, you had to swap all the 
byte-pairs.  That way if you read the file on the same machine, no work 
was needed at all.

Seems to me the Java bytecode does something similar, but I don't know. 
  All of these are from memory, and subject to mis-remembering.


-- 
DaveA


More information about the Tutor mailing list