(Simple?) Unicode Question

Shashank Singh shashank.sunny.singh at gmail.com
Thu Aug 27 12:39:06 EDT 2009


Hi All!

I have a very simple (and probably stupid) question eluding me.
When exactly is the char-set information needed?

To make my question clear consider reading a file.
While reading a file, all I get is basically an array of bytes.

Now suppose a file has 10 bytes in it (all is data, no metadata,
forget the BOM and stuff for a little while). I read it into an array of 10
bytes, replace, say, 2nd bytes and write all the bytes back to a new
file.

Do i need the character encoding mumbo jumbo anywhere in this?

Further, does anything, except a printing device need to know the
encoding of a piece of "text"? I mean, as long as we are not trying
to get a symbolic representation of a "text" or get "i"th character
of it, all we need to do is to carry the intended encoding as
an auxiliary information to the data stored as byte array.

Right?

--shashank
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20090827/aabb66ae/attachment.html>


More information about the Python-list mailing list