[Python-3000] encoding hell

tomer filiba tomerfiliba at gmail.com
Sat Sep 2 17:53:59 CEST 2006


i'm quite finished with the base of iostack (streams and layers), and
have moved to implementing the adpaters layer (especially the dreaded
TextAdapter).

as was discussed earlier, streams and layers work with bytes, while
adpaters may work with arbitrary objects (be it struct-style records,
serialized objects, characters and whatnot).

the question that arises is -- how far should we stretch this abstraction?
for example, the TextAdapter reads and writes characters to the
stream, after they go encoding or decoding, so from the programmer's
point of view, he's working with *characters*, not *bytes*.
that means the programmer need not be aware of how the characters
are "physically" stored in the underlying stream.

that's all very nice, but what do we do when it comes to seek()ing?
do you want to seek by character position or by byte position?
logically you are working with characters, but it would be impossible
to implement without first decoding the entire stream in-memory...
which is unacceptable of course.

and if seek()ing is byte-oriented, then you must somehow seek
only to the beginning of a multibyte character sequence... how
would you do that?

my solution would be completely leaving seek() and tell() out of the
3rd layer -- it's a byte-level operation.

anyone thinks differently? if so, what's your solution?

- - - -

you can find the latest sources here (note: i haven't tested it yet,
many things are likely to be broken, it's still being redesigned):
http://sebulbasvn.googlecode.com/svn/trunk/iostack/
http://sebulbasvn.googlecode.com/svn/trunk/sock2/


-tomer


More information about the Python-3000 mailing list