[Python-3000] encoding hell

Sat Sep 2 22:23:32 CEST 2006

tomer filiba wrote:
> i'm quite finished with the base of iostack (streams and layers), and
> have moved to implementing the adpaters layer (especially the dreaded
> TextAdapter).
> 
> as was discussed earlier, streams and layers work with bytes, while
> adpaters may work with arbitrary objects (be it struct-style records,
> serialized objects, characters and whatnot).
> 
> the question that arises is -- how far should we stretch this abstraction?
> for example, the TextAdapter reads and writes characters to the
> stream, after they go encoding or decoding, so from the programmer's
> point of view, he's working with *characters*, not *bytes*.
> that means the programmer need not be aware of how the characters
> are "physically" stored in the underlying stream.
> 
> that's all very nice, but what do we do when it comes to seek()ing?
> do you want to seek by character position or by byte position?
> logically you are working with characters, but it would be impossible
> to implement without first decoding the entire stream in-memory...
> which is unacceptable of course.
> 
> and if seek()ing is byte-oriented, then you must somehow seek
> only to the beginning of a multibyte character sequence... how
> would you do that?
> 
> my solution would be completely leaving seek() and tell() out of the
> 3rd layer -- it's a byte-level operation.
> 
> anyone thinks differently? if so, what's your solution?

Well, for comparison with other APIs:

The .Net equivalent, System.IO.TextReader, does not have a "seek" method 
at all.

The Java version, Java.io.BufferedReader, has a "skip()" method which 
only allows seeking forward.

Sounds to me like copying the Java model would work.

-- Talin