Re: [Python-Dev] Decoding incomplete unicode

Aug. 19, 2004


      Walter Dörwald wrote:
...
M.-A. Lemburg wrote:
...
Walter Dörwald wrote:
...
Let's compare example uses:
1) Having feed() as part of the StreamReader API:
---
s = u"???".encode("utf-8")
r = codecs.getreader("utf-8")()
for c in s:
   print r.feed(c)
---
I consider adding a .feed() method to the stream codec
bad design. .feed() is something you do on a stream, not
a codec.
I don't care about the name, we can call it
stateful_decode_byte_chunk() or whatever. (In fact I'd
prefer to call it decode(), but that name is already
taken by another method. Of course we could always
rename decode() to _internal_decode() like Martin
suggested.)
It's not that name that doesn't fit, it's the fact
that you are mixing a stream action into a codec which
I'd rather see well separated.
...
...
...
2) Explicitely using a queue object:
---
from whatever import StreamQueue
s = u"???".encode("utf-8")
q = StreamQueue()
r = codecs.getreader("utf-8")(q)
for c in s:
   q.write(c)
   print r.read()
---
This is probably how an advanced codec writer would use the APIs
to build new stream interfaces.
...
...
...
3) Using a special wrapper that implicitely creates a queue:
----
from whatever import StreamQueueWrapper
s = u"???".encode("utf-8")
r = StreamQueueWrapper(codecs.getreader("utf-8"))
for c in s:
   print r.feed(c)
----
This could be turned into something more straight forward,
e.g.
from codecs import EncodedStream
# Load data
s = u"???".encode("utf-8")
# Write to encoded stream (one byte at a time) and print
# the read output
q = EncodedStream(input_encoding="utf-8", output_encoding="unicode")
This is confusing, because there is no encoding named "unicode".
This should probably read:
q = EncodedQueue(encoding="utf-8", errors="strict")
Fine.

I was thinking of something similar to EncodedFile()
which also has two separate encodings, one for the file side
of things and one for the Python side.
...
...
for c in s:
   q.write(c)
   print q.read()
# Make sure we have processed all data:
if q.has_pending_data():
   raise ValueError, 'data truncated'
This should be the job of the error callback, the last part should
probably be:
for c in s:
   q.write(c)
   print q.read()
print q.read(final=True)
Ok; both methods have their use cases. (You seem to be obsessed
with this final argument ;-)
...
...
...
I very much prefer option 1).
I prefer the above example because it's easy to read and
makes things explicit.
...
"If the implementation is hard to explain, it's a bad idea."
The user usually doesn't care about the implementation, only it's
interfaces.
Bye,
   Walter Dörwald
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/mal%40egenix.com
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 19 2004)
...
...
...
Python/Zope Consulting and Support ...        http://www.egenix.com/
mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::