[Python-Dev] Decoding incomplete unicode
M.-A. Lemburg
mal at egenix.com
Thu Aug 19 22:09:09 CEST 2004
Walter Dörwald wrote:
> M.-A. Lemburg wrote:
>
>> Walter Dörwald wrote:
>>
>>> Let's compare example uses:
>>>
>>> 1) Having feed() as part of the StreamReader API:
>>> ---
>>> s = u"???".encode("utf-8")
>>> r = codecs.getreader("utf-8")()
>>> for c in s:
>>> print r.feed(c)
>>> ---
>>
>>
>> I consider adding a .feed() method to the stream codec
>> bad design. .feed() is something you do on a stream, not
>> a codec.
>
>
> I don't care about the name, we can call it
> stateful_decode_byte_chunk() or whatever. (In fact I'd
> prefer to call it decode(), but that name is already
> taken by another method. Of course we could always
> rename decode() to _internal_decode() like Martin
> suggested.)
It's not that name that doesn't fit, it's the fact
that you are mixing a stream action into a codec which
I'd rather see well separated.
>>> 2) Explicitely using a queue object:
>>> ---
>>> from whatever import StreamQueue
>>>
>>> s = u"???".encode("utf-8")
>>> q = StreamQueue()
>>> r = codecs.getreader("utf-8")(q)
>>> for c in s:
>>> q.write(c)
>>> print r.read()
>>> ---
>>
>>
>> This is probably how an advanced codec writer would use the APIs
>> to build new stream interfaces.
>
> >
>
>>> 3) Using a special wrapper that implicitely creates a queue:
>>> ----
>>> from whatever import StreamQueueWrapper
>>> s = u"???".encode("utf-8")
>>> r = StreamQueueWrapper(codecs.getreader("utf-8"))
>>> for c in s:
>>> print r.feed(c)
>>> ----
>>
>>
>>
>> This could be turned into something more straight forward,
>> e.g.
>>
>> from codecs import EncodedStream
>>
>> # Load data
>> s = u"???".encode("utf-8")
>>
>> # Write to encoded stream (one byte at a time) and print
>> # the read output
>> q = EncodedStream(input_encoding="utf-8", output_encoding="unicode")
>
>
> This is confusing, because there is no encoding named "unicode".
> This should probably read:
>
> q = EncodedQueue(encoding="utf-8", errors="strict")
Fine.
I was thinking of something similar to EncodedFile()
which also has two separate encodings, one for the file side
of things and one for the Python side.
>> for c in s:
>> q.write(c)
>> print q.read()
>>
>> # Make sure we have processed all data:
>> if q.has_pending_data():
>> raise ValueError, 'data truncated'
>
>
> This should be the job of the error callback, the last part should
> probably be:
>
> for c in s:
> q.write(c)
> print q.read()
> print q.read(final=True)
Ok; both methods have their use cases. (You seem to be obsessed
with this final argument ;-)
>>> I very much prefer option 1).
>>
>>
>> I prefer the above example because it's easy to read and
>> makes things explicit.
>>
>>> "If the implementation is hard to explain, it's a bad idea."
>>
>>
>> The user usually doesn't care about the implementation, only it's
>> interfaces.
>
>
> Bye,
> Walter Dörwald
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/mal%40egenix.com
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Aug 19 2004)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
More information about the Python-Dev
mailing list