[Python-3000] [Python-Dev] Betas today - I hope
M.-A. Lemburg
mal at egenix.com
Fri Jun 13 12:14:49 CEST 2008
On 2008-06-13 11:32, Walter Dörwald wrote:
> M.-A. Lemburg wrote:
>> On 2008-06-12 16:59, Walter Dörwald wrote:
>>> M.-A. Lemburg wrote:
>>>> .transform() and .untransform() use the codecs to apply same-type
>>>> conversions. They do apply type checks to make sure that the
>>>> codec does indeed return the same type.
>>>>
>>>> E.g. text.transform('xml-escape') or data.transform('base64').
>>>
>>> So what would a base64 codec do with the errors argument?
>>
>> It could use it to e.g. try to recover as much data as possible
>> from broken input data.
>>
>> Currently (in Py2.x), it raises an exception if you pass in anything
>> but "strict".
>>
>>>>> I think for transformations we don't need the full codec machinery:
>>>> > ...
>>>>
>>>> No need to invent another wheel :-) The codecs already exist for
>>>> Py2.x and can be used by the .encode()/.decode() methods in Py2.x
>>>> (where no type checks occur).
>>>
>>> By using a new API we could get rid of old warts. For example: Why
>>> does the stateless encoder/decoder return how many input
>>> characters/bytes it has consumed? It must consume *all* bytes anyway!
>>
>> No, it doesn't and that's the point in having those return values :-)
>>
>> Even though the encoder/decoders are stateless, that doesn't mean
>> they have to consume all input data. The caller is responsible to
>> make sure that all input data was in fact consumed.
>>
>> You could for example have a decoder that stops decoding after
>> having seen a block end indicator, e.g. a base64 line end or
>> XML closing element.
>
> So how should the UTF-8 decoder know that it has to stop at a closing
> XML element?
The UTF-8 decoder doesn't support this, but you could write a codec
that applies this kind of detection, e.g. to not try to decode
partial UTF-8 byte sequences at the end of input, which would then
result in error.
>> Just because all codecs that ship with Python always try to decode
>> the complete input doesn't mean that the feature isn't being used.
>
> I know of no other code that does. Do you have an example for this use.
I already gave you a few examples.
>> The interface was designed to allow for the above situations.
>
> Then could we at least have a new codec method that does:
>
> def statelesencode(self, input):
> (output, consumed) = self.encode(input)
> assert len(input) == consumed
> return output
You mean as method to the Codec class ?
Sure, we could do that, but please use a different name,
e.g. .encodeall() and .decodeall() - .encode() and .decode()
are already stateles (and so would the new methods be), so
"stateless" isn't all that meaningful in this context.
We could also add such a check to the PyCodec_Encode() and _Decode()
functions. They currently do not apply the above check.
In Python, those two functions are exposed as codecs.encode()
and codecs.decode().
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Jun 13 2008)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
2008-07-07: EuroPython 2008, Vilnius, Lithuania 23 days to go
:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
More information about the Python-3000
mailing list