[Patches] [ python-Patches-1443155 ] Incremental codecs for CJKCodecs

SourceForge.net noreply at sourceforge.net
Sat Mar 25 20:56:43 CET 2006


Patches item #1443155, was opened at 2006-03-04 19:45
Message generated for change (Comment added) made by doerwalter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1443155&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Modules
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Submitted By: Hye-Shik Chang (perky)
Assigned to: Hye-Shik Chang (perky)
Summary: Incremental codecs for CJKCodecs

Initial Comment:
Here's a supplemental patch for SF #1436130 to
implement CJKCodecs part of the Incremental codec
specification. This patch is implemented in an
interface of Walter's fourth patch on #1436130. Please
test this whether it agrees the design.

----------------------------------------------------------------------

>Comment By: Walter Dörwald (doerwalter)
Date: 2006-03-25 20:56

Message:
Logged In: YES 
user_id=89016

OK, a few notes on the patch:

In the test "c = codecs.lookup('utf-8')[3](s)" should be
written as "c = codecs.getwriter('utf-8')(s)". Someday in
the future CodecInfo may no longer be a tuple.

Instead of htmlentitydefs.entitydefs, the test could use
htmlentitydefs.name2codepoint.

I'd like to see a few tests for error callbacks that return
the wrong objects and for callbacks that return an offset
that is not exc.end.

The ERROR_ISCUSTOM() macro looks wrong to me: smaller than 1
*and* greater that 3?

In mbiencoder_encode()

r = encoder_encode_stateful(STATEFUL_ECTX(self), data, final);
if (r == NULL)
    return NULL;
return r;

can be simplified to:

return encoder_encode_stateful(STATEFUL_ECTX(self), data,
final);

Apart from that the patch looks good to me, so go ahead and
check it in.


----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2006-03-18 21:47

Message:
Logged In: YES 
user_id=89016

Maybe the statement is a bit misleading in its current form.
I didn't mean that error handling prevents the flushing of
the buffers, just that error handling prevents raising an
error. I hope the following is clearer: "If final is true
the codec must encode/decode the input completely and must
flush all buffers. If this isn't possible (e.g. because of
incomplete byte sequences still remaining in the buffer on
decoding) it must initiate error handling just like in the
stateless case (which might raise an exception)."

I'll take a look at the patch in the next few days.


----------------------------------------------------------------------

Comment By: Hye-Shik Chang (perky)
Date: 2006-03-18 19:32

Message:
Logged In: YES 
user_id=55188

I updated the patch for the .errors visibility.

I like the statement.  But the current implementation of the
patch is slightly different for a corner case; the current
implementation keeps buffers if an error is occurred.  That
is somewhat natural because error cases doesn't make
side-effects usually.  But in other side, I agree to your
statement in a view of a direct interpretation of
final=True. What do you think about this?

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2006-03-18 16:24

Message:
Logged In: YES 
user_id=89016

What other interpretation of the final parameter can we use
that doesn't make it completely useless? What about the
following: "If final is true the codec must encode/decode
the input completely and must flush all buffers. If this
isn't possible (e.g. because of incomplete byte sequences on
decoding) it must raise an exception (unless prevented by an
error handler)"?

----------------------------------------------------------------------

Comment By: Hye-Shik Chang (perky)
Date: 2006-03-16 13:32

Message:
Logged In: YES 
user_id=55188

1) Because CJKCodecs had an internal stateful framework, I
implemented just an interface using it for IncrementalCodec.
It treats final=True as a simple `flush' message(which
doesn't reset or terminate the codec). The behavior is quite
useful for real-time stream processing such as sockets and
tail log watchers. If we disallow that, such usages may
require its own sequence detectors.

For "to reset or not" issue, I think we can simply follow
how iconv does.  iconv doesn't reset the internal state for
iconv(ic, NULL, NULL, ..).


2) Aah.  I didn't notice that .errors is a part of public
API.  The current CJKCodecs can't support it easily yet. 
I'll fix it and upload a updated patch soon.  Thank you for
your review!

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2006-03-15 13:08

Message:
Logged In: YES 
user_id=89016

The patch doesn't apply cleanly (conflicts in
Lib/test/test_multibytecodec.py and Tools/unicode/Makefile).
Could you update the patch?

I haven't looked at the C code to closely yet.

Two notes: 1) The tests often call incencoder.encode() or
incdecoder.decode() again after the method has been called
with final=True before. I'm not sure that this should be
allowed. If we allow it, it should be documented in what
state the codec is after calling with final=True (probably
it should be back to the initial state (i.e. like calling
reset())). 2) It seems to me that it isn't possible to
change the error handling during the lifetime of a codec.

Anyway, thanks for the quick patch.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1443155&group_id=5470


More information about the Patches mailing list