<p>Your approach (doing the right thing for both Python and C, new API to avoid the C performance problem) sounds good to me. </p>
<p>--<br>
Nick Coghlan (via Gmail on Android, so likely to be more terse than usual)</p>
<div class="gmail_quote">On Nov 4, 2011 7:58 AM, Martin v. Löwis <<a href="mailto:martin@v.loewis.de">martin@v.loewis.de</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
> I started such hack for the UTF-8 codec... It is really tricky, we should not<br>
> do that!<br>
<br>
With the proper encapsulation, it's not that tricky. I have written<br>
functions PyUnicode_IndexToWCharIndex and PyUnicode_WCharIndexToIndex,<br>
and PyUnicodeEncodeError_GetStart and friends would use that function.<br>
I'd also need new functions PyUnicodeEncodeError_GetStartIndex to access<br>
the "true" start field.<br>
<br>
>> That would be expensive to compute<br>
><br>
> Yeah, O(n) should be avoided when is it possible.<br>
<br>
Ok. I'll wait half a day or so for people to reconsider (now knowing<br>
that it's actually feasible to be fully backwards compatible); if nobody<br>
speaks up, I go ahead and accept the breakage.<br>
<br>
Regards,<br>
Martin<br>
_______________________________________________<br>
Python-Dev mailing list<br>
<a href="mailto:Python-Dev@python.org">Python-Dev@python.org</a><br>
<a href="http://mail.python.org/mailman/listinfo/python-dev" target="_blank">http://mail.python.org/mailman/listinfo/python-dev</a><br>
Unsubscribe: <a href="http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com" target="_blank">http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com</a><br>
</blockquote></div>