Python & Unicode decimal interpretation
Scott David Daniels
scott.daniels at acm.org
Sat Dec 3 11:14:24 EST 2005
Martin v. Löwis wrote:
> Scott David Daniels wrote:
>> In reading over the source for CPython's PyUnicode_EncodeDecimal,
>> I see a dance to handle characters which are neither dec-equiv nor
>> in Latin-1. Does anyone know about the intent of such a conversion?
>
> To support this:
>
> >>> int(u"\N{DEVANAGARI DIGIT SEVEN}")
> 7
OK, That much I have handled. I am fiddling with direct-to-number
conversions and wondering about cases like
>>> int(u"\N{DEVANAGARI DIGIT SEVEN}" + XXX
+ u"\N{DEVANAGARI DIGIT SEVEN}")
Where XXX does not pass the digit test, but must either:
(A) be dropped, giving a result of 77
or (B) get translated (e.g. to u'234') giving 72347
or (C) get translated (to u'2' + YYY + u'4') where YYY will
require further handling ...
I don't really understand how the "ignore" or "something_else"
cases get caused by python source [where they come from]. Are they
only there for C-program access?
> In the "ignore" case, no output is produced at all, for the unencodable
> character; this is the same way that '?' would be treated (it is
> also unencodable).
If I understand you correctly -- I can consider the digit stream to stop
as soon as I hit a non-digit (except for handling bases 11-36).
> In the something_else case, a user-defined exception handler could
> treat the error in any way it liked, e.g. encoding all letters
> u'A' to digit '0'. This might be different from the way this error
> handler would treat '?'.
--Scott David Daniels
scott.daniels at acm.org
More information about the Python-list
mailing list