ConfigParser and Unicode
Riccardo Galli
riccardo_cut-me at cut.me.sideralis.net
Thu Mar 18 15:29:57 EST 2004
On Thu, 18 Mar 2004 19:10:08 +0000, thehaas wrote:
> "Martin v. Löwis" <martin at v.loewis.de> wrote:
>> thehaas at binary.net wrote:
>> > Obviously, 'Grüß'!='Gr\xfc\xdf' .
>
>> It is not at all obvious that they are different. In fact, they
>> are the same, assuming the second string is encoding in Latin-1.
>
>> > Any ideas on how I can get the correct value?
>
>> Pray tell: what is the correct value?
>
> The correct value is 'Grüß', or at least have it equal to that.
>
> Maybe I should back up -- I'm interfacing into a Windows API. In that API, I see 'Grüß' as:
> >>> plist[-1].Reference
> u'Gr\xfc\xdf'
>
> My value in goodProcList is:
> >>> goodProcRef[18]
> 'Gr\xfc\xdf'
>
> (yeah, goodProcList isn't in Unicode -- that's probably the cause of all this)
>
> When I test their equality:
>
>>>> goodProcRef[18] == plist[-1].Reference
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 2: ordinal
> not in range(128)
>
> If I try to manually encode goodProcRef[18], I get the same thing:
>
> >>> goodProcRef[18].encode('utf-8')
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 2: ordinal not in range(128)
by experience, you must first decode your string to encode it
so
>>> goodProcRef='Gr\xfc\xdf'.decode('latin-1')
>>> goodProcRef
u'Gr\xfc\xdf'
now you could compare goodProcRef and plist[-1].Reference and get "True"
When strings are unicode strings, then you can encode them easily
>>> goodProcRef.encode('UTF8')
'Gr\xc3\xbc\xc3\x9f'
>>> plist[-1].Reference.encode('UTF8')
'Gr\xc3\xbc\xc3\x9f'
Hope it can help,
Riccardo
--
-=Riccardo Galli=-
_,e.
s~ ``
~@. ideralis Programs
. ol
`**~ http://www.sideralis.net
More information about the Python-list
mailing list