ConfigParser and Unicode

Riccardo Galli riccardo_cut-me at cut.me.sideralis.net
Thu Mar 18 15:29:57 EST 2004


On Thu, 18 Mar 2004 19:10:08 +0000, thehaas wrote:

> "Martin v. Löwis" <martin at v.loewis.de> wrote:
>> thehaas at binary.net wrote:
>> > Obviously, 'Grüß'!='Gr\xfc\xdf' .  
> 
>> It is not at all obvious that they are different. In fact, they
>> are the same, assuming the second string is encoding in Latin-1.
> 
>> > Any ideas on how I can get the correct value?
> 
>> Pray tell: what is the correct value?
> 
> The correct value is 'Grüß', or at least have it equal to that.
> 
> Maybe I should back up -- I'm interfacing into a Windows API.  In that API, I see 'Grüß' as:
>   >>> plist[-1].Reference
>   u'Gr\xfc\xdf'
> 
> My value in goodProcList is:
>   >>> goodProcRef[18]
>   'Gr\xfc\xdf'
> 
> (yeah, goodProcList isn't in Unicode -- that's probably the cause of all this)
> 
> When I test their equality:
> 
>>>> goodProcRef[18] == plist[-1].Reference
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 2: ordinal
> not in range(128)
> 
> If I try to manually encode goodProcRef[18], I get the same thing:
> 
>     >>> goodProcRef[18].encode('utf-8')
>     Traceback (most recent call last):
>        File "<stdin>", line 1, in ?
>     UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 2: ordinal not in range(128)

by experience, you must first decode your string to encode it

so
>>> goodProcRef='Gr\xfc\xdf'.decode('latin-1')
>>> goodProcRef
u'Gr\xfc\xdf'

now you could compare goodProcRef and plist[-1].Reference and get "True"

When strings are unicode strings, then you can encode them easily

>>> goodProcRef.encode('UTF8')
'Gr\xc3\xbc\xc3\x9f'
>>> plist[-1].Reference.encode('UTF8')
'Gr\xc3\xbc\xc3\x9f'

Hope it can help,
Riccardo

-- 
-=Riccardo Galli=-

 _,e.
s~  ``
 ~@.   ideralis Programs
.   ol 
 `**~  http://www.sideralis.net



More information about the Python-list mailing list