non-ascii handling

Jens Vagelpohl jens at zope.com
Thu Apr 11 14:39:30 CEST 2002


michael,

thanks for the answer, that helped a bit.

in handling these various kinds of strings (both UTF-8 encoded unicode and 
latin-1 encoded unicode for web browser consumption) i always end up 
running into trouble at some point because in some situations strings get 
encoded more than once. does anyone know of a quick and fast test to 
determine whether a string is already encoded in a certain encoding? my 
knowledge of regular expressions (which i assume it would take for that) is 
extremely limited at best.

jens


On Wednesday, April 10, 2002, at 01:17 , Michael Ströder wrote:

> Jens,
>
> Sorry for answering that late.
>
> Jens Vagelpohl wrote:
>> i have a product that uses python-ldap and i'm trying to make sure 
>> everything works when non-ascii characters are used in a DN. from what i 
>> have been reading about OpenLDAP it either wants pure ASCII passed to it 
>> (for search terms, DNs etc) or UTF-8-encoded unicode strings.
>
> Depends on the attribute. BTW: ASCII is a real subset of UTF-8. Or better 
> said: The character entities encoded in ASCII are mapped to the very same 
> encoding in UTF-8.
>
>> my question is: does python-ldap do any automatic string conversions?
>
> No! And I refused a patch which does. It cannot be done without applying 
> knowledge about the schema (syntax of an attribute). Review the archives.
>
>> i get search results just fine using a non-ascii search term when i do 
>> not convert the term myself and hand it to ldap.search_s, but i never get 
>> results if i convert the string by myself and then hand it to the 
>> search_s method.
>
> If you have a Unicode object with a LDAP search filter than you have to 
> encode that before calling method search_s().
>
> Example (valid on my Linux console with ISO-8859-1):
>
> filter = unicode('cn=*Ströder*','iso-8859-1')
> l.search_s(search_root,ldap.SCOPE_SUB,filter.encode('utf-8'))
>
> Note that filter is a Unicode object created by passing a string and the 
> known character set to the unicode() function.
>
> Ciao, Michael.
>





More information about the python-ldap mailing list