non-ascii handling
Jens Vagelpohl
jens at zope.com
Thu Apr 11 14:39:30 CEST 2002
michael,
thanks for the answer, that helped a bit.
in handling these various kinds of strings (both UTF-8 encoded unicode and
latin-1 encoded unicode for web browser consumption) i always end up
running into trouble at some point because in some situations strings get
encoded more than once. does anyone know of a quick and fast test to
determine whether a string is already encoded in a certain encoding? my
knowledge of regular expressions (which i assume it would take for that) is
extremely limited at best.
jens
On Wednesday, April 10, 2002, at 01:17 , Michael Ströder wrote:
> Jens,
>
> Sorry for answering that late.
>
> Jens Vagelpohl wrote:
>> i have a product that uses python-ldap and i'm trying to make sure
>> everything works when non-ascii characters are used in a DN. from what i
>> have been reading about OpenLDAP it either wants pure ASCII passed to it
>> (for search terms, DNs etc) or UTF-8-encoded unicode strings.
>
> Depends on the attribute. BTW: ASCII is a real subset of UTF-8. Or better
> said: The character entities encoded in ASCII are mapped to the very same
> encoding in UTF-8.
>
>> my question is: does python-ldap do any automatic string conversions?
>
> No! And I refused a patch which does. It cannot be done without applying
> knowledge about the schema (syntax of an attribute). Review the archives.
>
>> i get search results just fine using a non-ascii search term when i do
>> not convert the term myself and hand it to ldap.search_s, but i never get
>> results if i convert the string by myself and then hand it to the
>> search_s method.
>
> If you have a Unicode object with a LDAP search filter than you have to
> encode that before calling method search_s().
>
> Example (valid on my Linux console with ISO-8859-1):
>
> filter = unicode('cn=*Ströder*','iso-8859-1')
> l.search_s(search_root,ldap.SCOPE_SUB,filter.encode('utf-8'))
>
> Note that filter is a Unicode object created by passing a string and the
> known character set to the unicode() function.
>
> Ciao, Michael.
>
More information about the python-ldap
mailing list