Escaping of binary characters

Thu May 12 00:57:49 CEST 2005

> Mark Hammond wrote:
> >
> > I don't actually have neat sample code - I'm observing this
> > inside Zope.
>
> Is this a publicly available Zope component like LDAPUserFolder?

It is exactly LDAPUserFolder ;)

> > However, what happens is:
> >
> > * We query for the attribute 'objectGUID'.  We get back a
> 16 byte string - a
> > raw binary representation of the 128-bit GUID.  This part
> works fine - we
> > get the binary value from LDAP correctly.
>
> Just curious because I'm always interested to learn anything
> people are
> doing via LDAP:
>
> Do you store the objectGUID to reference the single entry later?

Yes.

> Does this reference has to survive renaming of the entry?

It is the persistent "user ID" - clearly it is desirable if it does survive
renaming (and indeed all operations other than "delete")

MS explicitly suggests using objectGUID over the dn or any other attribute
for such an ID.

> > * Later, we call search_s with a filter string
> '(objectGUID={string})',
> > after calling escape_filter_chars with the exact value as previously
> > fetched.  The filter fails, but succeeds with my implementation of
> > escape_filter_chars.
>
> Is this code specific for Active Directory (seems so to me)? Or does
> your code has to work with any LDAP server with a configurable unique
> and DN-independent attribute similar to objectGUID (e.g.
> entryUUID comes
> to mind for OpenLDAP 2.2+)?

I'm afraid I don't know the answer to that.  Using the ActiveDirectory is
the only time I have found a binary attribute I have needed to use.  I
suspect entryUUID will face a similar issue, as will any other attempt to
store a binary string.

[As a side note, when using the AD interfaces directly (ie, not via LDAP),
you can specify the value for this GUID in a number of different ways.  When
going via LDAP, it appears only the raw binary value works.  I suspect MS
were trying to keep "standard" when talking via LDAP]

> IMHO searching with the exact objectGUID returns exactly one entry
> anyway. Therefore you could also use the entry's DN and retrieve the
> entry with a base level search.

Yes we could, but that sounds like an extreme solution to an escaping issue.

> Well, I still didn't get the point of why you need a octet string
> objectGUID in a search filter.

Basically, we have just configured LDAPUserFolder to use objectGUID as the
user ID.  The way LDAPUserFolder works causes this search to happen.  It
does a search to locate the attributes for the previously fetched user-id.

As I mentioned, it all works perfectly if the string is escaped more
aggressively.

> > it should read:
> >
> > if c < ' ' or c > '~' or c in "\\*()":
> >
> > which includes some extra punctuation. As far as I can tell, that
> > will leave all 'printable' characters alone and should leave things
> > as readable (even if slightly different than) the current
> > implementation
>
> Hmm, if I got you right this still escapes NON-ASCII chars which
> otherwise could be displayed as UTF-8 encoded Unicode chars.
>
> I'm also afraid this significantly slows down this function which is
> probably not a big deal in most applications.

Yes, all good points.  As mentioned, I can arrange to avoid your escaping
function and am happy to do so.  But as this appears the only obstacle to
using octet strings it seems a shame to leave it alone.  Easily avoided for
me, so I'm happy with whatever you decide.

Cheers,

Mark.