[Tutor] Continue Matching after First Match

Sun May 20 21:04:31 CEST 2007

Hi Tom,

Tom Tucker wrote:
 > Why the cStringIO stuff?  The input data shown below is collected from
> os.popen.  I was trying to find an easy way of matching my regex. 

Ah, ldap...

> Matching with a string seemed easier than looping through the ouput
> collected.  Hmm.  Come to think of it, I guess I could match on the
> first "^dn" catpure that output and then keep looping until "^cn:" is
> seen. Then repeat. 

Honestly, I'm not very good with regular expressions -- and try to avoid
them when possible. But in cases where they seem to be the best option,
I have formed a heavy dependence on regex debuggers like kodos.
http://kodos.sourceforge.net/

> Anyways, any suggestions to fix the below code?
<snip>

Have you had a look at the python-ldap package?

http://python-ldap.sourceforge.net/

You could probably access ldap directly with python, if that's an
option. Or, you could roll your own ldif parser (but make sure your data
contains a newline between each dn, or the parser will choke with a
'ValueError: Two lines starting with dn: in one record.'):

import ldif
from cStringIO import StringIO

class MyLDIF(ldif.LDIFParser):
    def __init__(self, inputfile):
        ldif.LDIFParser.__init__(self, inputfile)
        self.users = []

    def handle(self, dn, entry):
        self.users.append((entry['uid'], entry['cn']))

raw = """\
<snip your ldif example with newlines added between dns>
"""

if __name__ == '__main__':
    io = StringIO(raw)
    lp = MyLDIF(io)
    lp.parse()
    for user in lp.users:
        uid = user[0][0]
        cn = user[1][0]
        print uid
        print cn

... or ...

You could also use ldif.LDIFRecordList directly without creating a
custom parser class which would return a list of (dn, entry) tuples. The
module author warns that 'It can be a memory hog!', and I can imagine
this is true if you are working with a particularly large ldap directory.

io = StringIO(raw)
directory = ldif.LDIFRecordList(io)
directory.parse()
for dn, entry in directory.all_records:
    print entry['uid'][0]
    print entry['cn'][0]