[python-ldap] LDIFParser seems to only parse 200 out of 9K odd records

Ritesh Nadhani riteshn at gmail.com
Thu Sep 7 15:06:57 EDT 2017


I was able to do some text manipulation to add some sample data.

This reproduces that we are only parsing 1 record instead of 2.

On Thu, Sep 7, 2017 at 11:53 AM, Ritesh Nadhani <riteshn at gmail.com> wrote:
> More debugging it seems like its this line:
>
>   36030 # refldap://DomainDnsZones.arubanetworks.com/DC=DomainDnsZones,DC=arubanetworks
>   36031  ,DC=com
>   36032
>   36033 # refldap://ForestDnsZones.arubanetworks.com/DC=ForestDnsZones,DC=arubanetworks
>   36034  ,DC=com
>   36035
>   36036 # refldap://arubanetworks.com/CN=Configuration,DC=arubanetworks,DC=com
>
> ...
>
> any record after that first occurance, is not parsed.
>
> On Thu, Sep 7, 2017 at 11:10 AM, Ritesh Nadhani <riteshn at gmail.com> wrote:
>> Hello
>>
>> I have code like (taken from official docs):
>>
>> [riteshn at niara4 ldap]$ more myldif.py
>> import sys
>> from ldif import LDIFParser,LDIFWriter
>>
>> class MyLDIF(LDIFParser):
>>    def __init__(self,input,output):
>>       LDIFParser.__init__(self,input)
>>       self.writer = LDIFWriter(output)
>>       self.count = 0
>>
>>    def handle(self,dn,entry):
>>       # self.writer.unparse(dn,entry)
>>       self.count = self.count + 1
>>
>>
>> parser = MyLDIF(open(sys.argv[1], 'rb'), sys.stdout)
>> parser.parse()
>> sys.stdout.write("Parsed: {} records".format(parser.count))
>>
>> ..
>>
>> I have a file generated by ldapsearch using -LLL format and pagedresult:
>>
>> ldasearch .... -E pr=200/noprompt  ...
>>
>> When I parse the above file, it seems it only parses 200 records and
>> stops. Is there something about the paged result comment that breaks
>> the parsing?
>>
>> Since the file contains confidential PII, I cannot attach the file but
>> here are some statistics:
>>
>> [riteshn at niara4 ldap]$ python myldif.py log_win_ad_user.7
>> Parsed: 200 records[riteshn at niara4 ldap]$ grep "dn: " log_win_ad_user.7 | wc -l
>> 9043
>>
>> ...
>>
>> To me it seems this line is the culprit:
>>
>> # pagedresults:
>> cookie=AQAAADQCAAD/////hiZDEOF1HmHDzgiafe+UPajr0z0XvrN8Nrs4JYyW
>>  TBX86O5bP1QQQ65fZxTI5IAhAAAAAAEAAAAAAAAAbjoAAAUAAAAFAAAAAgAAAAAAAAAAAAAABQAAAE
>>  MACgCYBwAAlwcAAJgHAAAAAAAALA3SQhI2IEmzx0Fkiqk54QAAAAACAAAAAQAAAAAAAAAAAAAA////
>>  /8kAAADIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEAAAD/////oZEJStHOUUMCUHAQiI
>>  cFT2u5R8OYP4qRkojEgkWk8+IAAAAAf7AAAAB/gAA6bgAAAAAAAAAAAAD//////////wAAAAAuAQkA
>>  DgAAAAUAAAAAAQAASU5ERVhfMDAwOTAxMkV/sAAAAH+wAAAA//////////////////////////////
>>  //////////////////////////////////////////////////////////////////////////////
>>  //////////////////////////////////////////////////////////////////////////////
>>  //////////////////////////////////////////////////////////////////////////////
>>  /////////////////////////////////////////////////////////////////////wAA
>>
>> ...
>>
>> Any ideas?
>> --
>> Ritesh
>
>
>
> --
> Ritesh



-- 
Ritesh
-------------- next part --------------
A non-text attachment was scrubbed...
Name: newaccounts.ldif
Type: application/octet-stream
Size: 4066 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-ldap/attachments/20170907/78ba2876/attachment.obj>


More information about the python-ldap mailing list