Partly erratic wrong behaviour, Python 3, lxml

Jussi Piitulainen jpiitula at ling.helsinki.fi
Thu Mar 4 22:47:49 CET 2010


This is the full data file on which my regress/Tribug exhibits the
behaviour that I find incomprehensible, described in the first post in
this thread. The comment in the beginning of the file below was
written before I commented out some records in the data, so the actual
numbers now are not ten expected, thirty sometimes observed, but the
wrong number is always the correct number tripled (5 and 15, I think).

---regress/tridata.py follows---
# Exercise lxml.etree.parse(body).xpath(title)
# which I think should always return a list of
# ten elements but sometimes returns thirty,
# with each of the ten in triplicate. And this
# seems impossible to me. Yet I see it happening.

body = b'''<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/          http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2010-03-02T09:38:47Z</responseDate>
<request verb="ListRecords" from="2004-01-01T00:00:00Z" until="2004-12-31T23:59:59Z" metadataPrefix="oai_dc">http://localhost/pmh/que</request>
<ListRecords>
<record>
<header><!-- x --><!-- -->
   <identifier>jrc32003R0055-pl.xml/2/0</identifier>
   <datestamp>2004-08-15T19:45:00Z</datestamp>
   <setSpec>pl</setSpec>
</header>
<metadata>
   <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc        http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
      <dc:title>Rozporz&#261;dzenie</dc:title>
   </oai_dc:dc>
</metadata>
</record>
<!-- <record>
<header>
   <identifier>jrc32003R0055-pl.xml/2/1</identifier>
   <datestamp>2004-08-15T19:45:00Z</datestamp>
   <setSpec>pl</setSpec>
</header>
<metadata>
   <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc        http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
      <dc:title>Komisji</dc:title>
   </oai_dc:dc>
</metadata>
</record>
<record>
<header>
   <identifier>jrc32003R0055-pl.xml/2/2</identifier>
   <datestamp>2004-08-15T19:45:00Z</datestamp>
   <setSpec>pl</setSpec>
</header>
<metadata>
   <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc        http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
      <dc:title>(WE)</dc:title>
   </oai_dc:dc>
</metadata>
</record>
<record>
<header>
   <identifier>jrc32003R0055-pl.xml/2/3</identifier>
   <datestamp>2004-08-15T19:45:00Z</datestamp>
   <setSpec>pl</setSpec>
</header>
<metadata>
   <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc        http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
      <dc:title>nr</dc:title>
   </oai_dc:dc>
</metadata>
</record>
<record>
<header>
   <identifier>jrc32003R0055-pl.xml/2/4</identifier>
   <datestamp>2004-08-15T19:45:00Z</datestamp>
   <setSpec>pl</setSpec>
</header>
<metadata>
   <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc        http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
      <dc:title>55/2003</dc:title>
   </oai_dc:dc>
</metadata>
</record>
<record>
<header>
   <identifier>jrc32003R0055-pl.xml/3/0</identifier>
   <datestamp>2004-08-15T19:45:00Z</datestamp>
   <setSpec>pl</setSpec>
</header>
<metadata>
   <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc        http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
      <dc:title>z</dc:title>
   </oai_dc:dc>
</metadata>
</record> -->
<record>
<header>
   <identifier>jrc32003R0055-pl.xml/3/1</identifier>
   <datestamp>2004-08-15T19:45:00Z</datestamp>
   <setSpec>pl</setSpec>
</header>
<metadata>
   <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc        http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
      <dc:title>dnia</dc:title>
   </oai_dc:dc>
</metadata>
</record>
<record>
<header>
   <identifier>jrc32003R0055-pl.xml/3/2</identifier>
   <datestamp>2004-08-15T19:45:00Z</datestamp>
   <setSpec>pl</setSpec>
</header>
<metadata>
   <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc        http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
      <dc:title>13</dc:title>
   </oai_dc:dc>
</metadata>
</record>
<record>
<header>
   <identifier>jrc32003R0055-pl.xml/3/3</identifier>
   <datestamp>2004-08-15T19:45:00Z</datestamp>
   <setSpec>pl</setSpec>
</header>
<metadata>
   <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc        http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
      <dc:title>stycznia</dc:title>
   </oai_dc:dc>
</metadata>
</record>
<record>
<header>
   <identifier>jrc32003R0055-pl.xml/3/4</identifier>
   <datestamp>2004-08-15T19:45:00Z</datestamp>
   <setSpec>pl</setSpec>
</header>
<metadata>
   <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc        http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
      <dc:title>2003</dc:title>
   </oai_dc:dc>
</metadata>
</record>
</ListRecords>
</OAI-PMH>
'''

title = '//*[name()="record"]//*[name()="dc:title"]'



More information about the Python-list mailing list