[Tutor] memory error

Danny Yoo dyoo at hashcollision.org
Thu Jul 2 19:15:44 CEST 2015


>
> So I got my code working now and it looks like this
>
> TAG = '{http://www.mediawiki.org/xml/export-0.10/}page'
> doc = etree.iterparse(wiki)
>
> for _, node in doc:
>     if node.tag == TAG:
>         title = node.find("{http://www.mediawiki.org/xml/export-0.10/}title").text
>         if title in page_titles:
>             print (etree.tostring(node))
>         node.clear()
> Its mostly giving me what I want.  However it is adding extra formatting (I believe name_spaces and attributes).  I was wondering if there was a way to strip these out when I'm printing the node tostring?


I suspect that you'll want to do an explicit walk over the node.
Rather than use etree.tostring(), which indiscriminately walks the
entire tree, you'll probably want to write a function to walk over
selected portions of the tree structure.   You can see:

    https://docs.python.org/2/library/xml.etree.elementtree.html#tutorial

for an introduction to navigating portions of the tree, given a node.


As a more general response: you have significantly more information
about the problem than we do.  At the moment, we don't have enough
context to effectively help; we need more information.  Do you have a
sample *input* file that folks here can use to execute on your
program?  Providing sample input is important if you want
reproducibility.  Reproducibility is important because then we'll be
on the same footing in terms of knowing what the problem's inputs are.
See: http://sscce.org/

As for the form of the desired output: can you say more precisely what
parts of the document you want?  Rather than just say: "this doesn't
look the way I want it to", it may be more helpful to say: "here's
*exactly* what I'd like it to look like..." and show us the desired
text output.

That is: by expressing what you want as a collection of concrete
input/output examples, you gain the added benefit that once you have
revised your program, you can re-run it and see if what it's producing
is what you anticipate.  That is, you can use these concrete examples
as a "regression test suite".  This technique is something that
software engineers use regularly in their day-to-day work, to make
believe that their incomplete programs are working, to write out
explicitly what those programs should do, and then to work on their
programs until they do what they want them to.


Good luck to you!


More information about the Tutor mailing list