New subject: [lxml-dev] memory leak in lxml.html.parse()

Dec. 2, 2010

      Hello guys,

I am sorry that I am sending this as a response but there is two issues 
I d like to point out:

1. There is a memory leakage using lxml.html.parse (or etree) while you 
do that constantly in a loop. In particular creating etrees in a loop 
does let the trees there and is not deleting the properly when you reuse 
the same python variable to store the resutls. For now I haven't tryed 
to resolve it because module re (regular expression) is just fine for 
URL extraction, however I would prefer the use of XPath for extracting a 
variate of links more easily in Coding point of view. Plus I think that 
the overhead of Tree Building is not so much (I dont know for sure thought).

2. Speaking of XPath for url extraction, I think that lxml.html has some 
issues in url extraction (this is what I think reading the Code of this 
module). And the question is why not to use the XPath for making the 
code twice smaller and twice neater (I cleaner and well formed - I hope 
my vocabulary is correct), maybe faster too.

Best Regards,

Dimitrios

On 12/02/2010 11:35 AM, lxml-dev-request@codespeak.net wrote:
...
Send lxml-dev mailing list submissions to
  lxml-dev@codespeak.net
To subscribe or unsubscribe via the World Wide Web, visit
  http://codespeak.net/mailman/listinfo/lxml-dev
or, via email, send a message with subject or body 'help' to
  lxml-dev-request@codespeak.net
You can reach the person managing the list at
  lxml-dev-owner@codespeak.net
When replying, please edit your Subject line so it is more specific
than "Re: Contents of lxml-dev digest..."
Today's Topics:
1. Re: Compile failure (felix)
    2. Re: Compile failure (jholg@gmx.de)
    3. Re: SyntaxErrors with Python 3 (Stefan Behnel)
    4. Re: Schema validation - no file position (Stefan Behnel)
    5. Re: Schema validation - no file position (Krzysztof Jakubczyk)
    6. Re: Schema validation - no file position (Stefan Behnel)
    7. Re: read .xlsx spreadsheets with lxml ? (Chris Withers)
    8. How to register extension functions for XSL	transformations
       (Shaung)
    9. Re: How to register extension functions for	XSL
       transformations (jholg@gmx.de)
----------------------------------------------------------------------
Message: 1
Date: Tue, 23 Nov 2010 11:14:19 +0100
From: felix<crucialfelix@gmail.com>
Subject: Re: [lxml-dev] Compile failure
To: Stefan Behnel<stefan_ml@behnel.de>
Cc: lxml-dev@codespeak.net
Message-ID:
  <AANLkTinBaMukcy=pCVHjmmmAq=kiPN=nGE_Gd27_z5sT@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
On Sat, Oct 30, 2010 at 10:49 AM, Stefan Behnel<stefan_ml@behnel.de>  wrote:
...
felix, 26.10.2010 15:28:
According to this:
...
http://codespeak.net/lxml/build.html
we should avoid installing Cython
but using easy_install to build fails saying the cython generated file is
missing
I doubt that it's failing because of that. However, you didn't provide the
output of the build, so I can't guess what happened that actually made the
build fail.
sorry, that output had scrolled off by the time I realized I should submit a
report.  I have another server so fortunately I can fail there and show you.
crucial@crucial-systems:~/working/lxml$ python setup.py build
/home/crucial/working/lxml/versioninfo.py:53: UserWarning: unrecognized
.svn/entries format; skipping /home/crucial/working/lxml/
   warn("unrecognized .svn/entries format; skipping "+base)
Building lxml version 2.3.beta1.
*NOTE: Trying to build without Cython, pre-generated 'src/lxml/lxml.etree.c'
needs to be available.*
Using build configuration of libxslt 1.1.26
Building against libxml2/libxslt in the following directory: /usr/lib
running build
running build_py
running build_ext
building 'lxml.etree' extension
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall
-Wstrict-prototypes -fPIC -I/usr/local/include/libxml2
-I/usr/include/python2.6 -c src/lxml/lxml.etree.c -o
build/temp.linux-x86_64-2.6/src/lxml/lxml.etree.o -w
gcc: src/lxml/lxml.etree.c: No such file or directory
gcc: no input files
error: command 'gcc' failed with exit status 1
...
The latest build instructions for the SVN trunk are in the SVN trunk as
"doc/build.txt", or *(not always completely up-to-date)* here:
exactly
...
*but then I succeeded with the old sudo easy_install lxml*
...
because now I have Cython
Again, I doubt that this is the reason.
sudo easy_install lxml
failed before
after installing Cython it says it uses Cython (not Trying to build without
Cython) and it worked.
nothing else having changed I thought it was a reasonable guess that it
worked because it used Cython because Cython is installed.
*
*
...
Stefan

Re: [lxml-dev] lxml-dev Digest, Vol 75, Issue 1

Dimitrios Pritsos

Stefan Behnel

Stefan Behnel

Stefan Behnel

Dimitrios Pritsos

jholg＠gmx.de

Dimitrios Pritsos

Stefan Behnel

Dimitrios Pritsos

Stefan Behnel

Stefan Behnel

Stefan Behnel

Stefan Behnel

Dimitrios Pritsos

jholg＠gmx.de

Dimitrios Pritsos

Stefan Behnel

Dimitrios Pritsos

Stefan Behnel

tags

participants (3)