Hello guys,
I am sorry that I am sending this as a response but there is two issues
I d like to point out:
1. There is a memory leakage using lxml.html.parse (or etree) while you
do that constantly in a loop. In particular creating etrees in a loop
does let the trees there and is not deleting the properly when you reuse
the same python variable to store the resutls. For now I haven't tryed
to resolve it because module re (regular expression) is just fine for
URL extraction, however I …
[View More]would prefer the use of XPath for extracting a
variate of links more easily in Coding point of view. Plus I think that
the overhead of Tree Building is not so much (I dont know for sure thought).
2. Speaking of XPath for url extraction, I think that lxml.html has some
issues in url extraction (this is what I think reading the Code of this
module). And the question is why not to use the XPath for making the
code twice smaller and twice neater (I cleaner and well formed - I hope
my vocabulary is correct), maybe faster too.
Best Regards,
Dimitrios
On 12/02/2010 11:35 AM, lxml-dev-request(a)codespeak.net wrote:
> Send lxml-dev mailing list submissions to
> lxml-dev(a)codespeak.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://codespeak.net/mailman/listinfo/lxml-dev
> or, via email, send a message with subject or body 'help' to
> lxml-dev-request(a)codespeak.net
>
> You can reach the person managing the list at
> lxml-dev-owner(a)codespeak.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of lxml-dev digest..."
>
>
> Today's Topics:
>
> 1. Re: Compile failure (felix)
> 2. Re: Compile failure (jholg(a)gmx.de)
> 3. Re: SyntaxErrors with Python 3 (Stefan Behnel)
> 4. Re: Schema validation - no file position (Stefan Behnel)
> 5. Re: Schema validation - no file position (Krzysztof Jakubczyk)
> 6. Re: Schema validation - no file position (Stefan Behnel)
> 7. Re: read .xlsx spreadsheets with lxml ? (Chris Withers)
> 8. How to register extension functions for XSL transformations
> (Shaung)
> 9. Re: How to register extension functions for XSL
> transformations (jholg(a)gmx.de)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 23 Nov 2010 11:14:19 +0100
> From: felix<crucialfelix(a)gmail.com>
> Subject: Re: [lxml-dev] Compile failure
> To: Stefan Behnel<stefan_ml(a)behnel.de>
> Cc: lxml-dev(a)codespeak.net
> Message-ID:
> <AANLkTinBaMukcy=pCVHjmmmAq=kiPN=nGE_Gd27_z5sT(a)mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> On Sat, Oct 30, 2010 at 10:49 AM, Stefan Behnel<stefan_ml(a)behnel.de> wrote:
>
>> felix, 26.10.2010 15:28:
>>
>> According to this:
>>> http://codespeak.net/lxml/build.html
>>>
>>> we should avoid installing Cython
>>>
>>> but using easy_install to build fails saying the cython generated file is
>>> missing
>>>
>> I doubt that it's failing because of that. However, you didn't provide the
>> output of the build, so I can't guess what happened that actually made the
>> build fail.
>
> sorry, that output had scrolled off by the time I realized I should submit a
> report. I have another server so fortunately I can fail there and show you.
>
> crucial@crucial-systems:~/working/lxml$ python setup.py build
> /home/crucial/working/lxml/versioninfo.py:53: UserWarning: unrecognized
> .svn/entries format; skipping /home/crucial/working/lxml/
> warn("unrecognized .svn/entries format; skipping "+base)
> Building lxml version 2.3.beta1.
> *NOTE: Trying to build without Cython, pre-generated 'src/lxml/lxml.etree.c'
> needs to be available.*
> Using build configuration of libxslt 1.1.26
> Building against libxml2/libxslt in the following directory: /usr/lib
> running build
> running build_py
> running build_ext
> building 'lxml.etree' extension
> gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall
> -Wstrict-prototypes -fPIC -I/usr/local/include/libxml2
> -I/usr/include/python2.6 -c src/lxml/lxml.etree.c -o
> build/temp.linux-x86_64-2.6/src/lxml/lxml.etree.o -w
> gcc: src/lxml/lxml.etree.c: No such file or directory
> gcc: no input files
> error: command 'gcc' failed with exit status 1
>
>
>
>> The latest build instructions for the SVN trunk are in the SVN trunk as
>> "doc/build.txt", or *(not always completely up-to-date)* here:
>>
> exactly
>
>
>
>
>> *but then I succeeded with the old sudo easy_install lxml*
>>>
>>> because now I have Cython
>>>
>> Again, I doubt that this is the reason.
>>
> sudo easy_install lxml
> failed before
>
> after installing Cython it says it uses Cython (not Trying to build without
> Cython) and it worked.
>
> nothing else having changed I thought it was a reasonable guess that it
> worked because it used Cython because Cython is installed.
>
> *
> *
>
>> Stefan
>>
>
[View Less]