Hi dear people of lxml,
I wrote a small lib in Python using lxml to generate graph file in a
specific format gexf.
This little thing is called pygexf :
http://packages.python.org/pygexf/http://github.com/paulgirard/pygexf
First thanks for your great work on lxml, I am loving it !
Second I am experiencing rude probs with installing lxml on my mac os x.
I read that this hasn't been completed ported yet but still some
workarounds with static libs seems to have worked for some of us.
I will …
[View More]only focus on the method : STATIC_DEPS=true sudo easy_install lxml
I am having probs with gcc.
I have 3 different versions of gcc :
gcc-4 : coming from fink install gcc42
gcc-4.0
gcc-4.2 : bot coming from Xcode mac os x tools
note: I am changing the link gcc to the different versions to test all
of them
Now here are the different errors I have with the various gcc versions :
gcc-4
$ls -la /usr/bin/gcc
lrwxr-xr-x 1 root wheel 13 2 jul 11:43 /usr/bin/gcc -> /sw/bin/gcc-4
$STATIC_DEPS=true sudo easy_install lxml
searching for lxml
[...]
Building against libxml2/libxslt in the following directory: /usr/lib
gcc: unrecognized option '-no-cpp-precomp'
cc1: erreur: option "-mno-fused-madd" de la ligne de commande non reconnue
cc1: erreur: option "-arch" de la ligne de commande non reconnue
cc1: erreur: option "-arch" de la ligne de commande non reconnue
cc1: erreur: option "-Wno-long-double" de la ligne de commande non reconnue
error: Setup script exited with error: command 'gcc' failed with exit
status 1
gcc-4.2
$ STATIC_DEPS=true sudo easy_install lxml
Searching for lxml
[...]
Using build configuration of libxslt 1.1.12
Building against libxml2/libxslt in the following directory: /usr/lib
cc1: error: unrecognized command line option "-Wno-long-double"
cc1: error: unrecognized command line option "-Wno-long-double"
lipo: can't open input file: /var/tmp//ccDn5F16.out (No such file or
directory)
error: Setup script exited with error: command 'gcc' failed with exit
status 1
The error list of gcc-4.0 is huge.
I'll not post it here but I could if necessary.
So here I am facing building problems.
I am not a expert into that kind of thing.
I am usually developing on linux (ubuntu) but many users of my small lib
(including me) are mac users.
I don't really want to change my code to use another xml lib so I hope
i'll finally find a way..
If anyone can help on this issue it'd be more than great,
thanks for reading me
Paul
ps: I couldn't try the darwin port method I couldn't understand how to
use it...
--
Paul Girard
responsable numérique médialab
paul.girard(a)sciences-po.fr
01 45 49 63 58
médialab | Sciences Po
medialab.sciences-po.fr <http://medialab.sciences-po.fr>
13 rue de l'université
75007 PARIS
[View Less]
Eugene Van den Bulke, 02.07.2010 11:53:
> Stefan Behnel, 02.07.2010 11:42:
>> Eugene Van den Bulke, 02.07.2010 10:49:
>>> I am experimenting with web scraping using lxml.
>>>
>>> I have played a little with BeautifulSoup in the past and scrapy
>>> recently.
>>>
>>> I am recoding something I did with scrapy with lxml but encounter a
>>> problem I am not sure how to iron out.
>>>
>>> With scrapy, …
[View More]hxs is an xpath selector which has a select and re method
>>>
>>> types = hxs.select('.//a[@href]/@href').re(r'type=([A-Z]*)')
>>>
>>> Which will return a list of the matches in href.
>>>
>>> How would I do the same thing with lxml?
>>>
>>> types = doc.xpath('.//a[@href]/@href') ...
Note that this is redundant, './/a/@href' is enough.
>> http://lmgtfy.com/?q=lxml+regular+expressions&l=1
>
> I did read the doc before I took the liberty to post ... I am afraid I
> just don't get it.
Personally, I wouldn't even use XPath regular expressions here. I'd rather
do something like this:
from lxml import html
import re
parse_type_value = re.compile(r'type=([A-Z]*)').findall
root = html.parse(the_file).getroot()
for el, attr, link, pos in root.iterlinks():
if 'type=' in link:
print el.tag, parse_type_value(link)
Note that this will give you all links, not only those in <a> href's. If
you really only want those, the XPath expression above will do just fine.
Stefan
[View Less]
Hi,
I am experimenting with web scraping using lxml.
I have played a little with BeautifulSoup in the past and scrapy recently.
I am recoding something I did with scrapy with lxml but encounter a
problem I am not sure how to iron out.
With scrapy, hxs is an xpath selector which has a select and re method
types = hxs.select('.//a[@href]/@href').re(r'type=([A-Z]*)')
Which will return a list of the matches in href.
How would I do the same thing with lxml?
types = doc.xpath('.//a[@href]/@…
[View More]href') ...
Thanks a lot,
--
EuGeNe -- I lend my books on COlivri http://www.colivri.org/user/eugene, do you?
[View Less]