[Tutor] RE expressions
Johan Nilsson
johanpo at googlemail.com
Fri Aug 15 22:58:44 CEST 2008
Hi all python experts
I am trying to work with BeautifulSoup and re and running into one problem.
What I want to do is open a webpage and get some information. This is
working fine
I then want to follow some of links on this page and proces them. I
manage to get links that I am interested in filtered out with by simple re
expressions. My problem is that I now have a number of string that look
like
'text "http:\123\interesting_adress\etc\etc\" more text'
I have figured out that if it wasn't for the \ a simple
p=re.compile('\"\w+\"') would do the trick. From what I understand \w only
covers the set [a-zA-Z0-9_] and hence not the "\".
I assume the solution is just in front of my eyes, and I have been looking
on the screen for too long. Any hints would be appreciated.
In [72]: p=re.compile('"\w+\"')
In [73]: p.findall('asdsa"123abc123"jggfds')
Out[73]: ['"123abc123"']
In [74]: p.findall('asdsa"123abc\123"jggfds')
Out[74]: ['"123abcS"']
/Johan
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
More information about the Tutor
mailing list