[Tutor] RE expressions

Johan Nilsson johanpo at googlemail.com
Fri Aug 15 22:58:44 CEST 2008


Hi all python experts


I am trying to work with BeautifulSoup and re and running into one problem.

What I want to do is open a webpage and get some information. This is  
working fine
I then want to follow  some of links on this page and proces them. I  
manage to get links that I am interested in filtered out with by simple re  
expressions. My problem is that I now have a number of string that look  
like

'text  "http:\123\interesting_adress\etc\etc\" more text'

I have figured out that if it wasn't for the \ a simple
p=re.compile('\"\w+\"') would do the trick. From what I understand \w only  
covers the set [a-zA-Z0-9_] and hence not the "\".
I assume the solution is just in front of my eyes, and I have been looking  
on the screen for too long. Any hints would be appreciated.


In [72]: p=re.compile('"\w+\"')

In [73]: p.findall('asdsa"123abc123"jggfds')
Out[73]: ['"123abc123"']

In [74]: p.findall('asdsa"123abc\123"jggfds')
Out[74]: ['"123abcS"']

/Johan

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/


More information about the Tutor mailing list