python re - a not needed

kepes.krisztian kepes.krisztian at peto.hu
Thu Dec 16 04:06:42 EST 2004


Hi !

I want to get infos from a html, but I need all chars except <.
All chars is: over chr(31), and over (128) - hungarian accents.
The .* is very hungry, it is eat < chars too.

If I can use not, I simply define an regexp.
[not<]*</a>

It is get all in the href.

I wrote this programme, but it is too complex - I think:

import re

l=[]
for i in range(33,65):
    if i<>ord('<') and i<>ord('>'):
       l.append('\\'+chr(i))
s='|'.join(l)
all='\w|\s|\%s-\%s|%s'%(chr(128),chr(255),s)
sre='<Subj>([%s]{1,1024})</d>'%all
#sre='<Subj>([?!\\<]{1,1024})</d>'
s='<Subj>xmvccv ÁÁÁ sdfkdsfj eirfie</d><A></d>'


print sre
print s
cp=re.compile(sre)
m=cp.search(s)
print m.groups()

Have the python an regexp exception, or not function ? How to I use it ?

Thanx for help:
 kk



More information about the Python-list mailing list