How can I exclude a word by using re?
could.net at gmail.com
Tue Aug 16 03:09:11 CEST 2005
I want to use re because I want to extract something from a html. It
will be very complicated without using re. But while using re, I
found that I must exlude a hole word "</td>", certainly, there are
many many "</td>" in this html.
My re is as below:
ur'<a href="(?P<url>[^<>]+\.mp3)"( )target=_blank>'
There should be over 30 matches in the html. But I find nothing by
re.finditer(html) because my last line of re is wrong. I can't use
"(?P<name>.+)</td>" because there are many many "</td>" in the html
and I just want the ".*" to match what are before the firest "</td>".
So I think if there is some idea I can exclude a word, this will be
done. Assume there is "NOT(WORD)" can do it, I just need to write the
last line of the re as "(?P<name>(NOT(</td>))+)</td>".
But I still have no idea after thinking and trying for a very long time.
In other words, I want the "</td>" of "(?P<name>.+)</td>" to be
exactly the first "</td>" in this match. And there is more than one
match in this html, so this must be done by using re.
And I can't use any of your idea because what I want I deal with is a
very complicated html, not just a single line of word.
I can copy part of the html up to here but it's kinda too lengthy.
On 8/15/05, John Machin <sjmachin at lexicon.net> wrote:
> could ildg wrote:
> > In re, the punctuation "^" can exclude a single character, but I want
> > to exclude a whole word now. for example I have a string "hi, how are
> > you. hello", I want to extract all the part before the world "hello",
> > I can't use ".*[^hello]" because "^" only exclude single char "h" or
> > "e" or "l" or "o". Will somebody tell me how to do it? Thanks.
> (1) Why must you use re? It's often a good idea to use string methods
> where they can do the job you want.
> (2) What do you want to have happen if "hello" is not in the string?
> C:\junk>type upto.py
> def upto(strg, what):
> k = strg.find(what)
> if k > -1:
> return strg[:k]
> return None # or raise an exception
> helo = "hi, how are you? HELLO I'm fine, thank you hello hello hello.
> that's it"
> print repr(upto(helo, "HELLO"))
> print repr(upto(helo, "hello"))
> print repr(upto(helo, "hi"))
> print repr(upto(helo, "goodbye"))
> print repr(upto("", "goodbye"))
> print repr(upto("", ""))
> 'hi, how are you? '
> "hi, how are you? HELLO I'm fine, thank you "
More information about the Python-list