[Tutor] regular expression question

Kent Johnson kent37 at tds.net
Thu Apr 7 20:51:12 CEST 2005


D Elliott wrote:
> I wonder if anyone can help me with an RE. I also wonder if there is an 
> RE mailing list anywhere - I haven't managed to find one.
> 
> I'm trying to use this regular expression to delete particular strings 
> from a file before tokenising it.
> 
> I want to delete all strings that have a full stop (period) when it is 
> not at the beginning or end of a word, and also when it is not followed 
> by a closing bracket. I want to delete file names (eg. fileX.doc), and 
> websites (when www/http not given) but not file extensions (eg. this is 
> in .jpg format). I also don't want to delete the last word of each 
> sentence just because it precedes a fullstop, or if there's a fullstop 
> followed by a closing bracket.
> 
> fullstopRe = re.compile (r'\S+\.[^)}]]+')

There are two problems with this is:
- The ] inside the [] group must be escaped like this: [^)}\]]
- [^)}\]] matches any whitespace so it will match on the ends of words

It's not clear from your description if the closing bracket must immediately follow the full stop or 
if it can be anywhere after it. If you want it to follow immediately then use
\S+\.[^)}\]\s]\S*

If you want to allow the bracket anywhere after the stop you must force the match to go to a word 
boundary otherwise you will match foo.bar when the word is foo.bar]. I think this works:
(\S+\.[^)}\]\s]+)(\s)

but you have to include the second group in your substitution string.

BTW C:\Python23\pythonw.exe C:\Python24\Tools\Scripts\redemo.py is very helpful with questions like 
this...

Kent



More information about the Tutor mailing list