[Tutor] regular expression question

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Fri Apr 8 01:29:10 CEST 2005



> I wonder if anyone can help me with an RE. I also wonder if there is an
> RE mailing list anywhere - I haven't managed to find one.

Hi Debbie,

I haven't found one either.  There appear to be a lot of good resources
here:

    http://dmoz.org/Computers/Programming/Languages/Regular_Expressions/

> I'm trying to use this regular expression to delete particular strings
> from a file before tokenising it.

Why not tokenize the file first, and then drop the strings with a period?
You may not need to do all your tokenization at once.  Can you do it in
phases?


> I want to delete all strings that have a full stop (period) when it is
> not at the beginning or end of a word, and also when it is not followed
> by a closing bracket.

Let's make sure we're using the same concepts.  By "string", do you mean
"word"?  That is, if we have something like:

     "I went home last Thursday."

do you expect the regular expression to match against the whole thing?

     "I went home last Thursday."

Or do you expect it to match against the specific end word?

     "Thursday."

I'm just trying to make sure we're using the same terms.  How specific
do you want your regular expression to be?



Going back to your question:

> I want to delete all strings that have a full stop (period) when it is
> not at the beginning or end of a word, and also when it is not followed
> by a closing bracket.

from a first glance, I think you're looking for a "lookahead assertion":

http://www.amk.ca/python/howto/regex/regex.html#SECTION000540000000000000000




> I want to delete file names (eg. fileX.doc), and websites (when www/http
> not given) but not file extensions (eg. this is in .jpg format). I also
> don't want to delete the last word of each sentence just because it
> precedes a fullstop, or if there's a fullstop followed by a closing
> bracket.

Does this need to be part of the same regular expression?



There are a lot of requirements here: can we encode this in some kind of
test class, so that we're sure we're hitting all your requirements?

Here's what I think you're looking for so far, written in terms of a unit
test:


######
import unittest

class DebbiesRegularExpressionTest(unittest.TestCase):
    def setUp(self):
        self.fullstopRe = re.compile("... fill me in")

    def testRecognizingEndWord(self):
        self.assertEquals(
            ["Thursday."],
            self.fullstopRe.findall("I went home last Thursday."))

    def testEndWordWithBracket(self):
        self.assertEquals(
            ["bar."],
            self.fullstopRe.findall("[this is foo.] bar. licious"))

if __name__ == '__main__':
    unittest.main()
######

If these tests don't match with what you want, please feel free to edit
and add more to them so that we can be more clear about what you want.



Best of wishes to you!



More information about the Tutor mailing list