[Tutor] Thanks for Regex help

D Elliott debe at comp.leeds.ac.uk
Fri Apr 8 11:20:32 CEST 2005


Thanks to Matt, Kent and Danny for helping me with my regex question. I 
will try your suggestions this morning.

In response to Danny's question about tokenising first, there are reasons 
why I don't want to do this - the initial problem was that filenames in my 
test data were being tokenised as separate words. EG. 
DataMarchAccounts.txt would be tokenised as two words, neither of which 
are real words that can be found in an English dictionary. (Often, 
filenames are not proper words, which is why I needed to delete the whole 
string - and by 'string' I mean any consecutive string of non-whitespace 
characters.) Because I don't want to subsequently analyse any 'non-words', 
only real words that will then be automatically checked against a lexicon.

Well - my code is all done now, apart from the tweaking of this one RE. 
BTW - I am new to Python and had never done any programming before that, 
so you may see some more questions from me in the future...

Cheers again,
Debbie
-- 
***************************************************
Debbie Elliott
Computer Vision and Language Research Group,
School of Computing,
University of Leeds,
Leeds LS2 9JT
United Kingdom.
Tel: 0113 3437288
Email: debe at comp.leeds.ac.uk
***************************************************


More information about the Tutor mailing list