[Tutor] can this be done easerly
Alan Gauld
alan.gauld at btinternet.com
Mon Aug 30 12:34:10 CEST 2010
"Roelof Wobben" <rwobben at hotmail.com> wrote
> import string
I know your tutorial uses the string module but you really should
get out of that habit. It will not work in the newer versions of
Python
and is less efficient that using the builtin methods.
And invariably requires more typing! :-)
def extract_words(s):
"""
>>> extract_words('Now is the time! "Now", is the time? Yes,
now.')
['now', 'is', 'the', 'time', 'now', 'is', 'the', 'time', 'yes',
'now']
This is a bad example since it uses " twice in the same string.
It should probably read
>>> extract_words('Now is the time! 'Now', is the time? Yes,
now.")
word= ""
s=string.lower(s)
So this should be
s = s.lower()
> for char in s :
> if ord(char) >=65 and ord(char) <= 122 or ord(char)==32 or
> ord(char)==45:
> word= word + char
You don't really need to process it letter by letter and even if you
do
you don't need the ord() tests, just use isalpha() etc
if char.isalpha(): word += char
> word=string.split(word, "--")
> word=string.join(word, " ")
Becomes
word = " " .join(word.split('--'))
But you could just have used strip() - see below - or even replace()
> word=word.replace (" ", " ")
But oddly you use the string method here rather than string.replace()
?
A good choice.
> word=string.split(word, " ")
word = word.split(" ")
> return word
if __name__ == '__main__':
import doctest
doctest.testmod()
> But now I wonder if this can be done more easily ?
Yes, you can do it all at the word level and you can remove
the characters you don't want rather than testing for their
presence etc.
Look at the documentation for strip() you will see that you
can provide a list of characters that you want removed.
So a single call to split() followed by strip() on each word
should do the vast bulk of the work for you.
HTH,
--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/
More information about the Tutor
mailing list