[Tutor] Extracting words(quest 2)
Nicole Seitz
nicole.seitz@urz.uni-hd.de
Tue, 26 Mar 2002 16:22:03 +0100
Hi!
Thanx, this was very helpful! Though, there are some lines(see below) I d=
on't=20
understand.Hope you don't mind explaining.
Am Montag, 25. M=E4rz 2002 19:35 schrieben Sie:
>
> import re, pprint
>
> def indexWord(filename):
> wordDict =3D {}
> lineCount =3D 0
> file =3D open(filename)
> expr =3D re.compile("\w+", re.LOCALE)
What's exactly the meaning of the flag LOCALE?
> while 1:
> line =3D file.readline()
> if not line:
> break
> lineCount =3D lineCount + 1
> resultList =3D expr.findall(line)
So I can't use match()???
>
>
> if __name__ =3D=3D "__main__":
> filename =3D r"c:\foo\bar\baz.txt"
> wordDict =3D indexWord(filename)
What's happening here?
By the way, the program now works pretty well, though the output is somet=
imes=20
a bit awkward, for example, when there are many occurences of one=20
word.Doesn't look very nice. I'm trying to change this now.
Oh, I nearly forgot :
How come that the words in the output are alphabetically ordered?
Nicole
> - If a word occurs twice in the same line, it will be listed twice. To
> avoid this behaviour, you need to filter resultList (left as an exercic=
e
> :-).
>
> >Would be very thankful for suggestions!!
>
> HTH.
>
> Alex