[Tutor] Extracting words(quest 2)

Nicole Seitz nicole.seitz@urz.uni-hd.de
Tue, 26 Mar 2002 16:22:03 +0100


Hi!

Thanx, this was very helpful! Though, there are some lines(see below) I d=
on't=20
understand.Hope you don't mind  explaining.

Am Montag, 25. M=E4rz 2002 19:35 schrieben Sie:

>
> import re, pprint
>
> def indexWord(filename):
>      wordDict =3D {}
>      lineCount =3D 0
>      file =3D open(filename)
>      expr =3D re.compile("\w+", re.LOCALE)

What's exactly the meaning of the flag LOCALE?

>      while 1:
>          line =3D file.readline()
>          if not line:
>              break
>          lineCount =3D lineCount + 1
>          resultList  =3D expr.findall(line)

So I can't use match()???


>
>
> if __name__ =3D=3D "__main__":
>      filename =3D r"c:\foo\bar\baz.txt"
>      wordDict =3D indexWord(filename)

What's happening here?
By the way, the program now works pretty well, though the output is somet=
imes=20
a bit awkward, for example, when there are many occurences of one=20
word.Doesn't look very nice. I'm trying to change this now.
Oh, I nearly forgot :
How come that the words in the output are alphabetically ordered?

Nicole


> - If a word occurs twice in the same line, it will be listed twice. To
> avoid this behaviour, you need to filter resultList (left as an exercic=
e
> :-).
>
> >Would be very thankful for suggestions!!
>
> HTH.
>
> Alex