basic python questions

Fredrik Lundh fredrik at
Sat Nov 18 11:27:50 CET 2006

nateastle at wrote:

> I have a simple assignment for school but am unsure where to go. The
> assignment is to read in a text file, split out the words and say which
> line each word appears in alphabetical order. I have the basic outline
> of the program done which is:

looks like an excellent start to me.

> def Xref(filename):
>     try:
>         fp = open(filename, "r")
>         lines = fp.readlines()
>         fp.close()
>     except:
>         raise "Couldn't read input file \"%s\"" % filename
>     dict = {}
>     for line_num in xrange(len(lines)):
>         if lines[line_num] == "":  continue
>         words = lines[line_num].split()
>         for word in words:
>             if not dict.has_key(word):
>                 dict[word] = []
>             if line_num+1 not in dict[word]:
>                 dict[word].append(line_num+1)
>     return dict
> My question is, how do I easily parse out punction marks

it depends a bit how you define the term "word".

if you're using regular text, with a limited set of punctuation 
characters, you can simply do e.g.

     word = word.strip(".,!?:;")
     if not word:

inside the "for word" loop.  this won't handle such characters if they 
appear inside words, but that's probably good enough for your task.

another, slightly more advanced approach is to use regular expressions, 
such as re.findall("\w+") to get a list of all alphanumeric "words" in 
the text.  that'll have other drawbacks (e.g. it'll split up words like 
"couldn't" and "cross-reference", unless you tweak the regexp), and is 
probably overkill.

and how do I sort the list and

how to sort the dictionary when printing the cross-reference, you mean? 
    just use "sorted" on the dictionary; that'll get you a sorted list 
of the keys.


to avoid duplicates and simplify sorting, you probably want to normalize 
the case of the words you add to the dictionary, e.g. by converting all 
words to lowercase.

 > if there anything else that I am doing wrong in this code

there's plenty of things that can be tweaked and tuned and written in a 
slightly shorter way by an experienced Python programmer, but assuming 
that this is a general programming assignment, I don't see something 
seriously "wrong" in your code (just make sure you test it on a file 
that doesn't exist before you hand it in)


More information about the Python-list mailing list