[Tutor] thesaurus

Mon Jul 13 13:57:54 CEST 2009

Pete Froslie wrote:
> The url function btw:
>
> def url():
>
>
>    fin = open("journey_test.txt", "r")
>    response = re.split(r"[/|/,\n, , ,:\"\"\.?,)(\-\<>\[\]'\r']",
> fin.read())
>    thesaurus = API_URL + response[word_number] + '/'  #API_URL is
> established at the start of the code
>    return thesaurus
>
>   
> yes. Essentially, it grabs each word from a text file and combines it with
> the other stuff to create a url string that I send through an API to an
> online thesaurus. I was attempting to strip it this way as a weak method for
> cleaning out the words I don't want searched from the text file.
>
> Along with the the other functions the code currently scans a text file and
> replaces each of its words with one of their synonyms.. slightly legible
> gibberish, but that's what I'm interested in for now. It is a project to
> help me start learning pyhton as well being intended to take a different
> form in an artwork I'm working on.
>
> thanks.
> petef
>
>   
BTW, constants like API_URL are fine uses for globals.

I can't get my head around this url() function definition, or how it 
would be called.  It appears that your program looks like:

Open the file and find out how many words are in it.
for each word,
    Open the file again, parse the whole thing yet again, to figure out 
the nth word.  Then throw away the parsing, plus any information about 
where in the file the word was found, just keeping a generated URL
    use the URL to access a website, looking up a replacement word.
    Re-open the file, parse it till you find something resembling the 
word involved, then substitute the synonym, and write it back out.

So if you have a file with 1000 words in it, you'll open it and parse it 
2001 times.  And part of your symptoms result from the second parsing 
trying to replace words that aren't really words.

If I were you, I'd simplify the problem, and solve the simplified 
problem in a pretty way, keeping in mind what the complete problem is.  
Then expand it incrementally from there.

For example, instead of a website, you can build a map of synonyms (see 
my example earlier).  Just a dozen should be fine for testing.
Instead of a file with arbitrary text, you could use a simple string of 
words, without punctuation, capitalization or other complexities.

Now, write a set of functions that solve that problem, and gradually add 
in the complexities of the original problem.

And try to do each operation once, both for efficiency and to avoid 
errors when it's slightly different the two times you might do it.   To 
me that means the final program should have this kind of flow:

Open the file, and start parsing.
      if the next sequence is punctuation and white space, copy it to 
the output.
      if the next sequence is a word, extract it, call synonym() on it, 
and copy that to the output
     continue till the infile is done
Perhaps copy the outfile on top of the infile.

See the suggestions for functions I made much earlier in this thread.

DaveA