[Tutor] thesaurus
Dave Angel
davea at ieee.org
Mon Jul 13 13:57:54 CEST 2009
Pete Froslie wrote:
> The url function btw:
>
> def url():
>
>
> fin = open("journey_test.txt", "r")
> response = re.split(r"[/|/,\n, , ,:\"\"\.?,)(\-\<>\[\]'\r']",
> fin.read())
> thesaurus = API_URL + response[word_number] + '/' #API_URL is
> established at the start of the code
> return thesaurus
>
>
> yes. Essentially, it grabs each word from a text file and combines it with
> the other stuff to create a url string that I send through an API to an
> online thesaurus. I was attempting to strip it this way as a weak method for
> cleaning out the words I don't want searched from the text file.
>
> Along with the the other functions the code currently scans a text file and
> replaces each of its words with one of their synonyms.. slightly legible
> gibberish, but that's what I'm interested in for now. It is a project to
> help me start learning pyhton as well being intended to take a different
> form in an artwork I'm working on.
>
> thanks.
> petef
>
>
BTW, constants like API_URL are fine uses for globals.
I can't get my head around this url() function definition, or how it
would be called. It appears that your program looks like:
Open the file and find out how many words are in it.
for each word,
Open the file again, parse the whole thing yet again, to figure out
the nth word. Then throw away the parsing, plus any information about
where in the file the word was found, just keeping a generated URL
use the URL to access a website, looking up a replacement word.
Re-open the file, parse it till you find something resembling the
word involved, then substitute the synonym, and write it back out.
So if you have a file with 1000 words in it, you'll open it and parse it
2001 times. And part of your symptoms result from the second parsing
trying to replace words that aren't really words.
If I were you, I'd simplify the problem, and solve the simplified
problem in a pretty way, keeping in mind what the complete problem is.
Then expand it incrementally from there.
For example, instead of a website, you can build a map of synonyms (see
my example earlier). Just a dozen should be fine for testing.
Instead of a file with arbitrary text, you could use a simple string of
words, without punctuation, capitalization or other complexities.
Now, write a set of functions that solve that problem, and gradually add
in the complexities of the original problem.
And try to do each operation once, both for efficiency and to avoid
errors when it's slightly different the two times you might do it. To
me that means the final program should have this kind of flow:
Open the file, and start parsing.
if the next sequence is punctuation and white space, copy it to
the output.
if the next sequence is a word, extract it, call synonym() on it,
and copy that to the output
continue till the infile is done
Perhaps copy the outfile on top of the infile.
See the suggestions for functions I made much earlier in this thread.
DaveA
More information about the Tutor
mailing list