[Tutor] comparing files

D Elliott debe at comp.leeds.ac.uk
Wed Sep 15 20:52:33 CEST 2004


I am completely new to programming and have been learning Python for about
a week. I have looked through and worked through the first few chapters
of:

- Python Tutorial (Rossum et al)
- Non-Programmers Tutorial for Python (Cogliati)
- Learn to program using Python (Gauld)
- How to think like a computer scientist (Downey et al)

For my PhD in machine translation evaluation, my first programming task is
to try to automatically detect (and then count) all words that were not
translated (into English) by the system (ie. they are still in French).
The idea I have is as follows:

- Read in a file containing MT output (usually about 400 words)
- Compare it with a file containing a complete English word list
- Print all words that do not appear in the wordlist in a separate file
- Count the words in the file and print the percentage of not found words
(The assumption is that these will be untranslated words - obviously this
will have to be tested and tweaked)

I now know how to read and write files, but not compare them. Would you
say this is a particularly advanced task to do? My supervisor seemed to
think that I could learn how to do this within a week by just skimming
through the books and finding the relevant code. Is this realistic for a
complete beginner? I, on the other hand, prefer to fully understand what I
am doing! (BTW - my supervisor does not know Python)

Could anyone please tell me how long you think it should take a keen
beginner to get to that level, and which aspects of Python would you
recommend that I learn first? Does anyone know of a book/tutorial that
shows how to do the above tasks?

Thanks in advance to anyone who can enlighten me:)
Debbie
-- 
***************************************************
Debbie Elliott
Computer Vision and Language Research Group,
School of Computing,
University of Leeds,
Leeds LS2 9JT
United Kingdom.
Tel: 0113 3437288
Email: debe at comp.leeds.ac.uk
***************************************************


More information about the Tutor mailing list