[Tutor] comparing files
Danny Yoo
dyoo at hkn.eecs.berkeley.edu
Wed Sep 15 22:33:44 CEST 2004
On Wed, 15 Sep 2004, D Elliott wrote:
> I am completely new to programming and have been learning Python for
> about a week. I have looked through and worked through the first few
> chapters of:
>
> - Python Tutorial (Rossum et al)
> - Non-Programmers Tutorial for Python (Cogliati)
> - Learn to program using Python (Gauld)
> - How to think like a computer scientist (Downey et al)
Hi Debbie,
I'd say it's ambitious, but still realistic. You may want to stretch the
time for another week or two, but doing it in a week is still possible if
you work hard at it.
I recommend focusing on getting the core concepts of programming down.
The tutorials that you're looking at should be of great help. If you see
something that talks about how to write a class, or how to use modules,
skim it for now: you can get that material later, when you're more
familiar with the language.
I'd also recommend skimming the Python Tutorial, and not read it too
deeply yet. The material in Guido's tutorial touches mostly on the
differences between Python and other languages. Its target audience is
for folks who are already programmers.
The other three tutorials are tailored toward beginners, and those should
be more approachable.
> The idea I have is as follows:
>
> - Read in a file containing MT output (usually about 400 words)
> - Compare it with a file containing a complete English word list
> - Print all words that do not appear in the wordlist in a separate file
> - Count the words in the file and print the percentage of not found words
> (The assumption is that these will be untranslated words - obviously this
> will have to be tested and tweaked)
Yes, if you finish and understand the material from those tutorials, you
should be able to do this. A straightforward solution requires the
following concepts:
1. File IO and string manipulation: you'll need to break your two
input files (MT text and English words) into words.
2. Basic data structures to hold the list of words in memory.
"Dictionaries", in particular, will help you do the comparisons in an
efficient way. You can actually get away with using basic "lists"
too, although your program may be a little less efficient.
3. Control flow. You should feel comfortable with things like
'loops' and conditional 'if' statements.
I'd strongly suggest one more concept:
4. Functions. They're great for managing the complexity of programs.
I've seen programs written that don't use functions, and frankly, most of
them are a big mess. *grin*
Those areas seem core to writing interesting programs; does anyone have
other suggestions?
> I, on the other hand, prefer to fully understand what I am doing! (BTW -
> my supervisor does not know Python)
Please feel free to bring up programming questions on this Tutor list;
we're here to help. We won't do homework, of course, but we can help you
identify useful programming concepts and to help clarify the material that
you're reading.
If you see something that you don't understand, ask. The volunteers on
this list will either try to explain it, or point you toward online
material that explains it well.
Good luck to you.
More information about the Tutor
mailing list