dummy needs help with Python
Tim Chase
python.list at tim.thechases.com
Sat Dec 27 10:28:10 EST 2008
> I am trying to find somebody who can give me a simple python
> program I can use to "program by analogy". I just want to
> read two CSV files and match them on several fields,
> manipulate some of the fields, and write a couple of output
> files.
...
> Please forgive me if this is so, and take pity on a stranger
> in a strange land.
Pittsburgh is a little strange, but not *that* bad :)
Just for fun, I threw together a simple (about 30 lines) program
to do what you describe. Consider it a bit of slightly belated
Christmas pity on the assumption that this isn't classwork (a
little googling suggests that it's not homework). It's 100%
untested, so if it formats your hard-drive, steals your spouse,
wrecks your truck, kicks your dog, makes a mess of your
trailer-home, and drinks all your beer, caveat coder. But you've
got the source, so you can vet it...and it's even commented a bit
for pedagogical amusement if you plan to mung with it :)
from csv import reader
SMALL = 'a.txt'
OTHER = 'b.txt'
smaller_file = {} # key->line mapping dict for the smaller file
f_a = file(SMALL)
r_a = reader(f_a)
#a_headers = reader.next() # optionally discard a header row
# build up the map in smaller_file of key->line
for i, line in enumerate(r_a):
a1, a2, a3, a4, a5 = line # name the fields
key = f1, f3, f5
if key in smaller_file:
print "Duplicate key [%r] in %s:%i" % (key, SMALL, i+1)
#continue # does the 1st or 2nd win? uncomment for 1st
smaller_file[key] = line
f_a.close()
b = file(OTHER)
r_b = reader(b)
#b_headers = reader.next() # optionally discard a header row
for i, line in enumerate(r_b):
b1, b2, b3, b4, b5, b6, b7, b8, b9 = line
key = b2, b8, b9
if key not in smaller_file:
print "Key for line #%i (%r) not in %s" % (i+1, key, SMALL)
continue
a1, a2, a3, a4, a5 = smaller_file[key]
# do manipulation with a[1-5]/b[1-9] here
# and do something with them
b.close()
It makes more sense if instead of calling them a[1-5]/b[1-9], you
actually use the field-names that may have be in the header rows
such as
cost_center, store, location, manager_id = line
key = cost_center, store, location
You may also have to manipulate some of the values to make
key-matches work, such as
cc, store, loc, mgr = line
cc = cc.strip().upper()
store = store.strip().title()
key = cc, store, loc
ensuring that you do the same manipulations for both files.
The code above reads the entire smaller file into memory and uses
it for fast lookup. However, if you have gargantuan files, you
may need to process them differently. You don't detail the
fields/organization of the files, so if they're both sorted by
key, you can change the algorithm to behave like the standard
*nix "join" command.
Other asides: you may have to tweak treatment of a header-row
(and correspondingly the line-numbers), as well as
conflict-handling for keys in your a.txt source if they exist,
along with the behavior when a key can't be found in a.txt but is
requested in b.txt (maybe set some defaults instead of logging
the error and skipping the row?), and then lastly and most
importantly, you have to fill in the manipulations you desire and
then actually do something with the processed results (write them
to a file, upload them to a database, send them via email, output
them to a text-to-speech engine and have it speak them, etc).
> I come from 30 years of mainframe programming so I understand
> how computers work at a bits/bytes /machine language/ source
> vs.executable/reading core dumps level, and I can program in
> a lot of languages most people using Python have never even
> heard of,
If there's such urgency, I hope you resorted to simply using one
of these multitude of other languages you know -- Even in C, this
wouldn't be too painful as projects go (there's a phrase you
won't hear me utter frequently). Or maybe try your hand at it in
pascal, shell-scripting (see the "join" command) or even assembly
language. Not sure I'd use Logo, Haskel, Erlang, or Prolog. :)
> My problem is that I want to do this all yesterday, and the
> Python text I bought is not easy to understand. I don't have
> time to work my way through the online Python tutorial.
As Rick mentioned, there are a number of free online sources for
tutorials, books, and the like. Dive Into Python is one of the
classics. Searching the archives of comp.lang.python for
"beginner books" will yield the same thread coming up every
couple weeks. For future reference, if you've got time-sensitive
projects to tackle "yesterday", it's usually not the best time to
try and learn a new language. Good luck in your exploration of
Python.
-tkc
More information about the Python-list
mailing list