attribute access and indirection
Ethan Furman
ethan at stoneleaf.us
Thu Oct 22 09:21:13 EDT 2009
Greetings, List!
Say I have an old-fashioned dbf style table, with a single name field of
50 characters:
names = dbf.Table(':memory:', 'name C(40)')
Then I add a bunch of names from who-knows-where:
for name in some_iterable():
names.append((name))
Now I want to know how many start with a 'J'...
I have two options, 1) brute force:
matches = [rec for rec in names if rec.name[0] == 'J']
or, 2) binary search after ordering:
names.order('name[:1]')
matches = names.search('J', startswith=True)
So far so good. Now it gets a little more complicated. In my use case
I am trying to match records from one database to records from a another
database; therefore, I do _not_ know what text I will be searching for,
only the fields I will be using.
If I only had one criteria, I'd still be okay:
different_table.order('zipcode[:5], last_name')
for record in original_table:
matches = different_table.search([record.zipcode[:5], last_name])
However, I have three different sets of matches:
'first_name[:1], last_name, city, dlvryaddrs[:4]'
'first_name[:1], last_name[:5], dlvryaddrs[:8]'
'first_name, last_name, city, state'
This is not a problem for the ordering, as I can just do
for criteria in (choices):
different_table.order(criteria)
The problem comes at the matching stage: the .search method is
expecting a list of the pieces it is supposed to find, so what I need is
a way to apply, for example, 'first_name[:1], last_name[:5],
dlvryaddrs[:8]', to the current record to yield the text to search for.
Current code follows, more comments towards the end.
<code>
import dbf
import shutil
from collections import defaultdict
from cookbook.utils import index
source_tables = [ '/temp/kaz15514',
'/temp/kks15515',
'/temp/kmn15585',
'/temp/knv15516',
'/temp/ktx15722',
'/temp/kwa15584',
'/temp/mco15902',
'/temp/msq15994' ]
counts = defaultdict(int)
for i in index(source_tables):
source_tables[i] = dbf.Table(source_tables[i])
shutil.copy('z:/orders/25105/mbk16508_02', '.')
match_back = dbf.Table('mbk16508_02')
match_back.add_fields('f1ltcta4 C(100), f1l5a8 C(100), ftltctst C(100)')
for field, criteria in \
(('f1ltcta4', 'first_name[:1], last_name, city, dlvryaddrs[:4]'),
('f1l5a8', 'first_name[:1], last_name[:5], dlvryaddrs[:8]'),
('ftltctst', 'first_name, last_name, city, state'))
match_back.order(criteria)
for table in source_tables:
counts = defaultdict(int)
for record in match_back:
matches = table.search(?????)
.
.
.
</code>
The only idea I have at the moment is to parse the string (much like I
do in the order method), and after the string is parsed pluck out the
the needed pieces. If that is the best and/or most practical way to do
it, I was thinking of adding __call__ to the record class. Then,
besides being able to do:
matches = table.search([record.zip4[:5], record.dlvryaddrs])
I could also do:
matches = table.search(record('zip4[:5], dlvryaddrs'))
or, equivalently,
criteria = 'this, that[:7], the_other'
matches = table.search(record(criteria))
Any better ideas? Am I missing anything already in the stdlib?
Any and all tips appreciated!
~Ethan~
P.S.
<shameless plug> Python d-Base currently lives at
http://groups.google.com/group/python-dbase, and it's main purpose in
life is to ease the transition between old dbf files and newer sql
tables. It can, however, be used for read/write access to dBase III and
VFP 6 tables, including memo fields.
Success stories and bug reports both equally welcome! :D
</shameless plug>
More information about the Python-list
mailing list