pytables - best practices / mem leaks
py_genetic
conor.robinson at gmail.com
Tue Jul 18 16:45:27 EDT 2006
py_genetic wrote:
> I have an H5 file with one group (off the root) and two large main
> tables and I'm attempting to aggragate my data into 50+ new groups (off
> the root) with two tables per sub group.
>
> sys info:
> PyTables version: 1.3.2
> HDF5 version: 1.6.5
> numarray version: 1.5.0
> Zlib version: 1.2.3
> BZIP2 version: 1.0.3 (15-Feb-2005)
> Python version: 2.4.2 (#1, Jul 13 2006, 20:16:08)
> [GCC 4.0.1 (Apple Computer, Inc. build 5250)]
> Platform: darwin-Power Macintosh (v10.4.7)
> Byte-ordering: big
>
> Ran all pytables tests included with package and recieved an OK.
>
>
> Using the following code I get one of three errors:
>
> 1. Illegal Instruction
>
> 2. Malloc(): trying to call free() twice
>
> 3. Bus Error
>
> I believe all three stem from the same issue, involving a malloc()
> memory problem in the pytable c libraries. I also believe this may be
> due to how I'm attempting to write my sorting script.
>
> The script executes fine and all goes well until I'm sorting about
> group 20 to 30 and I throw one of the three above errors depending on
> how/when I'm flush() close() the file. When I open the file after the
> error using h5ls all tables are in perfact order up to the crash and if
> I continue from the point every thing runs fine until python throws the
> same error again after another 10 sorts or so. The somewhat random
> crashing is what leads me to believe I have a memory leak or my method
> of doing this is incorrect.
>
> Is there a better way to aggragate data using pytables/python? Is there
> a better way to be doing this? This seems strait forward enough.
>
> Thanks,
> Conor
>
> #function to agg state data from main neg/pos tables into neg/pos state
> tables
>
> import string
> import tables
>
>
> def aggstate(state, h5file):
>
> print state
>
> class PosRecords(tables.IsDescription):
> sic = tables.IntCol(0, 1, 4, 0, None, 0)
> numsic = tables.IntCol(0, 1, 4, 0, None, 0)
> empsiz = tables.StringCol(1, '?', 1, None, 0)
> salvol = tables.StringCol(1, '?', 1, None, 0)
> popcod = tables.StringCol(1, '?', 1, None, 0)
> state = tables.StringCol(2, '?', 1, None, 0)
> zip = tables.IntCol(0, 1, 4, 0, None, 1)
>
> class NegRecords(tables.IsDescription):
> sic = tables.IntCol(0, 1, 4, 0, None, 0)
> numsic = tables.IntCol(0, 1, 4, 0, None, 0)
> empsiz = tables.StringCol(1, '?', 1, None, 0)
> salvol = tables.StringCol(1, '?', 1, None, 0)
> popcod = tables.StringCol(1, '?', 1, None, 0)
> state = tables.StringCol(2, '?', 1, None, 0)
> zip = tables.IntCol(0, 1, 4, 0, None, 1)
>
>
>
> group1 = h5file.createGroup("/", state+"_raw_records", state+" raw
> records")
>
> table1 = h5file.createTable(group1, "pos_records", PosRecords, state+"
> raw pos record table")
> table2 = h5file.createTable(group1, "neg_records", NegRecords, state+"
> raw neg record table")
>
> table = h5file.root.raw_records.pos_records
> point = table1.row
> for x in table.iterrows():
> if x['state'] == state:
> point['sic'] = x['sic']
> point['numsic'] = x['numsic']
> point['empsiz'] = x['empsiz']
> point['salvol'] = x['salvol']
> point['popcod'] = x['popcod']
> point['state'] = x['state']
> point['zip'] = x['zip']
>
> point.append()
>
> h5file.flush()
>
> table = h5file.root.raw_records.neg_records
> point = table2.row
> for x in table.iterrows():
> if x['state'] == state:
> point['sic'] = x['sic']
> point['numsic'] = x['numsic']
> point['empsiz'] = x['empsiz']
> point['salvol'] = x['salvol']
> point['popcod'] = x['popcod']
> point['state'] = x['state']
> point['zip'] = x['zip']
>
> point.append()
>
>
> h5file.flush()
>
>
>
> states =
> ['AL','AK','AZ','AR','CA','CO','CT','DC','DE','FL','GA','HI','ID','IL','IN','IA','KS','KY','LA','ME','MD','MA','MI','MN','MS','MO','MT','NE','NV','NH','NJ','NM','NY','NC','ND','OH','OK','OR','PA','RI','SC','SD','TN','TX','UT','VT','VA','WA','WV','WI','WY']
>
> h5file = tables.openFile("200309_data.h5", mode = 'a')
>
> for i in xrange(len(states)):
> aggstate(states[i], h5file)
>
> h5file.close()
The problem with my above posting is that h5file.flush() should be
table.flush() (flush the table not the whole object) although
h5file.flush() is an actual method I don't believe it correctly writes
to the tables, it causes all types of issues as time goes on and I
think overlaps .close() causing more issues. I also flushed the table1
and table2 after I created the new group and table1 and table2 each
iteration, things are stable now, pytables is great.
More information about the Python-list
mailing list