speed up pandas calculation
Skip Montanaro
skip.montanaro at gmail.com
Wed Jul 30 19:57:46 EDT 2014
> df = pd.read_csv('nhamcsopd2010.csv' , index_col='PATCODE',
low_memory=False)
> col_init = list(df.columns.values)
> keep_col = ['PATCODE', 'PATWT', 'VDAY', 'VMONTH', 'VYEAR', 'MED1',
'MED2', 'MED3', 'MED4', 'MED5']
> for col in col_init:
> if col not in keep_col:
> del df[col]
I'm no pandas expert, but a couple things come to mind. First, where is
your code slow (profile it, even with a few well-placed prints)? If it's in
read_csv there might be little you can do unless you load those data
repeatedly, and can save a pickled data frame as a caching measure. Second,
you loop over columns deciding one by one whether to keep or toss a column.
Instead try
df = df[keep_col]
Third, if deleting those other columns is costly, can you perhaps just
ignore them?
Can't be more investigative right now. I don't have pandas on Android. :-)
Skip
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20140730/5dc3a3a6/attachment.html>
More information about the Python-list
mailing list