speed up pandas calculation

Vincent Davis vincent at vincentdavis.net
Thu Jul 31 02:57:15 CEST 2014


On Wed, Jul 30, 2014 at 6:28 PM, Vincent Davis <vincent at vincentdavis.net>
wrote:

> The real slow part seems to be
> for n in drugs:
>     df[n] =
> df[['MED1','MED2','MED3','MED4','MED5']].isin([drugs[n]]).any(1)
>

​I was wrong, this is fast, it was selecting the columns that was slow.
using
keep_col = ['PATCODE', 'PATWT', 'VDAYR', 'VMONTH', 'MED1', 'MED2', 'MED3',
'MED4', 'MED5']
df = df[keep_col]

took the time down from 19sec to 2 sec.


Vincent Davis
720-301-3003
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20140730/57887a7f/attachment.html>


More information about the Python-list mailing list