[BangPypers] Beginner questions with pandas and sql

Anupam Mediratta mediratta at gmail.com
Fri Oct 13 08:07:02 EDT 2017


Hello Folks,

I started using Pandas and am running into some issues while using it.

Primarily my question is how to:


   - Set primary key or index in dataframe
   - How to construct join of two dataframes


More precisely:

1. Doing sql queries on dataframe takes lot more time than what I would
expect (same query on excel filter is much faster)

I am setting the primary key as

import pandas as pd

df_oi = df_o.set_index(['c_br_code', 'n_srno', 'c_item_code']) where,

df_o = pd.read_csv('order_c.csv', encoding='latin1')

and values in red are the columns and their combination is the primary key.

similary I have another csv file for which I construct df_gi and then I do
a join query like this:

sq="select * from df_oi join df_gi on df_oi.c_br_code =
df_gi.c_order_br_code and df_oi.n_srno = df_gi.n_order_no and
df_oi.c_item_code = df_gi.c_item_code"

But this query never ends. It takes 5 GB of memory where as the two csv
files are 250 and 350 MB respectively.

Also if I do any select query on df_oi it still takes lot of time (much
more than excel filters).

So I am sure I am missing something there. Can you please help.

Thanks


More information about the BangPypers mailing list