[BangPypers] Beginner questions with pandas and sql
mediratta at gmail.com
Fri Oct 13 08:07:02 EDT 2017
I started using Pandas and am running into some issues while using it.
Primarily my question is how to:
- Set primary key or index in dataframe
- How to construct join of two dataframes
1. Doing sql queries on dataframe takes lot more time than what I would
expect (same query on excel filter is much faster)
I am setting the primary key as
import pandas as pd
df_oi = df_o.set_index(['c_br_code', 'n_srno', 'c_item_code']) where,
df_o = pd.read_csv('order_c.csv', encoding='latin1')
and values in red are the columns and their combination is the primary key.
similary I have another csv file for which I construct df_gi and then I do
a join query like this:
sq="select * from df_oi join df_gi on df_oi.c_br_code =
df_gi.c_order_br_code and df_oi.n_srno = df_gi.n_order_no and
df_oi.c_item_code = df_gi.c_item_code"
But this query never ends. It takes 5 GB of memory where as the two csv
files are 250 and 350 MB respectively.
Also if I do any select query on df_oi it still takes lot of time (much
more than excel filters).
So I am sure I am missing something there. Can you please help.
More information about the BangPypers