what is happening in panda "where" clause
Peter Otten
__peter__ at web.de
Fri Sep 22 10:10:05 EDT 2017
Exposito, Pedro (RIS-MDW) wrote:
> This code does a "where" clause on a panda data frame...
>
> Code:
> import pandas as pd;
> col_names = ['Name', 'Age', 'Weight', "Education"];
> # create panda dataframe
> x = pd.read_csv('test.dat', sep='|', header=None, names = col_names);
> # apply "where" condition
> z = x[ (x['Age'] == 55) ]
> # prints row WHERE age == 55
> print (z);
>
> What is happening in this statement:
> z = x[ (x['Age'] == 55) ]
>
> Thanks,
Let's take it apart into individual steps:
Make up example data:
>>> import pandas as pd
>>> x = pd.DataFrame([["Jim", 44], ["Sue", 55], ["Alice", 66]],
columns=["Name", "Age"])
>>> x
Name Age
0 Jim 44
1 Sue 55
2 Alice 66
Have a look at the inner expression:
>>> x["Age"] == 55
0 False
1 True
2 False
So this is a basically vector of boolean values. If you want more details:
in numpy operations involving a a scalar and an array work via
"broadcasting". In pure Python you would write something similar as
>>> [v == 55 for v in x["Age"]]
[False, True, False]
Use the result as an index:
>>> x[[False, True, True]]
Name Age
1 Sue 55
2 Alice 66
[2 rows x 2 columns]
This is again in line with numpy arrays -- if you pass an array of boolean
values as an index the values in the True positions are selected. In pure
Python you could achieve that with
>>> index = [v == 55 for v in x["Age"]]
>>> index
[False, True, False]
>>> [v for b, v in zip(index, x["Age"]) if b]
[55]
More information about the Python-list
mailing list