pandas split and melt()
Peter Otten
__peter__ at web.de
Wed Jun 26 04:13:55 EDT 2019
Sayth Renshaw wrote:
> Hi
>
> Having fun with pandas filtering a work excel file.
> My current script opens selected and filters the data and saves as excel.
>
> import pandas as pd
> import numpy as np
>
> log = pd.read_excel("log_dump_py.xlsx")
> df = log.filter(items=['Completed', 'Priority', 'Session date',
> 'Consultant', 'Coach',
> 'Delivery Method', 'Focus of Coaching', 'Leader', 'Site',
> 'Coaching Description','Motor/Property',
> ],)
> completed_tasks = df.loc[(df['Completed'] == 'Yes') &
> (df['Motor/Property'] == 'Motor') & (df['Delivery Method'] == 'Group
> Coaching')] print(completed_tasks.head(n=5))
> completed_tasks.to_excel("filtered_logs.xlsx")
>
> This leaves me with a set of several columns. The main column of concern
> for this example is a consultant
>
> Session date Consultant
> 2019-06-21 11:15:00 WNEWSKI, Joan;#17226;#BALIN,
Jock;#18139;#DUNE,
> Colem;#17230;
>
> How can I split the consultant column, keep only names and drop the
> numbers and for every session date create a line with data and consultants
> name?
>
> NB. There are varied amounts of consultants so splitting across columns is
> uneven. if it was even melt seems like it would be good
> https://dfrieds.com/data-analysis/melt-unpivot-python-pandas
>
>
> Thanks
>
> Sayth
Since I didn't find a cool shortcut I decided to use brute force:
$ cat pandas_explode_column.py
import pandas as pd
df = pd.DataFrame(
[
[
"2019-06-21 11:15:00",
"WNEWSKI, Joan;#17226;#BALIN, Jock;#18139;#DUNE, Colem;#17230;"
],
[
"2019-06-22 10:00:00", "Doe, John;#42;Robbins, Rita;"
]
],
columns=["Session date", "Consultant"]
)
def explode_consultants(consultants):
consultants = (c.lstrip("#") for c in consultants.split(";"))
return (c for c in consultants if c.strip("0123456789"))
def explode_column(df, column, split):
for _index, row in df.iterrows():
for part in split(row[column]):
yield [part if c == column else row[c] for c in df.columns]
def explode(df, column, split):
return pd.DataFrame(
explode_column(df, "Consultant", split), columns=df.columns
)
df2 = explode(df, "Consultant", explode_consultants)
print(df)
print(df2)
$ python3 pandas_explode_column.py
Session date Consultant
0 2019-06-21 11:15:00 WNEWSKI, Joan;#17226;#BALIN, Jock;#18139;#DUNE...
1 2019-06-22 10:00:00 Doe, John;#42;Robbins, Rita;
[2 rows x 2 columns]
Session date Consultant
0 2019-06-21 11:15:00 WNEWSKI, Joan
1 2019-06-21 11:15:00 BALIN, Jock
2 2019-06-21 11:15:00 DUNE, Colem
3 2019-06-22 10:00:00 Doe, John
4 2019-06-22 10:00:00 Robbins, Rita
[5 rows x 2 columns]
$
More information about the Python-list
mailing list