pandas split and melt()
Sayth Renshaw
flebber.crue at gmail.com
Wed Jun 26 04:53:24 EDT 2019
>
> Since I didn't find a cool shortcut I decided to use brute force:
>
> $ cat pandas_explode_column.py
> import pandas as pd
>
> df = pd.DataFrame(
> [
> [
> "2019-06-21 11:15:00",
> "WNEWSKI, Joan;#17226;#BALIN, Jock;#18139;#DUNE, Colem;#17230;"
> ],
> [
> "2019-06-22 10:00:00", "Doe, John;#42;Robbins, Rita;"
> ]
> ],
> columns=["Session date", "Consultant"]
> )
>
> def explode_consultants(consultants):
> consultants = (c.lstrip("#") for c in consultants.split(";"))
> return (c for c in consultants if c.strip("0123456789"))
>
> def explode_column(df, column, split):
> for _index, row in df.iterrows():
> for part in split(row[column]):
> yield [part if c == column else row[c] for c in df.columns]
>
> def explode(df, column, split):
> return pd.DataFrame(
> explode_column(df, "Consultant", split), columns=df.columns
> )
>
> df2 = explode(df, "Consultant", explode_consultants)
>
> print(df)
> print(df2)
> $ python3 pandas_explode_column.py
> Session date Consultant
> 0 2019-06-21 11:15:00 WNEWSKI, Joan;#17226;#BALIN, Jock;#18139;#DUNE...
> 1 2019-06-22 10:00:00 Doe, John;#42;Robbins, Rita;
>
> [2 rows x 2 columns]
> Session date Consultant
> 0 2019-06-21 11:15:00 WNEWSKI, Joan
> 1 2019-06-21 11:15:00 BALIN, Jock
> 2 2019-06-21 11:15:00 DUNE, Colem
> 3 2019-06-22 10:00:00 Doe, John
> 4 2019-06-22 10:00:00 Robbins, Rita
>
> [5 rows x 2 columns]
> $
Mind a little blown :-). Going to have to play and break this several times to fully get it.
Thanks
Sayth
More information about the Python-list
mailing list