pandas split and melt()
Peter Otten
__peter__ at web.de
Wed Jun 26 07:08:56 EDT 2019
Sayth Renshaw wrote:
> Peter Otten wrote:
>> def explode_consultants(consultants):
Should have called that split_consultants(); takes a string and
>> consultants = (c.lstrip("#") for c in consultants.split(";"))
splits by ";", removes leading "#"
>> return (c for c in consultants if c.strip("0123456789"))
filters out the digit-only values.
>> def explode_column(df, column, split):
>> for _index, row in df.iterrows():
iterates over the rows of the data frame
>> for part in split(row[column]):
iterates over the names extracted by explode_consultants() from the
"Consultants" field
>> yield [part if c == column else row[c] for c in df.columns]
Makes one row per consultant, replacing the contents of the Consultants
column with the current consultant (bound to the part variable, e. g. with
part = "Doe, John"
yield [
"Doe, John" if c == "Consultants" else row[c]
for c in ["Session Date", "Consultants"]
]
>> def explode(df, column, split):
Make a new data frame from the rows generated by explode_column(), reusing
the column names from the original data frame.
>> return pd.DataFrame(
>> explode_column(df, "Consultant", split), columns=df.columns
Here's a bug -- "Consultant" is hardcoded, but the column argument should be
instead.
>> )
>> df2 = explode(df, "Consultant", explode_consultants)
>>
>> print(df)
>> print(df2)
>> $ python3 pandas_explode_column.py
>> Session date Consultant
>> 0 2019-06-21 11:15:00 WNEWSKI, Joan;#17226;#BALIN, Jock;#18139;#DUNE...
>> 1 2019-06-22 10:00:00 Doe, John;#42;Robbins, Rita;
>>
>> [2 rows x 2 columns]
>> Session date Consultant
>> 0 2019-06-21 11:15:00 WNEWSKI, Joan
>> 1 2019-06-21 11:15:00 BALIN, Jock
>> 2 2019-06-21 11:15:00 DUNE, Colem
>> 3 2019-06-22 10:00:00 Doe, John
>> 4 2019-06-22 10:00:00 Robbins, Rita
>>
>> [5 rows x 2 columns]
>> $
>
> Mind a little blown :-). Going to have to play and break this several
> times to fully get it.
Here's another usage example for explode() which "explodes" the "ham" column
using range():
>>> df3 = pd.DataFrame([[2, "foo"], [3, "bar"], [0, "baz"]], columns=["ham",
"spam"])
>>> explode(df3, "ham", range)
ham spam
0 0 foo
1 1 foo
2 0 bar
3 1 bar
4 2 bar
[5 rows x 2 columns]
More information about the Python-list
mailing list