pandas split and melt()

Wed Jun 26 07:08:56 EDT 2019

Sayth Renshaw wrote:

> Peter Otten wrote:

>> def explode_consultants(consultants):

Should have called that split_consultants(); takes a string and

>>         consultants = (c.lstrip("#") for c in consultants.split(";"))

splits by ";",  removes leading "#"

>>         return (c for c in consultants if c.strip("0123456789"))

filters out the digit-only values.

>> def explode_column(df, column, split):
>>     for _index, row in df.iterrows():

iterates over the rows of the data frame

>>         for part in split(row[column]):

iterates over the names extracted by explode_consultants() from the 
"Consultants" field

>>             yield [part if c == column else row[c] for c in df.columns]

Makes one row per consultant, replacing the contents of the Consultants 
column with the current consultant (bound to the part variable, e. g. with 
part = "Doe, John"

    yield [
        "Doe, John" if c == "Consultants" else row[c] 
        for c in ["Session Date", "Consultants"]
    ]

>> def explode(df, column, split):

Make a new data frame from the rows generated by explode_column(), reusing 
the column names from the original data frame.

>>     return pd.DataFrame(
>>         explode_column(df, "Consultant", split), columns=df.columns

Here's a bug -- "Consultant" is hardcoded, but the column argument should be 
instead.

>>     )

>> df2 = explode(df, "Consultant", explode_consultants)
>> 
>> print(df)
>> print(df2)
>> $ python3 pandas_explode_column.py
>>           Session date                                         Consultant
>> 0  2019-06-21 11:15:00  WNEWSKI, Joan;#17226;#BALIN, Jock;#18139;#DUNE...
>> 1  2019-06-22 10:00:00                       Doe, John;#42;Robbins, Rita;
>> 
>> [2 rows x 2 columns]
>>           Session date     Consultant
>> 0  2019-06-21 11:15:00  WNEWSKI, Joan
>> 1  2019-06-21 11:15:00    BALIN, Jock
>> 2  2019-06-21 11:15:00    DUNE, Colem
>> 3  2019-06-22 10:00:00      Doe, John
>> 4  2019-06-22 10:00:00  Robbins, Rita
>> 
>> [5 rows x 2 columns]
>> $
> 
> Mind a little blown :-). Going to have to play and break this several
> times to fully get it.

Here's another usage example for explode() which "explodes" the "ham" column 
using range():

>>> df3 = pd.DataFrame([[2, "foo"], [3, "bar"], [0, "baz"]], columns=["ham", 
"spam"])
>>> explode(df3, "ham", range)
   ham spam
0    0  foo
1    1  foo
2    0  bar
3    1  bar
4    2  bar

[5 rows x 2 columns]