[Tutor] Only one value different columns

Alan Gauld alan.gauld at yahoo.co.uk
Thu Mar 19 06:53:43 EDT 2020


On 19/03/2020 09:42, André Pinto wrote:
> I have a dataset with 22500 items (rows) and most rows show columns withX
> the same values. 

Most? Or always?

> It is like this:
> 
> Item    col1   col2    col 3     col 4  col5
> XYZ    4                     4            4
> PQR               12                               12

> I need so:
> 
> Item    col1   col2    col 3     col 4  col5
> XYZ                            4            
> PQR                                                   12

It is not obvious from that which column the output should use.
Is it always the same column that the second instance of the
value was originally in? Or is that just coincidence in your example?

> How can I keep only one value per row, determining the column I want with
> the respective value.

Yes, provided you can define the rules. We have insufficient knowledge
to do so.

But you can write a function that will process the row and
return a new row.

> Note: for each line the column I want to keep the value changes.
> It is possible?

Of course provided you have a set of rules (or an equation)
that predicts which column. Or a table of column against row.

You can encapsulate that within another function which is
called by the row processing function.

Your top level function will look something like:

for row in data:
    results.append(process(row)

and

process(row) will extract the value, calculate the
required column (using your helper function) and build
a new output row using those values and return it.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos




More information about the Tutor mailing list