[Chennaipy] Finding duplicates of row in csv

Rajagopal Jagannathan rajagopal.jagannathan at gmail.com
Mon Feb 12 08:18:51 EST 2018


You can do this in multiple ways.

in Shell script

sort <filename> | uniq -c

If you have to use Python, then the faster way to do it would be to load
the csv into a Pandas dataframe, which should allow you to use
dataframe.duplicated()

If you don't want to use pandas then you can loop through the csv and
create a set or hashmap with the row as the key and count as the value

Hope this helps.

On Sun, Feb 11, 2018 at 9:07 AM, Saravanan Muthu <saravana4285 at gmail.com>
wrote:

> Hello All,
>      I have a csv with multiple column , and I need to figure out the
> duplicates entry ,I have imported csv and assigned the row to dictionary
> ,please share a logic to find the duplicates , sample data is below ,
>
> Name age employer
> Kumar 28 133678
> Kumar 28 133678
> Anil. 42.   133567
>
> Kumar entry need to be finded out
>
> _______________________________________________
> Chennaipy mailing list
> Chennaipy at python.org
> https://mail.python.org/mailman/listinfo/chennaipy
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/chennaipy/attachments/20180212/27e18a21/attachment.html>


More information about the Chennaipy mailing list