[Tutor] Re: removal of duplicates from .csv files

Rob Andrews randrews@planhouse.com
Fri, 26 Jan 2001 16:43:09 -0600


Belated thanks to everyone who's lent helpful thoughts on this question and
on my question about working with replace().  I'm not quite there with
either of them, and together they constitute the biggest Python challenge
I've undertaken so far, so I look forward to hacking on this quite a bit
over the next few days.

My understanding is that the .csv project is a big enough challenge that if
I make any real progress with it, it'll make a nice little Open Source
contribution.  So if I get very far with it, that's my plan.  If nothing
else, I find myself so engrossed that I'm learning more Python, which makes
me happy 'nuff.

Rob

----- Original Message -----
From: <alan.gauld@bt.com>
To: <slyskawa@yahoo.com>; <tutor@python.org>
Sent: Friday, January 26, 2001 11:41 AM
Subject: RE: [Tutor] Re: removal of duplicates from .csv files


> > > I have been given several comma-delimited (.csv) files,
> > > charged with is to remove duplicate entries.
>
> > One approach you may want to consider is to create a
> > dictionary with the phone number and/or address as a key.
>
> That was the approach I was going to suggest provided
> you have enough memory...
>
> One question you must answer is which duplicate you want to emilinate.
> Assuming only the 2 key fields are duplicates
> which of the other data is the riught one to keep?
>
> If its always the first one then thats easier using
> sort and a custom compare function, if its always the
> last one thats easier with a dictionary...
>
> Alan G.
>
> _______________________________________________
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor