[Tutor] Merging table-like files with overlapping values in one column

Kent Johnson kent37 at tds.net
Thu Aug 21 12:42:31 CEST 2008


On Thu, Aug 21, 2008 at 5:56 AM, Kat <think_fishbone at yahoo.com> wrote:

> I have several input files where in each file, every line has a space-separated pair values. The files are essentially tables with two columns. There are no duplicates in the first column values within each file, but they overlap when all files are considered. I'd like to merge them into one file according to values of the first column of each file with values from the second column of all files combined
>
> My second idea is to convert each file into a dictionary (since the first column's values are unique within each file), then I can create a combined dictionary which allows multiple values to each key, then output that. Does that sound reasonable?

Yes, as long as the order of entries doesn't matter - a dict does not
preserve order. Make a dict that maps a key to a list of values
(collections.defaultdict is useful for this). Read each file and add
its pairs to the dict. Then iterate the dict and write to a new file.

If you do care about order, there are various implementations of
ordered dictionaries available.

Kent


More information about the Tutor mailing list