key and ..
justin walters
walters.justin01 at gmail.com
Thu Nov 17 23:06:46 EST 2016
On Thu, Nov 17, 2016 at 7:05 PM, Val Krem via Python-list <
python-list at python.org> wrote:
>
>
> Hi all,
> Sorry for asking such a basic question butI am trying to merge two
> files(file1 and file2) and do some stuff. Merge the two files by the first
> column(key). Here is the description of files and what I would like to do.
>
>
> file1
>
> key c1 c2
> 1 759 939
> 2 345 154571
> 3 251 350711
> 4 3749 22159
> 5 676 76953
> 6 46 756
>
>
> file2
> key p1 p2
> 1 759 939
> 2 345 154571
> 3 251 350711
> 4 3915 23254
> 5 7676 77953
> 7 256 4562
>
> create file3
> a) merge the two files by (key) that exit in file1 and file2
> b) create two variables dcp1 = c1- p1 and dcp2= c2-p2
> c) sort file3 by dcp2(descending) and output
>
> create file4:- which exist in file1 but not in file2
> create file5:- that exist in file2 but not in file1;
>
>
> Desired output files
>
> file3
> key c1 c2 p1 p2 dcp1 dcp2
> 4 3749 22159 3915 23254 -166 -1095
> 5 676 76953 7676 77953 -7000 -1000
> 1 759 939 759 939 0 0
> 2 345 154571 345 154571 0 0
> 3 251 350711 251 350711 0 0
>
> file4
> key c1 p1
> 6 46 756
>
> file5
> key p1 p2
> 7 256 4562
>
>
>
> Thank you in advance
> --
> https://mail.python.org/mailman/listinfo/python-list
>
1. Take each file and read it using file.open() declaring a variable to
store the string.
2. Use list.split('\n') to split the file into an array of lines.
3. Build a list of dictionaries by splitting each line at whitespace and
calling int() on the values
of each column for each file.
4. Do what you have to do math wise between each dict storing the values in
a new dict. You can
write this out directly to the file or append it to a new list.
5. Use file.open() to write the resulting lines to a new file.
6. transform one of the lists into a set and use set.difference() or
set.intersection() to create
a new list. This list will be unordered by default, so you may want to
run it through
sorted(set, key=lambda row: row['key']).
7. repeat step 5 above to write out to file 4 and 5. no need to transform
the list into a set again.
Just find the difference/interference again.
This isn't the fastest or most efficient way of doing it, but it is
probably the most straight forward.
If these files are quite large you may want to take a different approach in
the interest of performance
and memory. If you don't want to use dicts, you should have no problem
substituting tuples or
nested lists.
The whole thing could be made into a generator as well.
Basically, there are a lot of ways to approach this.
Hope that helped at least a little bit.
More information about the Python-list
mailing list