[Tutor] how to sort the file out
Peter Otten
__peter__ at web.de
Wed Sep 7 10:52:03 CEST 2011
lina wrote:
> HI, I have two files, one is reference file, another is waiting for adjust
> one,
>
> File 1:
>
> 1 C1
> 2 O1
[...]
> 33 C19
> 34 O5
> 35 C21
>
> File 2:
> 3 H16
> 4 H5
[...]
> 39 H62
> 40 O2
> 41 H22
>
> I wish the field 2 from file 2 arranged the same sequence as the field
> 2 of file 1.
>
> Thanks for any suggestions,
>
> I drove my minds into nuts already, three hours passed and I still
> failed to achieve this.
You could have written the above after three minutes. To get the most out of
this mailing list you should give some details of what you tried and how it
failed. This gives us valuable information about your level of knowledge and
confidence that you are trying to learn rather than get solutions on the
cheap.
However, I'm in the mood for some spoonfeeding:
indexfile = "tmp_index.txt"
datafile = "tmp_data.txt"
sorteddatafile = "tmp_output.txt"
def make_lookup(lines):
r"""Build a dictionary that maps the second column to the line number.
>>> make_lookup(["aaa bbb\n", "ccc ddd\n"]) == {'bbb': 0, 'ddd': 1}
True
"""
position_lookup = {}
for lineno, line in enumerate(lines):
second_field = line.split()[1]
position_lookup[second_field] = lineno
return position_lookup
with open(indexfile) as f:
position_lookup = make_lookup(f)
# With your sample data the global position_lookup dict looks like this now:
# {'C1': 0, 'O1': 1, 'C2': 2,... , 'O5': 33, 'C21': 34}
def get_position(line):
r"""Extract the second field from the line and look up the
associated line number in the global position_lookup dictionary.
Example:
get_position("15 C2\n")
The line is split into ["15", "C2"]
The second field is "C2"
Its associated line number in position_lookup: 2
--> the function returns 2
"""
second_field = line.split()[1]
return position_lookup[second_field]
with open(datafile) as f:
# sort the lines in the data file using the line number in the index
# file as the sort key
lines = sorted(f, key=get_position)
with open(sorteddatafile, "w") as f:
f.writelines(lines)
More information about the Tutor
mailing list