[Tutor] how to sort the file out

Peter Otten __peter__ at web.de
Wed Sep 7 10:52:03 CEST 2011


lina wrote:

> HI, I have two files, one is reference file, another is waiting for adjust
> one,
> 
> File 1:
> 
> 1 C1
> 2 O1
[...]
> 33 C19
> 34 O5
> 35 C21
> 
> File 2:
> 3 H16
> 4 H5
[...]
> 39 H62
> 40 O2
> 41 H22
> 
> I wish the field 2 from file 2 arranged the same sequence as the field
> 2 of file 1.
> 
> Thanks for any suggestions,
> 
> I drove my minds into nuts already, three  hours passed and I still
> failed to achieve this.

You could have written the above after three minutes. To get the most out of 
this mailing list you should give some details of what you tried and how it 
failed. This gives us valuable information about your level of knowledge and 
confidence that you are trying to learn rather than get solutions on the 
cheap.

However, I'm in the mood for some spoonfeeding:

indexfile = "tmp_index.txt"
datafile = "tmp_data.txt"
sorteddatafile = "tmp_output.txt"

def make_lookup(lines):
    r"""Build a dictionary that maps the second column to the line number.

    >>> make_lookup(["aaa bbb\n", "ccc ddd\n"]) == {'bbb': 0, 'ddd': 1}
    True
    """
    position_lookup = {}
    for lineno, line in enumerate(lines):
        second_field = line.split()[1]
        position_lookup[second_field] = lineno
    return position_lookup

with open(indexfile) as f:
    position_lookup = make_lookup(f)

# With your sample data the global position_lookup dict looks like this now:
# {'C1': 0, 'O1': 1, 'C2': 2,... , 'O5': 33, 'C21': 34}

def get_position(line):
    r"""Extract the second field from the line and look up the
    associated line number in the global position_lookup dictionary.
    
    Example:
    get_position("15 C2\n")
    The line is split into ["15", "C2"]
    The second field is "C2"
    Its associated line number in position_lookup: 2
    --> the function returns 2
    """
    second_field = line.split()[1]
    return position_lookup[second_field]

with open(datafile) as f:
    # sort the lines in the data file using the line number in the index
    # file as the sort key
    lines = sorted(f, key=get_position)

with open(sorteddatafile, "w") as f:
    f.writelines(lines)





More information about the Tutor mailing list