[Tutor] how to sort the file out

lina lina.lastname at gmail.com
Wed Sep 7 12:04:30 CEST 2011


On Wed, Sep 7, 2011 at 4:52 PM, Peter Otten <__peter__ at web.de> wrote:
> lina wrote:
>
>> HI, I have two files, one is reference file, another is waiting for adjust
>> one,
>>
>> File 1:
>>
>> 1 C1
>> 2 O1
> [...]
>> 33 C19
>> 34 O5
>> 35 C21
>>
>> File 2:
>> 3 H16
>> 4 H5
> [...]
>> 39 H62
>> 40 O2
>> 41 H22
>>
>> I wish the field 2 from file 2 arranged the same sequence as the field
>> 2 of file 1.
>>
>> Thanks for any suggestions,
>>
>> I drove my minds into nuts already, three  hours passed and I still
>> failed to achieve this.
>
> You could have written the above after three minutes. To get the most out of
> this mailing list you should give some details of what you tried and how it
> failed. This gives us valuable information about your level of knowledge and
> confidence that you are trying to learn rather than get solutions on the
> cheap.
>
> However, I'm in the mood for some spoonfeeding:

LOL ... thanks.

I am very very low leavel in python, many times I just gave up due to
frustration in using it. and escape back to bash, awk.


>
> indexfile = "tmp_index.txt"
> datafile = "tmp_data.txt"
> sorteddatafile = "tmp_output.txt"
>
> def make_lookup(lines):
>    r"""Build a dictionary that maps the second column to the line number.
>
>    >>> make_lookup(["aaa bbb\n", "ccc ddd\n"]) == {'bbb': 0, 'ddd': 1}
>    True
>    """
>    position_lookup = {}
>    for lineno, line in enumerate(lines):
>        second_field = line.split()[1]
>        position_lookup[second_field] = lineno
>    return position_lookup
>
> with open(indexfile) as f:
>    position_lookup = make_lookup(f)
>
> # With your sample data the global position_lookup dict looks like this now:
> # {'C1': 0, 'O1': 1, 'C2': 2,... , 'O5': 33, 'C21': 34}
>
> def get_position(line):
>    r"""Extract the second field from the line and look up the
>    associated line number in the global position_lookup dictionary.
>
>    Example:
>    get_position("15 C2\n")
>    The line is split into ["15", "C2"]
>    The second field is "C2"
>    Its associated line number in position_lookup: 2
>    --> the function returns 2
>    """
>    second_field = line.split()[1]
>    return position_lookup[second_field]
>
> with open(datafile) as f:
>    # sort the lines in the data file using the line number in the index
>    # file as the sort key
>    lines = sorted(f, key=get_position)
>
> with open(sorteddatafile, "w") as f:
>    f.writelines(lines)
>

It's an amazing opportunity to learn. I will try it now.

>
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>



-- 
Best Regards,

lina


More information about the Tutor mailing list