[Tutor] improve the code
Peter Otten
__peter__ at web.de
Fri Nov 4 09:10:42 CET 2011
lina wrote:
> On Wed, Nov 2, 2011 at 12:14 AM, Peter Otten <__peter__ at web.de> wrote:
>> lina wrote:
>>
>>>> sorted(new_dictionary.items())
>>>
>>> Thanks, it works, but there is still a minor question,
>>>
>>> can I sort based on the general numerical value?
>>>
>>> namely not:
>>> :
>>> :
>>> 83ILE 1
>>> 84ALA 2
>>> 8SER 0
>>> 9GLY 0
>>> :
>>> :
>>>
>>> rather 8 9 ...83 84,
>>>
>>> Thanks,
>>
>> You need a custom key function for that one:
>>
>>>>> import re
>>>>> def gnv(s):
>> ... parts = re.split(r"(\d+)", s)
>> ... parts[1::2] = map(int, parts[1::2])
>> ... return parts
>> ...
>>>>> items = [("83ILE", 1), ("84ALA", 2), ("8SER", 0), ("9GLY", 0)]
>>>>> sorted(items, key=lambda pair: (gnv(pair[0]), pair[1]))
>> [('8SER', 0), ('9GLY', 0), ('83ILE', 1), ('84ALA', 2)]
>
>
> Thanks, I can follow the procedure and get the exact results, but
> still don't understand this part
>
> parts = re.split(r'"(\d+)",s)
>
> r"(\d+)", sorry,
>
>>>> items
> [('83ILE', 1), ('84ALA', 2), ('8SER', 0), ('9GLY', 0)]
>
>
>>>> parts = re.split(r"(\d+)",items)
> Traceback (most recent call last):
> File "<pyshell#78>", line 1, in <module>
> parts = re.split(r"(\d+)",items)
> File "/usr/lib/python3.2/re.py", line 183, in split
> return _compile(pattern, flags).split(string, maxsplit)
> TypeError: expected string or buffer
I was a bit lazy and hoped you would accept the gnv() function as a black
box...
Here's a step-through:
re.split() takes a pattern where to split the string and a string. In the
following example the pattern is the character "_":
>>> re.split("_", "alpha_beta___gamma")
['alpha', 'beta', '', '', 'gamma']
You can see that this simple form works just like
"alpha_beta___gamma".split("_"), and finds an empty string between two
adjacent "_". If you want both "_" and "___" to work as a single separator
you can change the pattern to "_+", where the "+" means one or more of the
previous:
>>> re.split("_+", "alpha_beta___gamma")
['alpha', 'beta', 'gamma']
If we want to keep the separators, we can wrap the whole expression in
parens:
>>> re.split("(_+)", "alpha_beta___gamma")
['alpha', '_', 'beta', '___', 'gamma']
Now for the step that is a bit unobvious: we can change the separator to
include all digits. Regular expressions have two ways to spell "any digit":
[0-9] or \d:
>>> re.split("([0-9]+)", "alpha1beta123gamma")
['alpha', '1', 'beta', '123', 'gamma']
I chose the other (which will also accept non-ascii digits)
>>> re.split(r"(\d+)", "alpha1beta123gamma")
['alpha', '1', 'beta', '123', 'gamma']
At this point we are sure that the list contains a sequence of non-integer-
str, integer-str, ..., non-integer-str, the first and the last always being
a non-integer str.
>>> parts = re.split(r"(\d+)", "alpha1beta123gamma")
So
>>> parts[1::2]
['1', '123']
will always give us the parts that can be converted to an integer
>>> parts
['alpha', '1', 'beta', '123', 'gamma']
>>> parts[1::2] = map(int, parts[1::2])
>>> parts
['alpha', 1, 'beta', 123, 'gamma']
We need to do the conversion because strings won't sort the way we like:
>>> sorted(["2", "20", "10"])
['10', '2', '20']
>>> sorted(["2", "20", "10"], key=int)
['2', '10', '20']
We now have the complete gnv() function
>>> def gnv(s):
... parts = re.split(r"(\d+)", s)
... parts[1::2] = map(int, parts[1::2])
... return parts
...
and can successfully sort a simple list of strings like
>>> values = ["83ILE", "84ALA", "8SER", "9GLY"]
>>> sorted(values, key=gnv)
['8SER', '9GLY', '83ILE', '84ALA']
The sorted() function calls gnv() internally for every item in the list and
uses the results to determine the order of the items. When
sorted()/list.sort() did not feature the key argument you could do this
manually with "decorate sort undecorate":
>>> decorated = [(gnv(item), item) for item in values]
>>> decorated
[(['', 83, 'ILE'], '83ILE'), (['', 84, 'ALA'], '84ALA'), (['', 8, 'SER'],
'8SER'), (['', 9, 'GLY'], '9GLY')]
>>> decorated.sort()
>>> decorated
[(['', 8, 'SER'], '8SER'), (['', 9, 'GLY'], '9GLY'), (['', 83, 'ILE'],
'83ILE'), (['', 84, 'ALA'], '84ALA')]
>>> undecorated
['8SER', '9GLY', '83ILE', '84ALA']
For your actual data
>>> items
[('83ILE', 1), ('84ALA', 2), ('8SER', 0), ('9GLY', 0)]
you need to extract the first from an (x, y) pair
>>> def first_gnv(item):
... return gnv(item[0])
...
>>> first_gnv(("83ILE", 1))
['', 83, 'ILE']
but what if there are items with the same x? In that case the order is
undefined:
>>> sorted([("83ILE", 1), ("83ILE", 2)], key=first_gnv)
[('83ILE', 1), ('83ILE', 2)]
>>> sorted([("83ILE", 2), ("83ILE", 1)], key=first_gnv)
[('83ILE', 2), ('83ILE', 1)]
Let's take y into account, too:
>>> def first_gnv(item):
... return gnv(item[0]), item[1]
...
>>> sorted([("83ILE", 1), ("83ILE", 2)], key=first_gnv)
[('83ILE', 1), ('83ILE', 2)]
>>> sorted([("83ILE", 2), ("83ILE", 1)], key=first_gnv)
[('83ILE', 1), ('83ILE', 2)]
We're done!
>>> sorted(items, key=first_gnv)
[('8SER', 0), ('9GLY', 0), ('83ILE', 1), ('84ALA', 2)]
(If you look back into my previous post, can you find the first_gnv()
function?)
More information about the Tutor
mailing list