[Python-Dev] Release of astoptimizer 0.3

Serhiy Storchaka storchaka at gmail.com
Wed Sep 12 09:36:24 CEST 2012


On 12.09.12 00:47, Victor Stinner wrote:
>> set([x for ...]) => {x for ...}
>> dict([(k, v) for ...]) => {k: v for ...}
>> dict((k, v) for ...) => {k: v for ...}
>> ''.join([s for ...]) => ''.join(s for ...)
>> a.extend([s for ...]) => a.extend(s for ...)
>
> These optimizations look correct.

Actually generator can be slower list comprehension. Especially on 
Python2. I think this is an opportunity to optimize the work with 
generators.

>> (f(x) for x in a) => map(f, a)
>> (x.y for x in a) => map(operator.attrgetter('y'), a)
>> (x[0] for x in a) => map(operator.itemgetter(0), a)
>> (2 * x for x in a) => map((2).__mul__, a)
>> (x in b for x in a) => map(b.__contains__, a)
>> map(lambda x: x.strip(), a) => (x.strip() for x in a)
>
> Is it faster? :-)

Yes, significantly for large sequences. But this transformation is not 
safe in general case. For short sequences possible regression (cost of 
"map" name lookup and function call).

>> x in ['i', 'em', 'cite'] => x in {'i', 'em', 'cite'}
>
> A list can contain non-hashable objects, whereas a set can not.

Agree, it applicable if x is proven str. At least list can be replaced 
by tuple.

>> x == 'i' or x == 'em' or x == 'cite'] => x in {'i', 'em', 'cite'}
>
> You need to know the type of x. Depending on the type, x.__eq__ and
> x.__contains__ may be completly different.

Then => x in ('i', 'em', 'cite') and move forward only if x obviously is 
of the appropriate type.

>> for ...: f.write(...) => __fwrite = f.write; for ...: __fwrite(...)
>
> f.write lookup cannot be optimized.

Yes, it is a dangerous transformation and it is difficult to prove its 
safety. But name lookup is one of the main brakes of Python.

>> x = x + 1 => x += 1
>> x = x + ' ' => x += ' '
>
> I don't know if these optimizations are safe.

It is safe if x is proven number or string. If x is local variable, 
initialized by number/string and modified only by number/string. 
Counters and string accumulators are commonly used.

>> 'x=%s' % repr(x) => 'x=%a' % (x,)
>
> I don't understand this one.

Sorry, it should be => 'x=%r' % (x,). And for more arguments: 'x[' + 
repr(k) + ']=' + repr(v) + ';' => 'x[%r]=%r;' % (k, v). Same for str and 
ascii.

It is not safe (repr can be shadowed).

>> 'x=%s' % x + s => 'x=%s%s' % (x, s)
>> x = x + ', [%s]' % y => x = '%s, [%s]' % (x, y)
>
> Doesn't work if s type is not str.

Yes, this is partially applicable. In many cases, s is a literal or the 
newly formatted string.

>> range(0, x) => range(x)
>
> Is it faster?

Slightly.

>> while True: s = f.readline(); if not s: break; ... => for s in f: ...
>
> Too much assumptions on f type.

I personally would prefer a 2to3-like "modernizer" (as a separate 
utility and as plugins for the IDEs), which would have found some 
templates and offered replacing by a more modern, readable (and possibly 
effective) variant. The decision on the applicability of the 
transformation in the particular case remains for the human. For the 
automatic optimizer remain only simple transformations which deteriorate 
readability, and optimizations which cannot be expressed in the source code.



More information about the Python-Dev mailing list