I could use some help making this Python code run faster using only Python code.

Thu Sep 20 19:35:41 EDT 2007

mensanator at aol.com wrote:
> On Sep 20, 5:46 pm, Paul Hankin <paul.han... at gmail.com> wrote:
>> On Sep 20, 10:59 pm, Python Maniac <raych... at hotmail.com> wrote:
>>
>>> I am new to Python however I would like some feedback from those who
>>> know more about Python than I do at this time.
>>> def scrambleLine(line):
>>>     s = ''
>>>     for c in line:
>>>         s += chr(ord(c) | 0x80)
>>>     return s
>>> def descrambleLine(line):
>>>     s = ''
>>>     for c in line:
>>>         s += chr(ord(c) & 0x7f)
>>>     return s
>>> ...
>> Well, scrambleLine will remove line-endings, so when you're
>> descrambling
>> you'll be processing the entire file at once. This is particularly bad
>> because of the way your functions work, adding a character at a time
>> to
>> s.
>>
>> Probably your easiest bet is to iterate over the file using read(N)
>> for some small N rather than doing a line at a time. Something like:
>>
>> process_bytes = (descrambleLine, scrambleLine)[action]
>> while 1:
>>     r = f.read(16)
>>     if not r: break
>>     ff.write(process_bytes(r))
>>
>> In general, rather than building strings by starting with an empty
>> string and repeatedly adding to it, you should use ''.join(...)
>>
>> For instance...
>> def descrambleLine(line):
>>   return ''.join(chr(ord(c) & 0x7f) for c in line)
>>
>> def scrambleLine(line):
>>   return ''.join(chr(ord(c) | 0x80) for c in line)
>>
>> It's less code, more readable and faster!
> 
> I would have thought that also from what I've heard here.
> 
> def scrambleLine(line):
>     s = ''
>     for c in line:
>         s += chr(ord(c) | 0x80)
>     return s
> 
> def scrambleLine1(line):
>     return ''.join([chr(ord(c) | 0x80) for c in line])
> 
> if __name__=='__main__':
>     from timeit import Timer
>     t = Timer("scrambleLine('abcdefghijklmnopqrstuvwxyz')", "from
> __main__ import scrambleLine")
>     print t.timeit()
> 
> ##  scrambleLine
> ##  13.0013366039
> ##  12.9461998318
> ##
> ##  scrambleLine1
> ##  14.4514098748
> ##  14.3594400695
> 
> How come it's not? Then I noticed you don't have brackets in
> the join statement. So I tried without them and got
> 
> ##  17.6010847978
> ##  17.6111472418
> 
> Am I doing something wrong?
> 
>> --
>> Paul Hankin
> 
> 

I got similar results as well. I believe the reason for join actually 
performing slower is because join iterates twice over the sequence. [1] 
The first time is to determine the size of the buffer to allocate and 
the second is to populate the buffer.

Ian

[1] http://mail.python.org/pipermail/python-list/2007-September/458119.html