I could use some help making this Python code run faster using only Python code.
Ian Clark
iclark at mail.ewu.edu
Thu Sep 20 19:35:41 EDT 2007
mensanator at aol.com wrote:
> On Sep 20, 5:46 pm, Paul Hankin <paul.han... at gmail.com> wrote:
>> On Sep 20, 10:59 pm, Python Maniac <raych... at hotmail.com> wrote:
>>
>>> I am new to Python however I would like some feedback from those who
>>> know more about Python than I do at this time.
>>> def scrambleLine(line):
>>> s = ''
>>> for c in line:
>>> s += chr(ord(c) | 0x80)
>>> return s
>>> def descrambleLine(line):
>>> s = ''
>>> for c in line:
>>> s += chr(ord(c) & 0x7f)
>>> return s
>>> ...
>> Well, scrambleLine will remove line-endings, so when you're
>> descrambling
>> you'll be processing the entire file at once. This is particularly bad
>> because of the way your functions work, adding a character at a time
>> to
>> s.
>>
>> Probably your easiest bet is to iterate over the file using read(N)
>> for some small N rather than doing a line at a time. Something like:
>>
>> process_bytes = (descrambleLine, scrambleLine)[action]
>> while 1:
>> r = f.read(16)
>> if not r: break
>> ff.write(process_bytes(r))
>>
>> In general, rather than building strings by starting with an empty
>> string and repeatedly adding to it, you should use ''.join(...)
>>
>> For instance...
>> def descrambleLine(line):
>> return ''.join(chr(ord(c) & 0x7f) for c in line)
>>
>> def scrambleLine(line):
>> return ''.join(chr(ord(c) | 0x80) for c in line)
>>
>> It's less code, more readable and faster!
>
> I would have thought that also from what I've heard here.
>
> def scrambleLine(line):
> s = ''
> for c in line:
> s += chr(ord(c) | 0x80)
> return s
>
> def scrambleLine1(line):
> return ''.join([chr(ord(c) | 0x80) for c in line])
>
> if __name__=='__main__':
> from timeit import Timer
> t = Timer("scrambleLine('abcdefghijklmnopqrstuvwxyz')", "from
> __main__ import scrambleLine")
> print t.timeit()
>
> ## scrambleLine
> ## 13.0013366039
> ## 12.9461998318
> ##
> ## scrambleLine1
> ## 14.4514098748
> ## 14.3594400695
>
> How come it's not? Then I noticed you don't have brackets in
> the join statement. So I tried without them and got
>
> ## 17.6010847978
> ## 17.6111472418
>
> Am I doing something wrong?
>
>> --
>> Paul Hankin
>
>
I got similar results as well. I believe the reason for join actually
performing slower is because join iterates twice over the sequence. [1]
The first time is to determine the size of the buffer to allocate and
the second is to populate the buffer.
Ian
[1] http://mail.python.org/pipermail/python-list/2007-September/458119.html
More information about the Python-list
mailing list