Text Parsing - character at a time...

Jeff Epler jepler at unpythonic.net
Fri Jul 9 08:12:43 EDT 2004


It's not clear what you mean by claiming that "creating a new string for
every character" is inefficient:
$ timeit 'int()'
100000 loops, best of 3: 1.26 usec per loop
$ timeit 'str()'
1000000 loops, best of 3: 1.28 usec per loop
$ timeit 'chr(0)'
1000000 loops, best of 3: 1.73 usec per loop

If your output is a transformation of your input, I'd write
    def transform(input):
        def _transform():
            for c in input:
                yield a string zero or more times
        return ''.join(_transform())
Python should automatically do some nice overallocation tricks to make
this fairly efficient.  You could also write
    def transform(input):
        result = ''
        for c in input:
            result.append(a string) zero or more times
        return ''.join(result)
and if you care about the absolute fastest code you'll benchmark both of
them.

A common "gotcha" for starting programmers would be to write something
like
    def transform(input):
        result = ''
        for c in input:
            result += a string zero or more times
        return result
because in this case Python won't (currently, anyway) do any clever
overallocation tricks, but instead will do a copy of the partial result
at the site of each +=.

Jeff
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20040709/ead05848/attachment.sig>


More information about the Python-list mailing list