[Python-Dev] Cost-Free Slice into FromString constructors--Long

Bob Ippolito bob at redivi.com
Thu May 25 17:36:00 CEST 2006

On May 25, 2006, at 3:28 PM, Jean-Paul Calderone wrote:

> On Thu, 25 May 2006 15:01:36 +0000, Runar Petursson  
> <runar at runar.net> wrote:
>> We've been talking this week about ideas for speeding up the  
>> parsing of
>> Longs coming out of files or network.  The use case is having a  
>> large string
>> with embeded Long's and parsing them to real longs.  One approach  
>> would be
>> to use a simple slice:
>> long(mystring[x:y])
>> an expensive operation in a tight loop.  The proposed solution is  
>> to add
>> further keyword arguments to Long (such as):
>> long(mystring, base=10, start=x, end=y)
>> The start/end would allow for negative indexes, as slices do, but  
>> otherwise
>> simply limit the scope of the parsing.  There are other solutions,  
>> using
>> buffer-like objects and such, but this seems like a simple win for  
>> anyone
>> parsing a lot of text.  I implemented it in a branch  runar- 
>> longslice-
>> branch,
>> but it would need to be updated with Tim's latest improvements to  
>> long.
>> Then you may ask, why not do it for everything else parsing from  
>> string--to
>> which I say it should.  Thoughts?
> This really seems like a poor option.  Why fix the problem with a  
> hundred special cases instead of a single general solution?
> Hmm, one reason could be that the general solution doesn't work:
>   exarkun at kunai:~$ python
>   Python 2.4.3 (#2, Apr 27 2006, 14:43:58)
>   [GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
>   Type "help", "copyright", "credits" or "license" for more  
> information.
>>>> long(buffer('1234', 0, 3))
>   Traceback (most recent call last):
>     File "<stdin>", line 1, in ?
>   ValueError: null byte in argument for long()
>>>> long(buffer('123a', 0, 3))
>   Traceback (most recent call last):
>     File "<stdin>", line 1, in ?
>   ValueError: invalid literal for long(): 123a

One problem with buffer() is that it does a memcpy of the buffer. A  
zero-copy version of buffer (a view on some object that implements  
the buffer API) would be nice.


