The Cost of Dynamism (was Re: Pyhon 2.x or 3.x, which is faster?)
BartC
bc at freeuk.com
Mon Mar 21 20:49:20 EDT 2016
On 21/03/2016 23:20, Dennis Lee Bieber wrote:
> On Mon, 21 Mar 2016 17:31:21 +0000, BartC <bc at freeuk.com> declaimed the
> following:
I wasn't going to post it but here it is anyway:
http://pastebin.com/FLbWSdpT
(I've added some spaces for your benefit. This also builds a histogram
of names so as to do something useful. Note that despite my concerns
about speed, this module can process itself in around 100ms.)
>>
>> def readtoken(psource):
>> global lxsptr, lxsymbol
>
> Why is "lxsymbol" a global, and not something returned by the function
> (I can understand your making lxsptr global as you intend to come back in
> with it later).
Ideally there would be a descriptor or handle passed around which
contains the current state of the tokeniser, and where you stick the
current token values. But for a speed test, I was worried about
attribute lookups.
In the first Python version, I used 'nonlocals' (belonging to an
enclosing function), but they were just as slow as globals!
>> lxsubcode = 0
>>
> Unused in the rest of the sample
This is a global. Some lxsymbol values will set it, for the rest it's
neater if it's zeroed.
>> while (1):
>
> while True: #At least since Python 2.x... No () needed
>
>> c=psource[lxsptr]
>
> Is the spacebar broken? How about some whitespace between language
> elements... They don't take up that much memory
(It's not broken but it wouldn't be consistent.)
> Given that you state you expect to only be working with 8-bit bytes...
>
>> if d<256:
>
> this will always be true
Unfortunately Python 3 doesn't play along. There could be some Unicode
characters in the string, with values above 255. (And if I used
byte-sequences, I don't know what would work and what wouldn't.)
>> lxsymbol = disptable[d](psource,c)
>
> Looks like you are indexing a 256-element table of functions, using the
> numeric value of the character/byte as the index... Only to then pass your
> entire source string along with the character from it to the function.
No, it passes only a reference to the entire string. The current
position is in 'lxsptr'. Yes the mix of parameters and globals is messy.
All globals might be better (in the original non-Python, 'globals' would
be have module-scope, and not visible outside the tokeniser module
unless explicitly exported. Semi-global...).
> I have no idea what your disptable functions look like but...
>
> while psource:
> c, psource = psource[0], psource[1:]
I don't think this will work. Slicing creates a hard copy of the rest of
the string. Performance is going to be n-squared.
(I tried a mock of this line, working with a duplicate of the data; the
time to process a 600-line module doubled. I'm still waiting on the 6MB
data data, and it's been seven minutes so far; it normally takes 7 seconds.
I was surprised at one time that slices don't create 'views', but I've
since implemented view-slices and I can appreciate the problems.)
--
Bartc
More information about the Python-list
mailing list