[python-uk] A stack with better performance than using a list

Jonathan Hartley tartley at tartley.com
Thu Jun 8 12:26:17 EDT 2017


Yep, that's a great elimination of the suspicious small overheads.

line_profiler is beautiful, I'll definitely be adding it to my toolbox, 
thanks for that!

I tried a variant of accumulating the output and printing it all as a 
single string, but of course this didn't help, printing is already buffered.

     Jonathan


On 6/8/2017 03:54, Stestagg wrote:
> I honestly can't see a way to improve this in python.  My best 
> solution is:
>
> def main(lines):
>     stack = []
>     sa = stack.append
>     sp = stack.pop
>     si = stack.__getitem__
>     for line in lines:
>         meth = line[:3]
>         if meth == b'pus':
>             sa(int(line[5:]))
>         elif meth == b'pop':
>             sp()
>         else:
>             parts = line[15:].split()
>             end = len(stack)-1
>             amount = int(parts[1])
>             for x in range(int(parts[0])):
>                 index = end - x
>                 stack[index] += amount
>         print(stack[-1] if stack else None)
>
> which comes out about 25% faster than your solution.
>
> One tool that's interesting to use here is: line_profiler: 
> https://github.com/rkern/line_profiler
>
> putting a @profile decorator on the above main() call, and running 
> with kernprof produces the following output:
>
> Line #Hits TimePer Hit % TimeLine Contents
>
> ==============================================================
>
> 12 @profile
>
> 13 def main(lines):
>
> 14 144.00.0stack = []
>
> 15 2000001 9495990.5 11.5for line in lines:
>
> 16 200000011269440.6 13.7meth = line[:3]
>
> 17 2000000 9746350.5 11.8if meth == b'pus':
>
> 18 100000010027331.0 12.2stack.append(int(line[5:]))
>
> 19 1000000 4787560.55.8elif meth == b'pop':
>
> 20999999 5971140.67.2stack.pop()
>
> 21 else:
>
> 22 166.00.0parts = line[15:].split()
>
> 23 122.00.0end = len(stack)-1
>
> 24 111.00.0amount = int(parts[1])
>
> 25500001 2412270.52.9for x in range(int(parts[0])):
>
> 26500000 2734770.53.3index = end - x
>
> 27500000 3090330.63.7stack[index] += amount
>
> 28 200000022958031.1 27.8print(stack[-1])
>
>
> which shows that there's no obvious bottleneck (line by line) here 
> (for my sample data).
>
> Note the print() overhead dominates the runtime, and that's with me 
> piping the output to /dev/null directly.
>
> I had a go at using arrays, deques, and numpy arrays in various ways 
> without luck, but we're getting fairly close to the native python 
> statement execution overhead here (hence folding it all into one 
> function).
>
> My only thoughts would be to see if there were some magic that could 
> be done by offloading the work onto a non-python library somehow.
>
> Another thing that might help some situations (hence my previous 
> questions) would be to implement the add_to_first_n as a lazy operator 
> (i.e. have a stack of the add_to_first_n values and dynamically add to 
> the results of pop() but that would proabably be much slow in the 
> average case.
>
> Steve
>
> On Wed, Jun 7, 2017 at 7:34 PM Jonathan Hartley <tartley at tartley.com 
> <mailto:tartley at tartley.com>> wrote:
>
>     Hey.
>
>     Thanks for engaging, but I can't help with the most important of
>     those questions - the large data sets on which my solution failed
>     due to timeout are hidden from candidates. Not unreasonable to
>     assume that they do exercise deep stacks, and large args to
>     add_to_first_n, etc.
>
>     Yes, the input looks exactly like your example. All args are
>     integers. The question asked for output corresponding to the top
>     of the stack after every operation. I omitted this print from
>     inside the 'for' loop in 'main', thinking it irrelevant.
>
>     I converted to integers inside 'dispatch'. 'args' must have
>     actually been created with:
>
>     args = [int(i) for i in tokens[1:]]
>
>     Where len(tokens) is never going to be bigger than 3.
>
>     Return values (from 'pop') were unused.
>
>
>     On 6/7/2017 13:25, Stestagg wrote:
>>     Do you have any more context?
>>     For example, is the add_to_first_n likely to be called with very
>>     large numbers, or very often? Does the stack get very deep, or
>>     stay shallow?
>>
>>     I'm assuming that lines look like this:
>>
>>     push 1
>>     push 2
>>     add_to_first_n 2 10
>>     pop
>>     pop
>>
>>     with all arguments as integers, and the final value being
>>     returned from main()?
>>     How did you convert from string inputs to numeric values?
>>     How did you manage return values?
>>
>>     :D
>>
>>     On Wed, Jun 7, 2017 at 6:51 PM Jonathan Hartley
>>     <tartley at tartley.com <mailto:tartley at tartley.com>> wrote:
>>
>>         I recently submitted a solution to a coding challenge, in an
>>         employment context. One of the questions was to model a
>>         simple stack. I wrote a solution which appended and popped
>>         from the end of a list. This worked, but failed with timeouts
>>         on their last few automated tests with large (hidden) data sets.
>>
>>         From memory, I think I had something pretty standard:
>>
>>         class Stack:
>>
>>         def __init__(self):
>>                 self.storage = []
>>
>>         def push(arg):
>>                 self.storage.append(arg)
>>
>>         def pop():
>>         return self.storage.pop() if self.storage else None
>>
>>         def add_to_first_n(n, amount):
>>         for n in range(n):
>>                     self.storage[n] += amount
>>
>>         def dispatch(self, line)
>>                 tokens = line.split()
>>                 method = getattr(self, tokens[0])
>>                 args = tokens[1:]
>>                 method(*args)
>>
>>         def main(lines):
>>             stack = Stack()
>>         for line in lines:
>>                 stack.dispatch(line)
>>
>>
>>         (will that formatting survive? Apologies if not)
>>
>>         Subsequent experiments have confirmed that appending to and
>>         popping from the end of lists are O(1), amortized.
>>
>>         So why is my solution too slow?
>>
>>         This question was against the clock, 4th question of 4 in an
>>         hour. So I wasn't expecting to produce Cython or C optimised
>>         code in that timeframe (Besides, my submitted .py file runs
>>         on their servers, so the environment is limited.)
>>
>>         So what am I missing, from a performance perspective? Are
>>         there other data structures in stdlib which are also O(1) but
>>         with a better constant?
>>
>>         Ah. In writing this out, I have begun to suspect that my
>>         slicing of 'tokens' to produce 'args' in the dispatch is
>>         needlessly wasting time. Not much, but some.
>>
>>         Thoughts welcome,
>>
>>             Jonathan
>>
>>         -- 
>>         Jonathan Hartleytartley at tartley.com <mailto:tartley at tartley.com>     http://tartley.com
>>         Made out of meat.+1 507-513-1101 <tel:%28507%29%20513-1101>         twitter/skype: tartley
>>
>>         _______________________________________________
>>         python-uk mailing list
>>         python-uk at python.org <mailto:python-uk at python.org>
>>         https://mail.python.org/mailman/listinfo/python-uk
>>
>>
>>
>>     _______________________________________________
>>     python-uk mailing list
>>     python-uk at python.org <mailto:python-uk at python.org>
>>     https://mail.python.org/mailman/listinfo/python-uk
>
>     -- 
>     Jonathan Hartleytartley at tartley.com <mailto:tartley at tartley.com>     http://tartley.com
>     Made out of meat.+1 507-513-1101 <tel:%28507%29%20513-1101>         twitter/skype: tartley
>
>     _______________________________________________
>     python-uk mailing list
>     python-uk at python.org <mailto:python-uk at python.org>
>     https://mail.python.org/mailman/listinfo/python-uk
>
>
>
> _______________________________________________
> python-uk mailing list
> python-uk at python.org
> https://mail.python.org/mailman/listinfo/python-uk

-- 
Jonathan Hartley    tartley at tartley.com    http://tartley.com
Made out of meat.   +1 507-513-1101        twitter/skype: tartley

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-uk/attachments/20170608/347d5229/attachment-0001.html>


More information about the python-uk mailing list