A technique from a chatbot
Mark Bourne
nntp.mbourne at spamgourmet.com
Fri Apr 5 15:59:54 EDT 2024
Stefan Ram wrote:
> Mark Bourne <nntp.mbourne at spamgourmet.com> wrote or quoted:
>> I don't think there's a tuple being created. If you mean:
>> ( word for word in list_ if word[ 0 ]== 'e' )
>> ...that's not creating a tuple. It's a generator expression, which
>> generates the next value each time it's called for. If you only ever
>> ask for the first item, it only generates that one.
>
> Yes, that's also how I understand it!
>
> In the meantime, I wrote code for a microbenchmark, shown below.
>
> This code, when executed on my computer, shows that the
> next+generator approach is a bit faster when compared with
> the procedural break approach. But when the order of the two
> approaches is being swapped in the loop, then it is shown to
> be a bit slower. So let's say, it takes about the same time.
There could be some caching going on, meaning whichever is done second
comes out a bit faster.
> However, I also tested code with an early return (not shown below),
> and this was shown to be faster than both code using break and
> code using next+generator by a factor of about 1.6, even though
> the code with return has the "function call overhead"!
To be honest, that's how I'd probably write it - not because of any
thought that it might be faster, but just that's it's clearer. And if
there's a `do_something_else()` that needs to be called regardless of
the whether a word was found, split it into two functions:
```
def first_word_beginning_with_e(target, wordlist):
for w in wordlist:
if w.startswith(target):
return w
return ''
def find_word_and_do_something_else(target, wordlist):
result = first_word_beginning_with_e(target, wordlist)
do_something_else()
return result
```
> But please be aware that such results depend on the implementation
> and version of the Python implementation being used for the benchmark
> and also of the details of how exactly the benchmark is written.
>
> import random
> import string
> import timeit
>
> print( 'The following loop may need a few seconds or minutes, '
> 'so please bear with me.' )
>
> time_using_break = 0
> time_using_next = 0
>
> for repetition in range( 100 ):
> for i in range( 100 ): # Yes, this nesting is redundant!
>
> list_ = \
> [ ''.join \
> ( random.choices \
> ( string.ascii_lowercase, k=random.randint( 1, 30 )))
> for i in range( random.randint( 0, 50 ))]
>
> start_time = timeit.default_timer()
> for word in list_:
> if word[ 0 ]== 'e':
> word_using_break = word
> break
> else:
> word_using_break = ''
> time_using_break += timeit.default_timer() - start_time
>
> start_time = timeit.default_timer()
> word_using_next = \
> next( ( word for word in list_ if word[ 0 ]== 'e' ), '' )
> time_using_next += timeit.default_timer() - start_time
>
> if word_using_next != word_using_break:
> raise Exception( 'word_using_next != word_using_break' )
>
> print( f'{time_using_break = }' )
> print( f'{time_using_next = }' )
> print( f'{time_using_next / time_using_break = }' )
>
More information about the Python-list
mailing list