[Tutor] simple unicode question

Wed Aug 27 16:57:51 CEST 2014

----- Original Message -----
> From: Steven D'Aprano <steve at pearwood.info>
> To: tutor at python.org
> Cc: 
> Sent: Tuesday, August 26, 2014 2:08 PM
> Subject: Re: [Tutor] simple unicode question
> 
> On Tue, Aug 26, 2014 at 03:58:17AM -0700, Albert-Jan Roskam wrote:
> 
>>  Interesting, you follow a "look before you leap" approach here, 
>>  whereas `they` always say it is easier to ”ask forgiveness than 
>>  permission” in python. 
> 
> Anyone who says it is *ALWAYS* easier is lying or confused :-)
> 
> It is *often*, perhaps even *usually* better to use a AFTP approach, but 
> not necessarily easier. Sometimes it is more work, but safer, or better. 
> For instance, it is wrong (i.e. buggy) to use a LBYL approach here:
> 
> 
> if os.path.exists(filename):
>     f = open(filename)
>     process(f.read())
> 
> Why is it wrong? Just because the file exists, doesn't mean you can open 
> it for reading. And just because it exists at the instant you do the 
> test, doesn't mean it will still exist a millisecond later when Python 
> tries to open the file. Perhaps another process or user has deleted it 
> in the fraction of a second between the two lines.
> 
> But sometimes AFTP is the wrong approach too. Consider a case where you 
> have to do a number of things, and only if *all* of them can be done do 
> you want to proceed. For example, baking a cake:
> 
> - pour the cake mix and milk into a bowl;
> - mix;
> - pour into a pan;
> - put into the oven;
> - cook for 30 minutes.
> 
> If you don't have an oven, there's no point mixing the cake mix and milk 
> 
> together, you'll just waste it. So you need to LBYL, make sure you have 
> the cake mix AND the milk AND a bowl AND a pan AND an oven before you 
> even start. If even one thing is missing, you don't proceed.
> 
> 
>>  But LBYL is much faster, which is relevant 
>>  because the function could be called millions and millions of times. 
> 
> Not necessarily! It depends on how often you have failures. Let's time 
> two examples: looking up something in a dict, where the key is never 
> missing. Copy and paste this code:
> 
> 
> from timeit import Timer
> setup = 'd = {"a": 1, "b": 2}'
> 
> t1 = Timer("""
> if "a" in d: x = d["a"]
> if "b" in d: x = d["b"]
> """, setup)  # LBYL version
> 
> t2 = Timer("""
> try:
>     x = d["a"]
> except KeyError:
>     pass
> try:
>     x = d["b"]
> except KeyError:
>     pass
> """, setup)  # AFTP version
> 
> 
> 
> And here are the results when I run it:
> 
> py> min(t1.repeat())
> 0.3348677200265229
> py> min(t2.repeat())
> 0.23994551179930568
> 
> So in this case, the LBYL example is significantly slower.
> 
> Now let's try it again, only this time the key will be missing half the 
> time:
> 
> 
> t3 = Timer("""
> if "a" in d: x = d["a"]
> if "c" in d: x = d["c"]
> """, setup)  # LBYL version
> 
> t4 = Timer("""
> try:
>     x = d["a"]
> except KeyError:
>     pass
> try:
>     x = d["c"]
> except KeyError:
>     pass
> """, setup)  # AFTP version
> 
> 
> And the results:
> 
> py> min(t3.repeat())
> 0.24967589927837253
> py> min(t4.repeat())
> 0.8413973557762802
> 
> Now the LBYL version is faster.
> 
> 
>>  If have noticed before that try-except is quite an expensive structure 
>>  to initialize (for instance membership testing with ´in´ is cheaper 
>>  than try-except-KeyError when getting items from a dictionary)
> 
> Funny you should say that :-)
> 
> Actually, try...except is almost free to *initialise*. Setting up the 
> try block is very, very cheap, it takes hardly any time:
> 
> py> timeit.timeit("len([])")  # No try
> 0.19250816199928522
> py> timeit.timeit("""
> ... try:
> ...     len([])
> ... except: pass""")  # With try
> 0.21191818173974752
> 
> That's not a lot of difference: less than 0.02μs.
> 
> But, *catching* the exception is quite expensive. So the overhead of 
> putting code inside a try block is negligible, virtually free, but the 
> cost of actually catching the exception is quite heavy. So if there are 
> lots and lots of exceptions, LBYL will probably be faster. But if they 
> are rare, then the cost of looking before you leap isn't worth it, you 
> should just leap. The exact crossover point will depend on how costly it 
> is to look first versus the cost of the exception, but as a very rough 
> rule of thumb, I go by:
> 
> if *fewer* than 1 in 10 operations will raise an exception, then use 
> try...except; but if *more* than 1 in 10 operations will raise an 
> exception, and it is safe to do so, then LBYL may be appropriate.

Thanks a lot, Steven! This kind of stuff should be in a book. I haven't seen any Python book where this kind of "it depends on your data" kind of considerations are discussed. Do you know of any real life examples where the code is written in such a way that it could switch from AFTP to LBYL and vice versa? It would probably look ugly, and it would have overhead. But if you would switch strategy after processing, say, 1000 records, then why not do it if it saves time?

regards,
ALbert-Jan