[Tutor] simple unicode question

Steven D'Aprano steve at pearwood.info
Tue Aug 26 14:08:33 CEST 2014


On Tue, Aug 26, 2014 at 03:58:17AM -0700, Albert-Jan Roskam wrote:

> Interesting, you follow a "look before you leap" approach here, 
> whereas `they` always say it is easier to ”ask forgiveness than 
> permission” in python. 

Anyone who says it is *ALWAYS* easier is lying or confused :-)

It is *often*, perhaps even *usually* better to use a AFTP approach, but 
not necessarily easier. Sometimes it is more work, but safer, or better. 
For instance, it is wrong (i.e. buggy) to use a LBYL approach here:


if os.path.exists(filename):
    f = open(filename)
    process(f.read())

Why is it wrong? Just because the file exists, doesn't mean you can open 
it for reading. And just because it exists at the instant you do the 
test, doesn't mean it will still exist a millisecond later when Python 
tries to open the file. Perhaps another process or user has deleted it 
in the fraction of a second between the two lines.

But sometimes AFTP is the wrong approach too. Consider a case where you 
have to do a number of things, and only if *all* of them can be done do 
you want to proceed. For example, baking a cake:

- pour the cake mix and milk into a bowl;
- mix;
- pour into a pan;
- put into the oven;
- cook for 30 minutes.

If you don't have an oven, there's no point mixing the cake mix and milk 
together, you'll just waste it. So you need to LBYL, make sure you have 
the cake mix AND the milk AND a bowl AND a pan AND an oven before you 
even start. If even one thing is missing, you don't proceed.


> But LBYL is much faster, which is relevant 
> because the function could be called millions and millions of times. 

Not necessarily! It depends on how often you have failures. Let's time 
two examples: looking up something in a dict, where the key is never 
missing. Copy and paste this code:


from timeit import Timer
setup = 'd = {"a": 1, "b": 2}'

t1 = Timer("""
if "a" in d: x = d["a"]
if "b" in d: x = d["b"]
""", setup)  # LBYL version

t2 = Timer("""
try:
    x = d["a"]
except KeyError:
    pass
try:
    x = d["b"]
except KeyError:
    pass
""", setup)  # AFTP version



And here are the results when I run it:

py> min(t1.repeat())
0.3348677200265229
py> min(t2.repeat())
0.23994551179930568

So in this case, the LBYL example is significantly slower.

Now let's try it again, only this time the key will be missing half the 
time:


t3 = Timer("""
if "a" in d: x = d["a"]
if "c" in d: x = d["c"]
""", setup)  # LBYL version

t4 = Timer("""
try:
    x = d["a"]
except KeyError:
    pass
try:
    x = d["c"]
except KeyError:
    pass
""", setup)  # AFTP version


And the results:

py> min(t3.repeat())
0.24967589927837253
py> min(t4.repeat())
0.8413973557762802

Now the LBYL version is faster.


> If have noticed before that try-except is quite an expensive structure 
> to initialize (for instance membership testing with ´in´ is cheaper 
> than try-except-KeyError when getting items from a dictionary)

Funny you should say that :-)

Actually, try...except is almost free to *initialise*. Setting up the 
try block is very, very cheap, it takes hardly any time:

py> timeit.timeit("len([])")  # No try
0.19250816199928522
py> timeit.timeit("""
... try:
...     len([])
... except: pass""")  # With try
0.21191818173974752

That's not a lot of difference: less than 0.02μs.

But, *catching* the exception is quite expensive. So the overhead of 
putting code inside a try block is negligible, virtually free, but the 
cost of actually catching the exception is quite heavy. So if there are 
lots and lots of exceptions, LBYL will probably be faster. But if they 
are rare, then the cost of looking before you leap isn't worth it, you 
should just leap. The exact crossover point will depend on how costly it 
is to look first versus the cost of the exception, but as a very rough 
rule of thumb, I go by:

if *fewer* than 1 in 10 operations will raise an exception, then use 
try...except; but if *more* than 1 in 10 operations will raise an 
exception, and it is safe to do so, then LBYL may be appropriate.



-- 
Steven


More information about the Tutor mailing list