[Tutor] simple unicode question
Steven D'Aprano
steve at pearwood.info
Tue Aug 26 14:08:33 CEST 2014
On Tue, Aug 26, 2014 at 03:58:17AM -0700, Albert-Jan Roskam wrote:
> Interesting, you follow a "look before you leap" approach here,
> whereas `they` always say it is easier to ”ask forgiveness than
> permission” in python.
Anyone who says it is *ALWAYS* easier is lying or confused :-)
It is *often*, perhaps even *usually* better to use a AFTP approach, but
not necessarily easier. Sometimes it is more work, but safer, or better.
For instance, it is wrong (i.e. buggy) to use a LBYL approach here:
if os.path.exists(filename):
f = open(filename)
process(f.read())
Why is it wrong? Just because the file exists, doesn't mean you can open
it for reading. And just because it exists at the instant you do the
test, doesn't mean it will still exist a millisecond later when Python
tries to open the file. Perhaps another process or user has deleted it
in the fraction of a second between the two lines.
But sometimes AFTP is the wrong approach too. Consider a case where you
have to do a number of things, and only if *all* of them can be done do
you want to proceed. For example, baking a cake:
- pour the cake mix and milk into a bowl;
- mix;
- pour into a pan;
- put into the oven;
- cook for 30 minutes.
If you don't have an oven, there's no point mixing the cake mix and milk
together, you'll just waste it. So you need to LBYL, make sure you have
the cake mix AND the milk AND a bowl AND a pan AND an oven before you
even start. If even one thing is missing, you don't proceed.
> But LBYL is much faster, which is relevant
> because the function could be called millions and millions of times.
Not necessarily! It depends on how often you have failures. Let's time
two examples: looking up something in a dict, where the key is never
missing. Copy and paste this code:
from timeit import Timer
setup = 'd = {"a": 1, "b": 2}'
t1 = Timer("""
if "a" in d: x = d["a"]
if "b" in d: x = d["b"]
""", setup) # LBYL version
t2 = Timer("""
try:
x = d["a"]
except KeyError:
pass
try:
x = d["b"]
except KeyError:
pass
""", setup) # AFTP version
And here are the results when I run it:
py> min(t1.repeat())
0.3348677200265229
py> min(t2.repeat())
0.23994551179930568
So in this case, the LBYL example is significantly slower.
Now let's try it again, only this time the key will be missing half the
time:
t3 = Timer("""
if "a" in d: x = d["a"]
if "c" in d: x = d["c"]
""", setup) # LBYL version
t4 = Timer("""
try:
x = d["a"]
except KeyError:
pass
try:
x = d["c"]
except KeyError:
pass
""", setup) # AFTP version
And the results:
py> min(t3.repeat())
0.24967589927837253
py> min(t4.repeat())
0.8413973557762802
Now the LBYL version is faster.
> If have noticed before that try-except is quite an expensive structure
> to initialize (for instance membership testing with ´in´ is cheaper
> than try-except-KeyError when getting items from a dictionary)
Funny you should say that :-)
Actually, try...except is almost free to *initialise*. Setting up the
try block is very, very cheap, it takes hardly any time:
py> timeit.timeit("len([])") # No try
0.19250816199928522
py> timeit.timeit("""
... try:
... len([])
... except: pass""") # With try
0.21191818173974752
That's not a lot of difference: less than 0.02μs.
But, *catching* the exception is quite expensive. So the overhead of
putting code inside a try block is negligible, virtually free, but the
cost of actually catching the exception is quite heavy. So if there are
lots and lots of exceptions, LBYL will probably be faster. But if they
are rare, then the cost of looking before you leap isn't worth it, you
should just leap. The exact crossover point will depend on how costly it
is to look first versus the cost of the exception, but as a very rough
rule of thumb, I go by:
if *fewer* than 1 in 10 operations will raise an exception, then use
try...except; but if *more* than 1 in 10 operations will raise an
exception, and it is safe to do so, then LBYL may be appropriate.
--
Steven
More information about the Tutor
mailing list