unicode() vs. s.decode()
Thorsten Kampe
thorsten at thorstenkampe.de
Sat Aug 8 13:00:11 EDT 2009
* Michael Ströder (Sat, 08 Aug 2009 15:09:23 +0200)
> Thorsten Kampe wrote:
> > * Steven D'Aprano (08 Aug 2009 03:29:43 GMT)
> >> But why assume that the program takes 8 minutes to run? Perhaps it takes
> >> 8 seconds to run, and 6 seconds of that is the decoding. Then halving
> >> that reduces the total runtime from 8 seconds to 5, which is a noticeable
> >> speed increase to the user, and significant if you then run that program
> >> tens of thousands of times.
> >
> > Exactly. That's why it doesn't make sense to benchmark decode()/unicode
> > () isolated - meaning out of the context of your actual program.
>
> Thorsten, the point is you're too arrogant to admit that making such a general
> statement like you did without knowing *anything* about the context is simply
> false.
I made a general statement to a very general question ("These both
expressions are equivalent but which is faster or should be used for any
reason?"). If you have specific needs or reasons then you obviously
failed to provide that specific "context" in your question.
> >> By all means, reminding people that pre-mature optimization is a
> >> waste of time, but it's possible to take that attitude too far to Planet
> >> Bizarro. At the point that you start insisting, and emphasising, that a
> >> three second time difference is "*exactly*" zero,
> >
> > Exactly. Because it was not generated in a real world use case but by
> > running a simple loop one millions times. Why one million times? Because
> > by running it "only" one hundred thousand times the difference would
> > have seen even less relevant.
>
> I was running it one million times to mitigate influences on the timing by
> other background processes which is a common technique when benchmarking.
Err, no. That is what "repeat" is for and it defaults to 3 ("This means
that other processes running on the same computer may interfere with the
timing. The best thing to do when accurate timing is necessary is to
repeat the timing a few times and use the best time. [...] the default
of 3 repetitions is probably enough in most cases.")
Three times - not one million times. You choose one million times (for
the loop) when the thing you're testing is very fast (like decoding) and
you don't want results in the 0.00000n range. Which is what you asked
for and what you got.
> > I already gave good advice:
> > 1. don't benchmark
> > 2. don't benchmark until you have an actual performance issue
> > 3. if you benchmark then the whole application and not single commands
>
> You don't know anything about what I'm doing and what my aim is. So your
> general rules don't apply.
See above. You asked a general question, you got a general answer.
> > It's really easy: Michael has working code. With that he can easily
> > write two versions - one that uses decode() and one that uses unicode().
>
> Yes, I have working code which was originally written before .decode() being
> added in Python 2.2. Therefore I wondered whether it would be nice for
> readability to replace unicode() by s.decode() since the software does not
> support Python versions prior 2.3 anymore anyway. But one aspect is also
> performance and hence my question and testing.
You haven't done any testing yet. Running decode/unicode one million
times in a loop is not testing. If you don't believe me then read at
least Martelli's Optimization chapter in Python in a nutshell (the
chapter is available via Google books).
Thorsten
More information about the Python-list
mailing list