unicode() vs. s.decode()

Sat Aug 8 09:09:23 EDT 2009

Thorsten Kampe wrote:
> * Steven D'Aprano (08 Aug 2009 03:29:43 GMT)
>> But why assume that the program takes 8 minutes to run? Perhaps it takes 
>> 8 seconds to run, and 6 seconds of that is the decoding. Then halving 
>> that reduces the total runtime from 8 seconds to 5, which is a noticeable 
>> speed increase to the user, and significant if you then run that program 
>> tens of thousands of times.
> 
> Exactly. That's why it doesn't make sense to benchmark decode()/unicode
> () isolated - meaning out of the context of your actual program.

Thorsten, the point is you're too arrogant to admit that making such a general
statement like you did without knowing *anything* about the context is simply
false. So this is not a technial matter. It's mainly an issue with your attitude.

>> By all means, reminding people that pre-mature optimization is a 
>> waste of time, but it's possible to take that attitude too far to Planet 
>> Bizarro. At the point that you start insisting, and emphasising, that a 
>> three second time difference is "*exactly*" zero,
> 
> Exactly. Because it was not generated in a real world use case but by 
> running a simple loop one millions times. Why one million times? Because 
> by running it "only" one hundred thousand times the difference would 
> have seen even less relevant.

I was running it one million times to mitigate influences on the timing by
other background processes which is a common technique when benchmarking. I
was mainly interested in the percentage which is indeed significant. The
absolute times also strongly depend on the hardware where the software is
running. So your comment about the absolute times are complete nonsense. I'm
eager that this software should also run with acceptable response times on
hardware much slower than my development machine.

> I already gave good advice:
> 1. don't benchmark
> 2. don't benchmark until you have an actual performance issue
> 3. if you benchmark then the whole application and not single commands

You don't know anything about what I'm doing and what my aim is. So your
general rules don't apply.

> It's really easy: Michael has working code. With that he can easily 
> write two versions - one that uses decode() and one that uses unicode().

Yes, I have working code which was originally written before .decode() being
added in Python 2.2. Therefore I wondered whether it would be nice for
readability to replace unicode() by s.decode() since the software does not
support Python versions prior 2.3 anymore anyway. But one aspect is also
performance and hence my question and testing.

Ciao, Michael.