unicode() vs. s.decode()

garabik-news-2005-05 at kassiopeia.juls.savba.sk garabik-news-2005-05 at kassiopeia.juls.savba.sk
Sat Aug 8 18:16:49 CEST 2009

Thorsten Kampe <thorsten at thorstenkampe.de> wrote:
> * garabik-news-2005-05 at kassiopeia.juls.savba.sk (Fri, 7 Aug 2009 
> 17:41:38 +0000 (UTC))
>> Thorsten Kampe <thorsten at thorstenkampe.de> wrote:
>> > If you increase the number of loops to one million or one billion or
>> > whatever even the slightest completely negligible difference will
>> > occur. The same thing will happen if you just increase the corpus of
>> > words to a million, trillion or whatever. The performance
>> > implications of that are exactly none.
>> I am not sure I understood that. Must be my English :-)
> I guess you understand me very well and I understand you very well. If 

I did not. Really. But then it has been explained to me, so I think I do
now :-)

> the performance gain you want to prove doesn't show with 600,000 words, 
> you test again with 18,000,000 words and if that is not impressive 
> enough with 600,000,000 words. Great.

18e6 words is what I am working with _now_. Most of the data is already
collected, there are going to be few more books, but that's all. And the
optimization I was talking about means going home from work one hour
later or earlier. Quite noticeable for me.
600e6 words is the main corpus. Data is already there and wait to be
processed in some time. Once we finih our current project. That is 
real life, no thought experiment.

> Or if a million repetitions of your "improved" code don't show the 
> expected "performance advantage" you run it a billion times. Even 
> greater. Keep on optimzing.

No, we do not have one billion words (yet - I assume you are talking
about American billion - if you are talking about European billion, we
would be masters of the world with a billion word corpus!).
However, that might change once we start collecting www data (which is a
separate project, to be started in a year or two)
Then, we'll do some more optimiation because the time differences will
be more noticeable. Easy as that.

| Radovan Garabík http://kassiopeia.juls.savba.sk/~garabik/ |
| __..--^^^--..__    garabik @ kassiopeia.juls.savba.sk     |
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!

More information about the Python-list mailing list