Python(2.5) reads an input file FASTER than pure C(Mingw)
Lie.1296 at gmail.com
Wed Apr 30 00:19:32 CEST 2008
On Apr 28, 2:14 am, n00m <n... at narod.ru> wrote:
> Lie wrote:
> > On Apr 27, 6:28�am, n00m <n... at narod.ru> wrote:
> > > No so simple, guys.
> > > E.g., I can't solve (in Python) this:http://www.spoj.pl/problems/INTEST/
> > > Keep getting TLE (time limit exceeded). Any ideas? After all, it's
> > > weekend.
> > > 450. Enormous Input Test
> > > Problem code: INTEST
> > > The purpose of this problem is to verify whether the method you are
> > > using to read input data is sufficiently fast to handle problems
> > > branded with the enormous Input/Output warning. You are expected to be
> > > able to process at least 2.5MB of input data per second at runtime.
> > > Input
> > > The input begins with two positive integers n k (n, k<=107). The next
> > > n lines of input contain one positive integer ti, not greater than
> > > 109, each.
> > > Output
> > > Write a single integer to output, denoting how many integers ti are
> > > divisible by k.
> > > Example
> > > Input:
> > > 7 3
> > > 1
> > > 51
> > > 966369
> > > 7
> > > 9
> > > 999996
> > > 11
> > > Output:
> > > 4
> > The problem is faulty.
> > First, the bottleneck on reading a huge input is the harddisk speed
> > and the string processing, the result of the competition doesn't prove
> > anything but how fast the string processor and the harddisk is.
> > Python's string processing is not a very fast one as it creates a copy
> > of the string every now and then.
> In case you didn't pay attention:
> Python and C++ were tested on the same PC.
In case you didn't pay attention, what I'm mainly concerned about is
running the same algorithm in the same machine in the same programming
language but in different phase of the moon that would give two
significantly different timing.
Harddisk seek time is based on harddisk's fragmentation, MFT's (or
equivalent) size, what "other" program are currently doing with the
harddisk, and by harddisk age. You can't avoid fragmentation, except
if you ghosted the harddisk and reinstall the OS after every test. You
can't prevent the OS to switch to other tasks while the test is
running they might possibly requested for a huge harddisk query. And
I'm sure you can't prevent hardware degradation (although the impact
of this is probably not significant). Heck I can go on to other phases
of the moon like how full and fragmented the memory (RAM) is, how much
swap is being used.
My secondary concern, is how the string processing is done in the
language. In python, every string operation creates a new string, in
C, a string processing is just a mere address passing, for this test,
the security given by python is unnecessary since it is read only. We
know which one is the more superior in speed, but we can't do anything
about this since this is implementation detail of the language.
More information about the Python-list