[Tutor] speed issue

Fri Aug 13 20:18:38 CEST 2004

On Wed, 11 Aug 2004, Conrad wrote:
>
> def interval():
>         possible = input_file.readline().split()
>         period[0] = (int(possible[0]))
>         period[1] = (int(possible[1]))
>         #Next two lines slow it down ((O^2)?)
> 	for x in range(period[0], period[1] + 1):
>                 intervals[str(x)] = 0
>
> def find_vals():
>         interviewed = int(input_file.readline())
>         for x in range(interviewed):
>                 time_range = map(int, input_file.readline().split())
>                 if time_range[0] < period[0]:
>                         time_range[0] = period[0]
>                 if time_range[1] > period[1]:
>                         time_range[1] = period[1]
> 		#This also slows it way down
>                 for x in range(time_range[0], time_range[1] + 1):
>                         intervals[str(x)] += 1

Hi Conrad,

Ok, I see, so you're keeping an intervals dictionary, and then whenever
someone passes by, for that interval that they stay, you raise the
"people" count.

This seems fine: this approach should give correct results.  I think that
you can avoid doing some string->int conversions.  There are places where
you do:

    intervals[str(x)] = 0

and it should be possible to just say:

    intervals[x] = 0

Integers are perfectly ok as dictionary keys.

On small numbers your program is fine.  But there is one problem with the
approach, and it has to do with the way it behaves when the problem
scales: it starts chugging as soon as the intervals get really large.
According to:

    http://spoj.sphere.pl/?a=problem&pcode=PICAD

you can expect to see intervals between:

    0<=p<=k<=100000000.

So a hideous test case that your program should consider is:

###
0 100000000
5000
0 100000000
0 100000000
0 100000000
0 100000000
0 100000000
0 100000000
....  [you get the idea]
###

You know what your program is going to do in this case, and you know that
it's doing a heck of a lot of work.  It's basically making an interval one
hundred million elements long, and then, element by element, incrementing
each about five thousand times.

The big problem here is that by physically representing each interval as a
dense set, the run-time is proportional to the length of the intervals it
deals with.

If you can just represent your intervals as endpoints, then that may help
speed up the program, but then the program will have to adjust to this
change in interval representation.  So in short: there's no really good
way of making your program any faster except by change of algorithm.

Good luck to you!