Data/File Structure and Algorithm for Retrieving Sorted Data Chunk Efficiently

Jane Austine janeaustine50 at hotmail.com
Fri Oct 8 05:19:25 CEST 2004


William Park <opengeometry at yahoo.ca> wrote in message news:<2sgc66F1enc5nU1 at uni-berlin.de>...
> Jane Austine <janeaustine50 at hotmail.com> wrote:
> > Hello.
> > 
> > I have student_names and their scores, which are between 0..100.
> > 
> > I want to retrieve in the sorted order in terms of score, Nth
> > student's name to N+10th student's name along with their scores.
> > (suppose a web page showing top 10 students and there are "next page"
> > link and if you follow the link there shows next top 10 students with
> > "prev" and "next" page links.)
> > 
> > The list is dynamically resized -- elements are added or deleted. The
> > list could go fairly long enough that pulling all the data on memory
> > at once is not a good choice.
> > 
> > What algorithm and data/file structure are most efficient? (the
> > environment is "cgi" based web environment)
> > 
> > Simplest approach would be using btree from bsddb. But there are only
> > "first"/"last" and "next"/"prev" moves. If I have to retrieve 20..29th
> > students, I have got to "first" and then "next" 19 times before I
> > finally get to 20th student, which is a waste of time(and memory).
> > Maybe I could keep the cursor position among sessions if I were using
> > a standalone server. However, it is in CGI environment.
> > 
> > What can I do instead? What data structure allows you to access
> > nth..n+10th data from the sorted list in a constant time?
> 
> This is an example of "premature optimization".  How many student do you
> have anyways?  10,000 students with 100 bytes for name and score comes
> to 1MB.  Just keep the list as textfile, and read a block at time.

Thank you for reminding me Knuth's words. However, there are other
information on each records, and the whole list could grow upto
10MB~100MB.

Another problem is, the list update action could take a long time and
big memory. The list could shrink and expand. Do I have to sort them
all in memory and then update the textfile _everytime_ it's changed?



More information about the Python-list mailing list