programming unlimited "categories" in python ?

Stephen shriek at gmx.co.uk
Mon Oct 22 17:18:38 EDT 2001


Scratching my head over what I thought was a simple problem.

I'm developing a catalog application (storing its data in a 
relational database), where catalog entries are categorized 
by one or more categories/subcategories.  It would be nice to
have "unlimited" levels of subcategories and that seemed
simple enough ~ "Use a parent/child" I hear you say ~ and that's
what I did but I've since found it's not very flexible further
down the line. 

Let me demonstrate with some example categories/subcategories
which we place in a category tree, giving each node a unique ID.
The root node is deemed to have node ID zero (0).

Root -- 1. ByLocation --- 3. Africa   --- 4.  Mozambique
                                      --- 6.  SouthAfrica
                      --- 5. Europe   --- 9.  Portugal
     -- 2. BySeverity --- 7. Critical --- 8.  Death
                                      --- 11. Handicap
                      --- 10. Moderate--- 

This structure can be stored in a single table of a database.

Parent_ID    Category_ID     Category_Label
0            1               ByLocation
0            2               BySeverity
1            3               Africa
3            4               Mozambique
1            5               Europe
3            6               SouthAfrica
2            7               Critical
7            8               Death
5            9               Portugal
etc


So far so good.   Cataloged items/illnesses record their
categories in a one-to-many table.  For example, an illness
with categories  "4" and "8" occurs in Mozambique and can
result in death.  

This appears scalable. 

Likewise you can easily select all illnesses occuring in 
Portugal using a JOIN and filtering category ID "9".

So what's the problem ?

The problem arises if one asks "Which illnesses occur
in Africa ?".  First, one has to find all category IDs
for which this is true. This may be easy for the example
above but imagine if the category ("Africa") has a sub-
-category for each and every country then further
subcategorized by major cities. To find all possible
categories, one would have to do a recursive select
finding each subscategory with the appropriate "parent ID".
This does not seem like a very efficient method.

Faced with this, it seems like a more dumbed down solution
is more appropriate, sacrificing scalability for speed.

However, I can't help but feel I've overlooked something
more simple and hence I'm seeking a nudge in the right 
direction. Thanking you in advance.
 
Stephen.



More information about the Python-list mailing list