programming unlimited "categories" in python ?

Thomas Jensen thomasNO at SPAM.obscure.dk
Tue Oct 23 01:29:56 CEST 2001

```shriek at gmx.co.uk (Stephen) wrote in

> Scratching my head over what I thought was a simple problem.
>
[snip - categories]
>
> Let me demonstrate with some example categories/subcategories
> which we place in a category tree, giving each node a unique ID.
> The root node is deemed to have node ID zero (0).
>
> Root -- 1. ByLocation --- 3. Africa   --- 4.  Mozambique
>                                       --- 6.  SouthAfrica
>                       --- 5. Europe   --- 9.  Portugal
>      -- 2. BySeverity --- 7. Critical --- 8.  Death
>                                       --- 11. Handicap
>                       --- 10. Moderate---
>
> This structure can be stored in a single table of a database.
>
> Parent_ID    Category_ID     Category_Label
> 0            1               ByLocation
> 0            2               BySeverity
> 1            3               Africa

[snip]

> The problem arises if one asks "Which illnesses occur
> in Africa ?".  First, one has to find all category IDs
> for which this is true. This may be easy for the example
> above but imagine if the category ("Africa") has a sub-
> -category for each and every country then further
> subcategorized by major cities. To find all possible
> categories, one would have to do a recursive select
> finding each subscategory with the appropriate "parent ID".
> This does not seem like a very efficient method.
>
> Faced with this, it seems like a more dumbed down solution
> is more appropriate, sacrificing scalability for speed.
>
> However, I can't help but feel I've overlooked something
> more simple and hence I'm seeking a nudge in the right
> direction. Thanking you in advance.

If You're willing to sacrifice the memory, You could cache the
information.

First extract all the category/parent pairs into a list. (eg. SELECT
parent_id, category_id FROM Categories)
then create and fill a dictionary which contain the parent and all
children:

>>> catlist = [(0, 1), (0, 2), (1, 3), (3, 4), (1, 5), (3, 6), (2, 7),
(7, 8), (5, 9), (2, 10), (7, 11)]
>>>
>>> catdict = {0: (None, [])} # (parent, children)
>>> for catpair in catlist:
...  catdict[catpair[1]] = (catpair[0], [])
...
>>> for catpair in catlist:
...  parent = catdict.get(catpair[0])
...  while parent:
...   parent[1].append(catpair[1]) # append to list of children
...   parent = catdict.get(parent[0]) # parent's parent
...
>>> print catdict
{11: (7, []), 10: (2, []), 9: (5, []), 8: (7, []), 7: (2, [8, 11]), 6:
(3, []), 5: (1, [9]), 4: (3, []), 3: (1, [4, 6]), 2: (
0, [7, 8, 10, 11]), 1: (0, [3, 4, 5, 6, 9]), 0: (None, [1, 2, 3, 4, 5,
6, 7, 8, 9, 10, 11])}

Now you can get all the children of a given category by calling
catdict[category_id].

Ofcourse You'd have to keep the dictionary in sync when updating the
DB.

--
Best Regards
Thomas Jensen

```