programming unlimited "categories" in python ?

Grant Beasley gbeasley at tsa.ac.za
Tue Oct 23 07:41:13 EDT 2001


Something I've used for developing threaded discussion boards might help you
here, but I'm not sure if it's more or less efficient, and can't truly go to
great depths (without sacrificing space).

Assign each category a 2-digit code (if you need more categories, increase
this to 3 or 4 digits). This code need only be unique on it's level. It does
not need to be completely unique. Eg, ByLocation = '01', BySeverity = '02',
Africa = '01', Europe = '02', Mozambique = '01', South Africa = '02',
Critical = '01', Death='01', etc. ....

Then when assigning the category code to your record, concatenate all codes
together, thus South Africa = '010102' (ByLocation+Africa+SouthAfrica).
Death = '020101' (BySeverity+Critical+Death). As you can see, your codes
could get pretty long, but the nice part comes in selecting stuff in a
category. To find out if a particular record is in a particular category,
test whether RecordCategorisationID.startswith(CategoryID). i.e Comparing a
record with ID '010102' (i.e South Africa), to Africa - '0101' would be
true, to BySeverity '02', it would be false.

I think you get the idea. I have this implemented with my id's as strings,
but it would be far more efficient doing this by converting to binary, etc.
The same principle applies, but you'll be dealing with numbers. I haven't
even considered the maths, but I would imagine it could be made to be fairly
efficient.

HTH

Cheers
Grant Beasley


"Stephen" <shriek at gmx.co.uk> wrote in message
news:97ae44ee.0110221318.6eec382d at posting.google.com...
> Scratching my head over what I thought was a simple problem.
>
> I'm developing a catalog application (storing its data in a
> relational database), where catalog entries are categorized
> by one or more categories/subcategories.  It would be nice to
> have "unlimited" levels of subcategories and that seemed
> simple enough ~ "Use a parent/child" I hear you say ~ and that's
> what I did but I've since found it's not very flexible further
> down the line.
>
> Let me demonstrate with some example categories/subcategories
> which we place in a category tree, giving each node a unique ID.
> The root node is deemed to have node ID zero (0).
>
> Root -- 1. ByLocation --- 3. Africa   --- 4.  Mozambique
>                                       --- 6.  SouthAfrica
>                       --- 5. Europe   --- 9.  Portugal
>      -- 2. BySeverity --- 7. Critical --- 8.  Death
>                                       --- 11. Handicap
>                       --- 10. Moderate---
>
> This structure can be stored in a single table of a database.
>
> Parent_ID    Category_ID     Category_Label
> 0            1               ByLocation
> 0            2               BySeverity
> 1            3               Africa
> 3            4               Mozambique
> 1            5               Europe
> 3            6               SouthAfrica
> 2            7               Critical
> 7            8               Death
> 5            9               Portugal
> etc
>
>
> So far so good.   Cataloged items/illnesses record their
> categories in a one-to-many table.  For example, an illness
> with categories  "4" and "8" occurs in Mozambique and can
> result in death.
>
> This appears scalable.
>
> Likewise you can easily select all illnesses occuring in
> Portugal using a JOIN and filtering category ID "9".
>
> So what's the problem ?
>
> The problem arises if one asks "Which illnesses occur
> in Africa ?".  First, one has to find all category IDs
> for which this is true. This may be easy for the example
> above but imagine if the category ("Africa") has a sub-
> -category for each and every country then further
> subcategorized by major cities. To find all possible
> categories, one would have to do a recursive select
> finding each subscategory with the appropriate "parent ID".
> This does not seem like a very efficient method.
>
> Faced with this, it seems like a more dumbed down solution
> is more appropriate, sacrificing scalability for speed.
>
> However, I can't help but feel I've overlooked something
> more simple and hence I'm seeking a nudge in the right
> direction. Thanking you in advance.
>
> Stephen.





More information about the Python-list mailing list