programming unlimited "categories" in python ?
Terry Reedy
tjreedy at home.com
Mon Oct 22 19:07:47 EDT 2001
"Stephen" <shriek at gmx.co.uk> wrote in message
news:97ae44ee.0110221318.6eec382d at posting.google.com...
> Scratching my head over what I thought was a simple problem.
>
> I'm developing a catalog application (storing its data in a
> relational database), where catalog entries are categorized
> by one or more categories/subcategories. It would be nice to
> have "unlimited" levels of subcategories and that seemed
> simple enough ~ "Use a parent/child" I hear you say ~ and that's
> what I did but I've since found it's not very flexible further
> down the line.
This is a database rather than Python question, but since you asked,
I'll answer from my database experience.
> Let me demonstrate with some example categories/subcategories
> which we place in a category tree, giving each node a unique ID.
> The root node is deemed to have node ID zero (0).
>
> Root -- 1. ByLocation --- 3. Africa --- 4. Mozambique
> --- 6. SouthAfrica
> --- 5. Europe --- 9. Portugal
> -- 2. BySeverity --- 7. Critical --- 8. Death
> --- 11. Handicap
> --- 10. Moderate---
Location and Severity are two separate variables that should be coded
completely separately from each other. Severity, if not numeric, is
usually an ordered series of categories, such as 1=mild, 2=moderate,
3=handicap, 4=critical, 5=death. Coding three-level LOcation is more
complicated. Either
1. Split it into three variables - continent, country, city, with
continent required and country and city optional -or-
2. Encoded the levels into 'one' entry (which you split into pieces as
needed). You could try a separater-ed entry such as
continent-country-city (like internet address, windows registry, etc)
or positional-subfield encoded (like many product ids and library book
categorizers (as in either Dewey Decimal or Library of Congress
systems)). For instance, 0 = Africa, 10 = Mozanbique, 11=(capital
city), 20= Angola, 1000 = Europe, etc. In otherwords, 1000s =
continent, hundreds/tens = country, ones position = city.
> This structure can be stored in a single table of a database.
>
> Parent_ID Category_ID Category_Label
> 0 1 ByLocation
> 0 2 BySeverity
> 1 3 Africa
> 3 4 Mozambique
> 1 5 Europe
> 3 6 SouthAfrica
> 2 7 Critical
> 7 8 Death
> 5 9 Portugal
> etc
Mixing two varriables like this is a mistake.
> So far so good. Cataloged items/illnesses record their
> categories in a one-to-many table. For example, an illness
> with categories "4" and "8" occurs in Mozambique and can
> result in death.
>
> This appears scalable.
>
> Likewise you can easily select all illnesses occuring in
> Portugal using a JOIN and filtering category ID "9".
>
> So what's the problem ?
>
> The problem arises if one asks "Which illnesses occur
> in Africa ?". First, one has to find all category IDs
> for which this is true. This may be easy for the example
> above but imagine if the category ("Africa") has a sub-
> -category for each and every country then further
> subcategorized by major cities. To find all possible
> categories, one would have to do a recursive select
> finding each subscategory with the appropriate "parent ID".
> This does not seem like a very efficient method.
If location were coded with continent either a separate variable/field
or separate subfield, this query would be trivial, as it should be.
Terry J. Reedy
More information about the Python-list
mailing list