Data Model:

Aaron Brady castironpi at gmail.com
Mon Apr 13 04:19:13 EDT 2009


On Apr 13, 2:29 am, Anthony <alantho... at gmail.com> wrote:
> On Apr 12, 9:36 pm, Aaron Brady <castiro... at gmail.com> wrote:
>
>
>
> > On Apr 12, 10:33 pm, Anthony <alantho... at gmail.com> wrote:
>
> > > On Apr 12, 8:10 pm, Aaron Brady <castiro... at gmail.com> wrote:
>
> > > > On Apr 12, 9:14 pm, Anthony <alantho... at gmail.com> wrote:
>
> > > > > I'm struggling on whether or not to implement GroupItem (below) with
> > > > > two separate models, or with one model that has a distinguishing key:
>
> > > > > Given:
> > > > > class ParentGroup:
> > > > >     a group of values represented by class GroupItem
>
> > > > > class ChildGroup:
> > > > >     a group of values represented by class GroupItem
> > > > >     foreign-key to ParentGroup (many Children sum to one Parent)
>
> > > > > Option A:
> > > > > class GroupItem:
> > > > >     foreign-key to ParentGroup
> > > > >     foreign-key to ChildGroup
> > > > >     GroupItemType in (ParentItem, ChildItem)
> > > > >     value
> > > > >     value-type
>
> > > > > Option B:
> > > > > class ParentGroupItem
> > > > >     foreign-key to ParentGroup
> > > > >     value
> > > > >     value-type
>
> > > > > class ChildGroupItem
> > > > >     foreign-key to ChildGroup
> > > > >     value
> > > > >     value-type
>
> > > > > What are my considerations when making this decision?
>
> > > > > Thanks!
>
> > > > You want a ChildItem to have membership in two collections:
> > > > ParentGroup and ChildGroup.  You also want a ParentItem to have
> > > > membership in one collection.  For example:
>
> > > > parentA: itemPA1, itemPA2, childA, childB
> > > > childA: itemCA1, itemCA2
> > > > childB: itemCB1, itemCB2
>
> > > > Or, listing by child,
>
> > > > itemPA1: parentA
> > > > itemPA2: parentA
> > > > itemCA1: childA
> > > > itemCA2: childA
> > > > itemCB1: childB
> > > > itemCB2: childB
> > > > childA: parentA
> > > > childB: parentA
>
> > > > Correct so far?
>
> > > Thanks for the insightful response.
>
> > > Yes, everything you say is correct, with one clarification:  The
> > > ChildItem can be a member of ParentGroup OR ChildGroup, but never both
> > > at the same time.
>
> > I see.  You described a collection class.  Its members are items or
> > other collections.  They are never nested more than two levels deep.
>
> > However, in your example, you implied a collection class whose
> > attributes are aggregates of its members'.  For simplicity, you can
> > use methods to compute the aggregate attributes.
>
> > class Group:
> >   def calculate_total_produced( self ):
> >     total= sum( x.total_produced for x in self.members )
>
> > If you want to cache them for performance, the children will have to
> > notify the parent when one of their attributes changes, which is at
> > least a little more complicated.  The class in the simpler structure
> > could even derive from 'set' or other built-in collection if you
> > want.  Are you interested in the more complicated faster technique?
>
> Yes, in my example, the top level collection class is implicitly the
> aggregate of the lower level class.  However, data entry will take
> place at the top level, not necessarily at the lower level.  This
> means that the lower level values will never drive the top level
> value.  Instead, the aggregate of the lower levels will be validated
> against the top level.  If there is a discrepancy, then the remainder
> will be applied to an additional "Unregistered" instance of the lower
> level.
>
> e.g.
>
> Group: Johnson - Total Units Produced 25;  Units Consumed 18;
>   Chris Johnson - Units Produced 18; Units Consumed 10;
>   Jim Johnson   - Units Produced 3;  Units Consumed 5;
>
> The group totals are the basis for any validations.  In this example,
> another entry will be created to account for the discrepancy:
>
>   Unregistered -  Units Produced 4;  Units Consumed 3
>
> As far as child notification of the parent, I plan to only allow data
> entry on a form that includes both parent and child level values.
> Validation of top level to child level aggregates can happen at this
> time.  This should remove the need for notification, right?
>
> Am I looking at 6 of one and half dozen of the other between options A
> and B at this point?  I'm currently leaning towards option B.  Is
> there anything I will be losing performance-wise by not choosing
> option A?
>
> Thanks again for conversing with me on this.

It sounds like you want your total to also be degenerate instance of
the Item class.  It has a name and two numeric attributes.  I once
learned that it "meant the right thing" to merely derive the Total
class from the Item class, and raise an exception when and if non-
aggregate values are attempted to access.

There is a little wasted space on A, which you probably needn't worry
about.  In C, you can create a 'union' type, that holds the parent
foreign-key -or- the child foreign-key, and use the "GroupItemType in"
flag to signal which, but not both.  The space required is whichever
is larger. </trivia>  In Python, you can just use one object, and
treat it differently depending on the 'itemtype in' flag.

>From what I understand so far, it's more consistent with object-
oriented ideals to write a separate class for separate behavior.

If I was writing a relation structure, i.e. a database in a relational
database, I would probably create a separate table for the totals,
since they don't have the same or equivalent (or isomorphic) semantics
to the items.  Does that help?

On the other hand, you could just have a separate Unregistered class
for unregistered entries, or just leave the name blank or set to None
on a normal item instance.

It doesn't sound like you need to modify both totals and individuals
at the same time, but you do need to modify both at different times.
Some incoming data is 'total' data, and some is individual.

> As far as child notification of the parent, I plan to only allow data
> entry on a form that includes both parent and child level values.
> Validation of top level to child level aggregates can happen at this
> time.  This should remove the need for notification, right?

Yes.  If you won't be changing child members, or will be doing so from
a uniform subset of code, then they won't need to notify their
parents.  You can do that from the input section.

On child entry:
  create child
  add child
  update parent

This is more risky:
On multiple child entries:
  for each entry:
    create child
    add child
  update parent

If it occurs more than once, you might consider creating a function to
guarantee they all accomplish the same, as usual.

> I'm currently leaning towards option B.  Is
> there anything I will be losing performance-wise by not choosing
> option A?

If you have to recalculate the totals every time you need them, that
will be slower than just accessing a data field, but the alternative
will take more space.  It is another 'time-space' trade-off.



More information about the Python-list mailing list