Data Model:

Anthony alanthonyc at gmail.com
Mon Apr 13 03:29:26 EDT 2009


On Apr 12, 9:36 pm, Aaron Brady <castiro... at gmail.com> wrote:
> On Apr 12, 10:33 pm, Anthony <alantho... at gmail.com> wrote:
>
>
>
> > On Apr 12, 8:10 pm, Aaron Brady <castiro... at gmail.com> wrote:
>
> > > On Apr 12, 9:14 pm, Anthony <alantho... at gmail.com> wrote:
>
> > > > I'm struggling on whether or not to implement GroupItem (below) with
> > > > two separate models, or with one model that has a distinguishing key:
>
> > > > Given:
> > > > class ParentGroup:
> > > >     a group of values represented by class GroupItem
>
> > > > class ChildGroup:
> > > >     a group of values represented by class GroupItem
> > > >     foreign-key to ParentGroup (many Children sum to one Parent)
>
> > > > Option A:
> > > > class GroupItem:
> > > >     foreign-key to ParentGroup
> > > >     foreign-key to ChildGroup
> > > >     GroupItemType in (ParentItem, ChildItem)
> > > >     value
> > > >     value-type
>
> > > > Option B:
> > > > class ParentGroupItem
> > > >     foreign-key to ParentGroup
> > > >     value
> > > >     value-type
>
> > > > class ChildGroupItem
> > > >     foreign-key to ChildGroup
> > > >     value
> > > >     value-type
>
> > > > What are my considerations when making this decision?
>
> > > > Thanks!
>
> > > You want a ChildItem to have membership in two collections:
> > > ParentGroup and ChildGroup.  You also want a ParentItem to have
> > > membership in one collection.  For example:
>
> > > parentA: itemPA1, itemPA2, childA, childB
> > > childA: itemCA1, itemCA2
> > > childB: itemCB1, itemCB2
>
> > > Or, listing by child,
>
> > > itemPA1: parentA
> > > itemPA2: parentA
> > > itemCA1: childA
> > > itemCA2: childA
> > > itemCB1: childB
> > > itemCB2: childB
> > > childA: parentA
> > > childB: parentA
>
> > > Correct so far?
>
> > Thanks for the insightful response.
>
> > Yes, everything you say is correct, with one clarification:  The
> > ChildItem can be a member of ParentGroup OR ChildGroup, but never both
> > at the same time.
>
> I see.  You described a collection class.  Its members are items or
> other collections.  They are never nested more than two levels deep.
>
> However, in your example, you implied a collection class whose
> attributes are aggregates of its members'.  For simplicity, you can
> use methods to compute the aggregate attributes.
>
> class Group:
>   def calculate_total_produced( self ):
>     total= sum( x.total_produced for x in self.members )
>
> If you want to cache them for performance, the children will have to
> notify the parent when one of their attributes changes, which is at
> least a little more complicated.  The class in the simpler structure
> could even derive from 'set' or other built-in collection if you
> want.  Are you interested in the more complicated faster technique?

Yes, in my example, the top level collection class is implicitly the
aggregate of the lower level class.  However, data entry will take
place at the top level, not necessarily at the lower level.  This
means that the lower level values will never drive the top level
value.  Instead, the aggregate of the lower levels will be validated
against the top level.  If there is a discrepancy, then the remainder
will be applied to an additional "Unregistered" instance of the lower
level.

e.g.

Group: Johnson - Total Units Produced 25;  Units Consumed 18;
  Chris Johnson - Units Produced 18; Units Consumed 10;
  Jim Johnson   - Units Produced 3;  Units Consumed 5;

The group totals are the basis for any validations.  In this example,
another entry will be created to account for the discrepancy:

  Unregistered -  Units Produced 4;  Units Consumed 3

As far as child notification of the parent, I plan to only allow data
entry on a form that includes both parent and child level values.
Validation of top level to child level aggregates can happen at this
time.  This should remove the need for notification, right?

Am I looking at 6 of one and half dozen of the other between options A
and B at this point?  I'm currently leaning towards option B.  Is
there anything I will be losing performance-wise by not choosing
option A?

Thanks again for conversing with me on this.



More information about the Python-list mailing list