[XML-SIG] Inverse relationship design in XML, Python, & Zope

Douglas Bates bates@stat.wisc.edu
15 Jan 2002 11:43:08 -0600


This is a design question but it does relate to XML, Python, and Zope
so I hope it is not too far off-topic for these lists.

I apologize for the length of the question.  It seems that design
questions often require a lot of explanation and background.

 I begin with XML representing, say, articles within issues within
 a volume of a journal.

<jvol>
  <jname>J. of the Amer. Statist. Assoc.</jname>
  <volume>98</volume>
  <year>2001</year>
  <jissue>
    <inum>455</inum>
    <month>September</month>
    <jart>
       <ppf>785</ppf>
       <ppl>793</ppl>
       <atl>Estimation of the ... Microscopy</atl>
       <aug>
          <au>Higdon, Dave</au>
          <au>Yamamoto, Steve</au>
       </aug>
       <kwdg>
          <kwd>Blind deconvolution</kwd>
          <kwd>Magnetic imaging</kwd>
       </kwdg>
    </jart>
    <jart>
       ...
    </jart>
  </jissue>
  <jissue>
    <inum>456</inum>
    ...
  </jissue>
</jvol>

 This information is decomposed and stored, along with other types of
 bibliographic information, in a PostgreSQL database.  We use Zope to
 provide the web interface for queries on this database.  We would
 like to have the result of a query returned as an XML object that
 contains elements, each of which has complete information on a
 title that matches the query, where a title can be an article or a
 book or another type of reference.  For example, a query could map to
 SQL like

  SELECT xmlrep FROM titles WHERE idTitle IN
    (SELECT idTitle FROM authors WHERE aname ~ 'Higdon, *'
     INTERSECT
     SELECT idTitle FROM authors WHERE aname ~ 'Yamamoto, *');

 The way I envision it, the xmlrep corresponding to that jart element
 above would be, say, a pjart element (presentation of a journal
 article).  It must include all the information on the article and the
 issue and the volume and the journal if we are to be able to format
 it.  The entire result from the query might be

<qresult>
  <pjart>
    <jname>J. of the Amer. Statist. Assoc.</jname>
    <volume>98</volume>
    <year>2001</year>
    <inum>455</inum>
    <month>September</month>
    <ppf>785</ppf>
    <ppl>793</ppl>
    <atl>Estimation of the ... Microscopy</atl>
    <aug>
       <au>Higdon, Dave</au>
    </aug>
    <kwdg>
       <kwd>Blind deconvolution</kwd>
       <kwd>Magnetic imaging</kwd>
    </kwdg>
  </pjart>
  <pbook>
    <btl>Title of a book</btl>
    <bpub>Book Publisher, Inc.</bpub>
    <aug>
       <au>Yamamoto, Steve</au>
       <au>Higdon, Dave</au>
    </aug>
    ...
  </pbook>
</qresult>

 This is what I meant by an inverse relationship design.  I must get
 the information on the journal, volume, and issue when I select the
 article.

 The simple way to do this is to store the entire xmlrep for each
 title in the titles table.  It works but it is not exactly elegant
 and it requires that we store a lot of redundant text in that xmlrep
 column.

 The more elegant solution is to construct the information from XML
 fragments or text fragments in several different tables in the
 database.  Our tables are normalized so the journal table, with
 primary key idJournal, contains information on the journal, the issn,
 etc. and our jour_vol_issue table, with primary key idE, contains an
 idJournal, and a volume number, year, and issue number.  SQL is
 designed to reconstruct this kind of information but it does so by
 returning rows in a query result and it would not be easy to make a
 single query return, say, information on articles and information on
 books.

 If we are going to store fragments that will be pasted together to
 create the XML for presenting a journal article or a book or a
 proceedings article or ..., where should we construct the XML and
 what are good tools to use to do this?
 
 We could:

   0) Store all the information on each title in the xmlrep column and
      not do any processing.

   1) Write a database function, say in PL/PGSQL, that constructs this
      XML on the fly from the different tables in the database.

   2) Write DTML methods, Python methods or external Python methods
      for Zope.

   3) Use XSLT or Xpath to reconstruct.

I think know how to do 0, 1, or 2.  I'm not sure about 3 because I
don't yet know enough about XSL.

Your suggestions will be appreciated.

-- 
Douglas Bates                            bates@stat.wisc.edu
Statistics Department                    608/262-2598
University of Wisconsin - Madison        http://www.stat.wisc.edu/~bates/