[XML-SIG] Using XPATH as references

Michael McLay mclay@nist.gov
Fri, 11 Aug 2000 12:21:08 -0400 (EDT)

My apologies for a very long message.  I think I'm getting closer to
understanding how to use XML properly for the problem now.  I have a
an example of how I think XPATH could be used to reference between
objects near the end of the message.  Skip to "EXAMPLE:" near the end
if you aren't interested in a detailed explaination of the PCB/PCA
problem domain. 

tpassin@home.com writes:
 > Michael McLay told us about his PCB layout XML schema -
 > [much interesting description elided]
 > >
 > > I'd appreciate feedback on the approach used to define pointers
 > > between structures.  Is there a standard way that would also be
 > > efficient for a file that may contain millions of these references?
 > > I would be interested in seeing the example rewritten using any
 > > alternative notations, such as XPATH.
 > >
 > Michael, this is very interesting.  Looking at your examples, I didn't think
 > the nesting levels were especially deep.  As for the references, you
 > certainly want to use IDs in elements, as you are.  But what about the colon
 > in the reference IDs?  That's not according to the Namespace Rec, is it?
 > Maybe another character would be better.

Thanks for the feedback.  I did model the usage of element IDs in the
GenCAM format after the Namespaces Rec, however the Namespaces Rec is
only concerned about element and attribute names.  The namespace
concept, AFAIK, are not allowed in element IDs.  At least I haven't
found a single reference or example that would illustrate how I would
use namespaces in an ID.

Unfortunately standard IDs from XML work for the information model
defined in GenCAM.  I've never seen an example or reference
to building data namespaces.  My problem domain needs an efficient
mechanism for resolving millions of name references to names nested
inside of "data namespaces" (the "data namespaces" are implemented as
GROUP elements in the GenCAM schema).  I've give an example and maybe  
someone can explain how I could apply IDs to the example.

The GenCAM file can contain multiple printed circuit board
definitions.  Each  BOARD has a unique name which is the ID for the
board.  The board names are unique within the top-level namespace of
the GenCAM file.

Each board includes many COMPONENTS and the components on a board are
uniquely identified by reference designators, the IDs for the
COMPONENT on a board.  A COMPONENT is required for each instance of a
DEVICE placed on a board.  CAD tools traditionally use designators
"R1", "R2"... for resistors; "U1", "U2"... for integrated circuits and
so on.  You can find the numbers next to components on older assembles
because these numbers were silkscreened onto the surface of boards
when boards were hand assembled.

In GenCAM all COMPONENTS used in all boards are stored in
a COMPONENTS section.  The COMPONENT section is divided by GROUP tags.
To fully identify a COMPONENT in a GenCAM file you need the name of
the GROUP and the reference designator of the COMPONENT.  So if there
are BOARD names "bd1" and "bd2" there could be COMPONENTs that are
identified using "bd1:R1", bd2:R2", "bd1:U1", bd2:U2", etc.  The
"bd1:R1" COMPONENT may reference a DEVICE named "stdparts:CR1206 while
the bd2:R2" might reference a DEVICE named "digikey:100ohm1W".

The current implementation of GenCAM requires that all data to be
included be in a single file.  In the XML implementation groups 
can be imported by URL reference.  For example:

<import href="http://www.acme.com/devices/stdparts_rev1" "stdparts">
<import href="http://www.digikey.com/devices/catalog00524" "digikey">

Perhaps there is a way to do this efficiently with the current XML
capabilities.  Any suggestions?

 > If you are interested in alternative access methods, I'd suggest getting the
 > following book if you don't already have it:
 > Data on the Web
 > Abiteboul, Buneman, and Suciu
 > Morgan Kaufmann Publishers
 > ISBN 1-55860-622-X
 > This book discusses semi-structured data (and XML data), and storing and
 > accessing it.  It also covers dealing with cycles in the data graphs.

I'll look this up.  Thanks for the reference.

 > I think the actual reference ID values would depend on how they structure
 > would be stored.  If they are going to be expanded into existing data
 > structures, the numbers would probably have to be translated into some
 > internal form expected by the existing system, so it hardly matters what
 > their exact format is (except for human reviewability, of course).  With
 > millions of elements possible,maybe a high-powered object database would be
 > a good thing to look at.  In that case, you'd want to see what kind of
 > object IDs the ODBMS wants to use.

The standards committee spend months working out the details of how
names must be used in GenCAM so that the information model fully
described the manufacturing data required to manufacture and test PCB
and PCB products.  The committee was constrained by the need to make
it reasonable to implement by CAD and CAM vendors.  The community is
satisfied that the object relationships are captured properly.  The
only problem is understanding how to use the features of XML in
mapping from the current syntax to XML.  


In the following example is the value of


the following string?


Here's the example:

    <GROUP id="digikey" >
      <DEVICE id="100ohm1W" device_type="RES" package_ref="pkg1:0402res" >
        <PART enterprise_ref="ANY" enterprise_part_id="LRC1206-01-R068" />
    <GROUP id="cmp1" >
      <COMPONENT id="R1" xy_ref="(10.0,150.0)" >
This may solve how to do references within a file.  How would I do
write the path if I need to reference an object outside of this file?
It wouldn't be as simple as:


This would be cool, but the name is very long and I have millions of
these references in a file.  How do I shorten the repeated part of
this statement, i.e., 


I'd like to give that a namespace name that could be used just like
element and attribute names are shortened in the meta data.

 > Since the PCB can be seen as a sort of drawing, I was thinking that using
 > SVG as a base might be interesting.  But if you are trying to map to the
 > existing systems as closely as possible, that would be different.

We are looking at SVG.  Ideally GenCAM would be rewritten to use SGV
and then stuff all GenCAM specific data into the SVG extension
mechanism.  Legacy system issues require we take an intermediate