Steven R. Newcomb srn@techno.com
Mon, 15 Feb 1999 19:05:22 -0600

[Paul Prescod:]

> The question was, how can this sort of
> transformation be done *efficiently*. The
> progenitor of the thread already had a solution
> to the tranformation problem. It just was not
> efficient. Losslessly encoding an RDBMS as a
> text file is easy with or without groves. Doing
> it efficiently can be difficult in either case.

"Efficiently" is too vague.  Which kinds of
expenses are we attempting to minimize, and in
what proportions?  In some cases, at least,
applying the grove paradigm is a big win in terms
of overall efficiency.  But the up-front cost of
using the grove paradigm is pretty significant;
you have to do a lot of rigorous thinking, and
that isn't cheap.  Like the setup charge for doing
4-color high-speed presswork, it makes no sense to
incur such a setup charge for a run of 500 copies,
and the break-even point often over 10,000 copies.
But there is no cheaper way to make 20,000 or more
color copies, and the more copies you make, the
cheaper it gets.  There is no way to make the
setup charge cheaper without increasing the
overall cost per copy, throughout the marketable
life of the printed product.

> > The implementation strategy is to apply the grove
> > paradigm.

> But what algorithms, data structures and SQL
> queries does one use to do this efficiently?

These things all depend on the nature of the
information, the importance of being able to do
things with the information that you might not be
able to anticipate, and the breadth of the known
application space, among many other things.  How
could it possibly be otherwise?

I don't see how there can possibly be one
efficient algorithmic answer for all cases, even
without considering the number and breadth of
applications for which RDBMSs are suboptimal, but
have been used anyway for reasons that had little
to do with efficiency.  In many cases, a quick and
dirty solution is very appropriate; such solutions
are always immediate-need-driven and efficiency
may matter very little.  If you need only two
copies, it's perfectly reasonable to use some sort
of brute force method, while expecting the cost
per copy to be extremely high, compared to the
cost per copy of an optimized run of 20,000.
However, in cases where a comparatively high
upfront "setup charge" for a given information set
is not an obstacle, it's nice to know there's a
technologically supportable paradigm, itself
supported by a cogent philosophy, for developing a
solution.  The specific algorithms and data
structures will suggest themselves when the grove
paradigm is thoughtfully applied, and in every
situation, they will be somewhat different.

(There are, of course, some archetypical solutions
that one should know about and apply, such as the
notions of hyperlinking, addressing, use by
reference, namespace management, activity
policies, popular graphical user interfaces,
widespread display protocols like HTML, etc. etc.
Significant efficiency enhancements can often be
realized by applying existing methodologies,
existing interchange syntaxes, and existing


Steven R. Newcomb, President, TechnoTeacher, Inc.
srn@techno.com  http://www.techno.com  ftp.techno.com

voice: +1 972 231 4098 (at ISOGEN: +1 214 953 0004 x137)
fax    +1 972 994 0087 (at ISOGEN: +1 214 953 3152)

3615 Tanner Lane
Richardson, Texas 75082-2618 USA