[XML-SIG] Developer's Day

Thomas B. Passin tpassin@idsonline.com
Thu, 16 Dec 1999 23:23:55 -0500

David Niergarth wrote:
> Thanks for the pointer to REBOL -- I hadn't heard of it before. In your
> post to the XML-SIG you mentioned
Actually, REBOL looks very interesting.  There isn't enough documentation as
yet so the learning curve is on the steep side.  At the risk of being
off-topic (and off-Python), I'm including a REBOL script - my only one- that
retrieves a URL, and extracts a particular section from the html.  The
section REBOL[...] is essentially a comment.
 Title: "Zone Forecast Extractor"
 File: %zone.r
 Purpose: {Extract the Virginia Zone Forecast and display it.}

zone: read http://iwin.nws.noaa.gov/iwin/va/zone.html

print ""
{parse zone [thru <title> copy result to </title>]
print result}

print "Current Fairfax County Zone Forecast"

fairfax: find zone "Fairfax"
parse fairfax [thru "..." copy forecast to "$$"]

print forecast


My point is not to urge anyone to switch from Python to REBOL, but to
illustrate how simple it can be.  Getting a url in Python isn't much more
involved if you ignore error handling.  My point is that the REBOL folks
have decided that their system will support standard network operations in a
built-in way, just like filehandling is built in.  Whether Python does it
with standard libraries or  built-in functions and types, basic url and xml
handling should come included and easy to use.

> > Shallow parsing is a bit wierd but easy.
> I'm curious what you mean by "shallow parsing". It's a topic I haven't
> seen mentioned on the list except for a posting I made a while back (by
> me) pointing out an article by Robert D. Cameron related to "shallow
> parsing" XML with regular expressions ( ftp://fas.sfu.ca/pub/cs

Yes, I got the phrase from your post and the link was very helpful - thank
you very much.

> TR/1998/CMPT1998-17.html ). More generally shallow parsing seems to be
> mentioned in the context of parsing natural languages. I'd be interested
> in understanding what you mean by it or in what domain you've used it.

I take "shallow parsing" to mean getting the elements and their content but
not the nested hierarchical structure.
For example, I have a case where a spreadsheet is translated into xml (yes,
really!).  Each row becomes an element, and each cell in the row becomes a
child element of the row. In this case each row is independent and on a
separate line in the file.  So line by line I just extract each named
element using regular expressions, knowing in advance that each element has
no children of its own.  Fast and easy - shallow parsing.

Tom Passin