[XML-SIG] Developer's Day
Thomas B. Passin
tpassin@idsonline.com
Thu, 16 Dec 1999 23:23:55 -0500
David Niergarth wrote:
>
> Thanks for the pointer to REBOL -- I hadn't heard of it before. In your
> post to the XML-SIG you mentioned
>
Actually, REBOL looks very interesting. There isn't enough documentation as
yet so the learning curve is on the steep side. At the risk of being
off-topic (and off-Python), I'm including a REBOL script - my only one- that
retrieves a URL, and extracts a particular section from the html. The
section REBOL[...] is essentially a comment.
----------------------------------------------------------------------------
-----------------------------------------
REBOL [
Title: "Zone Forecast Extractor"
File: %zone.r
Purpose: {Extract the Virginia Zone Forecast and display it.}
]
zone: read http://iwin.nws.noaa.gov/iwin/va/zone.html
print ""
{parse zone [thru <title> copy result to </title>]
print result}
print "Current Fairfax County Zone Forecast"
fairfax: find zone "Fairfax"
parse fairfax [thru "..." copy forecast to "$$"]
print forecast
----------------------------------------------------------------------------
----------------------------------------
My point is not to urge anyone to switch from Python to REBOL, but to
illustrate how simple it can be. Getting a url in Python isn't much more
involved if you ignore error handling. My point is that the REBOL folks
have decided that their system will support standard network operations in a
built-in way, just like filehandling is built in. Whether Python does it
with standard libraries or built-in functions and types, basic url and xml
handling should come included and easy to use.
> > Shallow parsing is a bit wierd but easy.
>
> I'm curious what you mean by "shallow parsing". It's a topic I haven't
> seen mentioned on the list except for a posting I made a while back (by
> me) pointing out an article by Robert D. Cameron related to "shallow
> parsing" XML with regular expressions ( ftp://fas.sfu.ca/pub/cs
Yes, I got the phrase from your post and the link was very helpful - thank
you very much.
> TR/1998/CMPT1998-17.html ). More generally shallow parsing seems to be
> mentioned in the context of parsing natural languages. I'd be interested
> in understanding what you mean by it or in what domain you've used it.
I take "shallow parsing" to mean getting the elements and their content but
not the nested hierarchical structure.
For example, I have a case where a spreadsheet is translated into xml (yes,
really!). Each row becomes an element, and each cell in the row becomes a
child element of the row. In this case each row is independent and on a
separate line in the file. So line by line I just extract each named
element using regular expressions, knowing in advance that each element has
no children of its own. Fast and easy - shallow parsing.
>
Tom Passin