
[M.-A. Lemburg]
What is QIO ?
See DejaNews (I don't save URLs). "Quick" line-oriented text input adapted from INN. Someone rewrote that as a Python extension module.
Looks nice indeed, but how does executable code fit into that definition ?
See the URL above I didn't save <wink>. PARSE's "pattern" argument is a block. Blocks can be (& often are) nested. Whether any given block is code or data is all the same to REBOL, so passing nested code blocks in PARSE's pattern argument is easy. Because blocks are lexically scoped, assignments (etc) inside a block are (well, can be) visible to its context; etc. It's a very Lispish approach. REBOL is essentially Scheme under the covers, but with syntax much more like Forth's (whitespace-separated strings of arbitrary non-whitespace characters, with few pre-assigned meanings or restrictions -- in fact, it's impossible for a compiler to determine where a REBOL function call begins or ends! can't be known until runtime).
(mxTextTools allows you to write your own parsing elements in Python, BTW; it should be possible to use those mechanisms to achieve a similar intergration.)
It can't capture the flavor -- although I don't know that it needs to <wink>. There's no distinction between "the pattern language" and "the computational language" in REBOL or Icon, and it's hard to explain what a maddening distinction that can be once you've lived without it. mxTextTools embedding would feel more like Icon, where the matching engine is fully exposed to the programmer (REBOL hides it, allowing only "approved" interactions).
OTOH, making lots of calls to analyze short strings is slow.
That's why mxTextTools converts these search idioms into byte codes which it executes at C level. Some future version will even "precompile" the tuple input and then omit the type checks during the search...that should give another noticeable speedup. Note that recursion etc. can be done at C level too -- Python function calls are not needed.
That's also the curse of having distinct languages; e.g., Python already had recursion, but you needed to reimplement it in a different way with different syntax and different rules in your pattern language. In Icon etc, there's no difference between a recursive pattern and a recursive function, except in *what* it computes. The machinery is all the same, and both more powerful and easier to learn because of that.
... Just for kicks, here is the mysplit() function using mxTextTools:
from mx.TextTools import *
table = ( # Match all whitespace (None,AllInSet,whitespace_set,+1), # Match and tag all non-whitespace ('text',AllInSet + AppendMatch,nonwhitespace_set,+1), # Loop until EOF (None,EOF,Here,-2), )
def mysplit(text):
return tag(text,table)[1]
The timings: mysplit: 5.84 sec. string.split: 3.62 sec.
Note that you can customize the above to split text at any character set you like, not just whitespace... without compiling or writing C code.
That's equally true of the example I posted <wink>. Now what if I wanted to stop splitting right after I find a keyword, recognized as such because it's a key in some passed-in dictionary? In my example, I make an obvious local code change, from while s.notmany(white): # consume non-whitespace result.append(s.get_match()) s.many(white) to while s.notmany(white): # consume non-whitespace word = s.get_match() result.append(word) if dictionary.has_key(word): break s.many(white) What does it do to your example? Or what if the target string isn't "a string" (the code I posted only assumes the "str" object responds to indexing and slicing -- any buffer object is fine -- so my example doesn't change at all)? Or what if you need to pass the tokens on as they're found, pipeline style? Etc. This is why I do complex string processing in Icon <0.9 wink>. OTOH, at what it does well, mxTextTools runs quicker than Icon. Its biggest problem has always been that e.g. nobody knows what the hell (None,EOF,Here,-2), *means* at first glance -- or third <wink>. an-extreme-on-the-transparency-vs-speed-curve-ly y'rs - tim