Stream programming

Sat Mar 24 12:05:28 CET 2012

On 3/24/2012 4:23, Steven D'Aprano wrote:
> On Fri, 23 Mar 2012 17:00:23 +0100, Kiuhnm wrote:
>> I've been writing a little library for handling streams as an excuse for
>> doing a little OOP with Python.
>> I don't share some of the views on readability expressed on this ng.
>> Indeed, I believe that a piece of code may very well start as complete
>> gibberish and become a pleasure to read after some additional
>> information is provided.
> [...]
>> numbers - push - avrg - 'med' - pop - filter(lt('med'), ge('med'))\
>>       - ['same', 'same'] - streams(cat) - 'same'
>> Ok, we're at the "complete gibberish" phase.
>> Time to give you the "additional information".
> There are multiple problems with your DSL. Having read your explanation,
> and subsequent posts, I think I understand the data model, but the syntax
> itself is not very good and far from readable. It is just too hard to
> reason about the code.
> Your syntax conflicts with established, far more common, use of the same
> syntax: you use - to mean "call a function" and | to join two or more
> streams into a flow.
> You also use () for calling functions, and the difference between - and
> () isn't clear. So a mystery there -- your DSL seems to have different
> function syntax, depending on... what?
> The semantics are unclear even after your examples. To understand your
> syntax, you give examples, but to understand the examples, the reader
> needs to understand the syntax. That suggests that the semantics are
> unclear even in your own mind, or at least too difficult to explain in
> simple examples.
> Take this example:
>> Flows can be saved (push) and restored (pop) :
>>     [1,2,3,4] - push - by(2) - 'double' - pop | val('double')
>>         <=>  [1,2,3,4] | [2,4,6,8]
> What the hell does that mean? The reader initially doesn't know what
> *any* of push, by(2), pop or val('double') means. All they see is an
> obfuscated series of calls that starts with a stream as input, makes a
> copy of it, and doubles the entries in the copy: you make FIVE function
> calls to perform TWO conceptual operations. So the reader can't even map
> a function call to a result.
> With careful thought and further explanations from you, the reader (me)
> eventually gets a mental model here. Your DSL has a single input which is
> pipelined through a series of function calls by the - operator, plus a
> separate stack. (I initially thought that, like Forth, your DSL was stack
> based. But it isn't, is it?)
> It seems to me that the - operator is only needed as syntactic sugar to
> avoid using reverse Polish notation and an implicit stack. Instead of the
> Forth-like:
> [1,2,3,4] dup 2 *
> your DSL has an explicit stack, and an explicit - operator to call a
> function. Presumably "[1,2] push" would be a syntax error.
> I think this is a good example of an inferior syntax. Contrast your:
> [1,2,3,4] - push - by(2) - 'double' - pop | val('double')
> with the equivalent RPL:
> [1,2,3,4] dup 2 *

I was just explaining how push and pop work.
I also said that
   [1,2,3,4] - [id,by(2)]
would be the recommended way to do it.

> Now *that* is a pleasure to read, once you wrap your head around reverse
> Polish notation and the concept of a stack. Which you need in your DSL
> anyway, to understand push and pop.

I don't see why. Push and pop are not needed. They're just handful 
mainly to modify a flow, collect a result, and go back to how the flow 
was before the push.
It has nothing to do with RPN (which RPL is based on).

> You say that this is an "easier way to get the same result":
> [1,2,3,4] - [id, by(2)]
> but it isn't, is it? The more complex example above ends up with two
> streams joined in a single flow:
> [1,2,3,4]|[2,4,6,8]
> whereas the shorter version using the magic "id" gives you a single
> stream containing nested streams:
> [[1,2,3,4], [2,4,6,8]]

Says who?

Here are the rules again:
A flow can be transformed:
   [1,2] - f <=> [f(1),f(2)]
   ([1,2] | [3,4]) - f <=> [f(1,3),f(2,4)]
   ([1,2] | [3,4]) - [f] <=> [f(1),f(2)] | [f(3),f(4)]
   ([1,2] | [3,4]) - [f,g] <=> [f(1),f(2)] | [g(3),g(4)]
   [1,2] - [f,g] <=> [f(1),f(2)] | [g(1),g(2)]

Read the last line.
What's very interesting, is that [f,g] is an iterable as well, so your 
functions can be generated as needed.

> So, how could you make this more readable?
> * Don't fight the reader's expectations. If they've programmed in Unix
> shells, they expect | as the pipelining operator. If they haven't, they
> probably will find>>  easy to read as a dataflow operator. Either way,
> they're probably used to seeing a|b as meaning "or" (as in "this stream,
> or this stream") rather than the way you seem to be using it ("this
> stream, and this stream").
> Here's my first attempt at improved syntax that doesn't fight the user:
> [1,2,3,4]>>  push>>  by(2)>>  'double'>>  pop&  val('double')

There are problems with your syntax.
[...]+[...] - f + [...] - g - h + [...] - i + [...]
((([...]+[...] >> f) + [...] >> g >> h) + [...] >> i) + [...]
I first tried to use '<<' and '>>' but '+' and '-' are much better.

> "push" and "pop" are poor choices of words. Push does not actually push
> its input onto the stack, which would leave the input stream empty. It
> makes a copy. You explain what they do:

Why should push move and not copy? In asm and openGL they copy, for 

> "Flows can be saved (push) and restored (pop)"
> so why not just use SAVE and RESTORE as your functions? Or if they're too
> verbose, STO and RCL, or my preference, store and recall.

Because that's not what they do.
push and pop actually push and pop, i.e. they can be nested and work as 

> [1,2,3,4]>>  store>>  by(2)>>  'double'>>  recall&  val('double')
> I'm still not happy with&  for the join operator. I think that the use of
> + for concatenate and&  for join is just one of those arbitrary choices
> that the user will have to learn. Although I'm tempted to try using a
> colon instead.
> [1,2,3]:[4,5,6]
> would be a flow with two streams.

I can't see a way to overload ':' in Python. There are also technical 

> I don't like the syntax for defining and using names. Here's a random
> thought:
> [1,2,3,4]>>  store>>  by(2)>>  @double>>  recall&  double
> Use @name to store to a name, and the name alone to retrieve from it. But
> I haven't given this too much thought, so it too might suck.

The problem, again, is Python limitation in defining DSLs.
At this point, one would have to interpret command-strings. I was trying 
to avoid an interpreter on an interpreter.

> Some other problems with your DSL:
>> A flow can be transformed:
>>     [1,2] - f<=>  [f(1),f(2)]
> but that's not consistently true. For instance:
> [1,2] - push<=/=>   [push(1), push(2)]

push is a special function (a keyword). It's clear what it does. It's 
just an exception to the general rule.

> So the reader needs to know all the semantics of the particular function
> f before being able to reason about the flow.

No, he only has to know special functions. Those are practically keywords.

>> Some functions are special and almost any function can be made special:
>>     [1,2,3,4,5] - filter(isprime)<=>  [2,3,5]
>>     [[],(1,2),[3,4,5]] - flatten<=>  [1,2,3,4,5]
> You say that as if it were a good thing.

It is, because it's never implicit. For instance, isprime is a filter. 
flatten is a special builtin function (a keyword).


More information about the Python-list mailing list