General Purpose Pipeline library?
Bob Gailer
bgailer at gmail.com
Mon Nov 20 11:23:41 EST 2017
On Nov 20, 2017 10:50 AM, "Jason" <jasonhihn at gmail.com> wrote:
>
> a pipeline can be described as a sequence of functions that are applied
to an input with each subsequent function getting the output of the
preceding function:
>
> out = f6(f5(f4(f3(f2(f1(in))))))
>
> However this isn't very readable and does not support conditionals.
>
> Tensorflow has tensor-focused pipepines:
> fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu,
scope='fc1')
> fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu,
scope='fc2')
> out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
>
> I have some code which allows me to mimic this, but with an implied
parameter.
>
> def executePipeline(steps, collection_funcs = [map, filter, reduce]):
> results = None
> for step in steps:
> func = step[0]
> params = step[1]
> if func in collection_funcs:
> print func, params[0]
> results = func(functools.partial(params[0],
*params[1:]), results)
> else:
> print func
> if results is None:
> results = func(*params)
> else:
> results = func(*(params+(results,)))
> return results
>
> executePipeline( [
> (read_rows, (in_file,)),
> (map, (lower_row, field)),
> (stash_rows, ('stashed_file', )),
> (map, (lemmatize_row, field)),
> (vectorize_rows, (field, min_count,)),
> (evaluate_rows, (weights, None)),
> (recombine_rows, ('stashed_file', )),
> (write_rows, (out_file,))
> ]
> )
>
> Which gets me close, but I can't control where rows gets passed in. In
the above code, it is always the last parameter.
>
> I feel like I'm reinventing a wheel here. I was wondering if there's
already something that exists?
IBM has had for a very long time a program called Pipelines which runs on
IBM mainframes. It does what you want.
A number of attempts have been made to create cross-platform versions of
this marvelous program.
A long time ago I started but never completed an open source python
version. If you are interested in taking a look at this let me know.
More information about the Python-list
mailing list