General Purpose Pipeline library?
Jason
jasonhihn at gmail.com
Mon Nov 20 10:48:48 EST 2017
a pipeline can be described as a sequence of functions that are applied to an input with each subsequent function getting the output of the preceding function:
out = f6(f5(f4(f3(f2(f1(in))))))
However this isn't very readable and does not support conditionals.
Tensorflow has tensor-focused pipepines:
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2')
out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
I have some code which allows me to mimic this, but with an implied parameter.
def executePipeline(steps, collection_funcs = [map, filter, reduce]):
results = None
for step in steps:
func = step[0]
params = step[1]
if func in collection_funcs:
print func, params[0]
results = func(functools.partial(params[0], *params[1:]), results)
else:
print func
if results is None:
results = func(*params)
else:
results = func(*(params+(results,)))
return results
executePipeline( [
(read_rows, (in_file,)),
(map, (lower_row, field)),
(stash_rows, ('stashed_file', )),
(map, (lemmatize_row, field)),
(vectorize_rows, (field, min_count,)),
(evaluate_rows, (weights, None)),
(recombine_rows, ('stashed_file', )),
(write_rows, (out_file,))
]
)
Which gets me close, but I can't control where rows gets passed in. In the above code, it is always the last parameter.
I feel like I'm reinventing a wheel here. I was wondering if there's already something that exists?
More information about the Python-list
mailing list