General Purpose Pipeline library?
Friedrich Rentsch
anthra.norell at bluewin.ch
Wed Nov 22 05:42:20 EST 2017
On 11/22/2017 10:54 AM, Friedrich Rentsch wrote:
>
>
> On 11/21/2017 03:26 PM, Jason wrote:
>> On Monday, November 20, 2017 at 10:49:01 AM UTC-5, Jason wrote:
>>> a pipeline can be described as a sequence of functions that are
>>> applied to an input with each subsequent function getting the output
>>> of the preceding function:
>>>
>>> out = f6(f5(f4(f3(f2(f1(in))))))
>>>
>>> However this isn't very readable and does not support conditionals.
>>>
>>> Tensorflow has tensor-focused pipepines:
>>> fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu,
>>> scope='fc1')
>>> fc2 = layers.fully_connected(fc1, 256,
>>> activation_fn=tf.nn.relu, scope='fc2')
>>> out = layers.fully_connected(fc2, 10, activation_fn=None,
>>> scope='out')
>>>
>>> I have some code which allows me to mimic this, but with an implied
>>> parameter.
>>>
>>> def executePipeline(steps, collection_funcs = [map, filter, reduce]):
>>> results = None
>>> for step in steps:
>>> func = step[0]
>>> params = step[1]
>>> if func in collection_funcs:
>>> print func, params[0]
>>> results = func(functools.partial(params[0],
>>> *params[1:]), results)
>>> else:
>>> print func
>>> if results is None:
>>> results = func(*params)
>>> else:
>>> results = func(*(params+(results,)))
>>> return results
>>>
>>> executePipeline( [
>>> (read_rows, (in_file,)),
>>> (map, (lower_row, field)),
>>> (stash_rows, ('stashed_file', )),
>>> (map, (lemmatize_row, field)),
>>> (vectorize_rows, (field, min_count,)),
>>> (evaluate_rows, (weights, None)),
>>> (recombine_rows, ('stashed_file', )),
>>> (write_rows, (out_file,))
>>> ]
>>> )
>>>
>>> Which gets me close, but I can't control where rows gets passed in.
>>> In the above code, it is always the last parameter.
>>>
>>> I feel like I'm reinventing a wheel here. I was wondering if
>>> there's already something that exists?
>> Why do I want this? Because I'm tired of writing code that is locked
>> away in a bespoke function. I'd have an army of functions all
>> slightly different in functionality. I require flexibility in
>> defining pipelines, and I don't want a custom pipeline to require any
>> low-level coding. I just want to feed a sequence of functions to a
>> script and have it process it. A middle ground between the shell |
>> operator and bespoke python code. Sure, I could write many binaries
>> bound by shell, but there are some things done far easier in python
>> because of its extensive libraries and it can exist throughout the
>> execution of the pipeline whereas any temporary persistence has to
>> be though environment variables or files.
>>
>> Well after examining your feedback, it looks like Grapevine has 99%
>> of the concepts that I wanted to invent, even if the | operator seems
>> a bit clunky. I personally prefer the affluent interface convention.
>> But this should work.
>>
>> Kamaelia could also work, but it seems a little bit more grandiose.
>>
>>
>> Thanks everyone who chimed in!
>
> This looks very much like I what I have been working on of late: a
> generic processing paradigm based on chainable building blocks. I call
> them Workshops, because the base class can be thought of as a workshop
> that takes some raw material, processes it and delivers the product
> (to the next in line). Your example might look something like this:
>
> >>> import workshops as WS
>
> >>> Vectorizer = WS.Chain (
> WS.File_Reader (), # WS provides
> WS.Map (lower_row), # WS provides (wrapped builtin)
> Row_Stasher (), # You provide
> WS.Map (lemmatize_row), # WS provides
> Row_Vectorizer (), # Yours
> Row_Evaluator (), # Yours
> Row_Recombiner (),
> WS.File_Writer (),
> _name = 'Vectorizer'
> )
>
> Parameters are process-control settings that travel through a
> subscription-based mailing system separate from the payload pipe.
>
> >>> Vectorizer.post (min_count = ..., ) # Set all parameters that
> control the entire run.
> >>> Vectorizer.post ("File_Writer", file_name =
> 'output_file_name') # Addressed, not meant for File_Reader
>
> Run
>
> >>> Vectorizer ('input_file_name') # File Writer returns 0 if
> the Chain completes successfully.
> 0
>
> If you would provide a list of your functions (input, output,
> parameters) I'd be happy to show a functioning solution. Writing a
> Shop follows a simple standard pattern: Naming the subscriptions, if
> any, and writing a single method that reads the subscribed parameters,
> if any, then takes payload, processes it and returns the product.
>
> I intend to share the system, provided there's an interest. I'd
> have to tidy it up quite a bit, though, before daring to release it.
>
> There's a lot more to it . . .
>
> Frederic
>
I'm sorry, I made a mistake with the "From" item. My address is
obviously not "python-list". It is "anthra.norell at bluewin.ch".
Frederic
More information about the Python-list
mailing list