fixing an horrific formatted csv file.
F.R.
anthra.norell at bluewin.ch
Wed Jul 2 11:51:05 EDT 2014
On 07/02/2014 11:13 AM, flebber wrote:
>>>>> TM = TX.Table_Maker (headings =
>> ('Meeting','Date','Race','Number','Name','Trainer','Location'))
>>>>> TM (race_table (your_csv_text)).write ()
> Where do I find TX? Found this mention in the list, was it available in pip by any name?
> https://mail.python.org/pipermail/python-list/2014-February/667464.html
>
> Sayth
I'd have to make it available. I proposed it some time ago and received
a couple of suggestions in return. It is a modular transformation
framework written entirely in python (2.7). It consists essentially of a
base class "Transformer" that handles input and output in such a way
that Transformer objects can be chained. It saved me from drowning an a
horrible and growing tangle of hacks. Finding something usable I had
previously done took time. Understanding how it worked took more time
and adapting it took still more time, so that writing yet another hack
from scratch was faster.
A number of hacks I could quickly wrap into a Transformer object
and so could start building a library of standard Transformers. The
Table_Maker is one of them. The table making code is quite bad. It
suffers from feature overload. I would clean it up for distribution.
I'd be happy to distribute the base class and a few standard
Translators, such as I use every day. (File Reader, File Writer, DB Run
Command, DB Write, Table Maker, PDF To Text, Text To Lines, Lines To
Text, Sort, Sort And Unique, etc.) Writing one's own Transformers is a
breeze. Testing too, because a Transformer keeps its input and output
and, in line with the system's design philosophy, does only its own
single thing.
A Chain is a list of Transformers that run in sequence. It is
itself derived from Transformer and is a functional equivalent. So
Chains nest. Fixing a Chain that nothing comes out of is a
straightforward matter too. It will still have run up to the failing
element. Chain.show () reveals the culprit as the first one to have no
output.
I am not up to date on distributing and would depend on qualified
help on that.
Frederic
--------------------------------------------------------------------------------
A brief overview
The TX solution to your race table would be (TX is the name of the module):
class Race_Table (TX.Transformer):
'''
In: CSV text
Out: Tabular data (2-dimensional list)
'''
name = 'Race_Table'
@TX.setup # Checks timestamps to prevent needless reruns in
the absence of new input
def transform (self):
for line in self.Input.data:
# See my post
self.Output.take (output_table)
Example file to file:
>>> Race_Schedule_F2F = TX.Chain (TX.File_Reader (), Race_Table (),
TX.List_To_CSV (delimiter = ';'), TX.File_Writer (terminal = out_file_name)
>>> Race_Schedule_F2F (input_file_name) # Does it all!
Example web to database:
>>> Race_Schedule_WWW2DB = TX.Chain (TX.WWW_Reader (),
Race_Schedule_HTML_Reader (), Race_Table (), TX.DB_Writer (table_name =
'horses'))
>>> Race_Schedule_WWW2DB (url) # Does is all! You'd have to write
the Race_Schedule_HTML_Reader
Verify your table:
>>> Table_Viewer = TX.Chain (TX.Table_Maker (), TX.Table_Writer ())
>>> Race_Schedule_WWW2DB.show_tree () # See which one should display
Chain
Chain[0] - WWW Reader
Chain[1] - Race_Schedule_HTML_Reader
Chain[2] - Race_Table
Chain[3] - DB Writer
>>> print Table_Viewer (Race_Schedule_WWW2DB[2]()) # All
Transformers keep their data
(Display of table)
Verify database:
>>> print Table_Viewer (TX.DB_Reader (table_name = 'horses')())
(Display of database table)
More information about the Python-list
mailing list