table (ascii text) lin ayout recognition
James Stroud
jstroud at mbi.ucla.edu
Wed Sep 13 01:52:00 EDT 2006
vbfoobar at gmail.com wrote:
> Hello,
>
> I am looking for python code useful to process
> tables that are in ASCII text. The code must
> determine where are the columns (fields).
> Concerned tables for my application are various,
> but their columns are not very complicated
> to locate for a human, because even
> when ignoring the semantic of words,
> our eyes see vertical alignments
>
> Here is a sample table (must be viewed
> with fixed-width font to see alignments):
> =================================
>
> 44544 ipod apple black 102
> GFGFHHF-12 unknown thing bizar brick mortar tbc
> 45fjk do not know + is less biac
> disk seagate 250GB 130
> 5G_gff tbd tbd
> gjgh88hgg media record a and b 12
> hjj foo bar hop zip
> hg uy oi hj uuu ii a qqq ccc v ZZZ Ughj
> qdsd zert nope nope
>
> =================================
>
> I want the python code that builds a representation
> of this table (for exemple a list of lists, where each list
> represents a table line, each element of the list
> being a field value).
>
> Any hints?
> thanks
>
I have to catch a bus, but, quickly the algorithm is to code non-space
as one and space as zero, then 'or' operate down the columns. Zeros will
indicate high probability of between-column. Code tomorrow if no one
else posts.
Must run...
--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095
http://www.jamesstroud.com/
More information about the Python-list
mailing list