Data Manipulation - Rows to Columns
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Tue Feb 5 22:52:08 EST 2008
En Wed, 06 Feb 2008 00:54:49 -0200, Tess <testone at gmail.com> escribió:
> I have a text file with marked up data that I need to convert into a
> text tab separated file.
>
> The structure of the input file is listed below (see file 1) and the
> desired output file is below as well (see file 2).
>
> I am a complete novice with python and would appreciate any tips you
> may be able to provide.
>
>
> file 1:
> <item>TABLE</table>
> <color>black</color>
> <color>blue</color>
> <color>red</color>
> <item>CHAIR</table>
> <color>yellow</color>
> <color>black</color>
> <color>red</color>
> <item>TABLE</table>
> <color>white</color>
> <color>gray</color>
> <color>pink</color>
Are you sure it says <item>...</table>?
Are ALWAYS three colors per item, as in your example? If this is the case,
just read groups of 4 lines and ignore the tags.
> file 2 (tab separated):
> TABLE black blue red
> CHAIR yellow black red
> TABLE white gray pink
The best way to produce this output is using the csv module:
http://docs.python.org/lib/module-csv.html
So we need a list of rows, being each row a list of column data. A simple
way of building such structure from the input file would be:
rows = []
row = None
for line in open('file1.txt'):
line = line.strip() # remove leading and trailing whitespace
if line.startswith('<item>'):
if row: rows.append(row)
j = row.index("</")
item = row[6:j]
row = [item]
elif line.startswith('<color>'):
j = row.index("</")
color = row[7:j]
row.append(color)
else:
raise ValueError, "can't understand line: %r" % line
if row: rows.append(row)
This allows for a variable number of "color" lines per item. Once the
`rows` list is built, we only have to create a csv writer for the right
dialect ('excel_tab' looks promising) and feed the rows to it:
import csv
fout = open('file2.txt', 'wb')
writer = csv.writer(fout, dialect='excel_tab')
writer.writerows(rows)
That's all folks!
--
Gabriel Genellina
More information about the Python-list
mailing list