data parsing

Sat Feb 24 16:52:03 EST 2001

import re
s="""
Joe|25|30|49|40|
|28|39|71||
|30|29|||
Malcolm|43|60|56||
|28|37|||
Amy||70|45||
|40|30||40
|40||30||
"""
s=s.replace("\012", "")
    # Assumes records start with text and consist of numbers
    # Match the name with \w+
    # Match as many characters as possible until a character not belonging
to a-z or A-Z
    # This should return your records as I understand them.
res1=re.findall("\w+[^a-zA-Z]+", s)
    # Split each record on "|" and join them back with ";"
    # This only works with Python2.0 or higher
res2=[";".join(i.split("|")) for i in res1]
for i in res2:
    print i

######### output
# Not the same as you requested because I don't understand your request.
Joe;25;30;49;40;;28;39;71;;;30;29;;;
Malcolm;43;60;56;;;28;37;;;
Amy;;70;45;;;40;30;;40;40;;30;;

--Darrell

"Gnanasekaran Thoppae" <gnana at mips.biochem.mpg.de> wrote in message
news:mailman.982950184.14469.python-list at python.org...
> Hi,
>
> I have some data in a file 'test', which contains:
>
> Joe|25|30|49|40|
> |28|39|71||
> |30|29|||
> Malcolm|43|60|56||
> |28|37|||
> Amy||70|45||
> |40|30||40
> |40||30||
>
> This is basically a multi line (record) values that belong
> to the first line that starts with a filled field.
>
> This is part belongs to joe 'Joe'
>
> Joe|25|30|49|40|
> |28|39|71||
> |30|29|||
>
> This to 'Malcolm'
> Malcolm|43|60|56||
> |28|37|||
>
> and the rest to Amy.
>
> I want to parse this data and format it in this way:
>
> Joe|25;28;30|30;39;29|49;71|40|
> Malcolm|43;28|60;37|56||
> Amy|40;40|70;30|45;;30|;40;|
>
> Basically speaking, I am trying to cluster multi record
> data into one field, each seperated by a delimiter ';' and
> if the field is empty, an empty ; will enable later on to
> decode the field as empty field ''.
>
> Thanks.
>
> -gnana
>
>
>