Formatting Results so that They Can be Nicely Imported into a Spreadsheet.

Sun Aug 5 14:33:35 EDT 2007

Mensanator at aol.com wrote:
> In a message dated 8/4/2007 11:50:05 PM Central Daylight Time,
> ricaraoz at gmail.com writes:
> 
>     mensanator at aol.com wrote:
>     > On Aug 4, 6:35?pm, SMERSH009 <SMERSH0... at gmail.com> wrote:
>     >> Hi All.
>     >> Let's say I have some badly formatted text called doc:
>     >>
>     >> doc=
>     >> """
>     >> friendid
>     >> Female
>     >>
>     >>                             23 years old
>     >>
>     >>                             Los Gatos
>     >>
>     >>                             United States
>     >> friendid
>     >> Male
>     >>
>     >>                             24 years old
>     >>
>     >>                             San Francisco, California
>     >>
>     >>                             United States
>     >> """
>     >>
>     >> How would I get these results to be displayed in a format similar to:
>     >> friendid;Female;23 years old;Los Gatos;United States
>     >> friendid;Male; 24 years old;San Francisco, California;United States
>     >>
>     >> The latter is a lot easier to organize and can be quickly imported
>     >> into Excel's column format.
>     >>
>     >> Thanks Much,
>     >> Sam
>     >
>     > d = doc.split('\n')
>     >
>     > f = [i.split() for i in d if i]
>     >
>     > g = [' '.join(i) for i in f]
>     >
>     > rec = []
>     > temprec = []
>     > for i in g:
>     >     if i:
>     >         if i == 'friendid':
>     >             rec.append(temprec)
>     >             temprec = [i]
>     >         else:
>     >             temprec.append(i)
>     > rec.append(temprec)
>     >
>     > output = [';'.join(i) for i in rec if i]
>     >
>     > for i in output: print i
>     >
>     > ##    friendid;Female;23 years old;Los Gatos;United States
>     > ##    friendid;Male;24 years old;San Francisco, California;United
>     > States
>     >
> 
>     Also :
> 
>     docList = [ i.strip() for i in doc.split('\n') if i.strip()]
> 
>     lines = [i for i in xrange(len(docList)) if docList[i] ==
>     'friendid']+[len(docList)]
> 
>     docOut = ''
>     for k in [docList[lines[j]:lines[j+1]] for j in xrange(len(lines)-1)]:
>         docOut += '\n' + ';'.join(k)
> 
>     docOut = docOut[1:]    # Get rid of initial '\n'
> 
> Aren't you making an unwarranted assumption here?
> That doc ALWAYS starts with EXACTLY one blank line?

I guess you are referring to :
docList = [ i.strip() for i in doc.split('\n') if i.strip()]

Actually no, "doc.split('\n')" will split it a list member per line,
whether there are blank lines or not. Then "i.strip()" will take care of
redundant space, and "if i.strip()" will take care of blank lines. Try
it in your shell, play with "doc" and you'll see (I did try it without
the first blank line, and with a lot of blank lines and it's ok).

>  
> That's why I didn't use a list comprehension in that
> one section, to cover the possibility of any number
> (including none) of blank lines.

It works, didn't make a thorough testing but the cases you suggest were
tested.

>  
> By blindly coding [1:] you run the risk of data loss,
> and that's a poor example for the OP.

It is not "blindly". I guarantee the first byte will ALLWAYS be a '\n',
that's because I'm putting it there with "docOut += '\n' + ';'.join(k)"
(check the '\n' added at the beginning of each cycle), and I need to
strip the beginning '\n' which is not necessary.

Cheers

Ricardo