Formatting Results so that They Can be Nicely Imported into a Spreadsheet.

SMERSH009 SMERSH009X at gmail.com
Sun Aug 5 05:06:54 EDT 2007


On Aug 4, 8:25 pm, "mensana... at aol.com" <mensana... at aol.com> wrote:
> On Aug 4, 9:21?pm, "Jim Langston" <tazmas... at rocketmail.com> wrote:
>
>
>
> > <mensana... at aol.com> wrote in message
>
> >news:1186278638.931477.39760 at z24g2000prh.googlegroups.com...
>
> > > On Aug 4, 6:35?pm, SMERSH009 <SMERSH0... at gmail.com> wrote:
> > >> Hi All.
> > >> Let's say I have some badly formatted text called doc:
>
> > >> doc=
> > >> """
> > >> friendid
> > >> Female
>
> > >>                             23 years old
>
> > >>                             Los Gatos
>
> > >>                             United States
> > >> friendid
> > >> Male
>
> > >>                             24 years old
>
> > >>                             San Francisco, California
>
> > >>                             United States
> > >> """
>
> > >> How would I get these results to be displayed in a format similar to:
> > >> friendid;Female;23 years old;Los Gatos;United States
> > >> friendid;Male; 24 years old;San Francisco, California;United States
>
> > >> The latter is a lot easier to organize and can be quickly imported
> > >> into Excel's column format.
>
> > >> Thanks Much,
> > >> Sam
>
> > > d = doc.split('\n')
>
> > > f = [i.split() for i in d if i]
>
> > > g = [' '.join(i) for i in f]
>
> > > rec = []
> > > temprec = []
> > > for i in g:
> > >    if i:
> > >        if i == 'friendid':
> > >            rec.append(temprec)
> > >            temprec = [i]
> > >        else:
> > >            temprec.append(i)
> > > rec.append(temprec)
>
> > > output = [';'.join(i) for i in rec if i]
>
> > > for i in output: print i
>
> > > ##    friendid;Female;23 years old;Los Gatos;United States
> > > ##    friendid;Male;24 years old;San Francisco, California;United States
>
> > also, I would suggest you use CSV format.
>
> Well, the OP asked for a specific format. One is not
> always at liberty to change it.
>
> > CSV stands for "Comma Seperated
> > Variable" and Excel can load such a sheet directly.
>
> And Excel can load the shown format directly also,
> just specify the delimiter.
>
>
>
> > Instead of seperating using ; seperate using ,  Of course, this provides a
> > problem when there is a , in a string.
>
> Which explains the popularity of using tabs as delimiters.
> The data deliverable specification I use at work
> uses the pipe character | which never appears as data
> in this particular application.
>
> > Resolution is to quote the string.
>
> Which makes the file bigger and isn't necessary
> when tabs and pipes are used as delimiters.
>
> > Being such, you can just go ahead and quote all strings.  So you would want
> > the output to be:
>
> > "friendid","Female","23 years old","Los Gatos","United States"
> > "friendid","Male","24 years old","San Francisco, California","United States"
>
> Which I would do if I had a specification that
> demanded it or was making files for others. For my
> own use, I wouldn't bother as it's unnecessary work.
>
>
>
> > Numbers should not be quoted if you wish to treat them as numeric and not
> > text.
>
> A good reason not to use quotes at all. Besides which,
> Excel can handle that also.

Thanks for all your posts guys.
mensanator's was the most helpful, and I only ended up needing to use
a few lines from that code.
The only question that remains for me--and this is just for my
knowledge-- what does the "if i" mean in this code snippet?
f = [i.split() for i in d if i]
How is it helpful to leave a dangling "if i"? Why not just f =
[i.split() for i in d]?

And yes John, this was indeed a "homework question." It was for my
daughter's preschool. You are going to help her ace her beginner
Python class! (No, this was not a homework question).




More information about the Python-list mailing list