[Tutor] splits and pops
bob gailer
bgailer at gmail.com
Sat Jul 12 15:44:06 CEST 2008
Eric Abrahamsen wrote:
> I have a horribly stupid text parsing problem that is driving me
> crazy, and making me think my Python skills have a long, long way to
> go...
>
> What I've got is a poorly-though-out SQL dump, in the form of a text
> file, where each record is separated by a newline, and each field in
> each record is separated by a tab. BUT, and this is what sinks me,
> there are also newlines within some of the fields. Newlines are not
> 'safe' – they could appear anywhere – but tabs are 'safe' – they only
> appear as field delimiters.
>
> There are nine fields per record. All I can think to do is read the
> file in as a string, then split on tabs. That gives me a list where
> every eighth item is a string like this: u'last-field\nfirst-field'.
> Now I want to iterate through the list of strings, taking every eighth
> item, splitting it on '\n', and replacing it with the two resulting
> strings. Then I'll have the proper flat list where every nine list
> items constitutes one complete record, and I'm good to go from there.
>
> I've been fooling around with variations on the following (assuming
> splitlist = fullstring.split('\t')):
>
> for x in xrange(8, sys.maxint, 8):
> try:
> splitlist[x:x] = splitlist.pop(x).split('\n')
> except IndexError:
> break
>
> The first line correctly steps over all the list items that need to be
> split, but I can't come up with a line that correctly replaces those
> list items with the two strings I want. Either the cycle goes off and
> splits the wrong strings, or I get nested list items, which is not
> what I want. Can someone please point me in the right direction here?
I tried a simple case with fullstring =
"11\t12\t13\t\n14\t15\t16\t17\t18\t19\n21\t22\t23\t24\t25\t26\t27\t28\t29"
Your spec is a little vague "each field in each record is separated by a
tab". I assumed that to mean "fields in each record are separated by tabs".
The result was ['11', '12', '13', '\n14', '15', '16', '17', '18', '19',
'21', '22', '23', '24', '25', '26', '27', '28', '29']
which I had expected.
Give us an example of text for which it does not work.
>
>
--
Bob Gailer
919-636-4239 Chapel Hill, NC
More information about the Tutor
mailing list