[Tutor] splits and pops

Eric Abrahamsen eric at ericabrahamsen.net
Sat Jul 12 14:55:02 CEST 2008


I have a horribly stupid text parsing problem that is driving me  
crazy, and making me think my Python skills have a long, long way to  
go...

What I've got is a poorly-though-out SQL dump, in the form of a text  
file, where each record is separated by a newline, and each field in  
each record is separated by a tab. BUT, and this is what sinks me,  
there are also newlines within some of the fields. Newlines are not  
'safe' – they could appear anywhere – but tabs are 'safe' – they only  
appear as field delimiters.

There are nine fields per record. All I can think to do is read the  
file in as a string, then split on tabs. That gives me a list where  
every eighth item is a string like this: u'last-field\nfirst-field'.  
Now I want to iterate through the list of strings, taking every eighth  
item, splitting it on '\n', and replacing it with the two resulting  
strings. Then I'll have the proper flat list where every nine list  
items constitutes one complete record, and I'm good to go from there.

I've been fooling around with variations on the following (assuming  
splitlist = fullstring.split('\t')):

for x in xrange(8, sys.maxint, 8):
     try:
         splitlist[x:x] = splitlist.pop(x).split('\n')
     except IndexError:
         break

The first line correctly steps over all the list items that need to be  
split, but I can't come up with a line that correctly replaces those  
list items with the two strings I want. Either the cycle goes off and  
splits the wrong strings, or I get nested list items, which is not  
what I want. Can someone please point me in the right direction here?

Thanks,
Eric


More information about the Tutor mailing list