[Tutor] splits and pops

bob gailer bgailer at gmail.com
Sat Jul 12 15:44:06 CEST 2008


Eric Abrahamsen wrote:
> I have a horribly stupid text parsing problem that is driving me 
> crazy, and making me think my Python skills have a long, long way to 
> go...
>
> What I've got is a poorly-though-out SQL dump, in the form of a text 
> file, where each record is separated by a newline, and each field in 
> each record is separated by a tab. BUT, and this is what sinks me, 
> there are also newlines within some of the fields. Newlines are not 
> 'safe' – they could appear anywhere – but tabs are 'safe' – they only 
> appear as field delimiters.
>
> There are nine fields per record. All I can think to do is read the 
> file in as a string, then split on tabs. That gives me a list where 
> every eighth item is a string like this: u'last-field\nfirst-field'. 
> Now I want to iterate through the list of strings, taking every eighth 
> item, splitting it on '\n', and replacing it with the two resulting 
> strings. Then I'll have the proper flat list where every nine list 
> items constitutes one complete record, and I'm good to go from there.
>
> I've been fooling around with variations on the following (assuming 
> splitlist = fullstring.split('\t')):
>
> for x in xrange(8, sys.maxint, 8):
>     try:
>         splitlist[x:x] = splitlist.pop(x).split('\n')
>     except IndexError:
>         break
>
> The first line correctly steps over all the list items that need to be 
> split, but I can't come up with a line that correctly replaces those 
> list items with the two strings I want. Either the cycle goes off and 
> splits the wrong strings, or I get nested list items, which is not 
> what I want. Can someone please point me in the right direction here? 
I  tried a simple case with fullstring = 
"11\t12\t13\t\n14\t15\t16\t17\t18\t19\n21\t22\t23\t24\t25\t26\t27\t28\t29"
Your spec is a little vague "each field in each record is separated by a 
tab". I assumed that to mean "fields in each record are separated by tabs".
The result was ['11', '12', '13', '\n14', '15', '16', '17', '18', '19', 
'21', '22', '23', '24', '25', '26', '27', '28', '29']
which I had expected.

Give us an example of text for which it does not work.
>
>


-- 
Bob Gailer
919-636-4239 Chapel Hill, NC



More information about the Tutor mailing list