[Tutor] splits and pops

Sat Jul 12 18:19:57 CEST 2008

Hi Eric,

I tried following and it seems to work:

fullstring = """l1r1	ll1r2	l1r3	l1
r4	l1r5
l2r1	l2r3	l3
r3	l2r4	l2r5
l3r1	l3r2	l3r3	l3r4	l3r5
"""
# This should be a string like your's. "\t"-seperated columns,  
"\n"-seperated
# rows, with "\n" in some columns.

rowlength = 5
# for you it would be 9, but I was lazy when I wrote the string

prefetch = ""
lines = []
i = 0
for tab in fullstring.split("\t"):
	if i < rowlength-1:	#i.e. working on all but the last column
		# offtopic: is the last comment correct English?
		prefetch += tab + "\t" # +"\t" because split removes the tab
		i += 1
	else: # last column
		prefetch += tab[:tab.find("\n")]
		lines.append(prefetch)
		prefetch = tab[(tab.find("\n")+2):] #adding the first column without the  
"\n"
		i = 1 #since we already added the first column

# End

After that "print lines" produces following output:
['l1r1\tll1r2\tl1r3\tl1\nr4\tl1r5', '2r1l2r3\tl3\nr3\tl2r4\tl2r5',  
'3r1l3r2\tl3r3\tl3r4\tl3r5']
So you've got a list of the lines. Instead of Strings you could also use
lists, by making prefetch a list and instead of adding the tabs, appending  
it.

However, I assumed that the new row is seperated by the first linebreak.
If that's not the case, I think that you have to check for multiple  
linebreaks
and if that's true, choose manually which one to select.

Hope this helps,

Marcel

> I have a horribly stupid text parsing problem that is driving me crazy,  
> and making me think my Python skills have a long, long way to go...
>
> What I've got is a poorly-though-out SQL dump, in the form of a text  
> file, where each record is separated by a newline, and each field in  
> each record is separated by a tab. BUT, and this is what sinks me, there  
> are also newlines within some of the fields. Newlines are not 'safe' –  
> they could appear anywhere – but tabs are 'safe' – they only appear as  
> field delimiters.
>
> There are nine fields per record. All I can think to do is read the file  
> in as a string, then split on tabs. That gives me a list where every  
> eighth item is a string like this: u'last-field\nfirst-field'. Now I  
> want to iterate through the list of strings, taking every eighth item,  
> splitting it on '\n', and replacing it with the two resulting strings.  
> Then I'll have the proper flat list where every nine list items  
> constitutes one complete record, and I'm good to go from there.
>
> I've been fooling around with variations on the following (assuming  
> splitlist = fullstring.split('\t')):
>
> for x in xrange(8, sys.maxint, 8):
>      try:
>          splitlist[x:x] = splitlist.pop(x).split('\n')
>      except IndexError:
>          break
>
> The first line correctly steps over all the list items that need to be  
> split, but I can't come up with a line that correctly replaces those  
> list items with the two strings I want. Either the cycle goes off and  
> splits the wrong strings, or I get nested list items, which is not what  
> I want. Can someone please point me in the right direction here?
>
> Thanks,
> Eric
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor