[Tutor] splits and pops
deaddy at gmx.de
Sat Jul 12 18:19:57 CEST 2008
I tried following and it seems to work:
fullstring = """l1r1 ll1r2 l1r3 l1
l2r1 l2r3 l3
r3 l2r4 l2r5
l3r1 l3r2 l3r3 l3r4 l3r5
# This should be a string like your's. "\t"-seperated columns,
# rows, with "\n" in some columns.
rowlength = 5
# for you it would be 9, but I was lazy when I wrote the string
prefetch = ""
lines = 
i = 0
for tab in fullstring.split("\t"):
if i < rowlength-1: #i.e. working on all but the last column
# offtopic: is the last comment correct English?
prefetch += tab + "\t" # +"\t" because split removes the tab
i += 1
else: # last column
prefetch += tab[:tab.find("\n")]
prefetch = tab[(tab.find("\n")+2):] #adding the first column without the
i = 1 #since we already added the first column
After that "print lines" produces following output:
So you've got a list of the lines. Instead of Strings you could also use
lists, by making prefetch a list and instead of adding the tabs, appending
However, I assumed that the new row is seperated by the first linebreak.
If that's not the case, I think that you have to check for multiple
and if that's true, choose manually which one to select.
Hope this helps,
> I have a horribly stupid text parsing problem that is driving me crazy,
> and making me think my Python skills have a long, long way to go...
> What I've got is a poorly-though-out SQL dump, in the form of a text
> file, where each record is separated by a newline, and each field in
> each record is separated by a tab. BUT, and this is what sinks me, there
> are also newlines within some of the fields. Newlines are not 'safe' –
> they could appear anywhere – but tabs are 'safe' – they only appear as
> field delimiters.
> There are nine fields per record. All I can think to do is read the file
> in as a string, then split on tabs. That gives me a list where every
> eighth item is a string like this: u'last-field\nfirst-field'. Now I
> want to iterate through the list of strings, taking every eighth item,
> splitting it on '\n', and replacing it with the two resulting strings.
> Then I'll have the proper flat list where every nine list items
> constitutes one complete record, and I'm good to go from there.
> I've been fooling around with variations on the following (assuming
> splitlist = fullstring.split('\t')):
> for x in xrange(8, sys.maxint, 8):
> splitlist[x:x] = splitlist.pop(x).split('\n')
> except IndexError:
> The first line correctly steps over all the list items that need to be
> split, but I can't come up with a line that correctly replaces those
> list items with the two strings I want. Either the cycle goes off and
> splits the wrong strings, or I get nested list items, which is not what
> I want. Can someone please point me in the right direction here?
> Tutor maillist - Tutor at python.org
More information about the Tutor