[Tutor] splits and pops
Marcel Wunderlich
deaddy at gmx.de
Sat Jul 12 18:19:57 CEST 2008
Hi Eric,
I tried following and it seems to work:
fullstring = """l1r1 ll1r2 l1r3 l1
r4 l1r5
l2r1 l2r3 l3
r3 l2r4 l2r5
l3r1 l3r2 l3r3 l3r4 l3r5
"""
# This should be a string like your's. "\t"-seperated columns,
"\n"-seperated
# rows, with "\n" in some columns.
rowlength = 5
# for you it would be 9, but I was lazy when I wrote the string
prefetch = ""
lines = []
i = 0
for tab in fullstring.split("\t"):
if i < rowlength-1: #i.e. working on all but the last column
# offtopic: is the last comment correct English?
prefetch += tab + "\t" # +"\t" because split removes the tab
i += 1
else: # last column
prefetch += tab[:tab.find("\n")]
lines.append(prefetch)
prefetch = tab[(tab.find("\n")+2):] #adding the first column without the
"\n"
i = 1 #since we already added the first column
# End
After that "print lines" produces following output:
['l1r1\tll1r2\tl1r3\tl1\nr4\tl1r5', '2r1l2r3\tl3\nr3\tl2r4\tl2r5',
'3r1l3r2\tl3r3\tl3r4\tl3r5']
So you've got a list of the lines. Instead of Strings you could also use
lists, by making prefetch a list and instead of adding the tabs, appending
it.
However, I assumed that the new row is seperated by the first linebreak.
If that's not the case, I think that you have to check for multiple
linebreaks
and if that's true, choose manually which one to select.
Hope this helps,
Marcel
> I have a horribly stupid text parsing problem that is driving me crazy,
> and making me think my Python skills have a long, long way to go...
>
> What I've got is a poorly-though-out SQL dump, in the form of a text
> file, where each record is separated by a newline, and each field in
> each record is separated by a tab. BUT, and this is what sinks me, there
> are also newlines within some of the fields. Newlines are not 'safe' –
> they could appear anywhere – but tabs are 'safe' – they only appear as
> field delimiters.
>
> There are nine fields per record. All I can think to do is read the file
> in as a string, then split on tabs. That gives me a list where every
> eighth item is a string like this: u'last-field\nfirst-field'. Now I
> want to iterate through the list of strings, taking every eighth item,
> splitting it on '\n', and replacing it with the two resulting strings.
> Then I'll have the proper flat list where every nine list items
> constitutes one complete record, and I'm good to go from there.
>
> I've been fooling around with variations on the following (assuming
> splitlist = fullstring.split('\t')):
>
> for x in xrange(8, sys.maxint, 8):
> try:
> splitlist[x:x] = splitlist.pop(x).split('\n')
> except IndexError:
> break
>
> The first line correctly steps over all the list items that need to be
> split, but I can't come up with a line that correctly replaces those
> list items with the two strings I want. Either the cycle goes off and
> splits the wrong strings, or I get nested list items, which is not what
> I want. Can someone please point me in the right direction here?
>
> Thanks,
> Eric
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
More information about the Tutor
mailing list