text file reformatting
cbrown at cbrownsystems.com
cbrown at cbrownsystems.com
Mon Nov 1 12:50:16 EDT 2010
On Nov 1, 1:58 am, iwawi <iwawi... at gmail.com> wrote:
> On 1 marras, 09:59, "cbr... at cbrownsystems.com"
>
>
>
> <cbr... at cbrownsystems.com> wrote:
> > On Oct 31, 11:46 pm, iwawi <iwawi... at gmail.com> wrote:
>
> > > On 31 loka, 21:48, Tim Chase <python.l... at tim.thechases.com> wrote:
>
> > > > > PRJ01001 4 00100END
> > > > > PRJ01002 3 00110END
>
> > > > > I would like to pick only some columns to a new file and put them to a
> > > > > certain places (to match previous data) - definition file (def.csv)
> > > > > could be something like this:
>
> > > > > VARIABLE FIELDSTARTS FIELD SIZE NEW PLACE IN NEW DATA FILE
> > > > > ProjID ; 1 ; 5 ; 1
> > > > > CaseID ; 6 ; 3 ; 10
> > > > > UselessV ; 10 ; 1 ;
> > > > > Zipcode ; 12 ; 5 ; 15
>
> > > > > So the new datafile should look like this:
>
> > > > > PRJ01 001 00100END
> > > > > PRJ01 002 00110END
>
> > > > How flexible is the def.csv format? The difficulty I see with
> > > > your def.csv format is that it leaves undefined gaps (presumably
> > > > to be filled in with spaces) and that you also have a blank "new
> > > > place in new file" value. If instead, you could specify the
> > > > width to which you want to pad it and omit variables you don't
> > > > want in the output, ordering the variables in the same order you
> > > > want them in the output:
>
> > > > Variable; Start; Size; Width
> > > > ProjID; 1; 5; 10
> > > > CaseID; 6; 3; 10
> > > > Zipcode; 12; 5; 5
> > > > End; 16; 3; 3
>
> > > > (note that I lazily use the same method to copy the END from the
> > > > source to the destination, rather than coding specially for it)
> > > > you could do something like this (untested)
>
> > > > import csv
> > > > f = file('def.csv', 'rb')
> > > > f.next() # discard the header row
> > > > r = csv.reader(f, delimiter=';')
> > > > fields = [
> > > > (varname, slice(int(start), int(start)+int(size)), width)
> > > > for varname, start, size, width
> > > > in r
> > > > ]
> > > > f.close()
> > > > out = file('out.txt', 'w')
> > > > try:
> > > > for row in file('data.txt'):
> > > > for varname, slc, width in fields:
> > > > out.write(row[slc].ljust(width))
> > > > out.write('\n')
> > > > finally:
> > > > out.close()
>
> > > > Hope that's fairly easy to follow and makes sense. There might
> > > > be some fence-posting errors (particularly your use of "1" as the
> > > > initial offset, while python uses "0" as the initial offset for
> > > > strings)
>
> > > > If you can't modify the def.csv format, then things are a bit
> > > > more complex and I'd almost be tempted to write a script to try
> > > > and convert your existing def.csv format into something simpler
> > > > to process like what I describe.
>
> > > > -tkc- Piilota siteerattu teksti -
>
> > > > - Näytä siteerattu teksti -
>
> > > Hi,
>
> > > Thanks for your reply.
>
> > > Def.csv could be modified so that every line has the same structure:
> > > variable name, field start, field size and new place and would be
> > > separated with semicolomns as you mentioned.
>
> > > I tried your script (which seems quite logical) but I get this
>
> > > Traceback (most recent call last):
> > > File "testing.py", line 16, in <module>
> > > out.write (row[slc].ljust(width))
> > > TypeError: an integer is required
>
> > > Yes - you said it was untested, but I can't figure out how to
> > > proceed...
>
> > The line
>
> > (varname, slice(int(start), int(start)+int(size)), width)
>
> > should instead be
>
> > (varname, slice(int(start), int(start)+int(size)), int(width))
>
> > although you give an example where there is no width - what does that
> > imply? In the above case, it will throw an exception.
>
> > Anyway, I think you'll find there's something a bit off in the output
> > loop with the parameter passed to ljust() as well. The value given in
> > your csv seems to be the absolute position, but as it's implemented by
> > Tim, it acts as the relative position.
>
> > Given Tim's parsing into the list fields, I have a feeling that what
> > you really want instead of
>
> > for varname, slc, width in fields:
> > out.write(row[slc].ljust(width))
> > out.write('\n')
>
> > is to have
>
> > s = ''
> > for varname, slc, width in fields:
> > s += " "*(width - len(s)) + row[slc]
> > out.write(s+'\n')
>
> > And if that is what you want, then you will surely want to globally
> > replace the name 'width' with for example 'start_column', because then
> > it all makes sense :).
>
> > Cheers - Chas- Piilota siteerattu teksti -
>
> > - Näytä siteerattu teksti -
>
> Yes, it's meant to be the absolute column position in a new file like
> you said.
>
> I used your changes to the csv-reading cause it seems more flexible,
> but the end of the code is still not working. Here's were I stand now:
>
> import re
>
> parse_columns = re.compile(r'\s*;\s*')
>
> f = file('def.csv', 'rb')
> f.readline() # discard the header row
> r = (parse_columns.split(line.strip()) for line in f)
> fields = [
> (varname, slice(int(start), int(start)+int(size), int(width) if width
> else 0))
there's a misplaced parentheses; replace the above line (which yields
a 2-tuple of values) with:
(varname, slice(int(start), int(start)+int(size)), int(width) if
width else 0)
which yields a 3-tuple.
Cheers - Chas
> for varname, start, size, width in r
> ]
> f.close()
> print fields
>
> out = file('out.txt', 'w')
>
> try:
> for row in file('data.txt'):
> s = ' '
> for varname, slc, width in fields:
> s += " "*(width - len(s)) + row[slc]
> out.write(s+'\n')
> finally:
> out.close()
>
> When executed, I get this:
> File "toimi.py", line 20, in <module>
> for varname, slc, width in fields:
> ValueError: need more than 2 values to unpack
More information about the Python-list
mailing list