Python Equivalent for dd & fold
pdpi
pdpinheiro at gmail.com
Thu Jul 16 12:02:25 EDT 2009
On Jul 16, 3:12 pm, seldan24 <selda... at gmail.com> wrote:
> On Jul 15, 1:48 pm, Emile van Sebille <em... at fenx.com> wrote:
>
>
>
>
>
> > On 7/15/2009 10:23 AM MRAB said...
>
> > >> On Jul 15, 12:47 pm, Michiel Overtoom <mot... at xs4all.nl> wrote:
> > >>> seldan24 wrote:
> > >>>> what can I use as the equivalent for the Unix 'fold' command?
> > >>> def fold(s,len):
> > >>> while s:
> > >>> print s[:len]
> > >>> s=s[len:]
>
> > <snip>
> > > You might still need to tweak the above code as regards how line endings
> > > are handled.
>
> > You might also want to tweak it if the strings are _really_ long to
> > simply slice out the substrings as opposed to reassigning the balance to
> > a newly created s on each iteration.
>
> > Emile
>
> Thanks for all of the help. I'm almost there. I have it working now,
> but the 'fold' piece is very slow. When I use the 'fold' command in
> shell it is almost instantaneous. I was able to do the EBCDIC->ASCII
> conversion usng the decode method in the built-in str type. I didn't
> have to import the codecs module. I just decoded the data to cp037
> which works fine.
>
> So now, I'm left with a large file, consisting of one extremely long
> line of ASCII data that needs to be sliced up into 35 character
> lines. I did the following, which works but takes a very long time:
>
> f = open(ascii_file, 'w')
> while ascii_data:
> f.write(ascii_data[:len])
> ascii_data = ascii_data[len:]
> f.close()
>
> I know that Emile suggested that I can slice out the substrings rather
> than do the gradual trimming of the string variable as is being done
> by moving around the length. So, I'm going to give that a try... I'm
> a bit confused by what that means, am guessing that slice can break up
> a string based on characters; will research. Thanks for the help thus
> far. I'll post again when all is working fine.
Assuming your rather large text file is 1 meg long, you have 1 million
characters in there. 1000000/35 = ~29k lines. The size remaining
string decreases linearly, so the average size is (1000000 + 0) / 2 or
500k. All said and done, you're allocating and copying a 500K string
-- not once, but 29 thousand times. That's where your slowdown resides.
More information about the Python-list
mailing list