Python Equivalent for dd & fold

pdpi pdpinheiro at gmail.com
Thu Jul 16 12:02:25 EDT 2009


On Jul 16, 3:12 pm, seldan24 <selda... at gmail.com> wrote:
> On Jul 15, 1:48 pm, Emile van Sebille <em... at fenx.com> wrote:
>
>
>
>
>
> > On 7/15/2009 10:23 AM MRAB said...
>
> > >> On Jul 15, 12:47 pm, Michiel Overtoom <mot... at xs4all.nl> wrote:
> > >>> seldan24 wrote:
> > >>>> what can I use as the equivalent for the Unix 'fold' command?
> > >>> def fold(s,len):
> > >>>      while s:
> > >>>          print s[:len]
> > >>>          s=s[len:]
>
> > <snip>
> > > You might still need to tweak the above code as regards how line endings
> > > are handled.
>
> > You might also want to tweak it if the strings are _really_ long to
> > simply slice out the substrings as opposed to reassigning the balance to
> > a newly created s on each iteration.
>
> > Emile
>
> Thanks for all of the help.  I'm almost there.  I have it working now,
> but the 'fold' piece is very slow.  When I use the 'fold' command in
> shell it is almost instantaneous.  I was able to do the EBCDIC->ASCII
> conversion usng the decode method in the built-in str type.  I didn't
> have to import the codecs module.  I just decoded the data to cp037
> which works fine.
>
> So now, I'm left with a large file, consisting of one extremely long
> line of ASCII data that needs to be sliced up into 35 character
> lines.  I did the following, which works but takes a very long time:
>
> f = open(ascii_file, 'w')
> while ascii_data:
>     f.write(ascii_data[:len])
>     ascii_data = ascii_data[len:]
> f.close()
>
> I know that Emile suggested that I can slice out the substrings rather
> than do the gradual trimming of the string variable as is being done
> by moving around the length.  So, I'm going to give that a try... I'm
> a bit confused by what that means, am guessing that slice can break up
> a string based on characters; will research.  Thanks for the help thus
> far.  I'll post again when all is working fine.

Assuming your rather large text file is 1 meg long, you have 1 million
characters in there. 1000000/35 = ~29k lines. The size remaining
string decreases linearly, so the average size is (1000000 + 0) / 2 or
500k. All said and done, you're allocating and copying a 500K string
-- not once, but 29 thousand times. That's where your slowdown resides.



More information about the Python-list mailing list