[Tutor] removing extra whitespace from all *.txt files in a directory

Randolph MacKenzie rmack@eznet.net
Tue, 23 Jan 2001 13:04:52 -0500


This is a multi-part message in MIME format.

------=_NextPart_000_001F_01C0853D.131C3990
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

# X2space.py is a Python module for removing extra whitespace from all =
*.txt or *.html files in a directory. I explain what was needed, and =
Fredrik Lundh and Greg Jorgenson independently submitted these =
solutions:=20



-------------------------------------------------------------------------=
-------

Fredrik Lundh: You probably want the "inplace" flag to fileinput.input =
instead:

Optional in-place filtering: if the keyword argument inplace=3D1 is =
passed to input() or to the FileInput constructor, the file is moved to =
a backup file and standard output is directed to the input file. This =
makes it possible to write a filter that rewrites its input file in =
place.

target =3D "E:\\text\\"

import glob, fileinput, os, string,
filelist =3D glob.glob(target+"*.TXT");

#Fredrik Lundh Python code:

# get a list of files
files =3D glob.glob("e:/somewhere/*.txt")
if not files:
    sys.exit("nothing to convert")

# process all lines in those files
for line in fileinput.input(files, inplace=3D1, backup=3D".bak"):
    print string.join(string.split(line))

# End of Fredrik Lundh's excellent Python code)

# The original files are saved as .bak and may be or deleted.

os.system('del ' +target +'*.bak')


-------------------------------------------------------------------------=
-------

# Greg Jorgenson: tempfile.TemporaryFile() creates a new temporary file =
and opens it for you; the value you get back is a file object ready for =
writing, so you don't need to open it. But since you need the temporary =
file name so you can rename it, you need to use the mktemp() method =
instead:

import glob, fileinput, os, tempfile, string

files =3D glob.glob(r'e:\text\*.txt')

for filename in files:
    tempname =3D tempfile.mktemp()
    temp =3D open(tempname, 'w')
    for s in fileinput.input(filename):
        temp.write(string.strip(string.join(string.split(s))))
    temp.close()
    os.remove(filename)
    os.rename(tempname, filename)


-------------------------------------------------------------------------=
-------

Further development: Need to pass parameters for various dirs and file =
types


------=_NextPart_000_001F_01C0853D.131C3990
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 5.50.4611.1300" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>
<P># X2space.py is a Python module for removing extra whitespace from =
all *.txt=20
or *.html files in a directory. I explain what was needed, and Fredrik =
Lundh and=20
Greg Jorgenson independently submitted these solutions:=20
<P>
<HR>

<P>Fredrik Lundh: <U>You probably want the "inplace" flag to =
fileinput.input=20
instead</U>:</P>
<P><B>Optional in-place filtering</B>: if the keyword argument =
inplace=3D1 is=20
passed to input() or to the FileInput constructor, the file is moved to =
a backup=20
file and standard output is directed to the input file. This makes it =
possible=20
to write a filter that rewrites its input file in place.</P><PRE>target =
=3D "E:\\text\\"

import glob, fileinput, os, string,
filelist =3D glob.glob(target+"*.TXT");

#Fredrik Lundh Python code:

# get a list of files
files =3D glob.glob("e:/somewhere/*.txt")
if not files:
    sys.exit("nothing to convert")

# process all lines in those files
for line in fileinput.input(files, inplace=3D1, backup=3D".bak"):
    print string.join(string.split(line))

# End of Fredrik Lundh's excellent Python code)

# The original files are saved as .bak and may be or deleted.

os.system('del ' +target +'*.bak')
</PRE>
<HR>

<P># Greg Jorgenson: <B>tempfile.TemporaryFile()</B> creates a new =
temporary=20
file and opens it for you; the value you get back is a file object ready =
for=20
writing, so you don't need to open it. But since you need the temporary =
file=20
name so you can rename it, you need to use the mktemp() method =
instead:</P><PRE>import glob, fileinput, os, tempfile, string

files =3D glob.glob(r'e:\text\*.txt')

for filename in files:
    tempname =3D tempfile.mktemp()
    temp =3D open(tempname, 'w')
    for s in fileinput.input(filename):
        temp.write(string.strip(string.join(string.split(s))))
    temp.close()
    os.remove(filename)
    os.rename(tempname, filename)
</PRE>
<HR>

<P>Further development: Need to pass parameters for various dirs and =
file=20
types</P></FONT></DIV></BODY></HTML>

------=_NextPart_000_001F_01C0853D.131C3990--