[Tutor] Creating one file out of all the files in a directory
Josep M. Fontana
josep.m.fontana at gmail.com
Sun Nov 14 18:27:41 CET 2010
Hi Evert,
Again, thanks a lot. Too bad you and Kushal don't live close. I would
like to invite you to a beer or a coffe or something.
<snip>
>>
>> ----------------- code--------
>> import os
>> path = '/Volumes/DATA/MyPath'
>> os.chdir(path)
>> file_names = glob.glob('*.txt')
>
> You don't use file_names any further. Depending on whether you want files from subdirectories or not, you can use the os.walk below or file_names.
You are totally right. Since I had used glob in another script before,
I tried to use it in this one as well before I tried with os.walk()
and then I left it there forgetting that with os.walk() it was no
longer necessary.
<snip>
>> for subdir, dirs, files in os.walk(path):
>> for file in files:
>> f = open(file, 'r')
>> text = f.readlines()
>
> Since you don't care about lines in your files, but just the entire file contents, you could also simply use
> data = f.read()
Yes. Thanks. This is much better. I knew this from having read a
chapter on working with files somewhere but I wound up using
.readlines() because in my search for a solution I saw some piece of
code that used it and got the results I wanted,
<snip>
> So close ;-).
> What you're missing is the next write statement:
> f.write(data)
>
> (or
> f.write(''.join(text))
> which shows why read() is nicer in this case: readlines() returns a list, not just a single string).
>
>> f.close()
>
> But actually, you can open and close the output file outside the entire loop; just name it differently (eg, before the first loop,
> outfile = open('outputfile', 'w')
OK, I ask you (or anybody reading this) the same question I asked
Kushal: why is it better to open the output file outside the entire
loop. I understand why it should be closed outside the loop but if you
see the code I came up with after your comments, I open the output
file inside the loop and the script still works perfectly well. Is
there any good reason to do it differently from the way I did it?
Here's what I did:
-----------------
import os
path = '/Volumes/myPath'
os.chdir(path)
for subdir, dirs, files in os.walk(path):
for filename in files:
if filename != '.DS_Store':
with open(filename, 'r') as f: #see tomboy note 'with statement'
data = f.read()
with open('/Volumes/myPath2/output.txt', 'a') as
output_file:
output_file.write('\n\n<file name="' +
filename + '">\n\n')
output_file.write(data)
output_file.write('\n\n</file>\n\n')
-----------------
I came up with this way of doing because I was trying to follow your
advice of using the 'with' statement and this was the first way that I
could think of to implement it. Since in the little test that I ran it
worked, I left it like that but I would like to know whether there is
a more elegant way to implement this so that I learn good habits.
> In this case, though, there's one thing to watch out for: glob or os.walk will pick up your newly (empty) created file, so you should either put the all-containg file in a different directory (best practice) or insert an if-statement to check whether file[name] != 'outputfile'
You'll have seen that I opted for the best practice but I still used
an if statement with file[name] != 'outputfile' in order to solve some
problems I was having with a hidden file created by Mac OSX
(.DS_Store). The output file contained some strange characters at the
beginning and it took me a while to figure out that this was caused by
the fact that the loop read the contents of the .DS_Store file.
> Finally, depending on the version of Python you're using, there are nice things you can do with the 'with' statement, which has an incredible advantage in case of file I/O errors (since you're not checking for any read errors).
> See eg http://effbot.org/zone/python-with-statement.htm (bottom part for example) or Google around.
Great advice. I took the opportunity to learn about the 'with'
statement and it will be very helpful in the future.
While they don't come up with virtual drinks that are realistic enough
to be enjoyable, I can only offer you my thanks for your time.
Josep M.
More information about the Tutor
mailing list