[Tutor] Creating one file out of all the files in a directory
evert.rol at gmail.com
Sun Nov 14 18:42:18 CET 2010
> Again, thanks a lot. Too bad you and Kushal don't live close. I would
> like to invite you to a beer or a coffe or something.
Thanks for the offer. Some time ever in the far, far future perhaps ;-).
>> So close ;-).
>> What you're missing is the next write statement:
>> which shows why read() is nicer in this case: readlines() returns a list, not just a single string).
>> But actually, you can open and close the output file outside the entire loop; just name it differently (eg, before the first loop,
>> outfile = open('outputfile', 'w')
> OK, I ask you (or anybody reading this) the same question I asked
> Kushal: why is it better to open the output file outside the entire
> loop. I understand why it should be closed outside the loop but if you
> see the code I came up with after your comments, I open the output
> file inside the loop and the script still works perfectly well. Is
> there any good reason to do it differently from the way I did it?
Your code will work fine. My reason (and there may be others) to do it this way is just to avoid a marginal bit of overhead. Closing and opening the file each time will cost some extra time. So that's why I would have moved the with statement for the output_file outside both loops. Though as said, I guess that the overhead in general is very little.
> Here's what I did:
> import os
> path = '/Volumes/myPath'
> for subdir, dirs, files in os.walk(path):
> for filename in files:
> if filename != '.DS_Store':
> with open(filename, 'r') as f: #see tomboy note 'with statement'
> data = f.read()
> with open('/Volumes/myPath2/output.txt', 'a') as
> output_file.write('\n\n<file name="' +
> filename + '">\n\n')
> I came up with this way of doing because I was trying to follow your
> advice of using the 'with' statement and this was the first way that I
> could think of to implement it. Since in the little test that I ran it
> worked, I left it like that but I would like to know whether there is
> a more elegant way to implement this so that I learn good habits.
This is fine: 'with open(<filename>, <modifier>) as <stream>:' is, afaik, the standard way now to open a file in Python.
>> In this case, though, there's one thing to watch out for: glob or os.walk will pick up your newly (empty) created file, so you should either put the all-containg file in a different directory (best practice) or insert an if-statement to check whether file[name] != 'outputfile'
> You'll have seen that I opted for the best practice but I still used
> an if statement with file[name] != 'outputfile' in order to solve some
> problems I was having with a hidden file created by Mac OSX
> (.DS_Store). The output file contained some strange characters at the
> beginning and it took me a while to figure out that this was caused by
> the fact that the loop read the contents of the .DS_Store file.
Yes, there'll few ways to avoid separate filenames apart from an if statement.
More information about the Tutor