A smarter shutil.copytree ?
Hi, shutil.copytree is very convenient to make recursive copies, but os.walk has to be used everytime some filtering has to be done on the files copied., if you want to avoid copying some files. The code pattern with os.walk is pretty talkative : --------------------- copying a source folder to a target folder, but the pyc/pyo files os.mkdir(target) for root, dirs, filenames in os.walk(source): root_target = root.replace(source, target) for dir_ in dirs: target_dir = join(root_target, dir_) if os.path.exists(target_dir): continue os.mkdir(target_dir) for filename in filenames: filename_source = join(root, filename) filename_target = join(root_target, filename) if (os.path.exists(filename_target) or os.path.splitext(filename) in ('.pyc', '.pyo')): continue shutil.copyfile(filename_source, filename_target) -------------------- If we could provide a callable argument to shutil.copytree, this would allow simplifying a lot the code: --------------------- copying a source to a target, but the pyc/pyo file def filtering(source, target): return os.path.splitext(filename) not in ('.pyc', '.pyo') shutil.copytree(source, target, filter_=filtering) --------------------- This is a very current pattern in my code, and I think this could be an interesting enhancement to shutil. So if people think it is a good idea, I can work on a patch and submit it to the tracker. Regards Tarek -- Tarek Ziadé | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/
--------------------- copying a source to a target, but the pyc/pyo file def filtering(source, target): return os.path.splitext(filename) not in ('.pyc', '.pyo')
shutil.copytree(source, target, filter_=filtering) ---------------------
oups, made a mistake in my example: def filtering(source_path, target_path): return os.path.splitext(source_path) not in ('.pyc', '.pyo') shutil.copytree(source, target, filter_=filtering)
Sounds like a neat little feature. Looking forward to it. Maybe the most useful use case would be to provide glob-style patterns for skipping files or directories (and their contents). --Guido On Thu, Apr 17, 2008 at 9:52 AM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
Hi,
shutil.copytree is very convenient to make recursive copies, but os.walk has to be used everytime some filtering has to be done on the files copied., if you want to avoid copying some files.
The code pattern with os.walk is pretty talkative :
--------------------- copying a source folder to a target folder, but the pyc/pyo files os.mkdir(target) for root, dirs, filenames in os.walk(source): root_target = root.replace(source, target) for dir_ in dirs: target_dir = join(root_target, dir_) if os.path.exists(target_dir): continue os.mkdir(target_dir) for filename in filenames: filename_source = join(root, filename) filename_target = join(root_target, filename) if (os.path.exists(filename_target) or os.path.splitext(filename) in ('.pyc', '.pyo')): continue shutil.copyfile(filename_source, filename_target) --------------------
If we could provide a callable argument to shutil.copytree, this would allow simplifying a lot the code:
--------------------- copying a source to a target, but the pyc/pyo file def filtering(source, target): return os.path.splitext(filename) not in ('.pyc', '.pyo')
shutil.copytree(source, target, filter_=filtering) ---------------------
This is a very current pattern in my code, and I think this could be an interesting enhancement to shutil. So if people think it is a good idea, I can work on a patch and submit it to the tracker.
Regards
Tarek
-- Tarek Ziadé | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/ _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
On 17/04/2008, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
Hi,
shutil.copytree is very convenient to make recursive copies, but os.walk has to be used everytime some filtering has to be done on the files copied., if you want to avoid copying some files.
The code pattern with os.walk is pretty talkative :
--------------------- copying a source folder to a target folder, but the pyc/pyo files os.mkdir(target) for root, dirs, filenames in os.walk(source): root_target = root.replace(source, target) for dir_ in dirs: target_dir = join(root_target, dir_) if os.path.exists(target_dir): continue os.mkdir(target_dir) for filename in filenames: filename_source = join(root, filename) filename_target = join(root_target, filename) if (os.path.exists(filename_target) or os.path.splitext(filename) in ('.pyc', '.pyo')): continue shutil.copyfile(filename_source, filename_target) --------------------
If we could provide a callable argument to shutil.copytree, this would allow simplifying a lot the code:
--------------------- copying a source to a target, but the pyc/pyo file def filtering(source, target): return os.path.splitext(filename) not in ('.pyc', '.pyo')
shutil.copytree(source, target, filter_=filtering) ---------------------
This is a very current pattern in my code, and I think this could be an interesting enhancement to shutil. So if people think it is a good idea, I can work on a patch and submit it to the tracker.
I also think this is a good idea; I recently was forced to copy-paste and modify shutil.copytree into a project because of this limitation. Regards
Tarek
--
Tarek Ziadé | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/ _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/gjcarneiro%40gmail.com
-- Gustavo J. A. M. Carneiro INESC Porto, Telecommunications and Multimedia Unit "The universe is always one step beyond logic." -- Frank Herbert
On Thu, Apr 17, 2008 at 7:06 PM, Guido van Rossum <guido@python.org> wrote:
Sounds like a neat little feature. Looking forward to it. Maybe the most useful use case would be to provide glob-style patterns for skipping files or directories (and their contents).
Alright I will work on it that way, thanks for the advice Tarek
I have submitted a patch for review here: http://bugs.python.org/issue2663 glob-style patterns or a callable (for complex cases) can be provided to filter out files or directories. Regards Tarek
On Sun, Apr 20, 2008 at 4:15 PM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
I have submitted a patch for review here: http://bugs.python.org/issue2663
glob-style patterns or a callable (for complex cases) can be provided to filter out files or directories.
I'm not a big fan of the sequence-or-callable argument. Why not just make it a callable argument, and supply a utility function so that you can write something like:: exclude_func = shutil.excluding_patterns('*.tmp', 'test_dir2') shutil.copytree(src_dir, dst_dir, exclude=exclude_func) ? Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy
On Sun, 20 Apr 2008, Steven Bethard wrote:
On Sun, Apr 20, 2008 at 4:15 PM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
I have submitted a patch for review here: http://bugs.python.org/issue2663
glob-style patterns or a callable (for complex cases) can be provided to filter out files or directories.
I'm not a big fan of the sequence-or-callable argument. Why not just make it a callable argument, and supply a utility function so that you can write something like::
exclude_func = shutil.excluding_patterns('*.tmp', 'test_dir2') shutil.copytree(src_dir, dst_dir, exclude=exclude_func)
Even if a glob pattern filter is considered useful enough to be worth special-casing, the glob capability should also be exposed via something like your excluding_patterns constructor and additionally as a function that can be called by another function intended for use as a callable argument. If it is not, then doing something like "files matching these glob patterns except for those matching this non-glob-expressible condition and also those files matching this second non-glob-expressible condition" becomes painful because the glob part essentially needs to be re-implemented. Isaac Morland CSCF Web Guru DC 2554C, x36650 WWW Software Specialist
The pattern matching uses the src_dir to call glob.glob(), which returns the list of files to be excluded. That's why I added within the copytree() function. To make an excluding_patterns work, it could be coded like this:: def excluding_patterns(*patterns): def _excluding_patterns(filepath): exclude_files = [] dir_ = os.path.dirname(filepath) for pattern in patterns: pattern = os.path.join(dir_, pattern) exclude_files.extend(glob.glob(pattern)) return path in exclude_files return _excluding_patterns But I can see some performance issues, as the glob function will be called within the loop to test each file or folder:: def copytree(src, dst, exclude): ... for name in names: srcname = os.path.join(src, name) if exclude(srcname): continue ... ... Adding it at the beginning of the `copytree` function would then be better for performance, but means that the callable has to return a list of matching files instead of the match result itself:: def excluding_patterns(*patterns): def _excluding_patterns(path): exclude_files = [] for pattern in patterns: pattern = os.path.join(dir_, pattern) exclude_files.extend(glob.glob(pattern)) return exclude_files Then in copytree:: def copytree(src, dst, exclude): ... excluded = exclude(src) ... for name in names: srcname = os.path.join(src, name) if srcname in excluded: continue ... ... But this means that people that wants to implement their own callable will have to provide a function that returns a list of excluded files, therefore they won't be free to implement what they want. We could have two parameters, one for the glob-style sequence and one for the callable, to be able to use them at the appropriate places in the function, but I think this would make the function signature rather heavy:: def copytree(src, dst, exclude_patterns=None, exclude_function=None): ... That's why I would be in favor of sequence-or-callable argument even if I admit that it is not the pretiest way to present an argument. Regards Tarek On Mon, Apr 21, 2008 at 2:38 AM, Isaac Morland <ijmorlan@cs.uwaterloo.ca> wrote:
On Sun, 20 Apr 2008, Steven Bethard wrote:
On Sun, Apr 20, 2008 at 4:15 PM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
I have submitted a patch for review here: http://bugs.python.org/issue2663
glob-style patterns or a callable (for complex cases) can be provided to filter out files or directories.
I'm not a big fan of the sequence-or-callable argument. Why not just make it a callable argument, and supply a utility function so that you can write something like::
exclude_func = shutil.excluding_patterns('*.tmp', 'test_dir2') shutil.copytree(src_dir, dst_dir, exclude=exclude_func)
Even if a glob pattern filter is considered useful enough to be worth special-casing, the glob capability should also be exposed via something like your excluding_patterns constructor and additionally as a function that can be called by another function intended for use as a callable argument.
If it is not, then doing something like "files matching these glob patterns except for those matching this non-glob-expressible condition and also those files matching this second non-glob-expressible condition" becomes painful because the glob part essentially needs to be re-implemented.
Isaac Morland CSCF Web Guru DC 2554C, x36650 WWW Software Specialist
-- Tarek Ziadé | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/
On Sun, Apr 20, 2008 at 5:25 PM, Steven Bethard <steven.bethard@gmail.com> wrote:
On Sun, Apr 20, 2008 at 4:15 PM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
I have submitted a patch for review here: http://bugs.python.org/issue2663
glob-style patterns or a callable (for complex cases) can be provided to filter out files or directories.
I'm not a big fan of the sequence-or-callable argument. Why not just make it a callable argument, and supply a utility function so that you can write something like::
exclude_func = shutil.excluding_patterns('*.tmp', 'test_dir2') shutil.copytree(src_dir, dst_dir, exclude=exclude_func)
?
Agreed. Type testing is fraught with problems. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
On Mon, Apr 21, 2008 at 2:25 AM, Steven Bethard <steven.bethard@gmail.com> wrote:
On Sun, Apr 20, 2008 at 4:15 PM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
I have submitted a patch for review here: http://bugs.python.org/issue2663
glob-style patterns or a callable (for complex cases) can be provided to filter out files or directories.
I'm not a big fan of the sequence-or-callable argument. Why not just make it a callable argument, and supply a utility function so that you can write something like::
exclude_func = shutil.excluding_patterns('*.tmp', 'test_dir2') shutil.copytree(src_dir, dst_dir, exclude=exclude_func)
?
I made another draft based on a single callable argument to try out: http://bugs.python.org/file10073/shutil.copytree.filtering.patch The callable takes the src directory + its content as a list, and returns filter eligible for exclusion That makes me wonder, like Alexander said on the bug tracker: In the glob-style patterns callable, do we want to deal with absolute paths ? Tarek
On Tue, Apr 22, 2008 at 1:56 AM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
On Mon, Apr 21, 2008 at 2:25 AM, Steven Bethard <steven.bethard@gmail.com> wrote:
On Sun, Apr 20, 2008 at 4:15 PM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
I have submitted a patch for review here: http://bugs.python.org/issue2663
glob-style patterns or a callable (for complex cases) can be provided to filter out files or directories.
I'm not a big fan of the sequence-or-callable argument. Why not just make it a callable argument, and supply a utility function so that you can write something like::
exclude_func = shutil.excluding_patterns('*.tmp', 'test_dir2') shutil.copytree(src_dir, dst_dir, exclude=exclude_func)
I made another draft based on a single callable argument to try out: http://bugs.python.org/file10073/shutil.copytree.filtering.patch
The callable takes the src directory + its content as a list, and returns filter eligible for exclusion
FWIW, that looks better to me.
That makes me wonder, like Alexander said on the bug tracker: In the glob-style patterns callable, do we want to deal with absolute paths ?
I think that it would be okay to document that shutil.ignore_patterns() only accepts patterns matching individual filenames (not complex paths). If someone needs to do something with absolute paths, then they can write their own 'ignore' function, right? Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy
On Tue, Apr 22, 2008 at 7:04 PM, Steven Bethard <steven.bethard@gmail.com> wrote:
The callable takes the src directory + its content as a list, and returns filter eligible for exclusion
FWIW, that looks better to me.
That makes me wonder, like Alexander said on the bug tracker: In the glob-style patterns callable, do we want to deal with absolute paths ?
I think that it would be okay to document that shutil.ignore_patterns() only accepts patterns matching individual filenames (not complex paths). If someone needs to do something with absolute paths, then they can write their own 'ignore' function, right?
Yes, the patch has been changed and corrected by a few people (thanks), and so the doc, http://bugs.python.org/issue2663 So i guess it can be reviewed by a commiter at this stage Regards Tarek
Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy
-- Tarek Ziadé | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/
participants (6)
-
Greg Ewing
-
Guido van Rossum
-
Gustavo Carneiro
-
Isaac Morland
-
Steven Bethard
-
Tarek Ziadé