Why are my files in in my list - os module used with sys argv
Sayth Renshaw
flebber.crue at gmail.com
Tue Apr 19 09:21:22 EDT 2016
On Tuesday, 19 April 2016 18:17:02 UTC+10, Peter Otten wrote:
> Steven D'Aprano wrote:
>
> > On Tue, 19 Apr 2016 09:44 am, Sayth Renshaw wrote:
> >
> >> Hi
> >>
> >> Why would it be that my files are not being found in this script?
> >
> > You are calling the script with:
> >
> > python jqxml.py samples *.xml
> >
> > This does not do what you think it does: under Linux shells, the glob
> > *.xml will be expanded by the shell. Fortunately, in your case, you have
> > no files in the current directory matching the glob *.xml, so it is not
> > expanded and the arguments your script receives are:
> >
> >
> > "python jqxml.py" # not used
> >
> > "samples" # dir
> >
> > "*.xml" # mask
> >
> >
> > You then call:
> >
> > fileResult = filter(lambda x: x.endswith(mask), files)
> >
> > which looks for file names which end with a literal string (asterisk, dot,
> > x, m, l) in that order. You have no files that match that string.
> >
> > At the shell prompt, enter this:
> >
> > touch samples/junk\*.xml
> >
> > and run the script again, and you should see that it now matches one file.
> >
> > Instead, what you should do is:
> >
> >
> > (1) Use the glob module:
> >
> > https://docs.python.org/2/library/glob.html
> > https://docs.python.org/3/library/glob.html
> >
> > https://pymotw.com/2/glob/
> > https://pymotw.com/3/glob/
> >
> >
> > (2) When calling the script, avoid the shell expanding wildcards by
> > escaping them or quoting them:
> >
> > python jqxml.py samples "*.xml"
>
> (3) *Use* the expansion mechanism provided by the shell instead of fighting
> it:
>
> $ python jqxml.py samples/*.xml
>
> This requires that you change your script
>
> from pyquery import PyQuery as pq
> import pandas as pd
> import sys
>
> fileResult = sys.argv[1:]
>
> if not fileResult:
> print("no files specified")
> sys.exit(1)
>
> for file in fileResult:
> print(file)
>
> for items in fileResult:
> try:
> d = pq(filename=items)
> except FileNotFoundError as e:
> print(e)
> continue
> res = d('nomination')
> # you could move the attrs definition before the loop
> attrs = ('id', 'horse')
> # probably a bug: you are overwriting data on every iteration
> data = [[res.eq(i).attr(x) for x in attrs] for i in range(len(res))]
>
> I think this is the most natural approach if you are willing to accept the
> quirk that the script tries to process the file 'samples/*.xml' if the
> samples directory doesn't contain any files with the .xml suffix. Common
> shell tools work that way:
>
> $ ls samples/*.xml
> samples/1.xml samples/2.xml samples/3.xml
> $ ls samples/*.XML
> ls: cannot access samples/*.XML: No such file or directory
>
> Unrelated: instead of working with sys.argv directly you could use argparse
> which is part of the standard library. The code to get at least one file is
>
> import argparse
>
> parser = argparse.ArgumentParser()
> parser.add_argument("files", nargs="+")
> args = parser.parse_args()
>
> print(args.files)
>
> Note that this doesn't fix the shell expansion oddity.
Hi
Thanks for the insight, after doing a little reading I found this post which uses both argparse and glob and attempts to cover the windows and bash expansion of wildcards, http://breathmintsforpenguins.blogspot.com.au/2013/09/python-crossplatform-handling-of.html
import argparse
from glob import glob
def main(file_names):
print file_names
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("file_names", nargs='*')
#nargs='*' tells it to combine all positional arguments into a single list
args = parser.parse_args()
file_names = list()
#go through all of the arguments and replace ones with wildcards with the expansion
#if a string does not contain a wildcard, glob will return it as is.
for arg in args.file_names:
file_names += glob(arg)
main(file_names)
And way beyond my needs for such a tiny script but I think tis is the flask developers python cli creation package Click http://click.pocoo.org/5/why/#why-not-argparse based of optparse.
> # probably a bug: you are overwriting data on every iteration
> data = [[res.eq(i).attr(x) for x in attrs] for i in range(len(res))]
Thanks for picking this up will have to append to it on each iteration for each attribute.
Thank You
Sayth
More information about the Python-list
mailing list