[Tutor] Find files without __doc__ strings
David
david at abbottdavid.com
Sun May 17 17:37:25 CEST 2009
spir wrote:
> Le Sat, 16 May 2009 21:46:02 -0400,
> David <david at abbottdavid.com> s'exprima ainsi:
>
>> I am doing an exercise in Wesley Chun's book. Find files in the standard
>> library modules that have doc strings. Then find the ones that don't,
>> "the shame list". I came up with this to find the ones with;
>> #!/usr/bin/python
>> import os
>> import glob
>> import fileinput
>> import re
>>
>> pypath = "/usr/lib/python2.6/"
>> fnames = glob.glob(os.path.join(pypath, '*.py'))
>>
>> def read_doc():
>> pattern = re.compile('"""*\w')
>> for line in fileinput.input(fnames):
>> if pattern.match(line):
>> print 'Doc String Found: ', fileinput.filename(), line
>>
>> read_doc()
>
> It seems to me that your approach is moderately wrong ;-)
>
>> There must have been an easier way :)
>
> Not sure. As I see it the problem is slightly more complicated. A module doc is any triple-quoted string placed before any code. But it must be closed, too.
> You'll have to skip blank and comment lines, then check whether the rest matches a docstring. It could be done with a single complicated pattern, but you could also go for it step by step.
> Say I have a file 'dummysource.py' with the following text:
> ==============
> # !/usr/bin/env python
> # coding: utf8
>
> # comment
> # ''' """
>
> ''' foo module
> doc
> '''
> def foofunc():
> ''' foofuncdoc '''
> pass
> ==============
>
> Then, the following doc-testing code
> ==============
> import re
> doc = re.compile(r'(""".+?""")|(\'\'\'.+?\'\'\')', re.DOTALL)
>
> def checkDoc(sourceFileName):
> sourceFile = file(sourceFileName, 'r')
> # move until first 'code' line
> while True:
> line = sourceFile.readline()
> strip_line = line.strip()
> print "|%s|" % strip_line
> if (strip_line != '') and (not strip_line.startswith('#')):
> break
> # check doc (keep last line read!)
> source = line + sourceFile.read()
> result = doc.match(source)
> if result is not None:
> print "*** %s *******" % sourceFileName
> print result.group()
> return True
> else:
> return False
>
> sourceFile = file("dummysource.py",'r')
> print checkDoc(sourceFile)
> ==============
>
> will output:
>
> ==============
> |# !/usr/bin/env python|
> |# coding: utf8|
> ||
> |# comment|
> |# ''' """|
> ||
> |''' foo module|
> *** dummysource.py *******
> ''' foo module
> doc
> '''
> True
> ==============
>
> It's just for illustration; you can probably make things simpler or find a better way.
>
>> Now I have a problem, I can not figure out how to compare the fnames
>> with the result fileinput.filename() and get a list of any that don,t
>> have doc strings.
>
> You can use a func like the above one to filter out (or in) files that answer yes/no to the test.
> I would start with a list of all files, and just populate 2 new lists for "shame" and "fame" files ;-) according to the result of the test.
>
> You could use list comprehension syntax, too:
> fameFileNames = [fileName for fileName in fileNames if checkDoc(fileName)]
> But if you do this for shame files too, then every file gets tested twice.
>
>> thanks
>
> Denis
> ------
> la vita e estrany
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>
Thanks Denis,
This seems to work OK;
#!/usr/bin/python
import os
import glob
import fileinput
import re
pypath = "/usr/lib/python2.6/"
fnames = glob.glob(os.path.join(pypath, '*.py'))
fnames.sort()
goodFiles = []
def shame_list():
pattern = re.compile(r'(^""")|(^\'\'\')', re.DOTALL)
for line in fileinput.input(fnames):
if pattern.match(line):
found = fileinput.filename()
goodFiles.append(found)
goodFiles.sort()
for item in fnames:
if item in goodFiles:
fnames.remove(item)
print 'Shame List: \n', fnames
shame_list()
<returns>
Shame List:
['/usr/lib/python2.6/__phello__.foo.py',
'/usr/lib/python2.6/collections.py', '/usr/lib/python2.6/md5.py',
'/usr/lib/python2.6/pydoc_topics.py', '/usr/lib/python2.6/sha.py',
'/usr/lib/python2.6/struct.py', '/usr/lib/python2.6/this.py']
--
Powered by Gentoo GNU/Linux
http://linuxcrazy.com
More information about the Tutor
mailing list