[Tutor] Find files without __doc__ strings

David david at abbottdavid.com
Sun May 17 17:37:25 CEST 2009


spir wrote:
> Le Sat, 16 May 2009 21:46:02 -0400,
> David <david at abbottdavid.com> s'exprima ainsi:
> 
>> I am doing an exercise in Wesley Chun's book. Find files in the standard 
>>   library modules that have doc strings. Then find the ones that don't, 
>> "the shame list". I came up with this to find the ones with;
>> #!/usr/bin/python
>> import os
>> import glob
>> import fileinput
>> import re
>>
>> pypath = "/usr/lib/python2.6/"
>> fnames = glob.glob(os.path.join(pypath, '*.py'))
>>
>> def read_doc():
>>      pattern = re.compile('"""*\w')
>>      for line in fileinput.input(fnames):
>>          if pattern.match(line):
>>              print 'Doc String Found: ', fileinput.filename(), line
>>
>> read_doc()
> 
> It seems to me that your approach is moderately wrong ;-)
> 
>> There must have been an easier way :)
> 
> Not sure. As I see it the problem is slightly more complicated. A module doc is any triple-quoted string placed before any code. But it must be closed, too.
> You'll have to skip blank and comment lines, then check whether the rest matches a docstring. It could be done with a single complicated pattern, but you could also go for it step by step.
> Say I have a file 'dummysource.py' with the following text:
> ==============
> # !/usr/bin/env python
> # coding: utf8
> 
> # comment
> # ''' """
> 
> ''' foo module
> 	doc
> 	'''
> def foofunc():
> 	''' foofuncdoc '''
> 	pass
> ==============
> 
> Then, the following doc-testing code
> ==============
> import re
> doc = re.compile(r'(""".+?""")|(\'\'\'.+?\'\'\')', re.DOTALL)
> 
> def checkDoc(sourceFileName):
>     sourceFile = file(sourceFileName, 'r')
>     # move until first 'code' line
>     while True:
>         line = sourceFile.readline()
>         strip_line = line.strip()
>         print "|%s|" % strip_line
>         if (strip_line != '') and (not strip_line.startswith('#')):
>             break
>     # check doc (keep last line read!)
>     source = line + sourceFile.read()
>     result = doc.match(source)
>     if result is not None:
>         print "*** %s *******" % sourceFileName
>         print result.group()
>         return True
>     else:
>         return False
> 
> sourceFile = file("dummysource.py",'r')
> print checkDoc(sourceFile)
> ==============
> 
> will output:
> 
> ==============
> |# !/usr/bin/env python|
> |# coding: utf8|
> ||
> |# comment|
> |# ''' """|
> ||
> |''' foo module|
> *** dummysource.py *******
> ''' foo module
> 	doc
> 	'''
> True
> ==============
> 
> It's just for illustration; you can probably make things simpler or find a better way.
> 
>> Now I have a problem, I can not figure out how to compare the fnames 
>> with the result fileinput.filename() and get a list of any that don,t 
>> have doc strings.
> 
> You can use a func like the above one to filter out (or in) files that answer yes/no to the test.
> I would start with a list of all files, and just populate 2 new lists for "shame" and "fame" files ;-) according to the result of the test.
> 
> You could use list comprehension syntax, too:
>     fameFileNames = [fileName for fileName in fileNames if checkDoc(fileName)]
> But if you do this for shame files too, then every file gets tested twice.
> 
>> thanks
> 
> Denis
> ------
> la vita e estrany
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
> 
> 
Thanks Denis,
This seems to work OK;
#!/usr/bin/python
import os
import glob
import fileinput
import re

pypath = "/usr/lib/python2.6/"
fnames = glob.glob(os.path.join(pypath, '*.py'))
fnames.sort()
goodFiles = []

def shame_list():
     pattern = re.compile(r'(^""")|(^\'\'\')', re.DOTALL)
     for line in fileinput.input(fnames):
         if pattern.match(line):
             found = fileinput.filename()
             goodFiles.append(found)
             goodFiles.sort()
             for item in fnames:
                 if item in goodFiles:
                     fnames.remove(item)
                     print 'Shame List: \n', fnames
shame_list()

<returns>

Shame List:
['/usr/lib/python2.6/__phello__.foo.py', 
'/usr/lib/python2.6/collections.py', '/usr/lib/python2.6/md5.py', 
'/usr/lib/python2.6/pydoc_topics.py', '/usr/lib/python2.6/sha.py', 
'/usr/lib/python2.6/struct.py', '/usr/lib/python2.6/this.py']


-- 
Powered by Gentoo GNU/Linux
http://linuxcrazy.com


More information about the Tutor mailing list