[Python-Dev] Typo.pl scan of Python 2.5 source code

Johnny Lee typo_pl at hotmail.com
Sat Sep 23 00:46:32 CEST 2006

Hello,My name is Johnny Lee. I have developed a *ahem* perl script which scans C/C++ source files for typos.  I ran the typo.pl script on the released Python 2.5 source code. The scan took about two minutes and produced ~340 typos.After spending about 13 minutes weeding out the obvious false positives, 149 typos remain. One of the pros/cons of the script is that it doesn't need to be intergrated into the build process to work.It just searches for files with typical C/C++ source code file extensions and scans them.The downside is if the source file is not included in the build process, then the script is scanning an irrelevant file.Unless you aid the script via some parameters, it will scan all the code, even stuff inside #ifdef'sthat wouldn't normally be compiled. You can access the list of typos from <http://www.geocities.com/typopl/typoscan.htm>The Perl 1999 paper can be read at <http://www.geocities.com/typopl/index.htm> I've mapped the Python memory-related calls PyMem_Alloc, PyMem_Realloc, etc. to the same behaviour as the C std library malloc, realloc, etc. sinceInclude\pymem.h seem to map them to those calls. If that assumption is not valid, then you can ignore typos that involve those PyMem_XXX calls.  The Python 2.5 typos can be classified into 7 types. 1) if (X = 0)Assignment within an if statement. Typically a false positive, but sometimes it catches something valid.In Python's case, the one typo is: if (status = ERROR_MORE_DATA)but the previous code statement returns an error code into the status variable. 2) realloc overwrite src if NULL, i.e. p = realloc(p, new_size);If realloc() fails, it will return NULL. If you assign the return value to the same variable you passed into realloc,then you've overwritten the variable and possibly leaked the memory that the variable pointed to. 3) if (CreateFileMapping == IHV)On Win32, the CreateFileMapping() API will return NULL on failure, not INVALID_HANDLE_VALUE.The Python code does not check for NULL though. 4) if ((X!=0) || (X!=1))The problem with code of this type is that it doesn't work. In the Python case, we have in a large if statement: quotetabs && ((data[in]!='\t')||(data[in]!=' '))Now if data[in] == '\t', then it will fail the first data[in] but it will pass the second data[in] comparison.Typically you want "&&" not "||".5) using API result w/no checkThere are several APIs that should be checked for success before using the returned ptrs/cookies, i.e.malloc, realloc, and fopen among others. 6) XX;;Just being anal here. Two semicolons in a row. Second one is extraneous. 7) extraneous test for non-NULL ptrSeveral memory calls that free memory accept NULL ptrs. So testing for NULL before calling them is redundant and wastes code space.Now some codepaths may be time-critical, but probably not all, and smaller code usually helps.If you have any questions, comments, feel free to email. I hope this scan is useful. Thanks for your time,J
Use Messenger to talk to your IM friends, even those on Yahoo!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20060922/93285d97/attachment.htm 

More information about the Python-Dev mailing list