[issue2694] msilib file names check too strict ?

Mark Mc Mahon report at bugs.python.org
Sat Mar 26 13:24:19 CET 2011


Mark Mc Mahon <mtnbikingmark at gmail.com> added the comment:

How about the following patch and tests...

Per: http://msdn.microsoft.com/en-us/library/aa369212(v=vs.85).aspx
"""The Identifier data type is a text string. Identifiers may contain the
ASCII characters A-Z (a-z), digits, underscores (_), or periods (.). However, every identifier must begin with either a letter or an underscore."""

So the spec would say that colons are NOT allowed. Editing some entries in the File table of an MSI (using Orca from the MSI SDK) and running the validation confirms that.

All the following were flagged as errors:
'KDiff3EXE;"ASDF@#$', 'chmFile-', 'pdfFile(', 'hgbook]', 'TortoisePlinkEXE]', 'Hg.Cämd'

I also did some speed testing (just in case non/regex might be slow)
Python 3.2 (r32:88445, Feb 20 2011, 21:29:02) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from timeit import timeit
>>> setup = 'import string\nidentifier_chars = string.ascii_letters + string.digits + "._"\ntmp_str = []'
>>> timeit("re.sub(r'[^a-zA-Z_\.]', '_', 'somefilename.txt')", setup = "import re")
4.434621757767205
>>> setup = 'import string\nidentifier_chars = string.ascii_letters + string.digits + "._"\ntmp_str = []'
>>> timeit('"".join([c if c in identifier_chars else "_" for c in "somefilename.txt"])', setup)
3.3757537425069906
>>>

----------
keywords: +patch
nosy: +markm
Added file: http://bugs.python.org/file21408/make_id_fix_and_test.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue2694>
_______________________________________


More information about the Python-bugs-list mailing list