[It may be worth creating a patch; I think most of these comments
would be better on the bug-tracker.]
(1) In a few cases, it looked like you were changing parameter names
between "files" and "filenames". This might break code that was
calling it with keyword arguments -- as I typically would for this
type of function.
(1a) If you are going to change the .sig, you might as well do it
right, and make the default be "knownfiles" rather than the empty
tuple.
(2) The comment about why inited was set true at the beginning of the
function instead of the end should probably be kept, or at least
reworded.
(3) Default values:
(3a) Why the list of known files going back to Apache 1.2, in that
order? Is there any risk in using too *new* of a MimeTypes file?
I would assume that the goal is to pick up whatever changes the user
has made locally, but in that case, it still makes sense to have the
newest file be the last one read, in case Apache has made bugfixes.
(3b) Also, this would improve cross-platform consistency; if I read
that correctly, the Apache files will override the python defaults on
unix or a mac, but not on windows. That will change the results on
the majority of items in _common_types. (application vs text, whether
to put an x- in front of the word pict.)
(3c) rtf is listed in non-standard, but
http://www.iana.org/assignments/media-types/ does define it. (Though
whether to guess application vs text is not defined, and python
chooses differently from apache.)
(3d) jpg is listed as non-standard. It turns out that this is just
for the inverse mapping, where image/jpg is non-standard (for
image/jpeg) but that is worth a comment. (see #5)
(3e) In _types_map, the lines marked duplicates are duplicate keys,
not duplicate values; it would be more clear to also comment out the
(first) line itself, instead of just marking it a duplicate. (Or
better yet, to mention that it is just being added for the inverse
mapping, if that is the case.)
(4) Why bother to lazyinit? Is there any sane usecase for a
MimeTypes that hasn't been inited?
I see value in not reading the default files, but none in not reading
at least the files that were asked for. I could see value in only
partial initialization if there were several long steps, but right
now, initialization is all-or-nothing.
If the thing is useless without an init, then it makes sense to just
get done it immediately and skip the later checks; anyone who could
have actually saved time should just remove the import.
-jJ
John Machin wrote:
> Hi Matthew,
>
> Your post in c.l.py about your re rewrite didn't mention where to report
> bugs etc so I dug this address out of Google Groups ...
>
> Environment: Python 2.6.2, Windows XP SP3, your latest (29 July) regex
> from the Python bugtracker.
>
> Problem is repeated calls of e.g. compiled_pattern.search(some_text) --
> Task Manager performance panel shows increasing memory usage with regex
> but not with re. It appears to be cumulative i.e. changing to another
> pattern or text doesn't release memory.
>
> Example:
>
> 8<-- regex_timer.py
> import sys
> import time
> if sys.platform == 'win32':
> timer = time.clock
> else:
> timer = time.time
> module = __import__(sys.argv[1])
> count = int(sys.argv[2])
> pattern = sys.argv[3]
> expected = sys.argv[4]
> text = 80 * '~' + 'qwerty'
> rx = module.compile(pattern)
> t0 = timer()
> for i in xrange(count):
> assert rx.search(text).group(0) == expected
> t1 = timer()
> print "%d iterations in %.6f seconds" % (count, t1 - t0)
> 8<---
>
> Here are the results of running this (plus observed difference between
> peak memory usage and base memory usage):
>
> dos-prompt>\python26\python regex_timer.py regex 1000000 "~" "~"
> 1000000 iterations in 3.811500 seconds [60 Mb]
>
> dos-prompt>\python26\python regex_timer.py regex 2000000 "~" "~"
> 2000000 iterations in 7.581335 seconds [128 Mb]
>
> dos-prompt>\python26\python regex_timer.py re 2000000 "~" "~"
> 2000000 iterations in 2.549738 seconds [3 Mb]
>
> This happens on a variety of patterns: "w", "wert", "[a-z]+", "[a-z]+t",
> ...
>
Thanks for that, John. I've should've kept an eye on the Task Manager!
:-) Now fixed.
It's surprising how much time and effort is needed just to manage the
memory!
Please review this, I'm worried that there are cases where convertitem() is
returning a string that really should be overridden by the argument "help
string". However, I'm worried that this change will get rid of useful
messages (via the format "; help string"), when there otherwise wouldn't
be.
PyArg_ParseTuple when handling a string format "s", raises a TypeError
when the passed string contains a NUL. However, the TypeError doesn't
contain useful information.
For example:
syslog.syslog('hello\0there')
TypeError: [priority,] message string
This seems to be a thinko in Python/getargs.c at line 331:
msg = convertitem(PyTuple_GET_ITEM(args, i), &format, p_va,
flags, levels, msgbuf,
sizeof(msgbuf), &freelist);
if (msg) {
seterror(i+1, msg, levels, fname, message); <<< Line 331
return cleanreturn(0, freelist);
}
This also applies to Python 3 trunk in line 390.
I think that's supposed to be "msg" instead of "message" in the last
argument. If I change it, I get:
>>> import syslog; syslog.syslog('hello\0there')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: must be string without null bytes, not str
I think it's safe to change "message" to "msg" because:
"message" is the "help" string "[priority,] message string",
but "msg" contains much more useful information.
"msg" is in the "fall back if the last argument doesn't contain useful
information" argument position, but "msg" is never NULL.
"message" only is NULL if the format string doesn't contain ";".
In every case I can find in the code, convertitem() is returning a
much more useful string than the help string.
Or perhaps it should do something like:
if (msg) {
seterror(i+1, msg, levels, fname, '%s (%s)' % ( msg, message ));
Pardon my mixed C+Python, but you get the idea.
Thoughts?
Thanks,
Sean
--
[...] Premature optimization is the root of all evil.
-- Donald Knuth
Sean Reifschneider, Member of Technical Staff <jafo(a)tummy.com>
tummy.com, ltd. - Linux Consulting since 1995: Ask me about High Availability