At 01:29 PM 11/6/2005 -0800, Guido van Rossum wrote:
>On 11/6/05, Phillip J. Eby <pje(a)telecommunity.com> wrote:
> > At 12:58 PM 11/6/2005 -0800, Guido van Rossum wrote:
> > >The main way this breaks down is when comparing objects of different
> > >types. While most comparisons typically are defined in terms of
> > >comparisons on simpler or contained objects, two objects of different
> > >types that happen to have the same "key" shouldn't necessarily be
> > >considered equal.
> >
> > When I use this pattern, I often just include the object's type in the
> > key. (I call it the 'hashcmp' value, but otherwise it's the same pattern.)
>
>But how do you make that work with subclassing? (I'm guessing your
>answer is that you don't. :-)
By either changing the subclass __init__ to initialize it with a different
hashcmp value, or by redefining the method that computes it.
Patch / Bug Summary
___________________
Patches : 372 open ( -7) / 2980 closed (+12) / 3352 total ( +5)
Bugs : 908 open ( -2) / 5395 closed (+11) / 6303 total ( +9)
RFE : 200 open ( +0) / 191 closed ( +0) / 391 total ( +0)
New / Reopened Patches
______________________
CodeContext - Improved text indentation (2005-11-21)
http://python.org/sf/1362975 opened by Tal Einat
test_cmd_line expecting English error messages (2005-11-23)
CLOSED http://python.org/sf/1364545 opened by A.B., Khalid
Add reference for en/decode error types (2005-11-23)
CLOSED http://python.org/sf/1364946 opened by Wummel
[PATCH] mmap fails on AMD64 (2005-11-24)
http://python.org/sf/1365916 opened by Joe Wreschnig
Patches Closed
______________
zlib.crc32 doesn't handle 0xffffffff seed (2005-11-07)
http://python.org/sf/1350573 closed by akuchling
xml.dom.minidom.Node.replaceChild(obj, x, x) removes child x (2005-01-01)
http://python.org/sf/1094164 closed by akuchling
Patch for (Doc) #1255218 (2005-10-17)
http://python.org/sf/1328526 closed by birkenfeld
Patch for (Doc) #1261659 (2005-10-17)
http://python.org/sf/1328566 closed by birkenfeld
Patch for (Doc) #1357604 (2005-11-18)
http://python.org/sf/1359879 closed by birkenfeld
CallTip Modifications (2005-05-11)
http://python.org/sf/1200038 closed by kbk
ensure lock is released if exception is raised (2005-10-05)
http://python.org/sf/1314396 closed by bcannon
test_cmd_line expecting English error messages (2005-11-23)
http://python.org/sf/1364545 closed by doerwalter
ToolTip.py: fix main() function (2005-10-06)
http://python.org/sf/1315161 closed by kbk
Add reference for en/decode error types (2005-11-23)
http://python.org/sf/1364946 closed by doerwalter
solaris 10 should not define _XOPEN_SOURCE_EXTENDED (2005-06-27)
http://python.org/sf/1227966 closed by loewis
Solaris 10 fails to compile complexobject.c [FIX incl.] (2005-02-05)
http://python.org/sf/1116722 closed by loewis
New / Reopened Bugs
___________________
textwrap.dedent() expands tabs (2005-11-19)
http://python.org/sf/1361643 opened by Steven Bethard
Text.edit_modified() doesn't work (2005-11-20)
http://python.org/sf/1362475 opened by Ron Provost
Problem with tapedevices and the tarfile module (2005-11-21)
http://python.org/sf/1362587 opened by Henrik
spawnlp is missing (2005-11-21)
http://python.org/sf/1363104 opened by Greg MacDonald
A possible thinko in the description of os/chmod (2005-11-22)
CLOSED http://python.org/sf/1363712 opened by Evgeny Roubinchtein
urllib cannot open data: urls (2005-11-25)
CLOSED http://python.org/sf/1365984 opened by Warren Butler
Bug bz2.BZ2File(...).seek(0,2) (2005-11-25)
http://python.org/sf/1366000 opened by STINNER Victor
inoorrect documentation for optparse (2005-11-25)
http://python.org/sf/1366250 opened by Michael Dunn
SRE engine do not release the GIL (2005-11-25)
http://python.org/sf/1366311 opened by Eric Noyau
inspect.getdoc fails on objs that use property for __doc__ (2005-11-26)
http://python.org/sf/1367183 opened by Drew Perttula
Bugs Closed
___________
A possible thinko in the description of os.chmod (2005-11-22)
http://python.org/sf/1363712 closed by birkenfeld
docs need to discuss // and __future__.division (2001-08-08)
http://python.org/sf/449093 closed by akuchling
Prefer configured browser over Mozilla and friends (2005-11-17)
http://python.org/sf/1359150 closed by birkenfeld
Incorrect documentation of raw unidaq string literals (2005-11-17)
http://python.org/sf/1359053 closed by birkenfeld
"appropriately decorated" is undefined in MultiFile.push doc (2005-08-09)
http://python.org/sf/1255218 closed by birkenfeld
Tutorial doesn't cover * and ** function calls (2005-08-17)
http://python.org/sf/1261659 closed by birkenfeld
os.path.makedirs DOES handle UNC paths (2005-11-15)
http://python.org/sf/1357604 closed by birkenfeld
Exec Inside A Function (2005-04-06)
http://python.org/sf/1177811 closed by birkenfeld
Py_BuildValue k format units don't work with big values (2005-09-04)
http://python.org/sf/1281408 closed by birkenfeld
urllib cannot open data: urls (2005-11-25)
http://python.org/sf/1365984 closed by birkenfeld
imaplib: parsing INTERNALDATE (2003-03-06)
http://python.org/sf/698706 closed by birkenfeld
There's still more clean up work to go, but the current AST is
hopefully much closer to the behaviour before it was checked in.
There are still a few small memory leaks.
After running the test suite, the total references were around 380k
(down from over 1,000k). I'm not sure exactly what the total refs
were just before AST was checked in, but I believe it was over 340k.
So there are likely some more ref leaks that should be investigated.
It would be good to know the exact number before AST was checked in
and now, minus any new tests.
There is one memory reference error in test_coding:
Invalid read of size 1
at 0x41304E: tok_nextc (tokenizer.c:876)
by 0x413874: PyTokenizer_Get (tokenizer.c:1099)
by 0x411962: parsetok (parsetok.c:124)
by 0x498D1F: PyParser_ASTFromFile (pythonrun.c:1292)
by 0x48D79A: load_source_module (import.c:777)
by 0x48E90F: load_module (import.c:1665)
by 0x48ED61: import_submodule (import.c:2259)
by 0x48EF60: load_next (import.c:2079)
by 0x48F44D: import_module_ex (import.c:1921)
by 0x48F715: PyImport_ImportModuleEx (import.c:1955)
by 0x46D090: builtin___import__ (bltinmodule.c:44)
Address 0x1863E8F6 is 2 bytes before a block of size 8192 free'd
at 0x11B1BA8A: free (vg_replace_malloc.c:235)
by 0x4127DB: decoding_fgets (tokenizer.c:167)
by 0x412F1F: tok_nextc (tokenizer.c:823)
by 0x413874: PyTokenizer_Get (tokenizer.c:1099)
by 0x411962: parsetok (parsetok.c:124)
by 0x498D1F: PyParser_ASTFromFile (pythonrun.c:1292)
by 0x48D79A: load_source_module (import.c:777)
by 0x48E90F: load_module (import.c:1665)
by 0x48ED61: import_submodule (import.c:2259)
by 0x48EF60: load_next (import.c:2079)
by 0x48F44D: import_module_ex (import.c:1921)
by 0x48F715: PyImport_ImportModuleEx (import.c:1955)
by 0x46D090: builtin___import__ (bltinmodule.c:44)
I had a patch for this somewhere, I'll try to find it. However, I
only fixed this exact error, there was another path that could still
be problematic.
Most of the memory leaks show up when we are forking in:
test_fork1
test_pty
test_subprocess
Here's what I have so far. There are probably some more. It would be
great if someone could try to find and fix these leaks.
n
--
16 bytes in 1 blocks are definitely lost in loss record 25 of 599
at 0x11B1AF13: malloc (vg_replace_malloc.c:149)
by 0x4CA102: alias (Python-ast.c:1066)
by 0x4CD918: alias_for_import_name (ast.c:2199)
by 0x4D0C4E: ast_for_stmt (ast.c:2244)
by 0x4D15E3: PyAST_FromNode (ast.c:234)
by 0x499078: Py_CompileStringFlags (pythonrun.c:1275)
by 0x46D6DF: builtin_compile (bltinmodule.c:457)
56 bytes in 1 blocks are definitely lost in loss record 87 of 599
at 0x11B1AF13: malloc (vg_replace_malloc.c:149)
by 0x4C9C92: Name (Python-ast.c:860)
by 0x4CE4BA: ast_for_expr (ast.c:1222)
by 0x4D1021: ast_for_stmt (ast.c:1900)
by 0x4D15E3: PyAST_FromNode (ast.c:234)
by 0x499078: Py_CompileStringFlags (pythonrun.c:1275)
by 0x46D6DF: builtin_compile (bltinmodule.c:457)
112 bytes in 2 blocks are definitely lost in loss record 198 of 674
at 0x11B1AF13: malloc (vg_replace_malloc.c:149)
by 0x4C9C92: Name (Python-ast.c:860)
by 0x4CE4BA: ast_for_expr (ast.c:1222)
by 0x4D1021: ast_for_stmt (ast.c:1900)
by 0x4D16D5: PyAST_FromNode (ast.c:275)
by 0x499078: Py_CompileStringFlags (pythonrun.c:1275)
by 0x46D6DF: builtin_compile (bltinmodule.c:457)
56 bytes in 1 blocks are definitely lost in loss record 89 of 599
at 0x11B1AF13: malloc (vg_replace_malloc.c:149)
by 0x4C9C92: Name (Python-ast.c:860)
by 0x4CF3AF: ast_for_arguments (ast.c:650)
by 0x4D1BFF: ast_for_funcdef (ast.c:830)
by 0x4D15E3: PyAST_FromNode (ast.c:234)
by 0x499161: PyRun_StringFlags (pythonrun.c:1275)
by 0x47B1B2: PyEval_EvalFrameEx (ceval.c:4221)
by 0x47CCCC: PyEval_EvalCodeEx (ceval.c:2739)
by 0x47ABCC: PyEval_EvalFrameEx (ceval.c:3657)
by 0x47CCCC: PyEval_EvalCodeEx (ceval.c:2739)
by 0x4C27F8: function_call (funcobject.c:550)
112 bytes in 2 blocks are definitely lost in loss record 189 of 651
at 0x11B1AF13: malloc (vg_replace_malloc.c:149)
by 0x4C9C92: Name (Python-ast.c:860)
by 0x4CE4BA: ast_for_expr (ast.c:1222)
by 0x4D02F7: ast_for_stmt (ast.c:2028)
by 0x4D16D5: PyAST_FromNode (ast.c:275)
by 0x499078: Py_CompileStringFlags (pythonrun.c:1275)
by 0x46D6DF: builtin_compile (bltinmodule.c:457)
56 bytes in 1 blocks are definitely lost in loss record 118 of 651
at 0x11B1AF13: malloc (vg_replace_malloc.c:149)
by 0x4C9A41: Num (Python-ast.c:751)
by 0x4CE578: ast_for_expr (ast.c:1237)
by 0x4CF4ED: ast_for_arguments (ast.c:629)
by 0x4D1BFF: ast_for_funcdef (ast.c:830)
by 0x4D15E3: PyAST_FromNode (ast.c:234)
by 0x499161: PyRun_StringFlags (pythonrun.c:1275)
by 0x47B1B2: PyEval_EvalFrameEx (ceval.c:4221)
by 0x47CCCC: PyEval_EvalCodeEx (ceval.c:2739)
by 0x47ABCC: PyEval_EvalFrameEx (ceval.c:3657)
by 0x47CCCC: PyEval_EvalCodeEx (ceval.c:2739)
by 0x4C27F8: function_call (funcobject.c:550)
112 (56 direct, 56 indirect) bytes in 1 blocks are definitely lost in
loss record 185 of 651
at 0x11B1AF13: malloc (vg_replace_malloc.c:149)
by 0x4C97CA: GeneratorExp (Python-ast.c:648)
by 0x4CEE4F: ast_for_expr (ast.c:1251)
by 0x4D1021: ast_for_stmt (ast.c:1900)
by 0x4D16D5: PyAST_FromNode (ast.c:275)
by 0x499078: Py_CompileStringFlags (pythonrun.c:1275)
by 0x46D6DF: builtin_compile (bltinmodule.c:457)
1024 bytes in 1 blocks are definitely lost in loss record 441 of 651
at 0x11B1AF13: malloc (vg_replace_malloc.c:149)
by 0x43F8C4: PyObject_Malloc (obmalloc.c:500)
by 0x4B808F: PyNode_AddChild (node.c:95)
by 0x4B8386: PyParser_AddToken (parser.c:126)
by 0x411944: parsetok (parsetok.c:165)
by 0x499062: Py_CompileStringFlags (pythonrun.c:1271)
by 0x46D6DF: builtin_compile (bltinmodule.c:457)
Jim Jewett wrote:
>Do you have the code that caused problems?
>
>
Yes. I was able to reproduce his trouble and was trying to debug it.
>The things I would check first are
>
>(1) Is he allocating (peak usage) a type (such as integers) that
>never gets returned to the free pool, in case you need more of that
>same type?
>
>
No, I don't think so.
>(2) Is he allocating new _types_, which I think don't get properly
>
> collected.
>
>
Bingo. Yes, definitely allocating new _types_ (an awful lot of them...)
--- that's what the "array scalars" are: new types created in C. If
they don't get properly collected then that would definitely have
created the problem. It would seem this should be advertised when
telling people to use PyObject_New for allocating new memory for an object.
>(3) Is there something in his code that keeps a live reference, or at
>least a spotty memory usage so that the memory can't be cleanly
>released?
>
>
>
No, that's where I thought the problem was, at first. I spent a lot of
time tracking down references. What finally convinced me it was the
Python memory manager was when I re-wrote the tp->alloc functions of the
new types to use the system malloc instead of PyObject_Malloc. As
soon as I did this the problems disappeared and memory stayed constant.
Thanks for your comments,
-Travis
While running regrtest with -R to find reference leaks I found a usage
issue. When a codec is registered it is stored in the interpreter
state and cannot be removed. Since it is stored as a list, if you
repeated add the same search function, you will get duplicates in the
list and they can't be removed. This shows up as a reference leak
(which it really isn't) in test_unicode with this code modified from
test_codecs_errors:
import codecs
def search_function(encoding):
def encode1(input, errors="strict"):
return 42
return (encode1, None, None, None)
codecs.register(search_function)
###
Should the search function be added to the search path if it is
already in there? I don't understand a benefit of having duplicate
search functions.
Should users have access to the search path (through a
codecs.unregister())? If so, should it search from the end of the
list to the beginning to remove an item? That way the last entry
would be removed rather than the first.
n
On Thursday 24 November, Donovan Baarda wrote:
> I don't know if this will help, but in my experience compiling re's
> often takes longer than matching them... are you sure that it's the
> match and not a compile that is taking a long time? Are you using
> pre-compiled re's or are you dynamically generating strings and using
> them?
It's definitely matching time. The res are all pre-compiled.
[...]
> > A quick look at the code in _sre.c suggests that for most of the time,
> > no Python objects are being manipulated, so the interpreter lock could
> > be released. Has anyone tried to do that?
>
> probably not... not many people would have several-minutes-to-match
> re's.
>
> I suspect it would be do-able... I suggest you put together a patch and
> submit it on SF...
The thing that scares me about doing that is that there might be
single-threadedness assumptions in the code that I don't spot. It's the
kind of thing where a patch could appear to work fine, but them
mysteriously fail due to some occasional race condition. Does anyone
know if there is there any global state in _sre that would prevent it
being re-entered, or know for certain that there isn't?
Cheers,
Duncan.
--
-- Duncan Grisby --
-- duncan(a)grisby.org --
-- http://www.grisby.org --
I know (thanks to Google) that much has been said in the past about the
Python Memory Manager. My purpose in posting is simply to given a
use-case example of how the current memory manager (in Python 2.4.X) can
be problematic in scientific/engineering code.
Scipy core is a replacement for Numeric. One of the things scipy core
does is define a new python scalar object for ever data type that an
array can have (currently 21). This has many advantages and is made
feasible by the ability of Python to subtype in C. These scalars all
inherit from the standard Python types where there is a correspondence.
More to the point, however, these scalar objects were allocated using
the standard PyObject_New and PyObject_Del functions which of course use
the Python memory manager. One user ported his (long-running) code to
the new scipy core and found much to his dismay that what used to
consume around 100MB now completely dominated his machine consuming up
to 2GB of memory after only a few iterations. After searching many
hours for memory leaks in scipy core (not a bad exercise anyway as some
were found), the real problem was tracked to the fact that his code
ended up creating and destroying many of these new array scalars.
The Python memory manager was not reusing memory (even though
PyObject_Del was being called). I don't know enough about the memory
manager to understand why that was happening. However, changing the
allocation from PyObject_New to malloc and from PyObject_Del to free,
fixed the problems this user was seeing. Now the code runs for a long
time consuming only around 100MB at-a-time.
Thus, all of the objects in scipy core now use system malloc and system
free for their memory needs. Perhaps this is unfortunate, but it was
the only solution I could see in the short term.
In the long term, what is the status of plans to re-work the Python
Memory manager to free memory that it acquires (or improve the detection
of already freed memory locations). I see from other postings that this
has been a problem for other people as well. Also, is there a
recommended way for dealing with this problem other than using system
malloc and system free (or I suppose writing your own specialized memory
manager).
Thanks for any feedback,
-Travis Oliphant