[Taking this to email. Carrying out discussions via the SF bug tracker
Comment By: Tim Peters (tim_one)
> > I had to change _PyWeakref_ClearRef() since it was also
> > clearing the weakref list of the trash object.
> That was really its *purpose*. If a trash weakref with a
> callback isn't removed from the referent's list of weakrefs,
> then the callback will trigger when PyObject_ClearWeakRefs()
> is invoked on the referent. The purpose of
> _PyWeakref_ClearRef() was to ensure that the callback never
But it's okay of the callback triggers, as long as the callback
doesn't reference trash.
> > Now it just sets wr_object to Py_None.
> That won't stop the callback from triggering. It also means
> (see earlier comment) that PyObject_ClearWeakRefs() will
> never removed the weakref from the list either, although I'm
> not sure that does real harm.
I'm trying to figure out PyObject_ClearWeakRefs() right now.
> > I also made some serious simplifications to gcmodule by
> > just treating trash weakref objects with callbacks the same
> > as objects with __del__ methods (i.e. move them to the
> > finalizers list and then do the transitive closure of that set).
> Does that mean they can end up in gc.garbage too? If so, I
> don't think that's sellable.
I think so. That can be easily changed though. What we can't do is
invoke those callbacks.
> I finally decided to stop postponing doing this. I'll be running a bug day on
> November 7. I'll be in our office with some coworkers: any Dutch Pythoneers
> welcome to join us for some 'real life' Python hacking!
Just a thought: should we consider doing a "patch review day" someday instead
of a bug day? At a time like this we might be dividing patches into categories
like "apply for 2.4", "apply for 2.5", "reject", "needs work", and
-- Michael Chermside
The first beta is out, so the trunk is unfrozen, and available for
Now that we're in beta, we shouldn't see any new features or behaviour
changing fixes going into the trunk, unless it's been seen and agreed
to on python-dev.
I currently plan for a second beta in either 2 or 3 weeks, and then,
assuming all goes well, a release candidate a couple of weeks after
(apologies in the pause in getting the release email out - an
unexpectedly hectic weekend popped up without notice)
I quite agree that the documentation for the logging package can be improved. As others have said in response to your post, you can definitely help by indicating more specifically where you find the documentation lacking, with a patch if possible. For example, your initial post about the length prefix for pickles is documented in the docstring for SocketHandler.makePickle():
"Pickles the record in binary format with a length prefix, and returns it ready for transmission across the socket."
I agree that the documentation does not mention specifically that the length is encoded as four bytes, or that it is packed using struct.pack(), or exactly how you unpack it. I also agree that more examples would be helpful. I will endeavour to improve the situation insofar as time allows. Patches and specific suggestions from users, especially new users like you, will be a definite help to me.
Tim Peters wrote:
> Anyone have a bright idea? It's remarkable how long we've managed to go
> without noticing that everything is disastrously broken here <0.9 wink>.
Sure. Clearing cyclic trash can call Python code. If there
are weakrefs to any of the cyclic trash, then those wekrefs can
be used to resurrect the objects. Therefore, *before* clearing cyclic
trash, we need to remove any weakrefs. If any of the weakrefs
being removed have callbacks, then we need to save the callbacks
and call them *after* all of the weakrefs have been cleared.
Jim Fulton mailto:firstname.lastname@example.org Python Powered!
CTO (540) 361-1714 http://www.python.org
Zope Corporation http://www.zope.comhttp://www.zope.org
Neal Becker writes:
> There is only a single example in the logging module documentation.
> add more examples!
You are completely right... I am sure that those docs need to be
Would you be willing to write up some improved docs? If so, submit them
to the patch manager on SourceForge, and we'll incorporate them into the
next release possible (probably Python 2.5). If not, then it may have to
wait until someone else has time to address it.
-- Michael Chermside
This email may contain confidential or privileged information. If you believe you have received the message in error, please notify the sender and delete the message without copying or disclosing it.
I've noticed a potential bug and wanted to see if anyone else can
replicate it, or has also seen it. When creating a socket with
getaddrinfo, with the address family set to AF_UNSPEC and the host set to
localhost, the socket binds to IPv4 instead of IPv6. When using the FQD of
the machine it binds to IPv6. I've tested the same code on two different
machines, and the one running python 2.3.3 works as it should, but the
python 2.3.4 doesn't.
Anyone seen this? Can you replicate the error? Any help would be
appreciated. Oh yeah, I'm running on FC2 on the 2.3.4 machine and redhat
8 on the 2.3.3 machine.
There is only a single example in the logging module documentation. Please
add more examples!
I'm trying to use DatagramHandler. The documentation on this is not
adequate. There is a confusing mention of makeLogRecord. No example.
I finally stumbled across this:
data, address = sock.recvfrom (8192)
rec = logging.makeLogRecord (pickle.loads(data[4:]))
I can't find _anywhere_ a mention that you need to leave off the first 4
bytes in order for unpickle to work.
Last December, we had a short thread discussing the integration of
PySQLite into Python 2.4. At the time, I was against inclusion,
because I thought PySQLite was not ripe for it, mostly because I
thought the API was not stable.
Now, I have started writing a new PySQLite module, which has the
following key features:
- Uses iterator-style SQLite 3.x API: sqlite3_compile, sqlite3_step()
etc. This way, it is possible to use prepared statements, and for
large resultsets, it requires less memory, because the whole
resultset isn't fetched into memory at once any longer.
- Completely incompatible with the SQLite 0.x/1.x API: I'm free to
create a much better API now.
- "In the face of ambiguity, refuse the temptation to guess." -
PySQLite 1.x tries to "guess" which Python type to convert to. It's
pretty good at it, because it queries the column type information.
This works for, I'd say 90 % of all cases at least. But as soon as
you use anything fancy like functions, aggregates or expressions in
SQL, the _typeless_ nature of SQLite breaks through and it will tell
us nothing about the declared column type (of course, because the
data is not coming from a database column).
So I decided to change the default behaviour and make PySQLite
typeless by default, too. Everything will be returned as a Unicode
string (the default might be user-configurable per connection).
Unless, unless of course the user explicitly activates the
"guess-mode" ;-) But to do so, she must read the docs then she will
be aware of the fact that it only works in 90 % of all cases.
So why am I bothering you about this?
I think that a simple embedded relational database would be a good
thing to have in Python by default. And as Python 2.5 won't happen
anytime soon, there's plenty of time for developing it, getting it
stable, and integrating it.
Especially those of you that have used PySQLite in the past, do you
have any suggestions that would make the rewrite a better candidate
for inclusion into Python?
One problem I see is that even the new PySQLite will grow and try to
wrap much of the SQLite API that are not directly related to the
DB-API. If such a thing is too complicated/big for the standard
library, then maybe it would be better to produce a much simpler
PySQLite, especially for the Python standard library that leaves all
the fancy stuff out. My codename would be "embsql".
So, what would you like to see? "import sqlite", "import embsql", or
I know that this has been discussed a bit in the past, but I was hoping
that some Python gurus could shed some light on this issue, and maybe
let me know if there are any plans for solving this problem. I know a
hack that might work, but there must be a better way to solve this
The short version of the problem is that obmalloc.c never frees memory.
This is a great strategy if the application runs for a short time then
quits, or if it has fairly constant memory usage. However, applications
with very dynamic memory needs and that run for a long time do not
perform well because Python hangs on to the peak amount of memory
required, even if that memory is only required for a tiny fraction of
the run time. With my application, I have a python process which occupy
1 GB of RAM for ~20 hours, even though it only uses that 1 GB for about
5 minutes. This is a problem that needs to be addressed, as it
negatively impacts the performance of Python when manipulating very
large data sets. In fact, I found a mailing list post where the poster
was looking for a workaround for this issue, but I can't find it now.
Some posts to various lists  have stated that this is not a real
problem because virtual memory takes care of it. This is fair if you
are talking about a couple megabytes. In my case, I'm talking about
~700 MB of wasted RAM, which is a problem. First, this is wasting space
which could be used for disk cache, which would improve the performance
of my system. Second, when the system decides to swap out the pages
that haven't been used for a while, they are dirty and must be written
to swap. If Python ever wants to use them again, they will be brought
it from swap. This is much worse than informing the system that the
pages can be discarded, and allocating them again later. In fact, the
other native object types (ints, lists) seem to realize that holding on
to a huge amount of memory indefinitely is a bad strategy, because they
explicitly limit the size of their free lists. So why is this not a
good idea for other types?
Does anyone else see this as a problem? Does anyone think this is not a
- Python's memory allocator should occasionally free memory if the
memory usage has been relatively constant, and has been well below the
amount of memory allocated. This will incur additional overhead to free
the memory, and additional overhead to reallocate it if the memory is
needed again quickly. However, it will make Python co-operate nicely
with other processes, and a clever implementation should be able to
reduce the overhead.
- I do not completely understand Python's memory allocator, but from
what I see, it will not easily support this.
I've been playing with the fact that the "collect" function in the gc
module already gets called occasionally. Whenever it gets called for a
level 2 collection, I've hacked it to call a cleanup function in
obmalloc.c. This function goes through the free pool list, reorganizes
it to decrease memory fragmentation and decides based on metrics
collected from the last run if it should free some memory. It currently
works fine, except that it will permit the arena vector to grow
indefinitely, which is also bad for a long running process. It is also
bad because these cleanups are relatively slow as they touch every free
page that is currently allocated, so I'm trying to figure out a way to
integrate them more cleanly into the allocator itself.
This also requires that nothing call the allocation functions while
this is happening. I believe that this is reasonable, considering that
it is getting called from the cyclical garbage collector, but I don't
know enough about Python internals to figure that out.
Eventually, I hope to do some benchmarks and figure out if this is
actually a reasonable strategy. However, I was hoping to get some
feedback before I waste too much time on this.
Evan Jones: http://evanjones.ca/
"Computers are useless. They can only give answers" - Pablo Picasso