I've been thinking about some ideas for reducing the
amount of refcount adjustment that needs to be done,
with a view to making GIL removal easier.
1) Permanent objects
In a typical Python program there are many objects
that are created at the beginning and exist for the
life of the program -- classes, functions, literals,
etc. Refcounting these is a waste of effort, since
they're never going to go away.
So perhaps there could be a way of marking such
objects as "permanent" or "immortal". Any refcount
operation on a permanent object would be a no-op,
so no locking would be needed. This would also have
the benefit of eliminating any need to write to the
object's memory at all when it's only being read.
2) Objects owned by a thread
Python code creates and destroys temporary objects
at a high rate -- stack frames, argument tuples,
intermediate results, etc. If the code is executed
by a thread, those objects are rarely if ever seen
outside of that thread. It would be beneficial if
refcount operations on such objects could be carried
out by the thread that created them without locking.
To achieve this, two extra fields could be added
to the object header: an "owning thread id" and a
"local reference count". (The existing refcount
field will be called the "global reference count"
in what follows.)
An object created by a thread has its owning thread
id set to that thread. When adjusting an object's
refcount, if the current thread is the object's owning
thread, the local refcount is updated without locking.
If the object has no owning thread, or belongs to
a different thread, the object is locked and the
global refcount is updated.
The object is considered garbage only when both
refcounts drop to zero. Thus, after a decref, both
refcounts would need to be checked to see if they
are zero. When decrementing the local refcount and
it reaches zero, the global refcount can be checked
without locking, since a zero will never be written
to it until it truly has zero non-local references
remaining.
I suspect that these two strategies together would
eliminate a very large proportion of refcount-related
activities requiring locking, perhaps to the point
where those remaining are infrequent enough to make
GIL removal practical.
--
Greg
Hello!
Python's int type has an optional argument base which allows people to
specify a base for the conversion of a string to an integer.
>>> int('101', 2)
5
>>> int('a', 16)
10
I've sometimes missed a way to reverse the process. How would you like
an optional second argument to str() that takes an int from 2 to 36?
>>> str(5, 2)
'101'
>>> str(10, 16)
'a'
I know it's not a killer feature but it feels right to have a
complement. How do you like the idea?
Christian
I was surprised to find that there is no "monitor construct"
implementation for the stdlib. I wrote one that I feel is pretty
Pythonic, and I'd like it to be torn apart :)
It's fairly well documented -- for most standard monitor use cases you
can simply inherit the Monitor class and use the Monitor-specific
condition variables. Hopefully the accompanying examples will also help
to clarify the usage. They're somewhat silly, but get the idea across.
As an aside, because this seems to come up in every conversation I've
had about monitors in python: if I'm not mistaken, monitors are useful
with or without the GIL :)
Monitors are nifty tools that make complex synchronization problems
somewhat simpler (though more serialized). So far as I understand it,
the GIL provides single-bytecode atomicity, and monitor methods are
rarely single instructions. Plus, Jython and IronPython don't have a
GIL, so I would argue that monitors can still be valuable to "Python the
language" even if you won't allow that they can be valuable to "Python
the standard implementation".
Looking forward to hearing everybody's opinions.
Cheers,
Chris
Hi.
I modified xml.sax.saxutils.XMLGenerator, _xmlplus.sax.saxutils.XMLGenerator, and
_xmlplus.sax.saxutils.LexicalXMLGenerator so that they do no longer produce those
ugly close tags for empty elements, but use the short version. So instead of
<empty></empty> you get just <empty/>. :)
I used the version of saxutils.py that is shipped with python 2.5.1.
Where do I send such patches to?
Download here:
http://twoday.tuwien.ac.at/pub/files/python-xml-sax-saxutils (ZIP, 10 KB)
-panzi
GIL-slayers take note. Here are two papers about concurrent reference counting:
An On-The-Fly Reference Counting Garbage Collector for Java
Levanoni and Petrank, 2001.
http://www.cs.technion.ac.il/~erez/Papers/refcount.ps
Efficient On-the-Fly Cycle Collection
Paz, Bacon, Kolodner, Petrank, and Rajan, 2005.
http://www.cs.technion.ac.il/%7Eerez/Papers/CycleCollection.ps
"On-the-fly" means the algorithm is neither "fully concurrent" nor
"stop the world". Rather, each thread pauses occasionally to do some
work. Instead of a GIL, you have a lock that covers this periodic
bookkeeping.
The details are awfully complex, but there may be insights worth
gleaning regardless.
Also -- I wrote some stuff at:
http://wiki.python.org/moin/GlobalInterpreterLock
in the hopes that future "Kill GIL" discussions can start from a
better-informed base.
-j
When writing decorators especially when it's one that needs arguments
other than the function to be wrapped, it often gets rather ugly...
def dec(a, b, foo=bar):
def inner(func):
def something(*a, **k):
...stuff...
return func(*a, **k)
return something
return inner
Perhaps we could allow functions to be defined with multiple argument
lists, basically partially applying the function until all of them
are filled. (Sort of like currying, but sort of not.)
def dec(a, b, foo=bar)(func)(*a, **k):
...stuff...
return func(*a, **k)
So, calling `dec` will fill the first argument list and return a
callable, which when called will fill the second argument list and
return a third callable, which will be the fully-decorated function.
Basically, exactly as it looks -- def func(a)(b)(c) is called as func
(1)(2)(3). Except, obviously, you can partially apply it by only
calling the first one or two or however many. I'm not sure how this
would look internally, but I imagine each successive call would
return an object something like a partial.
I expect that the main argument against this will be that it is not a
common enough idiom to warrant adding syntax. Perhaps; I don't know.
The decorator pattern is very useful (and not only in the @blah
function decorator sense -- also the future class decorators, WSGI
middleware, etc.), and I do think it makes their definitions quite a
bit nicer and easier to read. Any thoughts?
I'll apologize in advance for this one since I suspect a lot of people
have hit this.
The current implementation doesn't allow for a trailing backslash in the
string.
Why don't raw strings in Python work more like C# @"..." strings? Then
it would allow for a trailing backslash and you could still get a single
quote by two consecutive quotes characters.
f=r'c:\src\f' # This is ok and gives you what you want
f=r'c:\src\f\' # Compilation error. String is not terminated.
f=r'''c:\src\f\''' # This doesn't work either and causes a compilation
error.
f=r'Here''s another mistake' # This doesn't do what you would think.
# You get
'Heres another mistake'
f=r'''Here's another mistake''' # This works but being able to use raw
strings for this would be nice.
f='c:\\src\\f\\' # this works but is ugly
I just don't understand the rationale for the current implementation.
I thought the intention of raw strings was to allow for backslashes in
the string. The current implementation does a bad job at it. Any
chance this could be changed with a backward compatibility option?
[This should be on python-ideas, so I'm replying to there instead of python-dev]
On 10/1/07, Justin Tulloss <tulloss2(a)uiuc.edu> wrote:
> Hello,
>
> I've been doing some tests on removing the GIL, and it's becoming clear that
> some basic changes to the garbage collector may be needed in order for this
> to happen efficiently. Reference counting as it stands today is not very
> scalable.
>
> I've been looking into a few options, and I'm leaning towards the
> implementing IBMs recycler GC (
> http://www.research.ibm.com/people/d/dfb/recycler-publications.html
> ) since it is very similar to what is in place now from the users'
> perspective. However, I haven't been around the list long enough to really
> understand the feeling in the community on GC in the future of the
> interpreter. It seems that a full GC might have a lot of benefits in terms
> of performance and scalability, and I think that the current gc module is of
> the mark-and-sweep variety. Is the trend going to be to move away from
> reference counting and towards the mark-and-sweep implementation that
> currently exists, or is reference counting a firmly ingrained tradition?
Refcounting is fairly firmly ingrained in CPython, but there are
conservative GCs for C that mostly work, and other implementations
aren't so restricted.
The problem with Python is that it produces a *lot* of garbage.
Pystones on my box does around a million objects per second and fills
up available ram in about 10 seconds. Not only do you need to collect
often enough to not fill up the ram, but for *good* performance you
need to collect often enough to keep your L1 cache hot. That would
seem to demand a generational GC at least.
You might as well assume it'll be more expensive than refcounting[1].
The real advantage would be in scalability. Concurrent, parallel GCs
are an active field of research though. If you're really interested
you should research conservative GCs aimed at C in general, and only
minimally interact with CPython (such as to disable the custom
allocators.)
A good stepping off point is The Memory Management Reference (although
it looks like it hasn't been updated in the last few years). If some
of my terms are unfamiliar to you, go start reading. ;)
http://www.memorymanagement.org/
[1] This statement is only in the context of CPython, of course.
There are certainly many situations where a tracing GC performs
better.
--
Adam Olsen, aka Rhamphoryncus
(Clark: I don't want to discuss this offline. On the list it goes.)
Quote doubling isn't a viable option for Python -- I don't believe
it's sane to have both backslashes and quote-double as escape
mechanisms.
Of course in C# the trailing \ is the main use case -- after all it's
a Microsoft product.
While for some Windows users this may be a nuisance, I don't think
they are in the majority amongst Python users.
--Guido
On 10/1/07, Clark Maurer <cmaurer(a)slickedit.com> wrote:
> I'll do what I can to sway you.
>
> Guido, please....pretty please with sugar on top (hehehe). I've been
> spoiled with languages which do better raw string. I designed Slick-C
> which I've been using for years and it does not have this problem. I
> always type Windows paths using the Slick-C equivalent of raw strings.
> My implementation came from REXX (I'm no spring chicken- I think you've
> heard of it too :-). Now that C# has gone with this implementation, that
> pretty much makes it the modern way to do this. C# has a lot more clout
> than Slick-C. Yes, these style strings are intended for regexes AND
> Windows paths. Just think, you can get rid of one FAQ. It's easier to
> document. The down side is that there will be some backward
> compatibility issues. I will admit, compared to other issues this is a
> small one.
>
> Trailing backslash isn't the only problem but it's the bigger one. Two
> quotes should be one single quote. Otherwise, specifying both quote
> characters in regexes is an issue. Yes, I've done this in regular
> expressions. This change could cause some backward compatibility
> problems as well.
>
> Clark
> -----Original Message-----
> From: python-ideas-bounces(a)python.org
> [mailto:python-ideas-bounces@python.org] On Behalf Of Guido van Rossum
> Sent: Monday, October 01, 2007 3:54 PM
> To: Steven Bethard
> Cc: Python-Ideas
> Subject: Re: [Python-ideas] raw strings
>
> On 10/1/07, Steven Bethard <steven.bethard(a)gmail.com> wrote:
> > > On 10/1/07, Steven Bethard <steven.bethard(a)gmail.com> wrote:
> > > > On 10/1/07, Clark Maurer <cmaurer(a)slickedit.com> wrote:
> > > > > The current implementation doesn't allow for a trailing
> backslash in the
> > > > > string.
> > > >
> > > > I believe that will change in Python 3.0.
> > > >
> > > > The discussion is here:
> > > > http://mail.python.org/pipermail/python-3000/2007-May/007684.html
> > > >
> > > > And Ron Adam's current patch is here:
> > > > http://bugs.python.org/issue1720390
> >
> > On 10/1/07, Guido van Rossum <guido(a)python.org> wrote:
> > > I'm still against actually. That's why the patch hasn't been applied
> yet.
> >
> > Sorry, my mistake. I read the thread as being somewhat in support of
> the change.
>
> I admit I've been wobbling a lot on this.
>
> > Anyway, to the OP, if you want to make this happen, you should help
> > Ron out with his patch. (Code has a much better chance of convincing
> > Guido than anything else does.)
>
> Not in this case. It's more the philosophical distinction -- are raw
> strings meant primarily to hold regexes or Windows pathnames? These
> two use cases have opposite requirements for trailing backslash
> treatment. I know the original use case that caused them to be added
> to the language is regexes, and that's still the only one I use on a
> regular basis.
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> _______________________________________________
> Python-ideas mailing list
> Python-ideas(a)python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
[xposted to python-ideas, reply-to python-ideas, leaving python-dev in
to correct misinformation]
On Tue, Oct 02, 2007, Greg Ewing wrote:
>
> The cyclic GC kicks in when memory is running low.
Not at all. The sole and only basis for GC is number of allocations
compared to number of de-allocations. See
http://docs.python.org/lib/module-gc.html
--
Aahz (aahz(a)pythoncraft.com) <*> http://www.pythoncraft.com/
The best way to get information on Usenet is not to ask a question, but
to post the wrong information.