Extending language syntax

Hi all, I've just joined you, I hope I'll not make you loose your time with questions already covered. Sorry, if my suggest represents lot of work (certainly too much) and about my lack of knowledge about internal implementation, but I hope opening a discussion can help. I would suggest to introduce a new statement to provide a way to extend language syntax. Abstract: While reading this article about None history<http://python-history.blogspot.fr/2013/11/story-of-none-true-false.html>I realised I was not alone to wonder about some aspect of the language. Some of these are: - constant and immutability - null object and "None is a singleton" - readability of code *Constant:* It's a convention, no matter with that. My question is more if there is no memory overuse to have fixed values that behave as variable. *Null Object:* NoneType is not overridable. Having null objects instance of NoneType wouldn't make sense ? Can't assign keywords so "None is a singleton". Wouldn't it be more simple to have "keyword" type and be able to assign a keyword to a variable ? With an object hierarchy like: keyword <- ReservedKeyword I find it can make clearer why some keywords are reserved without changing actual behaviour. *Readability of code:* I have shouldDsl <http://www.should-dsl.info/> use case in mind. The introduction illustrate my thought: "*The goal of Should-DSL is to write should expectations in Python as clear and readable as possible, using an “almost” natural language (limited by some Python language’s constraints).*" I find ShouldDsl perturbing, as the aim is laudable but the implementation stays a hack of the language. I see the influence of functional paradigm in this syntax and agreed with the fact it becomes a "must have". I love Python syntax and find, it has very few limits, but maybe it can be improved by reducing some constraints. One concerns in extending syntax is that the language can evolve on users usage. I see it like an open laboratory where developpers experiments new keywords like "should" implemented in their favorite language and when this keyword is largely accepted by the community it can finally integrate the core. *A solution*: I was first thinking of a statement like "Instance" to declare an immutable object with a single instance (like None), but I then consider how it would be great to be able to extend the syntax. I thought the keyword "Keyword", which exists in clojure for example<http://clojure.org/data_structures#Data%20Structures-Keywords>, can do the job. A first implementation can be: Keyword <name>(<keyword, literal or object>): field = <something> # fixed value: self.field = "something" will raise an error def behaviour(self, *args, **kwargs): # do something def __get__(self, left, right, block_content): # what to do when accessing keyword Inline implementation: Keyword <name>(<keyword, literal or object>) *Examples of use:* *Constant*: Keyword Pi(3.14159) *Null Object* and immutable objects: Class Person(object): def __init__(self, name): self.name = name JohnDoe = Person(name="anonymous") Keyword Anonymous(None, JohnDoe) Anonymous.name = "JohnDoe" # raise an error assert isinstance(Anonymous, type(None)) # is true As Anonymous is immutable , it is also thread safe. The mecanism which consist of freezing an object can be applied in other situation. An interesting approach can be to define keywords identity throught a hash of their value, like value object. *Should keyword*: Keyword ShouldEqual(keyword): def __get__(self, left, right, block): if left != right: raise AssertError() ### weird examples: *Multiline Lambdas*: Keyword MultilineLambda(keyword): def __get__(self, left, right, block_content): def multiline_lambda(*args, **kwargs): self.assert_args_valid(right, args, kwargs) # compare "right" (tuple) with args and raise exception in case not valid return self.run(block_content, args, kwargs) # "run" is a shortcut because I don't know which type is the most appropriate for block content return multiline_lambda divide = MultilineLambda a, b: if b == 0: raise MyDivideByZeroError() return a/b *Reimplementing else*: Keyword otherwise(keyword): def __get__(self, left, right, block): if left: return # left is the result of "if" block self.run(block) if something: # do something otherwise: # other Whereas this example is weird, it seems to me that we can better figure out what language do. Thanks all to have kept python open and give us opportunities to freely discuss about its future implementation. Have a nice day, Grégory

I could go through this point by point, but I think I can head it all off by explaining what I think is a fundamental misconception that's misleading you. In Python, variables aren't "memory locations" where values live, they're just names that are bound to values that live somewhere on their own. Rebinding a variable to a different value doesn't mutate anything. You can't make variables refer to other variables, and making two variables refer to the same value doesn't have the same referential transparency, thread safety, etc. issues as making a variable refer to another variable. For example, in the following code: me = Person('Andrew Barnert') andrew = me me = Person('Dr. Sam Beckett') ... nothing has been mutated. In particular, the andrew variable is unchanged; it's still referring to the same Person('Andrew Barnert') object as before, not the new one. If I make the Person type mutable, and call a mutating method on it (or do so implicitly, by setting an attribute or a keyed or indexed member), then of course I can change the value, which will be visible to all variables referring to that value. But assignment does not do that. So, you already have almost everything you want. You can create new immutable types and global singleton values of those types. Or create Enums. Or just create singleton objects whose type doesn't matter, which are immutable and compared by identity, just by calling object(). All of this is trivial. Taking one of your examples: Pi = 3.14159 ... gives you everything you wanted, except for the fact that you can accidentally or maliciously rebind it to a new value. It's already an immutable, thread-safe constant. You can even make a variable name un-rebindable with a module import hook that produces a custom globals, if you really want to, although I can't imagine that ever being worth doing. The only additional thing a keyword gives you is that the compiler can prevent you from rebinding the name to a different value, at compile time rather than run time. That's it. Meanwhile, the disadvantage of allowing new keywords to be defined for immutable constants is huge. You could no longer compile, or even parse, any code without checking each token against the current runtime environment. That makes the parser slower and more complicated, introduces a dependency that makes it hard to keep the components separate, makes pyc files and marshal/pickle and so on useless, makes it much harder for code tools to use the ast module, etc. All for a very tiny benefit. Sent from a random iPhone On Nov 11, 2013, at 3:20, Gregory Salvan <apieum@gmail.com> wrote:

Thank you. You're almost right, but there is misunderstanding. I suggested generating value object by freezing object states and to have object identities defined on their values instead of their memory allocation. so : pi1 = keyword(3.14) pi2 = keyword(3.14) assert pi1 is pi2 # is true and: you = Person('Andrew Barnert') andrew = keyword(you) andrew.name = 'Dr. Sam Beckett' # will raise an error. It is value object not singleton because pi1 and pi2 are not of the same instance. This matter when doing concurrency but this is not the point as you can cheat with "repr" or "hash". I would focus mainly on how to extend language syntax and permit writing code like: 'abc' should_equal 'abc' Value object seems to me necessary to avoid side effects. I miss your point about marshal/pickle and will dig it to understand. I hope in future I can be more relevant. Sorry for the inconvenience, thanks for taking the time to answer, good continuation. 2013/11/11 Andrew Barnert <abarnert@yahoo.com>

On Tue, Nov 12, 2013 at 4:03 PM, Gregory Salvan <apieum@gmail.com> wrote:
I can see some potential in this if there were a way to say "This will never change, don't refcount it or GC-check it"; that might improve performance across a fork (or across threads), but it'd take a lot of language support. Effectively, you would be forfeiting the usual GC memory saving "this isn't needed, get rid of it" and fixing it in memory someplace. The question would be: Is the saving of not writing to that memory (updating refcounts, or marking for a mark/sweep GC, or whatever the equivalent is for each Python implementation) worth the complexity of checking every object to see if it's a frozen one? ChrisA

On Nov 11, 2013, at 21:27, Chris Angelico <rosuav@gmail.com> wrote:
Is there any implementation (like one of the PyPy sub projects) that uses refcounting, with interlocked increments if two interpreter threads are live but plain adds otherwise? In such an implementation, I think the cost of checking a second flag to avoid the interlocked increment would, at least on many platforms (including x86, x86_64, and arm9), be comparatively very cheap, and if used widely could provide big benefits. I believe CPython and standard PyPy just use plain adds under the GIL, and Jython and IronPython leave all the gc up to the underlying VM, so it would probably be a lot harder to get enough benefit there without a lot more effort. Also, as I said in my previous message, I don't think this "permanent value" idea is in any way dependent on any of the other stuff suggested, and in fact would work better without them. (For example, being able to have multiple separate copies of the permanent object that act as if they're identical, and can't be distinguished at the Python level.)

On 12 Nov 2013 20:03, "Andrew Barnert" <abarnert@yahoo.com> wrote:
On Nov 11, 2013, at 21:27, Chris Angelico <rosuav@gmail.com> wrote:
On Tue, Nov 12, 2013 at 4:03 PM, Gregory Salvan <apieum@gmail.com>
wrote: live but plain adds otherwise? In such an implementation, I think the cost of checking a second flag to avoid the interlocked increment would, at least on many platforms (including x86, x86_64, and arm9), be comparatively very cheap, and if used widely could provide big benefits. PyParallel uses some neat tricks to skip almost all memory management in worker threads. Currently Windows only though, and one reader's "neat trick" may be another's "awful hack" :) Cheers, Nick.
I believe CPython and standard PyPy just use plain adds under the GIL,
and Jython and IronPython leave all the gc up to the underlying VM, so it would probably be a lot harder to get enough benefit there without a lot more effort.
Also, as I said in my previous message, I don't think this "permanent
value" idea is in any way dependent on any of the other stuff suggested, and in fact would work better without them. (For example, being able to have multiple separate copies of the permanent object that act as if they're identical, and can't be distinguished at the Python level.)

2013/11/12 Andrew Barnert <abarnert@yahoo.com>:
Is there any implementation (like one of the PyPy sub projects) that uses refcounting, with interlocked increments if two interpreter threads are live but plain adds otherwise? In such an implementation, I think the cost of checking a second flag to avoid the interlocked increment would, at least on many platforms (including x86, x86_64, and arm9), be comparatively very cheap, and if used widely could provide big benefits.
How would you do this in a thread-safe way without atomic operations, or at least memory barriers? cf

On Nov 12, 2013, at 2:37, Charles-François Natali <cf.natali@gmail.com> wrote:
The whole point of a "permanent" flag would be that it's only set at object creation time and never modified, and the object never gets cleaned up. That means you can just check the flag, without an atomic operation, and if it's set you can skip the (slow/cache-killing) atomic increment or decrement.

to have object identities defined on their values instead of their memory allocation
but the whole point of identity is to be their memory allocation! There's already equality if you want to compare on values. I'm not sure what you really want, and I suspect you're also somewhat uncertain. Do you want multiline lambdas, by-name variables, custom blocks, interned objects, infix operators? Other things? It's a lot of distinct feature requests to ask for and it would be good to get them cleared up in everyone's minds. If you want interning for arbitrary expressions, MacroPy lets you do that already <https://github.com/lihaoyi/macropy#interned> in your own code. It interns on a per-declaration basis rather than on a per-value basis, because the task of evaluating an arbitrary expression at macro expansion time is icky. You can pull some other neat tricks with it (e.g. classes whose equality is by default defined by value<https://github.com/lihaoyi/macropy#case-classes>), but are limited to Python's grammar and parser, so no infix-method-operators and such, but you can trigger macro expansion easily with should_equal['abc', ''abc'] and do whatever "compile"-time substitution you want. On Tue, Nov 12, 2013 at 9:56 AM, Andrew Barnert <abarnert@yahoo.com> wrote:

I've open a gdoc to share our visions and be able to collaborate on this subject. It is open, everybody can contribute, I'll be happy if it permits us to find a good solution. https://docs.google.com/document/d/15IPMNzUnK9nd_j7B6wdo7gAn2US52qa8MjGcfQea... 2013/11/12 Haoyi Li <haoyi.sg@gmail.com>

2013/11/12 Andrew Barnert <abarnert@yahoo.com>:
The whole point of a "permanent" flag would be that it's only set at object creation time and never modified, and the object never gets cleaned up.
Of course. I probably wasn't clear, but I was actually referring to this part:
Is there any implementation (like one of the PyPy sub projects) that uses refcounting, with interlocked increments if two interpreter threads are live but plain adds otherwise?
I'm not sure how one would do this without using locking/memory barriers (hence a large performance overhead). Regarding this question, as noted by Nick, you might want to have a look at pyparallel: apparently, it uses a form of region-based memory allocation, per-thread (using mprotect to catch references to main-thread owned objects). But last time I checked with Trent, reference counting/garbage collection was completely disabled in worker theads, which means that if you allocate and free many objects, you'll run out of memory (having an infinite amount of memory certainly makes garbage collection easier :-) ). Cheers, cf

On Nov 12, 2013, at 10:25, Charles-François Natali <cf.natali@gmail.com> wrote:
A global "multithreading" flag. When you start a thread, it sets the flag in the parent thread, before starting the new thread. Both are guaranteed to see the True value. If there are any other threads, the value was already True. You probably don't ever need it to go back to False--not many programs are multithreaded for a while and then single-threaded again. But if you need this, note that it's never a problem to see a spurious True, it just means you do an atomic read when you didn't need to. But also, you only need to check whether you're the last thread while returning from join, and there's no way anyone could have created another thread between the OS-level join and the end of Thread.join.

2013/11/12 Andrew Barnert <abarnert@yahoo.com>:
That would probably work. The only remaining question is whether it'll actually yield a performance gain: as soon as you have more than one thread in the interpreter (even if it's idle), you'll end up doing atomic incref/decref all over the place, which will just kill performance (and will likely degrade with the number of threads because of increased contention to e.g. lock the memory bus). Atomic refcount just doesn't scale... cf

On Nov 12, 2013, at 14:42, Charles-François Natali <cf.natali@gmail.com> wrote:
The point is that if you're _already_ doing atomic refcounts, or the equivalent (like CPython, which does refcounts under the GIL so they're implicitly atomic, which is even less parallel), you can avoid many of those refcounts.

On Wed, Nov 13, 2013 at 9:42 AM, Charles-François Natali <cf.natali@gmail.com> wrote:
There was a proposal put together in Pike involving multiple arenas, some of which were thread-local and some global. If you have one arena for each thread, which handles refcounted objects without thread-safety, and another pool of frozen objects that don't need to be refcounted at all (at the cost of not ever removing them from memory), then the only objects that need atomic incref/decref are the ones that are actually (potentially) shared. There just needs to be some mechanism for moving an object from the thread-local arena to the shared one, which would be a relatively uncommon operation. ChrisA

Sorry Chris Angelico and Andrew Barnert, I have not enough knowledge of python implementations to correctly answer your questions. I just wanted to share the idea in order to see if It was interesting before I dig in that way. I thought token can act like a macro, replacing at compile time things like: 'abc' should_equal 'abc' by should_equal.__get__('abc', 'abc', None) Then I saw assert_equal as a "keyword" like "if" and it seemed obvious to have "assert if is if" raising assertion error. Finally authorising should_equal.__get__ to behave differently due to its state seemed dangerous. As I need immutable objects with an identity given by their values, I deduce it was value objects. This might be confusing. Thanks for your enlightments. 2013/11/12 Andrew Barnert <abarnert@yahoo.com>

On Tue, Nov 12, 2013 at 9:00 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
The only Python I actually work with is CPython, so I can't say what does or doesn't exist. But plausibly, it's possible to have a one-bit flag that says "this is permanent, don't GC it" and then it won't get its refcount updated (in CPython), or won't get marked (in a mark/sweep GC), or whatever. Then, if you have an entire page of frozen objects, and you fork() using the standard Linux (and other) semantics of copy-on-write, you would never need to write to that page. The reason this might require that the objects be immutable is this: When an object is marked as frozen, everything it references can also automatically be marked frozen. (By definition, they'll always be in use too.) That would only work, though, if the set of objects thus referenced doesn't change. PI = complex(3.14159) sys.freeze(PI) # should freeze the float value 3.14159 PI.real = 3.141592653589793 # awkward I can imagine that this might potentially offer some *huge* benefits in a system that does a lot of forking (maybe a web server?), but all I have to go on is utter and total speculation. And the cost of checking "Is this frozen? No, update its refcount" everywhere means that there's a penalty even if fork() is never used. So actually, this might work out best as a "special-build" Python, and maybe only as a toy/experiment. ChrisA

On Nov 12, 2013, at 6:52, Chris Angelico <rosuav@gmail.com> wrote:
Yes, that's a reasonable extension to the permanent-marking idea. But I'm not sure immutability is as necessary as you think. For the fork issue, assuming PI is a C object or a slots object, it couldn't be on the never-written page if you went this way, but it would still benefit from not being refcounted. If you had a permanent complex(3.14159) value, but 3.14159 itself weren't permanent, it would just be a normal float with one refcount that rarely gets copied anywhere. If it does get copied, it gets (atomically) incref'd like normal, but so what? If you, as the app developer, know that for whatever reason this will be much more common in your app than in the usual case, you can always create the permanent float first, then create the permanent complex out of that value. If you do that, and then later mutate PI to point to a different float, the new float can be permanent or normal and it all works. If you're forking, then you'd want PI to be immutable (again assuming its a C or slots object); if you're threading and just want the refcount skipping, you'd want a way to get just that.

On Nov 11, 2013, at 21:03, Gregory Salvan <apieum@gmail.com> wrote:
So effectively you want to add interning for arbitrary types. So... why? You won't get any direct performance benefits from being able to use is instead of == here--comparing two floats is as fast as comparing two pointers on most platforms. And why would you want users to compare with is instead of ==? It means everyone who uses pi has to know that it's a keyword, even though he gets no real benefit from doing so. There might be some indirect benefits in that we could skip refcounting on objects that are guaranteed to live forever. But you could get the same benefits with a less drastic change--basically, just a way to declare an object as "permanent", without changing its semantics in any other way. That sub-idea might be worth exploring.
That's already easy today. For example, you can declare name as a @property with no setter, or inherit from namedtuple.
It is value object not singleton because pi1 and pi2 are not of the same instance.
Why not? In fact, that seems to lose most of the benefits of interning. Effectively you've just created normal objects that can override the is operator, which I think is a bad idea.
I'm not sure how this fits in with the rest of your idea at all. That requires changing the parser at runtime, which has nothing to do with objects being permanent or their is operators being overridable to mean equality or anything else. Meanwhile, you should take a look at how parsing and compiling works in Python (the dev guide has a great section on it). Modules are parsed and compiled, and then the bytecode is run at import time. If that code can change the grammar, how can we use compiled bytecode at all? Without an implementation that compiles and interprets statement by statement on the fly, it seems like this would be very difficult. You may want to look at MacroPy, which allows making (less dramatic) changes to the language syntax via import hooks.
Value object seems to me necessary to avoid side effects.
I don't think I understand why, if by "value object" you mean "object with a custom is operator". If you just mean "immutable object", that sounds more reasonable (although I'm still not sure i get it), but again, we already have those. A float, for example, is already immutable today.

I could go through this point by point, but I think I can head it all off by explaining what I think is a fundamental misconception that's misleading you. In Python, variables aren't "memory locations" where values live, they're just names that are bound to values that live somewhere on their own. Rebinding a variable to a different value doesn't mutate anything. You can't make variables refer to other variables, and making two variables refer to the same value doesn't have the same referential transparency, thread safety, etc. issues as making a variable refer to another variable. For example, in the following code: me = Person('Andrew Barnert') andrew = me me = Person('Dr. Sam Beckett') ... nothing has been mutated. In particular, the andrew variable is unchanged; it's still referring to the same Person('Andrew Barnert') object as before, not the new one. If I make the Person type mutable, and call a mutating method on it (or do so implicitly, by setting an attribute or a keyed or indexed member), then of course I can change the value, which will be visible to all variables referring to that value. But assignment does not do that. So, you already have almost everything you want. You can create new immutable types and global singleton values of those types. Or create Enums. Or just create singleton objects whose type doesn't matter, which are immutable and compared by identity, just by calling object(). All of this is trivial. Taking one of your examples: Pi = 3.14159 ... gives you everything you wanted, except for the fact that you can accidentally or maliciously rebind it to a new value. It's already an immutable, thread-safe constant. You can even make a variable name un-rebindable with a module import hook that produces a custom globals, if you really want to, although I can't imagine that ever being worth doing. The only additional thing a keyword gives you is that the compiler can prevent you from rebinding the name to a different value, at compile time rather than run time. That's it. Meanwhile, the disadvantage of allowing new keywords to be defined for immutable constants is huge. You could no longer compile, or even parse, any code without checking each token against the current runtime environment. That makes the parser slower and more complicated, introduces a dependency that makes it hard to keep the components separate, makes pyc files and marshal/pickle and so on useless, makes it much harder for code tools to use the ast module, etc. All for a very tiny benefit. Sent from a random iPhone On Nov 11, 2013, at 3:20, Gregory Salvan <apieum@gmail.com> wrote:

Thank you. You're almost right, but there is misunderstanding. I suggested generating value object by freezing object states and to have object identities defined on their values instead of their memory allocation. so : pi1 = keyword(3.14) pi2 = keyword(3.14) assert pi1 is pi2 # is true and: you = Person('Andrew Barnert') andrew = keyword(you) andrew.name = 'Dr. Sam Beckett' # will raise an error. It is value object not singleton because pi1 and pi2 are not of the same instance. This matter when doing concurrency but this is not the point as you can cheat with "repr" or "hash". I would focus mainly on how to extend language syntax and permit writing code like: 'abc' should_equal 'abc' Value object seems to me necessary to avoid side effects. I miss your point about marshal/pickle and will dig it to understand. I hope in future I can be more relevant. Sorry for the inconvenience, thanks for taking the time to answer, good continuation. 2013/11/11 Andrew Barnert <abarnert@yahoo.com>

On Tue, Nov 12, 2013 at 4:03 PM, Gregory Salvan <apieum@gmail.com> wrote:
I can see some potential in this if there were a way to say "This will never change, don't refcount it or GC-check it"; that might improve performance across a fork (or across threads), but it'd take a lot of language support. Effectively, you would be forfeiting the usual GC memory saving "this isn't needed, get rid of it" and fixing it in memory someplace. The question would be: Is the saving of not writing to that memory (updating refcounts, or marking for a mark/sweep GC, or whatever the equivalent is for each Python implementation) worth the complexity of checking every object to see if it's a frozen one? ChrisA

On Nov 11, 2013, at 21:27, Chris Angelico <rosuav@gmail.com> wrote:
Is there any implementation (like one of the PyPy sub projects) that uses refcounting, with interlocked increments if two interpreter threads are live but plain adds otherwise? In such an implementation, I think the cost of checking a second flag to avoid the interlocked increment would, at least on many platforms (including x86, x86_64, and arm9), be comparatively very cheap, and if used widely could provide big benefits. I believe CPython and standard PyPy just use plain adds under the GIL, and Jython and IronPython leave all the gc up to the underlying VM, so it would probably be a lot harder to get enough benefit there without a lot more effort. Also, as I said in my previous message, I don't think this "permanent value" idea is in any way dependent on any of the other stuff suggested, and in fact would work better without them. (For example, being able to have multiple separate copies of the permanent object that act as if they're identical, and can't be distinguished at the Python level.)

On 12 Nov 2013 20:03, "Andrew Barnert" <abarnert@yahoo.com> wrote:
On Nov 11, 2013, at 21:27, Chris Angelico <rosuav@gmail.com> wrote:
On Tue, Nov 12, 2013 at 4:03 PM, Gregory Salvan <apieum@gmail.com>
wrote: live but plain adds otherwise? In such an implementation, I think the cost of checking a second flag to avoid the interlocked increment would, at least on many platforms (including x86, x86_64, and arm9), be comparatively very cheap, and if used widely could provide big benefits. PyParallel uses some neat tricks to skip almost all memory management in worker threads. Currently Windows only though, and one reader's "neat trick" may be another's "awful hack" :) Cheers, Nick.
I believe CPython and standard PyPy just use plain adds under the GIL,
and Jython and IronPython leave all the gc up to the underlying VM, so it would probably be a lot harder to get enough benefit there without a lot more effort.
Also, as I said in my previous message, I don't think this "permanent
value" idea is in any way dependent on any of the other stuff suggested, and in fact would work better without them. (For example, being able to have multiple separate copies of the permanent object that act as if they're identical, and can't be distinguished at the Python level.)

2013/11/12 Andrew Barnert <abarnert@yahoo.com>:
Is there any implementation (like one of the PyPy sub projects) that uses refcounting, with interlocked increments if two interpreter threads are live but plain adds otherwise? In such an implementation, I think the cost of checking a second flag to avoid the interlocked increment would, at least on many platforms (including x86, x86_64, and arm9), be comparatively very cheap, and if used widely could provide big benefits.
How would you do this in a thread-safe way without atomic operations, or at least memory barriers? cf

On Nov 12, 2013, at 2:37, Charles-François Natali <cf.natali@gmail.com> wrote:
The whole point of a "permanent" flag would be that it's only set at object creation time and never modified, and the object never gets cleaned up. That means you can just check the flag, without an atomic operation, and if it's set you can skip the (slow/cache-killing) atomic increment or decrement.

to have object identities defined on their values instead of their memory allocation
but the whole point of identity is to be their memory allocation! There's already equality if you want to compare on values. I'm not sure what you really want, and I suspect you're also somewhat uncertain. Do you want multiline lambdas, by-name variables, custom blocks, interned objects, infix operators? Other things? It's a lot of distinct feature requests to ask for and it would be good to get them cleared up in everyone's minds. If you want interning for arbitrary expressions, MacroPy lets you do that already <https://github.com/lihaoyi/macropy#interned> in your own code. It interns on a per-declaration basis rather than on a per-value basis, because the task of evaluating an arbitrary expression at macro expansion time is icky. You can pull some other neat tricks with it (e.g. classes whose equality is by default defined by value<https://github.com/lihaoyi/macropy#case-classes>), but are limited to Python's grammar and parser, so no infix-method-operators and such, but you can trigger macro expansion easily with should_equal['abc', ''abc'] and do whatever "compile"-time substitution you want. On Tue, Nov 12, 2013 at 9:56 AM, Andrew Barnert <abarnert@yahoo.com> wrote:

I've open a gdoc to share our visions and be able to collaborate on this subject. It is open, everybody can contribute, I'll be happy if it permits us to find a good solution. https://docs.google.com/document/d/15IPMNzUnK9nd_j7B6wdo7gAn2US52qa8MjGcfQea... 2013/11/12 Haoyi Li <haoyi.sg@gmail.com>

2013/11/12 Andrew Barnert <abarnert@yahoo.com>:
The whole point of a "permanent" flag would be that it's only set at object creation time and never modified, and the object never gets cleaned up.
Of course. I probably wasn't clear, but I was actually referring to this part:
Is there any implementation (like one of the PyPy sub projects) that uses refcounting, with interlocked increments if two interpreter threads are live but plain adds otherwise?
I'm not sure how one would do this without using locking/memory barriers (hence a large performance overhead). Regarding this question, as noted by Nick, you might want to have a look at pyparallel: apparently, it uses a form of region-based memory allocation, per-thread (using mprotect to catch references to main-thread owned objects). But last time I checked with Trent, reference counting/garbage collection was completely disabled in worker theads, which means that if you allocate and free many objects, you'll run out of memory (having an infinite amount of memory certainly makes garbage collection easier :-) ). Cheers, cf

On Nov 12, 2013, at 10:25, Charles-François Natali <cf.natali@gmail.com> wrote:
A global "multithreading" flag. When you start a thread, it sets the flag in the parent thread, before starting the new thread. Both are guaranteed to see the True value. If there are any other threads, the value was already True. You probably don't ever need it to go back to False--not many programs are multithreaded for a while and then single-threaded again. But if you need this, note that it's never a problem to see a spurious True, it just means you do an atomic read when you didn't need to. But also, you only need to check whether you're the last thread while returning from join, and there's no way anyone could have created another thread between the OS-level join and the end of Thread.join.

2013/11/12 Andrew Barnert <abarnert@yahoo.com>:
That would probably work. The only remaining question is whether it'll actually yield a performance gain: as soon as you have more than one thread in the interpreter (even if it's idle), you'll end up doing atomic incref/decref all over the place, which will just kill performance (and will likely degrade with the number of threads because of increased contention to e.g. lock the memory bus). Atomic refcount just doesn't scale... cf

On Nov 12, 2013, at 14:42, Charles-François Natali <cf.natali@gmail.com> wrote:
The point is that if you're _already_ doing atomic refcounts, or the equivalent (like CPython, which does refcounts under the GIL so they're implicitly atomic, which is even less parallel), you can avoid many of those refcounts.

On Wed, Nov 13, 2013 at 9:42 AM, Charles-François Natali <cf.natali@gmail.com> wrote:
There was a proposal put together in Pike involving multiple arenas, some of which were thread-local and some global. If you have one arena for each thread, which handles refcounted objects without thread-safety, and another pool of frozen objects that don't need to be refcounted at all (at the cost of not ever removing them from memory), then the only objects that need atomic incref/decref are the ones that are actually (potentially) shared. There just needs to be some mechanism for moving an object from the thread-local arena to the shared one, which would be a relatively uncommon operation. ChrisA

Sorry Chris Angelico and Andrew Barnert, I have not enough knowledge of python implementations to correctly answer your questions. I just wanted to share the idea in order to see if It was interesting before I dig in that way. I thought token can act like a macro, replacing at compile time things like: 'abc' should_equal 'abc' by should_equal.__get__('abc', 'abc', None) Then I saw assert_equal as a "keyword" like "if" and it seemed obvious to have "assert if is if" raising assertion error. Finally authorising should_equal.__get__ to behave differently due to its state seemed dangerous. As I need immutable objects with an identity given by their values, I deduce it was value objects. This might be confusing. Thanks for your enlightments. 2013/11/12 Andrew Barnert <abarnert@yahoo.com>

On Tue, Nov 12, 2013 at 9:00 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
The only Python I actually work with is CPython, so I can't say what does or doesn't exist. But plausibly, it's possible to have a one-bit flag that says "this is permanent, don't GC it" and then it won't get its refcount updated (in CPython), or won't get marked (in a mark/sweep GC), or whatever. Then, if you have an entire page of frozen objects, and you fork() using the standard Linux (and other) semantics of copy-on-write, you would never need to write to that page. The reason this might require that the objects be immutable is this: When an object is marked as frozen, everything it references can also automatically be marked frozen. (By definition, they'll always be in use too.) That would only work, though, if the set of objects thus referenced doesn't change. PI = complex(3.14159) sys.freeze(PI) # should freeze the float value 3.14159 PI.real = 3.141592653589793 # awkward I can imagine that this might potentially offer some *huge* benefits in a system that does a lot of forking (maybe a web server?), but all I have to go on is utter and total speculation. And the cost of checking "Is this frozen? No, update its refcount" everywhere means that there's a penalty even if fork() is never used. So actually, this might work out best as a "special-build" Python, and maybe only as a toy/experiment. ChrisA

On Nov 12, 2013, at 6:52, Chris Angelico <rosuav@gmail.com> wrote:
Yes, that's a reasonable extension to the permanent-marking idea. But I'm not sure immutability is as necessary as you think. For the fork issue, assuming PI is a C object or a slots object, it couldn't be on the never-written page if you went this way, but it would still benefit from not being refcounted. If you had a permanent complex(3.14159) value, but 3.14159 itself weren't permanent, it would just be a normal float with one refcount that rarely gets copied anywhere. If it does get copied, it gets (atomically) incref'd like normal, but so what? If you, as the app developer, know that for whatever reason this will be much more common in your app than in the usual case, you can always create the permanent float first, then create the permanent complex out of that value. If you do that, and then later mutate PI to point to a different float, the new float can be permanent or normal and it all works. If you're forking, then you'd want PI to be immutable (again assuming its a C or slots object); if you're threading and just want the refcount skipping, you'd want a way to get just that.

On Nov 11, 2013, at 21:03, Gregory Salvan <apieum@gmail.com> wrote:
So effectively you want to add interning for arbitrary types. So... why? You won't get any direct performance benefits from being able to use is instead of == here--comparing two floats is as fast as comparing two pointers on most platforms. And why would you want users to compare with is instead of ==? It means everyone who uses pi has to know that it's a keyword, even though he gets no real benefit from doing so. There might be some indirect benefits in that we could skip refcounting on objects that are guaranteed to live forever. But you could get the same benefits with a less drastic change--basically, just a way to declare an object as "permanent", without changing its semantics in any other way. That sub-idea might be worth exploring.
That's already easy today. For example, you can declare name as a @property with no setter, or inherit from namedtuple.
It is value object not singleton because pi1 and pi2 are not of the same instance.
Why not? In fact, that seems to lose most of the benefits of interning. Effectively you've just created normal objects that can override the is operator, which I think is a bad idea.
I'm not sure how this fits in with the rest of your idea at all. That requires changing the parser at runtime, which has nothing to do with objects being permanent or their is operators being overridable to mean equality or anything else. Meanwhile, you should take a look at how parsing and compiling works in Python (the dev guide has a great section on it). Modules are parsed and compiled, and then the bytecode is run at import time. If that code can change the grammar, how can we use compiled bytecode at all? Without an implementation that compiles and interprets statement by statement on the fly, it seems like this would be very difficult. You may want to look at MacroPy, which allows making (less dramatic) changes to the language syntax via import hooks.
Value object seems to me necessary to avoid side effects.
I don't think I understand why, if by "value object" you mean "object with a custom is operator". If you just mean "immutable object", that sounds more reasonable (although I'm still not sure i get it), but again, we already have those. A float, for example, is already immutable today.
participants (6)
-
Andrew Barnert
-
Charles-François Natali
-
Chris Angelico
-
Gregory Salvan
-
Haoyi Li
-
Nick Coghlan