Here is a bit of an idea that I first came up with some years ago. Guido's response at the time was "sounds reasonable as long as we dont slow the normal case down". To cut a long story short, I would like eval and exec to be capable of working with arbitrary mapping objects rather than only dictionaries. The general idea is that I can provide a class with mapping semantics, and pass this to exec/eval. This would give us 2 seriously cool features (that I want <wink>), should anyone decide to write code that enables them: * Case insensitive namespaces. This would be very cool for COM, and as far as I know would please the Alice people. May open up more embedding opportunities that are lost if people feel strongly about this issue. * Dynamic name lookups. At the moment, dynamic attribute lookups are simple, but dynamic name lookups are hard. If I execute code "print foo", foo _must_ pre-exist in the namespace. There is no reasonable way I can some up with so that I can fetch "foo" as it is requested (raising the NameError if necessary). This would also be very cool for some of the COM work - particularly Active Scripting. Of course, these would not be enabled by standard Python, but would allow people to create flexible execution environments that suit their purpose. Any comments on this? Is it a dumb idea? Anyone have a feel for how deep these changes would cut? Its not something I want to happen soon, but it does seem a reasonable mechanism that can provide very flexible solutions... Mark.
To cut a long story short, I would like eval and exec to be capable of working with arbitrary mapping objects rather than only dictionaries. The general idea is that I can provide a class with mapping semantics, and pass this to exec/eval.
I agree that this would be seriously cool. It will definitely be in Python 2.0; it's already in JPython. Quite a while ago, Ian Castleden sent me patches for 1.5.1 to do this. It was a lot of code and I was indeed worried about slowing things down too much (and also about checking that he got all the endcases right). Ian did benchmarks and found that the new code was consistently slower, by 3-6%. Perhaps for Python 1.6 this will be acceptable (the "Python IS slow" thread notwithstanding :-); or perhaps someone can have a look at it and squeeze some more time out of it? I'll gladly forward the patches. --Guido van Rossum (home page: http://www.python.org/~guido/)
To cut a long story short, I would like eval and exec to be capable of working with arbitrary mapping objects rather than only dictionaries. The general idea is that I can provide a class with mapping semantics, and pass this to exec/eval.
I agree that this would be seriously cool. It will definitely be in Python 2.0; it's already in JPython.
Seriously cool is an understatement. --david
Ian did benchmarks and found that the new code was consistently slower, by 3-6%. Perhaps for Python 1.6 this will be acceptable (the "Python IS slow" thread notwithstanding :-); or perhaps someone can
As long as we get someone working on a consistent 3-6% speedup in other areas this wont be a problem :-) I will attach it to my long list - but Im going to put all my efforts here into Unicode first, and probably the threading stuff second :-) Mark.
Guido van Rossum wrote: ...
Ian did benchmarks and found that the new code was consistently slower, by 3-6%. Perhaps for Python 1.6 this will be acceptable (the "Python IS slow" thread notwithstanding :-); or perhaps someone can have a look at it and squeeze some more time out of it? I'll gladly forward the patches.
I'd really like to look into that. Also I wouldn't worry too much about speed, since this is such a cool feature. It might even be a speedup in some cases which otherwise would need more complex handling. May I have a look? ciao - chris -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home
From: Christian Tismer <tismer@appliedbiometrics.com>
I'd really like to look into that. Also I wouldn't worry too much about speed, since this is such a cool feature. It might even be a speedup in some cases which otherwise would need more complex handling.
May I have a look?
Sure! (I've forwarded Christian the files per separate mail.) I'm also interested in your opinion on how well thought-out and robust the patches are -- I've never found the time to do a good close reading of them. --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
From: Christian Tismer <tismer@appliedbiometrics.com> May I have a look?
Sure!
(I've forwarded Christian the files per separate mail.)
Thanks a lot!
I'm also interested in your opinion on how well thought-out and robust the patches are -- I've never found the time to do a good close reading of them.
Yes, it is quite long and not so very easy to digest. At first glance, most of the changes are replacements of dict access with mapping access where we pay for the extra indirection. It depends on how often this feature will really replace dicts. If dicts are more common, I'd like to figure out how much code bloat extra special casing for dicts would give. Puha - thanks - chris -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home
Guido wrote:
From: Christian Tismer <tismer@appliedbiometrics.com>
I'd really like to look into that. Also I wouldn't worry too much about speed, since this is ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ such a cool feature. It might even be a speedup in some cases which otherwise would need more complex handling.
May I have a look?
Sure!
(I've forwarded Christian the files per separate mail.)
I don't know who you sent that to, but it couldn't possibly have been Christian! - Gordon
Gordon McMillan wrote:
Guido wrote:
From: Christian Tismer <tismer@appliedbiometrics.com>
I'd really like to look into that. Also I wouldn't worry too much about speed, since this is ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ such a cool feature. It might even be a speedup in some cases which otherwise would need more complex handling.
May I have a look?
Sure!
(I've forwarded Christian the files per separate mail.)
I don't know who you sent that to, but it couldn't possibly have been Christian!
:-) :-) truely not if I had'n other things in mind. I know that I will not get more than 30-40 percent by compiling the interpreter away, so I will not even spend time at a register machine right now. Will also not follow the ideas of P2C any longer, doesn't pay off. Instead, If I can manage to create something like static binding snapshots, then I could resolve many of the lookups and internal method indirections, for time critical applications. For all the rest, Python is pretty much fast enough. I've begun to write s special platform dependant version which allows me to do all the things which can't go into the dist. For instance, I just saved some 10 percent by dynamically patching some of the ParseTuple and BuildObject calls out of the core code (ahem). I hope Python will stay as clean as it is, allowing me to try all kinds of tricks for one machine class. ciao - chris.speedy -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home
Christian Tismer writes:
Instead, If I can manage to create something like static binding snapshots, then I could resolve many of the lookups and internal method indirections, for time critical applications.
The usual assumption is that the lookups are what takes time. Now, are we sure of that assumption? I'd expect the lookup code to be like: 1) Get hash of name 2) Retrieve object from dictionary 3) Do something with the object. Now, since string objects cache their hash value, 1) should usually be just "if (obj->cached_hash_value!=-1) return obj->cached_hash_value"; a comparision and a return. Step 2) should be very fast, barring a bad hash function. So perhaps most of the time is spent in 3), creating new local dicts, stack frames, and what not. (Yes, I know that doing this on every reference to an object is part of the problem.) I also wonder about the costs of all the Py_BuildValue and Py_ParseTuple calls going on under the hood. A performance improvement project would definitely be a good idea for 1.6, and a good sub-topic for python-dev. Incidentally, what's the verdict on python-dev archives: public or not? -- A.M. Kuchling http://starship.python.net/crew/amk/ Despair says little, and is patient. -- From SANDMAN: "Season of Mists", episode 0
"Andrew M. Kuchling" wrote:
Christian Tismer writes:
Instead, If I can manage to create something like static binding snapshots, then I could resolve many of the lookups and internal method indirections, for time critical applications.
The usual assumption is that the lookups are what takes time. Now, are we sure of that assumption? I'd expect the lookup code to be like:
1) Get hash of name 2) Retrieve object from dictionary 3) Do something with the object.
... Right, but when you become more restrictive, you can do lots, lots more. If I also allow to fix the type of an object, I can go under the hood of the Python API, so you could add some points 4) 5) 6) here. But I shouldn't have talked before I can show something. And at the time, I don't restrict myself to write clean code, but try very machine specific things which I don't believe should go into Python at all.
I also wonder about the costs of all the Py_BuildValue and Py_ParseTuple calls going on under the hood. A performance improvement project would definitely be a good idea for 1.6, and a good sub-topic for python-dev.
I wasn't sure, so I first wrote a module which does statistics on that, and I found a lot of these calls. Some are even involved in ceval's code, where things like "(OOOO)" is parsed all over again and again. Now I think this is ok in most cases. But in some places these are very bad: If your builtin function soes so very few work. Take len(): This uses ParseTuple, which calls a function to find out its object, then calls a function to get the object, then takes the len, and builds the result. But since everything has a length, checking for no NULL pointer and grabbing the len produces even shorter code. This applies to many places as well. But right now, I don't change the core, but build hacky little extension modules which squeak such things out at runtime. No big deal yet, but maybe it could pay off to introduce format strings which make them more suitableto be used in macro substitutions. sorry if I was too elaborate, this is so much fun :-) ciao - chris -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home
[Andrew M. Kuchling]
... A performance improvement project would definitely be a good idea for 1.6, and a good sub-topic for python-dev.
To the extent that optimization requires uglification, optimization got pushed beyond Guido's comfort zone back around 1.4 -- little has made it in since then. Not griping; I'm just trying to avoid enduring the same discussions for the third to twelfth times <wink>. Anywho, on the theory that a sweeping speedup patch has no chance of making it in regardless, how about focusing on one subsystem? In my experience, the speed issue Python gets beat up the most for is the relative slowness of function calls. It would be very good if eval_code2 somehow or other could manage to invoke a Python function without all the hair of a recursive C call, and I believe Guido intends to move in that direction for Python2 anyway. This would be a good time to start exploring that seriously. inspirationally y'rs - tim
Guido van Rossum wrote:
From: Christian Tismer <tismer@appliedbiometrics.com>
I'd really like to look into that. Also I wouldn't worry too much about speed, since this is such a cool feature. It might even be a speedup in some cases which otherwise would need more complex handling.
May I have a look?
Sure!
(I've forwarded Christian the files per separate mail.)
I'm also interested in your opinion on how well thought-out and robust the patches are -- I've never found the time to do a good close reading of them.
Coming back from the stackless task with is finished now, I popped this task from my stack. I had a look and it seems well-thought and robust so far. To make a more trustable claim, I would need to build and test it. Is this still of interest, or should I drop it? The follow-ups in this thread indicated that the opinions about flexible namespaces were quite mixed. So, should I waste time in building and testing or better save it? chris -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home
Mark Hammond wrote:
Here is a bit of an idea that I first came up with some years ago. Guido's response at the time was "sounds reasonable as long as we dont slow the normal case down".
To cut a long story short, I would like eval and exec to be capable of working with arbitrary mapping objects rather than only dictionaries. The general idea is that I can provide a class with mapping semantics, and pass this to exec/eval.
This involves a whole lot of changes: not only in the Python core, but also in extensions that rely on having real dictionaries available. Since you put out to objectives, I'd like to propose a little different approach... 1. Have eval/exec accept any mapping object as input 2. Make those two copy the content of the mapping object into real dictionaries 3. Provide a hook into the dictionary implementation that can be used to redirect KeyErrors and use that redirection to forward the request to the original mapping objects
This would give us 2 seriously cool features (that I want <wink>), should anyone decide to write code that enables them:
* Case insensitive namespaces. This would be very cool for COM, and as far as I know would please the Alice people. May open up more embedding opportunities that are lost if people feel strongly about this issue.
This is covered by 1 and 2.
* Dynamic name lookups. At the moment, dynamic attribute lookups are simple, but dynamic name lookups are hard. If I execute code "print foo", foo _must_ pre-exist in the namespace. There is no reasonable way I can some up with so that I can fetch "foo" as it is requested (raising the NameError if necessary). This would also be very cool for some of the COM work - particularly Active Scripting.
This is something for 3. I guess it wouldn't cause any significant slow-down and can be imlemented with much less code than the "change all PyDict_GetItem to PyObject_GetItem" thingie. The real thing could then be done for 2.0 where PyDict_Check() would presumably not rely on an adress but some kind of inheritance scheme indicating that the object is in fact a dictionary. Cheers, -- Marc-Andre Lemburg Y2000: 245 days left --------------------------------------------------------------------- : Python Pages >>> http://starship.skyport.net/~lemburg/ : ---------------------------------------------------------
Since you put out to objectives, I'd like to propose a little different approach...
1. Have eval/exec accept any mapping object as input
2. Make those two copy the content of the mapping object into real dictionaries
3. Provide a hook into the dictionary implementation that can be used to redirect KeyErrors and use that redirection to forward the request to the original mapping objects
Interesting counterproposal. I'm not sure whether any of the proposals on the table really do what's needed for e.g. case-insensitive namespace handling. I can see how all of the proposals so far allow case-insensitive reference name handling in the global namespace, but don't we also need to hook into the local-namespace creation process to allow case-insensitivity to work throughout? --david
I'm not sure whether any of the proposals on the table really do what's needed for e.g. case-insensitive namespace handling. I can see how all of the proposals so far allow case-insensitive reference name handling in the global namespace, but don't we also need to hook into the local-namespace creation process to allow case-insensitivity to work throughout?
Why not? I pictured case insensitive namespaces working so that they retain the case of the first assignment, but all lookups would be case-insensitive. Ohh - right! Python itself would need changing to support this. I suppose that faced with code such as: def func(): if spam: Spam=1 Python would generate code that refers to "spam" as a local, and "Spam" as a global. Is this why you feel it wont work? Mark.
On Sun, 2 May 1999, Mark Hammond wrote:
I'm not sure whether any of the proposals on the table really do what's needed for e.g. case-insensitive namespace handling. I can see how all of the proposals so far allow case-insensitive reference name handling in the global namespace, but don't we also need to hook into the local-namespace creation process to allow case-insensitivity to work throughout?
Why not? I pictured case insensitive namespaces working so that they retain the case of the first assignment, but all lookups would be case-insensitive.
Ohh - right! Python itself would need changing to support this. I suppose that faced with code such as:
def func(): if spam: Spam=1
Python would generate code that refers to "spam" as a local, and "Spam" as a global.
Is this why you feel it wont work?
I hadn't thought of that, to be truthful, but I think it's more generic. [FWIW, I never much cared for the tag-variables-at-compile-time optimization in CPython, and wouldn't miss it if were lost.] The point is that if I eval or exec code which calls a function specifying some strange mapping as the namespaces (global and current-local) I presumably want to also specify how local namespaces work for the function calls within that code snippet. That means that somehow Python has to know what kind of namespace to use for local environments, and not use the standard dictionary. Maybe we can simply have it use a '.clear()'ed .__copy__ of the specified environment. exec 'foo()' in globals(), mylocals would then call foo and within foo, the local env't would be mylocals.__copy__.clear(). Anyway, something for those-with-the-patches to keep in mind. --david
David Ascher wrote: [Marc:>
Since you put out to objectives, I'd like to propose a little different approach...
1. Have eval/exec accept any mapping object as input
2. Make those two copy the content of the mapping object into real dictionaries
3. Provide a hook into the dictionary implementation that can be used to redirect KeyErrors and use that redirection to forward the request to the original mapping objects
I don't think that this proposal would give so much new value. Since a mapping can also be implemented in arbitrary ways, say by functions, a mapping is not necessarily finite and might not be changeable into a dict. [David:>
Interesting counterproposal. I'm not sure whether any of the proposals on the table really do what's needed for e.g. case-insensitive namespace handling. I can see how all of the proposals so far allow case-insensitive reference name handling in the global namespace, but don't we also need to hook into the local-namespace creation process to allow case-insensitivity to work throughout?
Case-independant namespaces seem to be a minor point, nice to have for interfacing to other products, but then, in a function, I see no benefit in changing the semantics of function locals? The lookup of foreign symbols would always be through a mapping object. If you take COM for instance, your access to a COM wrapper for an arbitrary object would be through properties of this object. After assignment to a local function variable, why should we support case-insensitivity at all? I would think mapping objects would be a great simplification of lazy imports in COM, where we would like to avoid to import really huge namespaces in one big slurp. Also the wrapper code could be made quite a lot easier and faster without so much getattr/setattr trapping. Does btw. anybody really want to see case-insensitivity in Python programs? I'm quite happy with it as it is, and I would even force the use to always use the same case style after he has touched an external property once. Example for Excel: You may write "xl.workbooks" in lowercase, but then you have to stay with it. This would keep Python source clean for, say, PyLint. my 0.02 Euro - chris -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net 10553 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home
Christian Tismer wrote:
David Ascher wrote: [Marc:>
Since you put out the objectives, I'd like to propose a little different approach...
1. Have eval/exec accept any mapping object as input
2. Make those two copy the content of the mapping object into real dictionaries
3. Provide a hook into the dictionary implementation that can be used to redirect KeyErrors and use that redirection to forward the request to the original mapping objects
I don't think that this proposal would give so much new value. Since a mapping can also be implemented in arbitrary ways, say by functions, a mapping is not necessarily finite and might not be changeable into a dict.
[Disclaimer: I'm not really keen on having the possibility of letting code execute in arbitrary namespace objects... it would make code optimizations even less manageable.] You can easily support infinite mappings by wrapping the function into an object which returns an empty list for .items() and then use the hook mentioned in 3 to redirect the lookup to that function. The proposal allows one to use such a proxy to simulate any kind of mapping -- it works much like the __getattr__ hook provided for instances.
[David:>
Interesting counterproposal. I'm not sure whether any of the proposals on the table really do what's needed for e.g. case-insensitive namespace handling. I can see how all of the proposals so far allow case-insensitive reference name handling in the global namespace, but don't we also need to hook into the local-namespace creation process to allow case-insensitivity to work throughout?
Case-independant namespaces seem to be a minor point, nice to have for interfacing to other products, but then, in a function, I see no benefit in changing the semantics of function locals? The lookup of foreign symbols would always be through a mapping object. If you take COM for instance, your access to a COM wrapper for an arbitrary object would be through properties of this object. After assignment to a local function variable, why should we support case-insensitivity at all?
I would think mapping objects would be a great simplification of lazy imports in COM, where we would like to avoid to import really huge namespaces in one big slurp. Also the wrapper code could be made quite a lot easier and faster without so much getattr/setattr trapping.
What do lazy imports have to do with case [in]sensitive namespaces ? Anyway, how about a simple lazy import mechanism in the standard distribution, i.e. why not make all imports lazy ? Since modules are first class objects this should be easy to implement...
Does btw. anybody really want to see case-insensitivity in Python programs? I'm quite happy with it as it is, and I would even force the use to always use the same case style after he has touched an external property once. Example for Excel: You may write "xl.workbooks" in lowercase, but then you have to stay with it. This would keep Python source clean for, say, PyLint.
"No" and "me too" ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Y2000: Y2000: 243 days left Business: http://www.lemburg.com/ Python Pages: http://starship.python.net/crew/lemburg/
[Marc]
[Disclaimer: I'm not really keen on having the possibility of letting code execute in arbitrary namespace objects... it would make code optimizations even less manageable.]
Good point - although surely that would simply mean (certain) optimisations can't be performed for code executing in that environment? How to detect this at "optimization time" may be a little difficult :-) However, this is the primary purpose of this thread - to workout _if_ it is a good idea, as much as working out _how_ to do it :-)
The proposal allows one to use such a proxy to simulate any kind of mapping -- it works much like the __getattr__ hook provided for instances.
My only problem with Marc's proposal is that there already _is_ an established mapping protocol, and this doesnt use it; instead it invents a new one with the benefit being potentially less code breakage. And without attempting to sound flippant, I wonder how many extension modules will be affected? Module init code certainly assumes the module __dict__ is a dictionary, but none of my code assumes anything about other namespaces. Marc's extensions may be a special case, as AFAIK they inject objects into other dictionaries (ie, new builtins?). Again, not trying to downplay this too much, but if it is only a problem for Marc's more esoteric extensions, I dont feel that should hold up an otherwise solid proposal. [Chris, I think?]
Case-independant namespaces seem to be a minor point, nice to have for interfacing to other products, but then, in a function, I see no benefit in changing the semantics of function locals? The lookup of foreign symbols would
I disagree here. Consider Alice, and similar projects, where a (arguably misplaced, but nonetheless) requirement is that the embedded language be case-insensitive. Period. The Alice people are somewhat special in that they had the resources to change the interpreters guts. Most people wont, and will look for a different language to embedd. Of course, I agree with you for the specific cases you are talking - COM, Active Scripting etc. Indeed, everything I would use this for would prefer to keep the local function semantics identical.
Does btw. anybody really want to see case-insensitivity in Python programs? I'm quite happy with it as it is, and I would even force the use to always use the same case style after he has touched an external property once. Example for Excel: You may write "xl.workbooks" in lowercase, but then you have to stay with it. This would keep Python source clean for, say, PyLint.
"No" and "me too" ;-)
I think we are missing the point a little. If we focus on COM, we may come up with a different answer. Indeed, if we are to focus on COM integration with Python, there are other areas I would prefer to start with :-) IMO, we should attempt to come up with a more flexible namespace mechanism that is in the style of Python, and will not noticeably slowdown Python. Then COM etc can take advantage of it - much in the same way that Python's existing namespace model existed pre-COM, and COM had to take advantage of what it could! Of course, a key indicator of the likely success is how well COM _can_ take advantage of it, and how much Alice could have taken advantage of it - I cant think of any other yardsticks? Mark.
Mark Hammond wrote:
[Marc]
[Disclaimer: I'm not really keen on having the possibility of letting code execute in arbitrary namespace objects... it would make code optimizations even less manageable.]
Good point - although surely that would simply mean (certain) optimisations can't be performed for code executing in that environment? How to detect this at "optimization time" may be a little difficult :-)
However, this is the primary purpose of this thread - to workout _if_ it is a good idea, as much as working out _how_ to do it :-)
The proposal allows one to use such a proxy to simulate any kind of mapping -- it works much like the __getattr__ hook provided for instances.
My only problem with Marc's proposal is that there already _is_ an established mapping protocol, and this doesnt use it; instead it invents a new one with the benefit being potentially less code breakage.
...and that's the key point: you get the intended features and the core code will not have to be changed in significant ways. Basically, I think these kind of core extensions should be done in generic ways, e.g. by letting the eval/exec machinery accept subclasses of dictionaries, rather than trying to raise the abstraction level used and slowing things down in general just to be able to use the feature on very few occasions.
And without attempting to sound flippant, I wonder how many extension modules will be affected? Module init code certainly assumes the module __dict__ is a dictionary, but none of my code assumes anything about other namespaces. Marc's extensions may be a special case, as AFAIK they inject objects into other dictionaries (ie, new builtins?). Again, not trying to downplay this too much, but if it is only a problem for Marc's more esoteric extensions, I dont feel that should hold up an otherwise solid proposal.
My mxTools extension does the assignment in Python, so it wouldn't be affected. The others only do the usual modinit() stuff. Before going any further on this thread we may have to ponder a little more on the objectives that we have. If it's only case-insensitive lookups then I guess a simple compile time switch exchanging the implementations of string hash and compare functions would do the trick. If we're after doing wild things like lookups accross networks, then a more specific approach is needed. So what is it that we want in 1.6 ?
[Chris, I think?]
Case-independant namespaces seem to be a minor point, nice to have for interfacing to other products, but then, in a function, I see no benefit in changing the semantics of function locals? The lookup of foreign symbols would
I disagree here. Consider Alice, and similar projects, where a (arguably misplaced, but nonetheless) requirement is that the embedded language be case-insensitive. Period. The Alice people are somewhat special in that they had the resources to change the interpreters guts. Most people wont, and will look for a different language to embedd.
Of course, I agree with you for the specific cases you are talking - COM, Active Scripting etc. Indeed, everything I would use this for would prefer to keep the local function semantics identical.
As I understand the needs in COM and AS you are talking about object attributes, right ? Making these case-insensitive is a job for a proxy or a __getattr__ hack.
Does btw. anybody really want to see case-insensitivity in Python programs? I'm quite happy with it as it is, and I would even force the use to always use the same case style after he has touched an external property once. Example for Excel: You may write "xl.workbooks" in lowercase, but then you have to stay with it. This would keep Python source clean for, say, PyLint.
"No" and "me too" ;-)
I think we are missing the point a little. If we focus on COM, we may come up with a different answer. Indeed, if we are to focus on COM integration with Python, there are other areas I would prefer to start with :-)
IMO, we should attempt to come up with a more flexible namespace mechanism that is in the style of Python, and will not noticeably slowdown Python. Then COM etc can take advantage of it - much in the same way that Python's existing namespace model existed pre-COM, and COM had to take advantage of what it could!
Of course, a key indicator of the likely success is how well COM _can_ take advantage of it, and how much Alice could have taken advantage of it - I cant think of any other yardsticks?
-- Marc-Andre Lemburg ______________________________________________________________________ Y2000: Y2000: 242 days left Business: http://www.lemburg.com/ Python Pages: http://starship.python.net/crew/lemburg/
scriptics is positioning tcl as a perl killer: http://www.scriptics.com/scripting/perl.html afaict, unicode and event handling are the two main thingies missing from python 1.5. -- unicode: is on its way. -- event handling: asynclib/asynchat provides an awesome framework for event-driven socket pro- gramming. however, Python still lacks good cross- platform support for event-driven access to files and pipes. are threads good enough, or would it be cool to have something similar to Tcl's fileevent stuff in Python? -- regexps: has anyone compared the new uni- code-aware regexp package in Tcl with pcre? comments? </F> btw, the rebol folks have reached 2.0: http://www.rebol.com/ maybe 1.6 should be renamed to Python 6.0?
Fredrik Lundh writes:
-- regexps: has anyone compared the new uni- code-aware regexp package in Tcl with pcre?
I looked at it a bit when Tcl 8.1 was in beta; it derives from Henry Spencer's 1998-vintage code, which seems to try to do a lot of optimization and analysis. It may even compile DFAs instead of NFAs when possible, though it's hard for me to be sure. This might give it a substantial speed advantage over engines that do less analysis, but I haven't benchmarked it. The code is easy to read, but difficult to understand because the theory underlying the analysis isn't explained in the comments; one feels there should be an accompanying paper to explain how everything works, and it's why I'm not sure if it really is producing DFAs for some expressions. Tcl seems to represent everything as UTF-8 internally, so there's only one regex engine; there's . The code is scattered over more files: amarok generic>ls re*.[ch] regc_color.c regc_locale.c regcustom.h regerrs.h regfree.c regc_cvec.c regc_nfa.c rege_dfa.c regex.h regfronts.c regc_lex.c regcomp.c regerror.c regexec.c regguts.h amarok generic>wc -l re*.[ch] 742 regc_color.c 170 regc_cvec.c 1010 regc_lex.c 781 regc_locale.c 1528 regc_nfa.c 2124 regcomp.c 85 regcustom.h 627 rege_dfa.c 82 regerror.c 18 regerrs.h 308 regex.h 952 regexec.c 25 regfree.c 56 regfronts.c 388 regguts.h 8896 total amarok generic> This would be an issue for using it with Python, since all these files would wind up scattered around the Modules directory. For comparison, pypcre.c is around 4700 lines of code. -- A.M. Kuchling http://starship.python.net/crew/amk/ Things need not have happened to be true. Tales and dreams are the shadow-truths that will endure when mere facts are dust and ashes, and forgot. -- Neil Gaiman, _Sandman_ #19: _A Midsummer Night's Dream_
I looked at it a bit when Tcl 8.1 was in beta; it derives from Henry Spencer's 1998-vintage code, which seems to try to do a lot of optimization and analysis. It may even compile DFAs instead of NFAs when possible, though it's hard for me to be sure. This might give it a substantial speed advantage over engines that do less analysis, but I haven't benchmarked it. The code is easy to read, but difficult to understand because the theory underlying the analysis isn't explained in the comments; one feels there should be an accompanying paper to explain how everything works, and it's why I'm not sure if it really is producing DFAs for some expressions.
Tcl seems to represent everything as UTF-8 internally, so there's only one regex engine; there's .
Hmm... I looked when Tcl 8.1 was in alpha, and I *think* that at that point the regex engine was compiled twice, once for 8-bit chars and once for 16-bit chars. But this may have changed. I've noticed that Perl is taking the same position (everything is UTF-8 internally). On the other hand, Java distinguishes 16-bit chars from 8-bit bytes. Python is currently in the Java camp. This might be a good time to make sure that we're still convinced that this is the right thing to do!
The code is scattered over more files:
amarok generic>ls re*.[ch] regc_color.c regc_locale.c regcustom.h regerrs.h regfree.c regc_cvec.c regc_nfa.c rege_dfa.c regex.h regfronts.c regc_lex.c regcomp.c regerror.c regexec.c regguts.h amarok generic>wc -l re*.[ch] 742 regc_color.c 170 regc_cvec.c 1010 regc_lex.c 781 regc_locale.c 1528 regc_nfa.c 2124 regcomp.c 85 regcustom.h 627 rege_dfa.c 82 regerror.c 18 regerrs.h 308 regex.h 952 regexec.c 25 regfree.c 56 regfronts.c 388 regguts.h 8896 total amarok generic>
This would be an issue for using it with Python, since all these files would wind up scattered around the Modules directory. For comparison, pypcre.c is around 4700 lines of code.
I'm sure that if it's good code, we'll find a way. Perhaps a more interesting question is whether it is Perl5 compatible. I contacted Henry Spencer at the time and he was willing to let us use his code. --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum writes:
Hmm... I looked when Tcl 8.1 was in alpha, and I *think* that at that point the regex engine was compiled twice, once for 8-bit chars and once for 16-bit chars. But this may have changed.
It doesn't seem to currently; the code in tclRegexp.c looks like this: /* Remember the UTF-8 string so Tcl_RegExpRange() can convert the * matches from character to byte offsets. */ regexpPtr->string = string; Tcl_DStringInit(&stringBuffer); uniString = Tcl_UtfToUniCharDString(string, -1, &stringBuffer); numChars = Tcl_DStringLength(&stringBuffer) / sizeof(Tcl_UniChar); /* Perform the regexp match. */ result = TclRegExpExecUniChar(interp, re, uniString, numChars, -1, ((string > start) ? REG_NOTBOL : 0)); ISTR the Spencer engine does, however, define a small and large representation for NFAs and have two versions of the engine, one for each representation. Perhaps that's what you're thinking of.
I've noticed that Perl is taking the same position (everything is UTF-8 internally). On the other hand, Java distinguishes 16-bit chars from 8-bit bytes. Python is currently in the Java camp. This might be a good time to make sure that we're still convinced that this is the right thing to do!
I don't know. There's certainly the fundamental dichotomy that strings are sometimes used to represent characters, where changing encodings on input and output is reasonably, and sometimes used to hold chunks of binary data, where any changes are incorrect. Perhaps Paul Prescod is right, and we should try to get some other data type (array.array()) for holding binary data, as distinct from strings.
I'm sure that if it's good code, we'll find a way. Perhaps a more interesting question is whether it is Perl5 compatible. I contacted Henry Spencer at the time and he was willing to let us use his code.
Mostly Perl-compatible, though it doesn't look like the 5.005 features are there, and I haven't checked for every single 5.004 feature. Adding missing features might be problematic, because I don't really understand what the code is doing at a high level. Also, is there a user community for this code? Do any other projects use it? Philip Hazel has been quite helpful with PCRE, an important thing when making modifications to the code. Should I make a point of looking at what using the Spencer engine would entail? It might not be too difficult (an evening or two, maybe?) to write a re.py that sat on top of the Spencer code; that would at least let us do some benchmarking. -- A.M. Kuchling http://starship.python.net/crew/amk/ In Einstein's theory of relativity the observer is a man who sets out in quest of truth armed with a measuring-rod. In quantum theory he sets out with a sieve. -- Sir Arthur Eddington
Should I make a point of looking at what using the Spencer engine would entail? It might not be too difficult (an evening or two, maybe?) to write a re.py that sat on top of the Spencer code; that would at least let us do some benchmarking.
Surely this would be more helpful than weeks of specilative emails -- go for it! --Guido van Rossum (home page: http://www.python.org/~guido/)
talking about regexps, here's another thing that would be quite nice to have in 1.6 (available from the Python level, that is). or is it already in there somewhere? </F> ... http://www.dejanews.com/[ST_rn=qs]/getdoc.xp?AN=464362873 Tcl 8.1b3 Request: Generated by Scriptics' bug entry form at Submitted by: Frederic BONNET OperatingSystem: Windows 98 CustomShell: Applied patch to the regexp engine (the exec part) Synopsis: regexp improvements DesiredBehavior: As previously requested by Don Libes: > I see no way for Tcl_RegExpExec to indicate "could match" meaning > "could match if more characters arrive that were suitable for a > match". This is required for a class of applications involving > matching on a stream required by Expect's interact command. Henry > assured me that this facility would be in the engine (I'm not the only > one that needs it). Note that it is not sufficient to add one more > return value to Tcl_RegExpExec (i.e., 2) because one needs to know > both if something matches now and can match later. I recommend > another argument (canMatch *int) be added to Tcl_RegExpExec. /patch info follows/ ...
[Guido & Andrew on Tcl's new regexp code]
I'm sure that if it's good code, we'll find a way. Perhaps a more interesting question is whether it is Perl5 compatible. I contacted Henry Spencer at the time and he was willing to let us use his code.
Haven't looked at the code, but did read the manpage just now: http://www.scriptics.com/man/tcl8.1/TclCmd/regexp.htm WRT Perl5 compatibility, it sez: Incompatibilities of note include `\b', `\B', the lack of special treatment for a trailing newline, the addition of complemented bracket expressions to the things affected by newline-sensitive matching, the restrictions on parentheses and back references in lookahead constraints, and the longest/shortest-match (rather than first-match) matching semantics. So some gratuitous differences, and maybe a killer: Guido hasn't had much kind to say about "longest" (aka POSIX) matching semantics. An example from the page: (week|wee)(night|knights) matches all ten characters of `weeknights' which means it matched 'wee' and 'knights'; Python/Perl match 'week' and 'night'. It's the *natural* semantics if Andrew's suspicion that it's compiling a DFA is correct; indeed, it's a pain to get that behavior any other way! otoh-it's-potentially-very-much-faster-ly y'rs - tim
[Tim]
... It's the *natural* semantics if Andrew's suspicion that it's compiling a DFA is correct ...
More from the man page: AREs report the longest/shortest match for the RE, rather than the first found in a specified search order. This may affect some RREs which were written in the expectation that the first match would be reported. (The careful crafting of RREs to optimize the search order for fast matching is obsolete (AREs examine all possible matches in parallel, and their performance is largely insensitive to their complexity) but cases where the search order was exploited to deliberately find a match which was not the longest/shortest will need rewriting.) Nails it, yes? Now, in 10 seconds, try to remember a regexp where this really matters <wink>. Note in passing that IDLE's colorizer regexp *needs* to search for triple-quoted strings before single-quoted ones, else the P/P semantics would consider """ to be an empty single-quoted string followed by a double quote. This isn't a case where it matters in a bad way, though! The "longest" rule picks the correct alternative regardless of the order in which they're written. at-least-in-that-specific-regex<0.1-wink>-ly y'rs - tim
[Tim]
So some gratuitous differences, and maybe a killer: Guido hasn't had much kind to say about "longest" (aka POSIX) matching semantics.
An example from the page:
(week|wee)(night|knights) matches all ten characters of `weeknights'
which means it matched 'wee' and 'knights'; Python/Perl match 'week' and 'night'.
It's the *natural* semantics if Andrew's suspicion that it's compiling a DFA is correct; indeed, it's a pain to get that behavior any other way!
Possibly contradicting what I once said about DFAs (I have no idea what I said any more :-): I think we shouldn't be hung up about the subtleties of DFA vs. NFA; for most people, the Perl-compatibility simply means that they can use the same metacharacters. My guess is that people don'y so much translate long Perl regexp's to Python but simply transport their (always incomplete -- Larry Wall *wants* it that way :-) knowledge of Perl regexps to Python. My meta-guess is that this is also Henry Spencer's and John Ousterhout's guess. As for Larry Wall, I guess he really doesn't care :-) --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum writes:
Possibly contradicting what I once said about DFAs (I have no idea what I said any more :-): I think we shouldn't be hung up about the subtleties of DFA vs. NFA; for most people, the Perl-compatibility simply means that they can use the same metacharacters. My guess is
I don't like slipping in such a change to the semantics with no visible change to the module name or interface. On the other hand, if it's not NFA-based, then it can provide POSIX semantics without danger of taking exponential time to determine the longest match. BTW, there's an interesting reference, I assume to this code, in _Mastering Regular Expressions_; Spencer is quoted on page 121 as saying it's "at worst quadratic in text size.". Anyway, we can let it slide until a Python interface gets written. -- A.M. Kuchling http://starship.python.net/crew/amk/ In the black shadow of the Baba Yaga babies screamed and mothers miscarried; milk soured and men went mad. -- In SANDMAN #38: "The Hunt"
BTW, there's an interesting reference, I assume to this code, in _Mastering Regular Expressions_; Spencer is quoted on page 121 as saying it's "at worst quadratic in text size.".
Not sure if that was the same code -- this is *new* code, not Spencer's old code. I think Friedl's book is older than the current code. --Guido van Rossum (home page: http://www.python.org/~guido/)
I've consistently found that the best way to kill a thread is to rename it accurately <wink>. Agree w/ Guido that few people really care about the differing semantics. Agree w/ Andrew that it's bad to pull a semantic switcheroo at this stage anyway: code will definitely break. Like \b(?: (?P<keyword>and|if|else|...) | (?P<identifier>[a-zA-Z_]\w*) )\b The (special)|(general) idiom relies on left-to-right match-and-out searching of alternatives to do its job correctly. Not to mention that \b is not a word-boundary assertion in the new pkg (talk about pointlessly irritating differences! at least this one could be easily hidden via brainless preprocessing). Over the long run, moving to a DFA locks Python out of the directions Perl is *moving*, namely embedding all sorts of runtime gimmicks in regexps that exploit knowing the "state of the match so far". DFAs don't work that way. I don't mind losing those possibilities, because I think the regexp sublanguage is strained beyond its limits already. But that's a decision with Big Consequences, so deserves some thought. I'd definitely like the (sometimes dramatically) increased speed a DFA can offer (btw, this code appears to use a lazily-generated DFA, to avoid the exponential *compile*-time a straightforward DFA implementation can suffer -- the code is very complex and lacks any high-level internal docs, so we better hope Henry stays in love with it <0.5 wink>).
... My guess is that people don't so much translate long Perl regexp's to Python but simply transport their (always incomplete -- Larry Wall *wants* it that way :-) knowledge of Perl regexps to Python.
This is directly proportional to the number of feeble CGI programmers Python attracts <wink>. The good news is that they wouldn't know an NFA from a DFA if Larry bit Henry on the ass ...
My meta-guess is that this is also Henry Spencer's and John Ousterhout's guess.
I think Spencer strongly favors DFA semantics regardless of fashion, and Ousterhout is a pragmatist. So I trust JO's judgment more <0.9 wink>.
As for Larry Wall, I guess he really doesn't care :-)
I expect he cares a lot! Because a DFA would prevent Perl from going even more insane in its present direction. About the age of the code, postings to comp.lang.tcl have Henry saying he was working on the alpha version intensely as recently as Decemeber ('98). A few complaints about the alpha release trickled in, about regexp compile speed and regexp matching speed in specific cases. Perhaps paradoxically, the latter were about especially simple regexps with long fixed substrings (where this mountain of sophisticated machinery is likely to get beat cold by an NFA with some fixed-substring lookahead smarts -- which latter Henry intended to graft into this pkg too). [Andrew]
BTW, there's an interesting reference, I assume to this code, in _Mastering Regular Expressions_; Spencer is quoted on page 121 as saying it's "at worst quadratic in text size.".
[Guido]
Not sure if that was the same code -- this is *new* code, not Spencer's old code. I think Friedl's book is older than the current code.
I expect this is an invariant, though: it's not natural for a DFA to know where subexpression matches begin and end, and there's a pile of xxx_dissect functions in regexec.c that use what strongly appear to be worst-case quadratic-time algorithms for figuring that out after it's known that the overall expression has *a* match. Expect too, but don't know, that only pathological cases are actually expensive. Question: has this package been released in any other context, or is it unique to Tcl? I searched in vain for an announcement (let alone code) from Henry, or any discussion of this code outside the Tcl world. whatever-happens-i-vote-we-let-them-debug-it<wink>-ly y'rs - tim
On Wed, 5 May 1999, Tim Peters wrote:
... Question: has this package been released in any other context, or is it unique to Tcl? I searched in vain for an announcement (let alone code) from Henry, or any discussion of this code outside the Tcl world.
Apache uses it. However, the Apache guys have considered possibility updating the thing. I gather that they have a pretty old snapshot. Another guy mentioned PCRE and I pointed out that Python uses it for its regex support. In other words, if Apache *does* update the code, then it may be that Apache will drop the HS engine in favor of PCRE. Cheers, -g -- Greg Stein, http://www.lyra.org/
[Tim]
Question: has this package [Tcl's 8.1 regexp support] been released in any other context, or is it unique to Tcl? I searched in vain for an announcement (let alone code) from Henry, or any discussion of this code outside the Tcl world.
[Greg Stein]
Apache uses it.
However, the Apache guys have considered possibility updating the thing. I gather that they have a pretty old snapshot. Another guy mentioned PCRE and I pointed out that Python uses it for its regex support. In other words, if Apache *does* update the code, then it may be that Apache will drop the HS engine in favor of PCRE.
Hmm. I just downloaded the Apache 1.3.4 source to check on this, and it appears to be using a lightly massaged version of Spencer's old (circa '92-'94) just-POSIX regexp package. Henry has been distributing regexp pkgs for a loooong time <wink>. The Tcl 8.1 regexp pkg is much hairier. If the Apache folk want to switch in order to get the Perl regexp syntax extensions, this Tcl version is worth looking at too. If they want to switch for some other reason, it would be good to know what that is! The base pkg Apache uses is easily available all over the web; the pkg Tcl 8.1 is using I haven't found anywhere except in the Tcl download (which is why I'm wondering about it -- so far, it doesn't appear to be distributed by Spencer himself, in a non-Tcl-customized form). looks-like-an-entirely-new-pkg-to-me-ly y'rs - tim
"TP" == Tim Peters <tim_one@email.msn.com> writes:
TP> Over the long run, moving to a DFA locks Python out of the TP> directions Perl is *moving*, namely embedding all sorts of TP> runtime gimmicks in regexps that exploit knowing the "state of TP> the match so far". DFAs don't work that way. I don't mind TP> losing those possibilities, because I think the regexp TP> sublanguage is strained beyond its limits already. But that's TP> a decision with Big Consequences, so deserves some thought. I know zip about the internals of the various regexp package. But as far as the Python level interface, would it be feasible to support both as underlying regexp engines underneath re.py? The idea would be that you'd add an extra flag (re.PERL / re.TCL ? re.DFA / re.NFA ? re.POSIX / re.USEFUL ? :-) that would select the engine and compiler. Then all the rest of the magic happens behind the scenes, with appropriate exceptions thrown if there are syntax mismatches in the regexp that can't be worked around by preprocessors, etc. Or would that be more confusing than yet another different regexp module? -Barry
[Tim notes that moving to a DFA regexp engine would rule out some future aping of Perl mistakes <wink>] [Barry "The Great Compromiser" Warsaw]
I know zip about the internals of the various regexp package. But as far as the Python level interface, would it be feasible to support both as underlying regexp engines underneath re.py? The idea would be that you'd add an extra flag (re.PERL / re.TCL ? re.DFA / re.NFA ? re.POSIX / re.USEFUL ? :-) that would select the engine and compiler. Then all the rest of the magic happens behind the scenes, with appropriate exceptions thrown if there are syntax mismatches in the regexp that can't be worked around by preprocessors, etc.
Or would that be more confusing than yet another different regexp module?
It depends some on what percentage of the Python distribution Guido wants to devote to regexp code <0.6 wink>; the Tcl pkg would be the largest block of code in Modules/, where regexp packages already consume more than anything else. It's a lot of delicate, difficult code. Someone would need to step up and champion each alternative package. I haven't asked Andrew lately, but I'd bet half a buck the thrill of supporting pcre has waned. If there were competing packages, your suggested interface is fine. I just doubt the Python developers will support more than one (Andrew may still be young, but he can't possibly still be naive enough to sign up for two of these nightmares <wink>). i'm-so-old-i-never-signed-up-for-one-ly y'rs - tim
participants (11)
-
Andrew M. Kuchling
-
Barry A. Warsaw
-
Christian Tismer
-
David Ascher
-
Fredrik Lundh
-
Gordon McMillan
-
Greg Stein
-
Guido van Rossum
-
M.-A. Lemburg
-
Mark Hammond
-
Tim Peters