From jan.kanis at phil.uu.nl Fri Jun 1 10:20:02 2007 From: jan.kanis at phil.uu.nl (Jan Kanis) Date: Fri, 01 Jun 2007 10:20:02 +0200 Subject: [Python-ideas] About list comprehension syntax In-Reply-To: <8C1BDF74-1DAB-4F64-A28E-16788C48AA95@marooned.org.uk> References: <8C1BDF74-1DAB-4F64-A28E-16788C48AA95@marooned.org.uk> Message-ID: On Wed, 30 May 2007 12:41:51 +0200, Arnaud Delobelle wrote: > (3') [x in L if p(x)] I like the idea as well. It doesn't look ambiguous to me. 'if' can only appear as a statement or in conjunction with an 'else', so this expression can't mean anything else imo. Jan From g.brandl at gmx.net Fri Jun 1 11:34:12 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 01 Jun 2007 11:34:12 +0200 Subject: [Python-ideas] About list comprehension syntax In-Reply-To: References: <8C1BDF74-1DAB-4F64-A28E-16788C48AA95@marooned.org.uk> Message-ID: Jan Kanis schrieb: > On Wed, 30 May 2007 12:41:51 +0200, Arnaud Delobelle > wrote: > >> (3') [x in L if p(x)] > > I like the idea as well. It doesn't look ambiguous to me. 'if' can only > appear as a statement or in conjunction with an 'else', so this expression > can't mean anything else imo. Tell that to the LL(1) parser ;) Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From tony at PageDNA.com Fri Jun 1 18:19:22 2007 From: tony at PageDNA.com (Tony Lownds) Date: Fri, 1 Jun 2007 09:19:22 -0700 Subject: [Python-ideas] Attribute Docstrings and Annotations In-Reply-To: <6b4de4d80705310615j5cf712fdldb5e685d2c4a4a7a@mail.gmail.com> References: <6b4de4d80705310615j5cf712fdldb5e685d2c4a4a7a@mail.gmail.com> Message-ID: <8BDB8EC5-43F3-4251-8096-06818BA279B5@PageDNA.com> On May 31, 2007, at 6:15 AM, Ali Sabil wrote: > Hello all, > > I was looking into function annotations, that are to be introduced > in Python3k, and I found this old message sent to this mailing > list : http://mail.python.org/pipermail/python-ideas/2007-January/ > 000037.html > > I would like to restart the discussion of attribute annotation, > because in my opinion it can be a very powerful feature. I > personally think about using it for SOAP message serialization, or > any kind of XML serialization, the Idea would be to annotate > various object attribues to be either marshaled as XML Elements or > XML Attributes of the current Node that reflects the Object. > Hi Ali, I'm afraid the PEP deadline has passed, restarting the discussion would not change that. Your use case sounds interesting. I know there are a lot of other use cases for attribute meta-data. However there are lots of ways to achieve attribute meta data storage. # Via naming convention class A: attr = 1 f_attr = int # Via decorators class A: attr = annotated_value(1, int) # Via a separate class class A: class meta: attr = int attr = 1 # Compare to attribute annotation syntax class A: attr: int = 1 Each of the existing idioms I have seen or used has issues, mostly minor. I suspect that because the existing ways don't have glaring deficiencies, even if it were before the PEP deadline, a renewed push on the proposal would have encountered a lot more resistance. I still think the syntax is elegant in symmetry and provides a unified and improved idiom for a common use case. -Tony -------------- next part -------------- An HTML attachment was scrubbed... URL: From showell30 at yahoo.com Wed Jun 6 05:23:38 2007 From: showell30 at yahoo.com (Steve Howell) Date: Tue, 5 Jun 2007 20:23:38 -0700 (PDT) Subject: [Python-ideas] alphabets, spelling, grammar, vocabulary, and the evolution of English Message-ID: <524793.44655.qm@web33512.mail.mud.yahoo.com> This is a linguistic reflection inspired by PEP 3131. English is a language that has undergone a major transformation in the last 200 to 300 years. It used to be spoken mostly on one particular island across the channel from France. Now it's spoken worldwide. Two of the larger populations of English speakers, residents of the UK and residents of the US, live an ocean away from each other. Unlike Python, English never had a PEP process. It naturally evolved. But like Python, English has been promoted, for various reasons, as a worldwide language, mostly successfully. English is also like Python in the sense that it had a mostly fresh start during certain colonizations, but it still had backward compatibility issues. Some observations: 1) US and UK residents can mostly converse with each other. 2) American English has diverged from British English in vocubalary, although many of the differing words are esoteric, or are inherently culturally incompatible, or have synonyms recognized on both sides of the ocean, or are idiomatic expressions. 3) American English differs from British grammar only in pretty non-fundamental areas. American English, despite 200 years of evolution away from its parent, preserves subject-verb-object ordering. Adjectives almost always precede nouns. Differences come down to subtle things like how you deal with collective nouns, etc. 4) Some words are spelled differently between American English and British English, but the spellings are generally mutually understanded by all speakers. (Even on the same side of the ocean, spelling can be ambiguous in English, so variant spellings often arise [more often, than, say, Spanish].) 5) American English and British English still have the exact same alphabet. A to Z. Are there analogies here to be drawn to Python? Thoughts? On AmE and BrE: http://en.wikipedia.org/wiki/American_and_British_English_differences "America and England are two nations divided by a common language." ____________________________________________________________________________________ Don't get soaked. Take a quick peak at the forecast with the Yahoo! Search weather shortcut. http://tools.search.yahoo.com/shortcuts/#loc_weather From eyal.lotem at gmail.com Sun Jun 10 06:27:20 2007 From: eyal.lotem at gmail.com (Eyal Lotem) Date: Sun, 10 Jun 2007 07:27:20 +0300 Subject: [Python-ideas] Exporting dict Items for direct lookups of specific keys Message-ID: I believe that it is possible to add a useful feature to dicts, that will enable some interesting performance features in the future. Dict lookups are in general supposed to be O(1) and they are indeed very fast. However, there are several drawbacks: A. O(1) is not always achieved, collisions may still occur B. Some lookups use static "literal" keys, and could benefit from accessing a dict item directly (very fast O(1) _worst_ case, without even a function call when done from the C side). C. B is especially true when dicts are used to hold attributes - in that case literal attribute names, in conjunction with psyco-like type specialization, can help avoid many dict lookups. I won't delve into this in this mail - but this is actually the use-case that I'm after optimizing. There is a possible implementation that can allow a dict to export items with minimal impact on its other performance. Create a new type of PyObject, a PyDictItemObject that will contain a key, value pair. (This will NOT exist for each hash table entry, but only for specifically exported items). Add a bitmap to dicts that has a single bit for every hash table entry. If the entry is marked in the bitmap, then its PyObject* "value" is not a reference to the value object, but to a PyDictItemObject. A new dict method "PyDict_ExportItem" that takes a single argument: key will create a PyDictItemObject, and assign the dict's key to it, and mark that hash-table entry in the bitmap. If PyDict_ExportItem is called when the item is already exported, it will return another reference to the same PyDictItemObject. The value in PyDictItemObject will initially be set to NULL (which means "key not mapped"). Both the PyDictItemObject and PyDict_exportitem should probably not be exported to the Python-side, but PyDictItemObject should probably be a PyObject for refcounting purposes. All the dict methods to get/set values, once they found the correct entry, check the bitmap to see if the entry is marked, and if it is - access the value in the PyDictItemObject instead of the value itself. In addition, if that value is NULL, it represents the key not actually being in the dict (__getitem__ can raise a KeyError there, for example, and __setitem__ can simply use Py_XDECREF and overwrite value with the value). Alternatively to the bitmap, the hash table entry can contain another boolean int -- I am not sure which is preferable in terms of cache-locality, but the bitmap is definitely cheaper, space-wise. This would allow dict users to create an exported item for a key once, and then access that key in real O(1) without function calls. As mentioned before, this can also serve in the future, as the basis for avoiding dict lookups for attribute searches. From jcarlson at uci.edu Sun Jun 10 10:00:20 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Sun, 10 Jun 2007 01:00:20 -0700 Subject: [Python-ideas] Exporting dict Items for direct lookups of specific keys In-Reply-To: References: Message-ID: <20070610003904.7907.JCARLSON@uci.edu> "Eyal Lotem" wrote: > > I believe that it is possible to add a useful feature to dicts, that > will enable some interesting performance features in the future. > > Dict lookups are in general supposed to be O(1) and they are indeed very fast. > However, there are several drawbacks: > A. O(1) is not always achieved, collisions may still occur Note that O(1) is constant, not necessarily 1. Assuming that the hash function that Python uses is decent (it seems to work well), then as per the load factor of 2/3, then you get an expected number of probes = 1 + (2/3)^2 + (2/3)^3 + (2/3)^4 + ..., which sums to 3. Now, if you have contents that are specifically designed to screw the hash function, then you are going to get poor performance. But this is the case for any particular hash function; there exists inputs that force it to perform poorly. Given the above load factor, if 3 expected probes is too many, you can use d.update(d) to double the size of the dictionary, forcing the load factor to be 1/3 or less, for an expected number of probes = 1.5 . > B. Some lookups use static "literal" keys, and could benefit from > accessing a dict item directly (very fast O(1) _worst_ case, without > even a function call when done from the C side). [snip] What you are proposing is to add a level of indirection between some pointer in the dictionary to some special PyDictItem object. This will slow Python execution when such a thing is used. Why? The extra level of indirection requires another pointer following, as well as a necessary check on the bitmap you are proposing, nevermind the additional memory overhead of having the item (generally doubling the size of dictionary objects that use such 'exported' items). You don't mention what algorithm/structure will allow for the accessing of your dictionary items in O(1) time, only that after you have this bitmap and dictioanry item thingy that it will be O(1) time (where dictionaries weren't already fast enough natively). I don't believe you have fully thought through this, but feel free to post C or Python source that describes your algorithm in detail to prove me wrong. You should note that Python's dictionary implementation has been tuned to work *quite well* for the object attribute/namespace case, and I would be quite surprised if anyone managed to improve upon Raymond's work (without writing platform-specific assembly). - Josiah From eyal.lotem at gmail.com Sun Jun 10 11:30:42 2007 From: eyal.lotem at gmail.com (Eyal Lotem) Date: Sun, 10 Jun 2007 12:30:42 +0300 Subject: [Python-ideas] Exporting dict Items for direct lookups of specific keys In-Reply-To: <20070610003904.7907.JCARLSON@uci.edu> References: <20070610003904.7907.JCARLSON@uci.edu> Message-ID: On 6/10/07, Josiah Carlson wrote: > > "Eyal Lotem" wrote: > > > > I believe that it is possible to add a useful feature to dicts, that > > will enable some interesting performance features in the future. > > > > Dict lookups are in general supposed to be O(1) and they are indeed very fast. > > However, there are several drawbacks: > > A. O(1) is not always achieved, collisions may still occur > > Note that O(1) is constant, not necessarily 1. Assuming that the hash > function that Python uses is decent (it seems to work well), then as per > the load factor of 2/3, then you get an expected number of probes = 1 + > (2/3)^2 + (2/3)^3 + (2/3)^4 + ..., which sums to 3. > > Now, if you have contents that are specifically designed to screw the > hash function, then you are going to get poor performance. But this is > the case for any particular hash function; there exists inputs that > force it to perform poorly. Ofcourse, though it is an interesting anecdote because it won't screw the lookups in the solution I'm describing. > Given the above load factor, if 3 expected probes is too many, you can > use d.update(d) to double the size of the dictionary, forcing the load > factor to be 1/3 or less, for an expected number of probes = 1.5 . > > > > B. Some lookups use static "literal" keys, and could benefit from > > accessing a dict item directly (very fast O(1) _worst_ case, without > > even a function call when done from the C side). > > [snip] > > What you are proposing is to add a level of indirection between some > pointer in the dictionary to some special PyDictItem object. This will > slow Python execution when such a thing is used. Why? The extra level > of indirection requires another pointer following, as well as a > necessary check on the bitmap you are proposing, nevermind the > additional memory overhead of having the item (generally doubling the > size of dictionary objects that use such 'exported' items). > > You don't mention what algorithm/structure will allow for the accessing > of your dictionary items in O(1) time, only that after you have this > bitmap and dictioanry item thingy that it will be O(1) time (where > dictionaries weren't already fast enough natively). I don't believe you > have fully thought through this, but feel free to post C or Python > source that describes your algorithm in detail to prove me wrong. Only access of exported items is O(1) time (when accessed via your PyDictItem_obj->value), other items must be accessed normally and they take just as much time (or as I explained and you reiterated, a tad longer, as it requires a bitmap check and in the case of exported items another dereference). > You should note that Python's dictionary implementation has been tuned > to work *quite well* for the object attribute/namespace case, and I > would be quite surprised if anyone managed to improve upon Raymond's > work (without writing platform-specific assembly). Ofcourse - the idea is not to improve dict's performance with the normal way it is accessed, but to change the way it is accessed for the specific use-case of accessing static values in a static dict - which can be faster than even a fast dict lookup. The dict lookups in globals, builtins are all looking for literal static keys in a literal static dict. In this specific case, it is better to outdo the existing dict performance, by adding a special way to access such static keys in dicts - which insignificantly slows down access to the dict, but significantly speeds up this very common use pattern. Attribute lookups in the class dict are all literal/static key lookups in a static dict (though in order for a code object to know that it is a static dict, a psyco-like system is required. If such a system is used, all of those dict lookups can be made faster as well). From jcarlson at uci.edu Sun Jun 10 15:39:57 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Sun, 10 Jun 2007 06:39:57 -0700 Subject: [Python-ideas] Exporting dict Items for direct lookups of specific keys In-Reply-To: References: <20070610003904.7907.JCARLSON@uci.edu> Message-ID: <20070610061127.790A.JCARLSON@uci.edu> "Eyal Lotem" wrote: > On 6/10/07, Josiah Carlson wrote: > > "Eyal Lotem" wrote: > > > B. Some lookups use static "literal" keys, and could benefit from > > > accessing a dict item directly (very fast O(1) _worst_ case, without > > > even a function call when done from the C side). > > > > [snip] > > > > What you are proposing is to add a level of indirection between some > > pointer in the dictionary to some special PyDictItem object. This will > > slow Python execution when such a thing is used. Why? The extra level > > of indirection requires another pointer following, as well as a > > necessary check on the bitmap you are proposing, nevermind the > > additional memory overhead of having the item (generally doubling the > > size of dictionary objects that use such 'exported' items). > > > > You don't mention what algorithm/structure will allow for the accessing > > of your dictionary items in O(1) time, only that after you have this > > bitmap and dictioanry item thingy that it will be O(1) time (where > > dictionaries weren't already fast enough natively). I don't believe you > > have fully thought through this, but feel free to post C or Python > > source that describes your algorithm in detail to prove me wrong. > > Only access of exported items is O(1) time (when accessed via your > PyDictItem_obj->value), other items must be accessed normally and they > take just as much time (or as I explained and you reiterated, a tad > longer, as it requires a bitmap check and in the case of exported > items another dereference). But you still don't explain *how* these exported keys are going to be accessed. Walk me through the steps required to improve access times in the following case: def foo(obj): return obj.foo > > You should note that Python's dictionary implementation has been tuned > > to work *quite well* for the object attribute/namespace case, and I > > would be quite surprised if anyone managed to improve upon Raymond's > > work (without writing platform-specific assembly). > > Ofcourse - the idea is not to improve dict's performance with the > normal way it is accessed, but to change the way it is accessed for > the specific use-case of accessing static values in a static dict - > which can be faster than even a fast dict lookup. If I have a dictionary X, and X has exported keys, then whenever I access exported values in the dictionary via X[key], your proposed indirection will necessarily be slower than the current implementation. > The dict lookups in globals, builtins are all looking for literal > static keys in a literal static dict. In this specific case, it is > better to outdo the existing dict performance, by adding a special way > to access such static keys in dicts - which insignificantly slows down > access to the dict, but significantly speeds up this very common use > pattern. Please benchmark before you claim "insignificant" performance degredation in the general case. I claim that adding a level of indirection and the accessing of a bit array (which in C is technically a char array with every bit getting a full char, unless you use masks and shifts, which will be slower still) is necessarily slower than the access characteristics of current dictionaries. We can see this as a combination of an increase in the number of operations necessary to do arbitrary dictionary lookups, increased cache overhad of those lookups, as well as the delay and cache overhead of accessing the bit array. > Attribute lookups in the class dict are all literal/static key lookups > in a static dict (though in order for a code object to know that it is > a static dict, a psyco-like system is required. If such a system is > used, all of those dict lookups can be made faster as well). No, attribute lookups are not all literal/static key lookups. See getattr/setattr/delattr, operations on cls.__dict__, obj.__dict__, etc. From what I can gather (please describe the algorithm now that I have asked twice), the only place where there exists improvement potential is in the case of global lookups in a module. That is to say, if a function/method in module foo is accessing some global variable bar, the compiler can replace LOAD_GLOBAL/STORE_GLOBAL/DEL_GLOBAL with an opcode to access a special PyDictItem object that sits in the function/method cell variables, rather than having to look in the globals dictionary (that is attached to every function and method). - Josiah From eyal.lotem at gmail.com Mon Jun 11 03:43:28 2007 From: eyal.lotem at gmail.com (Eyal Lotem) Date: Mon, 11 Jun 2007 04:43:28 +0300 Subject: [Python-ideas] Exporting dict Items for direct lookups of specific keys In-Reply-To: <20070610061127.790A.JCARLSON@uci.edu> References: <20070610003904.7907.JCARLSON@uci.edu> <20070610061127.790A.JCARLSON@uci.edu> Message-ID: On 6/10/07, Josiah Carlson wrote: > > "Eyal Lotem" wrote: > > On 6/10/07, Josiah Carlson wrote: > > > "Eyal Lotem" wrote: > > > > B. Some lookups use static "literal" keys, and could benefit from > > > > accessing a dict item directly (very fast O(1) _worst_ case, without > > > > even a function call when done from the C side). > > > > > > [snip] > > > > > > What you are proposing is to add a level of indirection between some > > > pointer in the dictionary to some special PyDictItem object. This will > > > slow Python execution when such a thing is used. Why? The extra level > > > of indirection requires another pointer following, as well as a > > > necessary check on the bitmap you are proposing, nevermind the > > > additional memory overhead of having the item (generally doubling the > > > size of dictionary objects that use such 'exported' items). > > > > > > You don't mention what algorithm/structure will allow for the accessing > > > of your dictionary items in O(1) time, only that after you have this > > > bitmap and dictioanry item thingy that it will be O(1) time (where > > > dictionaries weren't already fast enough natively). I don't believe you > > > have fully thought through this, but feel free to post C or Python > > > source that describes your algorithm in detail to prove me wrong. > > > > Only access of exported items is O(1) time (when accessed via your > > PyDictItem_obj->value), other items must be accessed normally and they > > take just as much time (or as I explained and you reiterated, a tad > > longer, as it requires a bitmap check and in the case of exported > > items another dereference). > > But you still don't explain *how* these exported keys are going to be > accessed. Walk me through the steps required to improve access times in > the following case: > > def foo(obj): > return obj.foo > > I think you missed what I said - I said that the functionality should probably not be exported to Python - as Python has little to gain from it (it would have to getattr a C method just to request the exported item -- which will nullify the speed benefit). It is the C code which can suddenly do direct access to access the exported dict items - not Python code. > > > You should note that Python's dictionary implementation has been tuned > > > to work *quite well* for the object attribute/namespace case, and I > > > would be quite surprised if anyone managed to improve upon Raymond's > > > work (without writing platform-specific assembly). > > > > Ofcourse - the idea is not to improve dict's performance with the > > normal way it is accessed, but to change the way it is accessed for > > the specific use-case of accessing static values in a static dict - > > which can be faster than even a fast dict lookup. > > If I have a dictionary X, and X has exported keys, then whenever I > access exported values in the dictionary via X[key], your proposed > indirection will necessarily be slower than the current implementation. That is true, I acknowledged that. It is even true also that access to X[key] even when key is not exported is slower. When I have a few spare moments, I'll try and benchmark how much slower it is. > > The dict lookups in globals, builtins are all looking for literal > > static keys in a literal static dict. In this specific case, it is > > better to outdo the existing dict performance, by adding a special way > > to access such static keys in dicts - which insignificantly slows down > > access to the dict, but significantly speeds up this very common use > > pattern. > > Please benchmark before you claim "insignificant" performance > degredation in the general case. I claim that adding a level of > indirection and the accessing of a bit array (which in C is technically > a char array with every bit getting a full char, unless you use masks > and shifts, which will be slower still) is necessarily slower than the > access characteristics of current dictionaries. We can see this as a > combination of an increase in the number of operations necessary to do > arbitrary dictionary lookups, increased cache overhad of those lookups, > as well as the delay and cache overhead of accessing the bit array. You are right - we disagree there, but until I benchmark all words are moot. > > Attribute lookups in the class dict are all literal/static key lookups > > in a static dict (though in order for a code object to know that it is > > a static dict, a psyco-like system is required. If such a system is > > used, all of those dict lookups can be made faster as well). > > No, attribute lookups are not all literal/static key lookups. See > getattr/setattr/delattr, operations on cls.__dict__, obj.__dict__, etc. I may have oversimplified a bit for the sake of explaining. I was referring to the operations that are taken by LOAD_ATTR, as an example. Lets analyze the LOAD_ATTR bytecode instruction: * Calls PyObject_GetAttr for the requested attribute name on the request object. * PyObject_GetAttr redirects it to the type's tp_getattr[o]. * When tp_getattr[o] is not overridden, this calls PyObject_GenericGetAttr. * PyObject_GenericGetAttr first looks for a method descriptor in dicts of every class in the entire __mro__. If it found a getter/setter descriptor, it uses that. If it didn't, it tries the instance dict, and then uses the class descriptor/attr. I believe this implementation to be very wasteful (specifically the last step) and I have posted a separate post about this in python-dev. There is work being done to create an mro cache for types which would allow to convert the mro lookup to a single lookup in most cases. I believe that this mro cache should become a single dict object inside each type object, which holds a merge (according to mro order) of all the dicts in its mro. If this modification is done, then PyObject_GenericGetAttr can become a lookup in the instance dict (which, btw, can also disappear when __slots__ is used in the class), and a lookup in the mro cache dict. If this is the case, then LOAD_ATTR, which is most often used with literal names, can (if the type of the object being accessed is known [via a psyco-like system]) become a regular lookup on the instance dict, and a "static lookup" on the class mro cache dict (which would use an exported dict item). If the psyco-like system can even create code objects which are not only specific to one type, but to a specific instance, even the instance lookup of the literal attribute name can be converted to a "static lookup" in the instance's dict. Since virtually all LOAD_ATTR's are using literal strings, virtually all of the class-side "dynamic lookups" can be converted to "static lookups". Since a "static lookup" costs a dereference and a conditional, and a dynamic lookup entails at least 4 C function calls (including C stack setup/unwinds), a few C assignments and C conditionals, I believe it is likely that this will pay off as a serious improvement in Python's performance, when combined with a psyco-like system (not an architecture-dependent ones). > From what I can gather (please describe the algorithm now that I have > asked twice), the only place where there exists improvement potential is > in the case of global lookups in a module. That is to say, if a > function/method in module foo is accessing some global variable bar, the > compiler can replace LOAD_GLOBAL/STORE_GLOBAL/DEL_GLOBAL with an opcode > to access a special PyDictItem object that sits in the function/method > cell variables, rather than having to look in the globals dictionary > (that is attached to every function and method). As I explained above, there is room for improvement in normal attribute lookups, however that improvement requires a psyco-like system in order to be able to deduce which dicts are going to be accessed by the GetAttr mechanism and then using static lookups to access them directly. With access to globals and builtins, this optimization is indeed easier, and your description is correct, I can be a little more specific: * Each code object can contain a "co_globals_dict_items" and "co_builtins_dict_items" attributes that refer to the exported-dict-items for that literal name in both the globals and builtins dict. * Each access to a literal name in the globals/builtins namespace, at the compilation stage, will request the globals dict and builtins dict to create an exported item for that literal name. This exported item will be put into the co_globals_dict_items/co_builtins_dict_items in the code object. * LOAD_GLOBAL will not be used when literal name are accessed. Instead, a new bytecode instruction "LOAD_LITERAL_GLOBAL" with an index to the "co_globals_dict_items/co_builtins_dict_items" tuples in the code object. * LOAD_LITERAL_GLOBAL will use the index to find the PyExportedDictItem in those tuples, and look like (a bit more verbose naming for clarity): case LOAD_LITERAL_GLOBAL: exported_dict_item = GETITEM(co->co_globals_dict_items, oparg); x = exported_dict_item->value; if(NULL == x) { exported_dict_item = GETITEM(co->co_builtins_dict_items, oparg); x = exported_dict_item->value; if(NULL == x) { format_exc_check_arg(PyExc_NameError, MSG, GETITEM(co->co_globals_names, oparg)); break; } } Py_INCREF(x); PUSH(x); continue; I hope that with these explanations and some code snippets my intentions are more clear. > - Josiah > > From jcarlson at uci.edu Mon Jun 11 07:56:40 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Sun, 10 Jun 2007 22:56:40 -0700 Subject: [Python-ideas] Exporting dict Items for direct lookups of specific keys In-Reply-To: References: <20070610061127.790A.JCARLSON@uci.edu> Message-ID: <20070610205256.7919.JCARLSON@uci.edu> "Eyal Lotem" wrote: > On 6/10/07, Josiah Carlson wrote: > > "Eyal Lotem" wrote: > > > On 6/10/07, Josiah Carlson wrote: > > > Only access of exported items is O(1) time (when accessed via your > > > PyDictItem_obj->value), other items must be accessed normally and they > > > take just as much time (or as I explained and you reiterated, a tad > > > longer, as it requires a bitmap check and in the case of exported > > > items another dereference). > > > > But you still don't explain *how* these exported keys are going to be > > accessed. Walk me through the steps required to improve access times in > > the following case: > > > > def foo(obj): > > return obj.foo > > > > > I think you missed what I said - I said that the functionality should > probably not be exported to Python - as Python has little to gain from > it (it would have to getattr a C method just to request the exported > item -- which will nullify the speed benefit). > > It is the C code which can suddenly do direct access to access the > exported dict items - not Python code. Maybe my exposure to C extensions is limited, but I haven't seen a whole lot of C doing the equivalent of obj.attrname outside of the Python standard library. And even then, it's not "I'm going to access attribute Y of object X a million times", it's "I'm going to access some attributes on some objects". The only exception that I've seen happen really at all is when someone converts their pure Python library that interacts with other libraries into Pyrex. But even then, repeated access is generally uncommon except in wxPython uses; wx. (which I've never seen converted to Pyrex), and in those cases, repeated access is generally rare. I'm curious as to what kind of C code you are seeing in which these cached lookups will help in a substantial way. > > If I have a dictionary X, and X has exported keys, then whenever I > > access exported values in the dictionary via X[key], your proposed > > indirection will necessarily be slower than the current implementation. > > That is true, I acknowledged that. It is even true also that access to > X[key] even when key is not exported is slower. When I have a few > spare moments, I'll try and benchmark how much slower it is. I await your benchmarks. > > > Attribute lookups in the class dict are all literal/static key lookups > > > in a static dict (though in order for a code object to know that it is > > > a static dict, a psyco-like system is required. If such a system is > > > used, all of those dict lookups can be made faster as well). > > > > No, attribute lookups are not all literal/static key lookups. See > > getattr/setattr/delattr, operations on cls.__dict__, obj.__dict__, etc. > > I may have oversimplified a bit for the sake of explaining. I was > referring to the operations that are taken by LOAD_ATTR, as an > example. > Lets analyze the LOAD_ATTR bytecode instruction: > * Calls PyObject_GetAttr for the requested attribute name on the > request object. > * PyObject_GetAttr redirects it to the type's tp_getattr[o]. > * When tp_getattr[o] is not overridden, this calls PyObject_GenericGetAttr. > * PyObject_GenericGetAttr first looks for a method descriptor in > dicts of every class in the entire __mro__. If it found a > getter/setter descriptor, it uses that. If it didn't, it tries the > instance dict, and then uses the class descriptor/attr. > > I believe this implementation to be very wasteful (specifically the > last step) and I have posted a separate post about this in python-dev. Due to the lack of support on the issue in python-dev (it would break currently existing code, and the time for Python 3.0 PEPs is past), I doubt you are going to get any changes in this area unless the resulting semantics are unchanged. > Since a "static lookup" costs a dereference and a conditional, and a > dynamic lookup entails at least 4 C function calls (including C stack > setup/unwinds), a few C assignments and C conditionals, I believe it > is likely that this will pay off as a serious improvement in Python's > performance, when combined with a psyco-like system (not an > architecture-dependent ones). It's really only useful if you are accessing fixed attributes of a fixed object many times. The only case I can think of where this kind of thing would be useful (sufficient accesses to make a positive difference) is in the case of module globals, but in that case, we can merely change how module globals are implemented (more or less like self.__dict__ = ... in the module's __init__ method). > > From what I can gather (please describe the algorithm now that I have > > asked twice), the only place where there exists improvement potential is > > in the case of global lookups in a module. That is to say, if a > > function/method in module foo is accessing some global variable bar, the > > compiler can replace LOAD_GLOBAL/STORE_GLOBAL/DEL_GLOBAL with an opcode > > to access a special PyDictItem object that sits in the function/method > > cell variables, rather than having to look in the globals dictionary > > (that is attached to every function and method). > > As I explained above, there is room for improvement in normal > attribute lookups, however that improvement requires a psyco-like > system in order to be able to deduce which dicts are going to be > accessed by the GetAttr mechanism and then using static lookups to > access them directly. Insights into a practical method of such optimizations are not leaping forth from my brain (aside from using a probabilistic tracking mechanism to minimize overhead), though my experience with JIT compilers is limited. Maybe someone else has a good method (though I suspect that this particular problem is hard enough to make it not practical to make it into Python). > With access to globals and builtins, this optimization is indeed > easier, and your description is correct, I can be a little more > specific: > * Each code object can contain a "co_globals_dict_items" and > "co_builtins_dict_items" attributes that refer to the > exported-dict-items for that literal name in both the globals and > builtins dict. > > * Each access to a literal name in the globals/builtins namespace, at > the compilation stage, will request the globals dict and builtins dict > to create an exported item for that literal name. This exported item > will be put into the co_globals_dict_items/co_builtins_dict_items in > the code object. > > * LOAD_GLOBAL will not be used when literal name are accessed. > Instead, a new bytecode instruction "LOAD_LITERAL_GLOBAL" with an > index to the "co_globals_dict_items/co_builtins_dict_items" tuples in > the code object. You may want to change the name. "Literal" implies a constant, like 1 or "hello", as in 'x = "hello"'. LOAD_GLOBAL_FAST would seem to make more sense to me, considering that is what it intends to do. - Josiah From eyal.lotem at gmail.com Mon Jun 11 08:18:19 2007 From: eyal.lotem at gmail.com (Eyal Lotem) Date: Mon, 11 Jun 2007 09:18:19 +0300 Subject: [Python-ideas] Exporting dict Items for direct lookups of specific keys In-Reply-To: <20070610205256.7919.JCARLSON@uci.edu> References: <20070610061127.790A.JCARLSON@uci.edu> <20070610205256.7919.JCARLSON@uci.edu> Message-ID: On 6/11/07, Josiah Carlson wrote: > > "Eyal Lotem" wrote: > > On 6/10/07, Josiah Carlson wrote: > > > "Eyal Lotem" wrote: > > > > On 6/10/07, Josiah Carlson wrote: > > > > Only access of exported items is O(1) time (when accessed via your > > > > PyDictItem_obj->value), other items must be accessed normally and they > > > > take just as much time (or as I explained and you reiterated, a tad > > > > longer, as it requires a bitmap check and in the case of exported > > > > items another dereference). > > > > > > But you still don't explain *how* these exported keys are going to be > > > accessed. Walk me through the steps required to improve access times in > > > the following case: > > > > > > def foo(obj): > > > return obj.foo > > > > > > > > I think you missed what I said - I said that the functionality should > > probably not be exported to Python - as Python has little to gain from > > it (it would have to getattr a C method just to request the exported > > item -- which will nullify the speed benefit). > > > > It is the C code which can suddenly do direct access to access the > > exported dict items - not Python code. > > Maybe my exposure to C extensions is limited, but I haven't seen a whole > lot of C doing the equivalent of obj.attrname outside of the Python > standard library. And even then, it's not "I'm going to access attribute > Y of object X a million times", it's "I'm going to access some > attributes on some objects". The only exception that I've seen happen > really at all is when someone converts their pure Python library that > interacts with other libraries into Pyrex. But even then, repeated > access is generally uncommon except in wxPython uses; wx. > (which I've never seen converted to Pyrex), and in those cases, repeated > access is generally rare. > > I'm curious as to what kind of C code you are seeing in which these > cached lookups will help in a substantial way. While extensions are an optimization target, the main target is global/builtin/attribute accessing code. > > > If I have a dictionary X, and X has exported keys, then whenever I > > > access exported values in the dictionary via X[key], your proposed > > > indirection will necessarily be slower than the current implementation. > > > > That is true, I acknowledged that. It is even true also that access to > > X[key] even when key is not exported is slower. When I have a few > > spare moments, I'll try and benchmark how much slower it is. > > I await your benchmarks. I have started work on this. I am still struggling to understand some nuances of dict's implementation in order to be able to make such a change. > > > > Attribute lookups in the class dict are all literal/static key lookups > > > > in a static dict (though in order for a code object to know that it is > > > > a static dict, a psyco-like system is required. If such a system is > > > > used, all of those dict lookups can be made faster as well). > > > > > > No, attribute lookups are not all literal/static key lookups. See > > > getattr/setattr/delattr, operations on cls.__dict__, obj.__dict__, etc. > > > > I may have oversimplified a bit for the sake of explaining. I was > > referring to the operations that are taken by LOAD_ATTR, as an > > example. > > Lets analyze the LOAD_ATTR bytecode instruction: > > * Calls PyObject_GetAttr for the requested attribute name on the > > request object. > > * PyObject_GetAttr redirects it to the type's tp_getattr[o]. > > * When tp_getattr[o] is not overridden, this calls PyObject_GenericGetAttr. > > * PyObject_GenericGetAttr first looks for a method descriptor in > > dicts of every class in the entire __mro__. If it found a > > getter/setter descriptor, it uses that. If it didn't, it tries the > > instance dict, and then uses the class descriptor/attr. > > > > I believe this implementation to be very wasteful (specifically the > > last step) and I have posted a separate post about this in python-dev. > > Due to the lack of support on the issue in python-dev (it would break > currently existing code, and the time for Python 3.0 PEPs is past), I > doubt you are going to get any changes in this area unless the resulting > semantics are unchanged. Well, I personally find those semantics (involving the question of whether the class attribute has a __set__ or not) to be "inelegant", at best, but since I realized that the optimization I am proposing is orthogonal to that change, I have lost interest there. > > Since a "static lookup" costs a dereference and a conditional, and a > > dynamic lookup entails at least 4 C function calls (including C stack > > setup/unwinds), a few C assignments and C conditionals, I believe it > > is likely that this will pay off as a serious improvement in Python's > > performance, when combined with a psyco-like system (not an > > architecture-dependent ones). > > It's really only useful if you are accessing fixed attributes of a fixed > object many times. The only case I can think of where this kind of > thing would be useful (sufficient accesses to make a positive difference) > is in the case of module globals, but in that case, we can merely change > how module globals are implemented (more or less like self.__dict__ = ... > in the module's __init__ method). That's not true. As I explained, getattr accesses the types's mro dicts as well. So even if you are accessing a lot of different instances, and those have a shared (fixed) type, you can speed up the type-side dict lookup (even if you still pay for a whole instance-side lookup). Also, "fixed-object" access can occur when you have a small number of objects whose attributes are looked up many times. In such a case, a psyco-like system can create a specialized code object specifically for _instances_ (not just for types), each code object using "static lookups" on the instance's dict as well, and not just on the class's dict. > > > From what I can gather (please describe the algorithm now that I have > > > asked twice), the only place where there exists improvement potential is > > > in the case of global lookups in a module. That is to say, if a > > > function/method in module foo is accessing some global variable bar, the > > > compiler can replace LOAD_GLOBAL/STORE_GLOBAL/DEL_GLOBAL with an opcode > > > to access a special PyDictItem object that sits in the function/method > > > cell variables, rather than having to look in the globals dictionary > > > (that is attached to every function and method). > > > > As I explained above, there is room for improvement in normal > > attribute lookups, however that improvement requires a psyco-like > > system in order to be able to deduce which dicts are going to be > > accessed by the GetAttr mechanism and then using static lookups to > > access them directly. > > Insights into a practical method of such optimizations are not leaping > forth from my brain (aside from using a probabilistic tracking mechanism > to minimize overhead), though my experience with JIT compilers is > limited. Maybe someone else has a good method (though I suspect that > this particular problem is hard enough to make it not practical to make > it into Python). Implementing a psyco-like system in CPython is indeed not an easy task. But it is possible. The simple idea is that you create specialized code objects that are specific to an instance or to a type in the code object when the code object is first run with that instance or type, and use an exact-type check to invoke the right code object. The specialized code object can use "static lookups" in dicts, and perhaps even avoid using obj->ob_type->slotname (instead use slotname directly, as its already specific to a type). > > With access to globals and builtins, this optimization is indeed > > easier, and your description is correct, I can be a little more > > specific: > > * Each code object can contain a "co_globals_dict_items" and > > "co_builtins_dict_items" attributes that refer to the > > exported-dict-items for that literal name in both the globals and > > builtins dict. > > > > * Each access to a literal name in the globals/builtins namespace, at > > the compilation stage, will request the globals dict and builtins dict > > to create an exported item for that literal name. This exported item > > will be put into the co_globals_dict_items/co_builtins_dict_items in > > the code object. > > > > * LOAD_GLOBAL will not be used when literal name are accessed. > > Instead, a new bytecode instruction "LOAD_LITERAL_GLOBAL" with an > > index to the "co_globals_dict_items/co_builtins_dict_items" tuples in > > the code object. > > You may want to change the name. "Literal" implies a constant, like 1 > or "hello", as in 'x = "hello"'. LOAD_GLOBAL_FAST would seem to make > more sense to me, considering that is what it intends to do. Well, LOAD_GLOBAL_FAST can only be used when the string that's being looked up is known at the code-object creation time, which means that the attribute name was indeed literal. Eyal From jimjjewett at gmail.com Mon Jun 11 17:04:22 2007 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 11 Jun 2007 11:04:22 -0400 Subject: [Python-ideas] Exporting dict Items for direct lookups of specific keys In-Reply-To: References: <20070610061127.790A.JCARLSON@uci.edu> <20070610205256.7919.JCARLSON@uci.edu> Message-ID: Eyal, Have you taken a look at Andrea Griffini's patch, http://python.org/sf/1616125 -jJ From jcarlson at uci.edu Mon Jun 11 18:45:41 2007 From: jcarlson at uci.edu (Josiah Carlson) Date: Mon, 11 Jun 2007 09:45:41 -0700 Subject: [Python-ideas] Exporting dict Items for direct lookups of specific keys In-Reply-To: References: <20070610205256.7919.JCARLSON@uci.edu> Message-ID: <20070611092553.791E.JCARLSON@uci.edu> "Eyal Lotem" wrote: > On 6/11/07, Josiah Carlson wrote: > > "Eyal Lotem" wrote: > > > On 6/10/07, Josiah Carlson wrote: > > > > "Eyal Lotem" wrote: > > > > > On 6/10/07, Josiah Carlson wrote: > > > > > Only access of exported items is O(1) time (when accessed via your > > > > > PyDictItem_obj->value), other items must be accessed normally and they > > > > > take just as much time (or as I explained and you reiterated, a tad > > > > > longer, as it requires a bitmap check and in the case of exported > > > > > items another dereference). > > > > > > > > But you still don't explain *how* these exported keys are going to be > > > > accessed. Walk me through the steps required to improve access times in > > > > the following case: > > > > > > > > def foo(obj): > > > > return obj.foo > > > > > > > > > > > I think you missed what I said - I said that the functionality should > > > probably not be exported to Python - as Python has little to gain from > > > it (it would have to getattr a C method just to request the exported > > > item -- which will nullify the speed benefit). > > > > > > It is the C code which can suddenly do direct access to access the > > > exported dict items - not Python code. [snip] > While extensions are an optimization target, the main target is > global/builtin/attribute accessing code. Or really, module globals and __builtin__ accessing. Arbitrary attribute access is one of those "things most commonly done in Python". But just for the sake of future readers of this thread, could you explicitly enumerate *which* things you intend to speed up with this work. > > > Since a "static lookup" costs a dereference and a conditional, and a > > > dynamic lookup entails at least 4 C function calls (including C stack > > > setup/unwinds), a few C assignments and C conditionals, I believe it > > > is likely that this will pay off as a serious improvement in Python's > > > performance, when combined with a psyco-like system (not an > > > architecture-dependent ones). > > > > It's really only useful if you are accessing fixed attributes of a fixed > > object many times. The only case I can think of where this kind of > > thing would be useful (sufficient accesses to make a positive difference) > > is in the case of module globals, but in that case, we can merely change > > how module globals are implemented (more or less like self.__dict__ = ... > > in the module's __init__ method). > > That's not true. > > As I explained, getattr accesses the types's mro dicts as well. So > even if you are accessing a lot of different instances, and those have > a shared (fixed) type, you can speed up the type-side dict lookup > (even if you still pay for a whole instance-side lookup). Also, That's MRO caching, which you have already stated is orthogonal to this particular proposal. > "fixed-object" access can occur when you have a small number of > objects whose attributes are looked up many times. In such a case, a > psyco-like system can create a specialized code object specifically > for _instances_ (not just for types), each code object using "static > lookups" on the instance's dict as well, and not just on the class's > dict. If you re-read my last posting, which you quoted above and I re-quote, you can easily replace 'fixed attributes of a fixed object' with 'fixed attributes of a small set of fixed objects' and get what you say. Aside from module globals, when is this seen? > > You may want to change the name. "Literal" implies a constant, like 1 > > or "hello", as in 'x = "hello"'. LOAD_GLOBAL_FAST would seem to make > > more sense to me, considering that is what it intends to do. > > Well, LOAD_GLOBAL_FAST can only be used when the string that's being > looked up is known at the code-object creation time, which means that > the attribute name was indeed literal. A literal is a value. A name/identifier is a reference. In: a = "hello" ... "hello" is a literal. In: hello = 1 ... hello is a name/identifier. In: b.hello = 1 ... hello is a named attribute of an object named/identified by b. - Josiah From eyal.lotem at gmail.com Tue Jun 12 14:25:47 2007 From: eyal.lotem at gmail.com (Eyal Lotem) Date: Tue, 12 Jun 2007 15:25:47 +0300 Subject: [Python-ideas] Exporting dict Items for direct lookups of specific keys In-Reply-To: References: <20070610061127.790A.JCARLSON@uci.edu> <20070610205256.7919.JCARLSON@uci.edu> Message-ID: That seems interesting. My patch should have the same speed-up effect (assuming it has no serious consequences on the performance of dicts in general) for constant read-only globals/builtins, but it should also equally speed up global writes and reads of globals/builtins that constantly change. Thanks for the reference, it is encouraging as to what I can expect from the speedup of my patch. On 6/11/07, Jim Jewett wrote: > Eyal, > > Have you taken a look at Andrea Griffini's patch, > http://python.org/sf/1616125 > > -jJ > From eyal.lotem at gmail.com Tue Jun 12 14:56:07 2007 From: eyal.lotem at gmail.com (Eyal Lotem) Date: Tue, 12 Jun 2007 15:56:07 +0300 Subject: [Python-ideas] Exporting dict Items for direct lookups of specific keys In-Reply-To: <20070611092553.791E.JCARLSON@uci.edu> References: <20070610205256.7919.JCARLSON@uci.edu> <20070611092553.791E.JCARLSON@uci.edu> Message-ID: On 6/11/07, Josiah Carlson wrote: > > "Eyal Lotem" wrote: > > On 6/11/07, Josiah Carlson wrote: > > > "Eyal Lotem" wrote: > > > > On 6/10/07, Josiah Carlson wrote: > > > > > "Eyal Lotem" wrote: > > > > > > On 6/10/07, Josiah Carlson wrote: > > > > > > Only access of exported items is O(1) time (when accessed via your > > > > > > PyDictItem_obj->value), other items must be accessed normally and they > > > > > > take just as much time (or as I explained and you reiterated, a tad > > > > > > longer, as it requires a bitmap check and in the case of exported > > > > > > items another dereference). > > > > > > > > > > But you still don't explain *how* these exported keys are going to be > > > > > accessed. Walk me through the steps required to improve access times in > > > > > the following case: > > > > > > > > > > def foo(obj): > > > > > return obj.foo > > > > > > > > > > > > > > I think you missed what I said - I said that the functionality should > > > > probably not be exported to Python - as Python has little to gain from > > > > it (it would have to getattr a C method just to request the exported > > > > item -- which will nullify the speed benefit). > > > > > > > > It is the C code which can suddenly do direct access to access the > > > > exported dict items - not Python code. > [snip] > > While extensions are an optimization target, the main target is > > global/builtin/attribute accessing code. > > Or really, module globals and __builtin__ accessing. Arbitrary > attribute access is one of those "things most commonly done in Python". > But just for the sake of future readers of this thread, could you > explicitly enumerate *which* things you intend to speed up with this > work. As for optimizing globals/builtins, this will be the effect of my patch: global x x = 5 # Direct access write instead of dict write x # Direct access read globals()['x'] = 5 # Same speed as before. globals()['x'] # Same speed as before. min # Two direct access reads, instead of 2 dict reads. As for attribute access in classes, the speedup I can gain depends on a psyco-like system. Lets assume that we have a system that utilizes a new TYPE_FORK opcode that jumps to use different code according to a map of types, for example, if we have: def f(x, y): x.hello() Then we will create a TYPE_FORK opcode before x.hello() that takes 'x' as an argument, and a map of types (initially empty). When the exact type of 'x' isn't in the map, then the rest of the code in the code object after TYPE_FORK will have a specialized version created for x's current type [only if that type doesn't override tp_getattr/o], and inserted in the map. The specialized version of the code will contain, instead of a LOAD_ATTR for the string "hello", a FAST_LOAD_ATTR for the string "hello" (associated with the direct-access dict item in the mro dict (if there's no mro cache, I actually have a problem here, because I don't know which dicts I need to export dict items from - and worse, that list may change with time. The simplest solution is to use an exported item from an mro cache dict)). FAST_LOAD_ATTR will not call PyObject_GetAttr, but instead use the exported dict items to find the descriptor/classattr using direct access. If it found a descriptor with __get__/__set__, it will return its get-call. Otherwise, it will do a normal expensive lookup on the instance dict (for "hello") (unless __slots__ is defined in which case there is no instance dict). If it found that, it will return that. Otherwise, it will return the descriptor's get-call if it has one or the descriptor itself as a classattr. In other words, I am reimplementing PyObject_GenericGetAttr here, but for mro-side lookups, using my direct lookup. The result is: class X(object): def method(self, arg): self.x = arg # One direct-lookup+dict lookup instead of two dict lookups self.other_method() # One direct-lookup+dict lookup instead of two dict lookups class Y(object): __slots__ = ["x"] def method(self, arg): self.x = arg # One direct-lookup instead of one dict lookup self.other_method() # One direct-lookup instead of one dict lookup A direct lookup is significantly cheaper than a dict lookup (as optimized as dict is, it still involves C callstack setups/unwinds, more conditionals, assignments, potential collisions and far more instructions). Therefore, with the combination of a psyco-like system I can eliminate one of two dict lookup costs, and with the combination of __slots__ as well, I can eliminate one of one dict lookup costs. > > > > Since a "static lookup" costs a dereference and a conditional, and a > > > > dynamic lookup entails at least 4 C function calls (including C stack > > > > setup/unwinds), a few C assignments and C conditionals, I believe it > > > > is likely that this will pay off as a serious improvement in Python's > > > > performance, when combined with a psyco-like system (not an > > > > architecture-dependent ones). > > > > > > It's really only useful if you are accessing fixed attributes of a fixed > > > object many times. The only case I can think of where this kind of > > > thing would be useful (sufficient accesses to make a positive difference) > > > is in the case of module globals, but in that case, we can merely change > > > how module globals are implemented (more or less like self.__dict__ = ... > > > in the module's __init__ method). > > > > That's not true. > > > > As I explained, getattr accesses the types's mro dicts as well. So > > even if you are accessing a lot of different instances, and those have > > a shared (fixed) type, you can speed up the type-side dict lookup > > (even if you still pay for a whole instance-side lookup). Also, > > That's MRO caching, which you have already stated is orthogonal to this > particular proposal. I may have made a mistake before - its not completely orthogonal as an MRO cache dict which can export items for direct access in psyco'd code is a clean and simple solution, while the lack of an MRO cache means that finding which class-side dicts to take exported items from may be a difficult problem which may involve a cache of its own. > > "fixed-object" access can occur when you have a small number of > > objects whose attributes are looked up many times. In such a case, a > > psyco-like system can create a specialized code object specifically > > for _instances_ (not just for types), each code object using "static > > lookups" on the instance's dict as well, and not just on the class's > > dict. > > If you re-read my last posting, which you quoted above and I re-quote, > you can easily replace 'fixed attributes of a fixed object' with 'fixed > attributes of a small set of fixed objects' and get what you say. Aside > from module globals, when is this seen? Its seen when many calls are made on singletons. Its seen when an inner loop is not memory-intensive but is computationally intensive - which would translate to having an instance calling methods on itself or other instances. You may only have 100 instances relevant in your inner loop which is running many millions of times. In such a case, you will benefit greatly if for every code object in the methods of every instance, you create specialized code for every one of the types it is called with (say, 3 types per code object), so you take perhaps a factor of 3 of space for code objects (which I believe are not a significant portion of memory consumption in Python), but your performance will involve _no_ dict access at all for attribute lookups, and instead will just compare instance pointers and then use direct access. > > > You may want to change the name. "Literal" implies a constant, like 1 > > > or "hello", as in 'x = "hello"'. LOAD_GLOBAL_FAST would seem to make > > > more sense to me, considering that is what it intends to do. > > > > Well, LOAD_GLOBAL_FAST can only be used when the string that's being > > looked up is known at the code-object creation time, which means that > > the attribute name was indeed literal. > > A literal is a value. A name/identifier is a reference. > > In: > a = "hello" > ... "hello" is a literal. > > In: > hello = 1 > ... hello is a name/identifier. > > In: > b.hello = 1 > ... hello is a named attribute of an object named/identified by b. Then I agree, the use of the word literal here is inappropriate, constant/static may be more appropriate. Eyal From matt-python at theory.org Mon Jun 18 06:35:52 2007 From: matt-python at theory.org (Matt Chisholm) Date: Sun, 17 Jun 2007 21:35:52 -0700 Subject: [Python-ideas] labeled break/continue Message-ID: <20070618043552.GA28584@theory.org> Hi. I was wondering if there had ever been an official decision on the idea of adding labeled break and continue functionality to Python. I've found a few places where the idea has come up, in the context of named code blocks: http://groups.google.com/group/comp.lang.python/browse_thread/thread/a696624c92b91181/065b1dbc13ec2807?lnk=gst&q=labeled+break&rnum=1#065b1dbc13ec2807 and in the context of discussing do/while loops and assignments in conditionals: http://groups.google.com/group/comp.lang.python/browse_thread/thread/6da848f762c9cf58/979ca3cd42633b52?lnk=gst&q=labeled+break&rnum=3#979ca3cd42633b52 Both of those discussions just kind of petered out or changed direction without any conclusion. There's also this Python 2.6 which has a similar syntax (although different semantics) to one of the syntaxes proposed in the first discussion above: http://sourceforge.net/tracker/index.php?func=detail&aid=1714448&group_id=5470&atid=355470 I would be willing to help make a case and then write a PEP for labeled break and continue, as long as the community or the BDFL hasn't already decided against it. -matt P.S. My apologies about cross posting; python-ideas seems like a better place to post this, but PEP 1 says to post to python-list. From aahz at pythoncraft.com Mon Jun 18 07:11:00 2007 From: aahz at pythoncraft.com (Aahz) Date: Sun, 17 Jun 2007 22:11:00 -0700 Subject: [Python-ideas] labeled break/continue In-Reply-To: <20070618043552.GA28584@theory.org> References: <20070618043552.GA28584@theory.org> Message-ID: <20070618051059.GA27027@panix.com> On Sun, Jun 17, 2007, Matt Chisholm wrote: > > I would be willing to help make a case and then write a PEP for > labeled break and continue, as long as the community or the BDFL > hasn't already decided against it. You would do the community a service to write the PEP even if the BDFL already vetoed it -- PEPs are valuable documentation of *why* an idea is rejected. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "as long as we like the same operating system, things are cool." --piranha From eyal.lotem at gmail.com Tue Jun 19 17:13:10 2007 From: eyal.lotem at gmail.com (Eyal Lotem) Date: Tue, 19 Jun 2007 18:13:10 +0300 Subject: [Python-ideas] Accelerated attr lookups Message-ID: Hi, I have attached a patch at: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1739789&group_id=5470 A common optimization tip for Python code is to use locals rather than globals. This converts dictionary lookups of (interned) strings to tuple indexing. I have created a patch that achieves this speed benefit "automatically" for all globals and builtins, by adding a feature to dictobjects. Additionally, the idea of this patch is that it puts down the necessary infrastructure to also allow virtually all attribute accesses to also be accelerated in the same way (with some extra work, of course). I have already suggested this before but I got the impression that the spirit of the replies was "talk is cheap, show us the code/benchmarks". So I wrote some code. Getting the changes to work was not easy, and required learning about the nuances of dictobject's, their re-entrancy issues, etc. These changes do slow down dictobjects, but it seems that this slowdown is more than offset by the speed increase of builtins/globals access. Benchmarks: A set of benchmarks that repeatedly perform: A. Global reads B. Global writes C. Builtin reads with little overheads (just repeatedly issuing global/builtin access bytecodes, many times per loop iteration to minimize the loop overhead), yield 30% time decrease (~42% speed increase). Regression tests take ~62 seconds (of user+sys time) with Python2.6 trunk Regression tests take ~65 seconds (of user+sys time) with the patch Regression tests are about ~4.5% slower. (Though Regression tests probably spend their running time on a lot more code than other programs, so are not a good benchmark, which spends more time instantiating function objects, and less time executing them) pystone seems to be improved by about 5%. My conclusions: The LOAD_GLOBAL/STORE_GLOBAL opcodes are considerably faster. Dict accesses or perhaps the general extra activity around seem to be insignificantly slower, or at least cancel out against the speed benefits in the regression tests. The next step I am going to try, is to replace the PyObject_GetAttr call with code that: * Calls PyObject_GetAttr only if GenericGetAttr is not the object's handler, as to allow modifying the behaviour. * Otherwise, remember for each attribute-accessing opcode, the last type from which the attribute was accessed. A single pointer comparison can check if the attribute access is using the same type. In case it does, it can use a stored exported key from the type dictionary [or from an mro cache dictionary for that type, if that is added], rather than a dict lookup. If it yields the same speed benefit, it could make attribute access opcodes up-to 42% faster as well, when used on the same types (which is probably the common case, particularly in inner loops). This will allow, with the combination of __slots__, to eliminate all dict lookups for most instance-side accesses as well. P.S: I discovered a lot of code duplication (and "went along" and duplicated my code in the same spirit), but was wondering if a patch that utilized C's preprocessor heavily to prevent code duplication in CPython's code, and trusting the "inline" keyword to prevent thousands of lines in the same function (ceval.c's opcode switch) would be accepted. From brett at python.org Tue Jun 19 20:21:39 2007 From: brett at python.org (Brett Cannon) Date: Tue, 19 Jun 2007 11:21:39 -0700 Subject: [Python-ideas] Accelerated attr lookups In-Reply-To: References: Message-ID: On 6/19/07, Eyal Lotem wrote: [SNIP] > P.S: I discovered a lot of code duplication (and "went along" and > duplicated my code in the same spirit), but was wondering if a patch > that utilized C's preprocessor heavily to prevent code duplication in > CPython's code, and trusting the "inline" keyword to prevent thousands > of lines in the same function (ceval.c's opcode switch) would be > accepted. Preprocessor stuff is fine. But 'inline' is not a valid keyword in C89 so that will not be accepted. -Brett From showell30 at yahoo.com Wed Jun 20 03:35:13 2007 From: showell30 at yahoo.com (Steve Howell) Date: Tue, 19 Jun 2007 18:35:13 -0700 (PDT) Subject: [Python-ideas] English builtins for Python Message-ID: <261237.44526.qm@web33514.mail.mud.yahoo.com> I somewhat tongue-in-cheekly propose to make the first seven most common English words all integral part of the Python language (three already are): source: http://www.world-english.org/english500.htm 1 the: Singletons: the class Logger: # ... 2 of inheritance: class Square of Shape:: # pass 3 to printing: print('hello world') to sys.stdout 4 and already a keyword 5 a introspection: if object is a dict: # ... 6 in already a keyword 7 is already a keyword Then it gets tougher: 8 it 9 you 10 that Top 500 words that are already keywords/builtins/conventions in Python: 27 or 49 each 55 if 189 try 198 self 251 open 254 next Top 500 words that are already keywords in some languages: 25 this 52 do 68 long Top 500 words that should NEVER be keywords: 78 could 81 did 180 men 252 seem 435 oh Words that seem like they'd be part of a programming language, but maybe a bad idea: 74 has 82 my 120 every 148 too ____________________________________________________________________________________ Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase. http://farechase.yahoo.com/ From eduardo.padoan at gmail.com Wed Jun 20 04:04:07 2007 From: eduardo.padoan at gmail.com (Eduardo "EdCrypt" O. Padoan) Date: Tue, 19 Jun 2007 23:04:07 -0300 Subject: [Python-ideas] English builtins for Python In-Reply-To: <261237.44526.qm@web33514.mail.mud.yahoo.com> References: <261237.44526.qm@web33514.mail.mud.yahoo.com> Message-ID: > Top 500 words that are already > keywords/builtins/conventions in Python: > > 27 or > 49 each > 55 if > 189 try > 198 self > 251 open > 254 next Also: 13 for 16 with 17 as 26 from -- EduardoOPadoan (eopadoan->altavix::com) Bookmarks: http://del.icio.us/edcrypt From fdrake at acm.org Wed Jun 20 04:13:54 2007 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 19 Jun 2007 22:13:54 -0400 Subject: [Python-ideas] English builtins for Python In-Reply-To: <261237.44526.qm@web33514.mail.mud.yahoo.com> References: <261237.44526.qm@web33514.mail.mud.yahoo.com> Message-ID: <200706192213.55011.fdrake@acm.org> On Tuesday 19 June 2007, Steve Howell wrote: > Words that seem like they'd be part of a programming > language, but maybe a bad idea: ... > 82 my Perl uses this one. ;-) -Fred -- Fred L. Drake, Jr. From showell30 at yahoo.com Wed Jun 20 04:19:46 2007 From: showell30 at yahoo.com (Steve Howell) Date: Tue, 19 Jun 2007 19:19:46 -0700 (PDT) Subject: [Python-ideas] English builtins for Python In-Reply-To: <200706192213.55011.fdrake@acm.org> Message-ID: <582659.55607.qm@web33509.mail.mud.yahoo.com> --- "Fred L. Drake, Jr." wrote: > On Tuesday 19 June 2007, Steve Howell wrote: > > Words that seem like they'd be part of a > programming > > language, but maybe a bad idea: > ... > > 82 my > > Perl uses this one. ;-) > Yes indeed! It uses "our" as well: http://perldoc.perl.org/functions/our.html I don't remember whether Perl has "theirs," "yours," "his," "hers," etc. ____________________________________________________________________________________ Expecting? Get great news right away with email Auto-Check. Try the Yahoo! Mail Beta. http://advision.webevents.yahoo.com/mailbeta/newmail_tools.html From adam at atlas.st Wed Jun 20 04:21:33 2007 From: adam at atlas.st (Adam Atlas) Date: Tue, 19 Jun 2007 22:21:33 -0400 Subject: [Python-ideas] English builtins for Python In-Reply-To: <261237.44526.qm@web33514.mail.mud.yahoo.com> References: <261237.44526.qm@web33514.mail.mud.yahoo.com> Message-ID: On 19 Jun 2007, at 21.35, Steve Howell wrote: > Top 500 words that should NEVER be keywords: > > 78 could > 81 did > 180 men > 252 seem > 435 oh I'm having fun thinking about the possibilities of these. "could" could be a keyword if we had a magical Nondeterministic Turing Machine. You could have a "could" block (with a suite) followed by at least one (but unlimited) "else" blocks. And when that code was encountered, it would automatically choose the right one. :P From fdrake at acm.org Wed Jun 20 05:00:52 2007 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 19 Jun 2007 23:00:52 -0400 Subject: [Python-ideas] English builtins for Python In-Reply-To: <582659.55607.qm@web33509.mail.mud.yahoo.com> References: <582659.55607.qm@web33509.mail.mud.yahoo.com> Message-ID: <200706192300.52257.fdrake@acm.org> On Tuesday 19 June 2007, Steve Howell wrote: > Yes indeed! It uses "our" as well: > > http://perldoc.perl.org/functions/our.html Eeeewww.... > I don't remember whether Perl has "theirs," "yours," > "his," "hers," etc. I don't even want to know.... -Fred -- Fred L. Drake, Jr. From eyal.lotem at gmail.com Wed Jun 20 11:50:21 2007 From: eyal.lotem at gmail.com (Eyal Lotem) Date: Wed, 20 Jun 2007 12:50:21 +0300 Subject: [Python-ideas] Exporting dict Items for direct lookups of specific keys In-Reply-To: References: <20070610205256.7919.JCARLSON@uci.edu> <20070611092553.791E.JCARLSON@uci.edu> Message-ID: I have created a new thread on Python Ideas to discuss this. I have also wrote some code and attached a patch. I did not eventually have to use a bitmap in dicts, but could abuse the top hash bit instead: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1739789&group_id=5470 pystones and other benchmarks seem to accelerate by about 5%. Other specific benchmarks built to measure the speed increase of the globals/builtins keywords measure a 42% speedup. Regression tests are less than 5% slower, but I assume that if adding this acceleration to attr lookups as well, they will be accelerated too. Eyal On 6/12/07, Eyal Lotem wrote: > On 6/11/07, Josiah Carlson wrote: > > > > "Eyal Lotem" wrote: > > > On 6/11/07, Josiah Carlson wrote: > > > > "Eyal Lotem" wrote: > > > > > On 6/10/07, Josiah Carlson wrote: > > > > > > "Eyal Lotem" wrote: > > > > > > > On 6/10/07, Josiah Carlson wrote: > > > > > > > Only access of exported items is O(1) time (when accessed via your > > > > > > > PyDictItem_obj->value), other items must be accessed normally and they > > > > > > > take just as much time (or as I explained and you reiterated, a tad > > > > > > > longer, as it requires a bitmap check and in the case of exported > > > > > > > items another dereference). > > > > > > > > > > > > But you still don't explain *how* these exported keys are going to be > > > > > > accessed. Walk me through the steps required to improve access times in > > > > > > the following case: > > > > > > > > > > > > def foo(obj): > > > > > > return obj.foo > > > > > > > > > > > > > > > > > I think you missed what I said - I said that the functionality should > > > > > probably not be exported to Python - as Python has little to gain from > > > > > it (it would have to getattr a C method just to request the exported > > > > > item -- which will nullify the speed benefit). > > > > > > > > > > It is the C code which can suddenly do direct access to access the > > > > > exported dict items - not Python code. > > [snip] > > > While extensions are an optimization target, the main target is > > > global/builtin/attribute accessing code. > > > > Or really, module globals and __builtin__ accessing. Arbitrary > > attribute access is one of those "things most commonly done in Python". > > But just for the sake of future readers of this thread, could you > > explicitly enumerate *which* things you intend to speed up with this > > work. > > As for optimizing globals/builtins, this will be the effect of my patch: > > global x > x = 5 # Direct access write instead of dict write > x # Direct access read > globals()['x'] = 5 # Same speed as before. > globals()['x'] # Same speed as before. > min # Two direct access reads, instead of 2 dict reads. > > As for attribute access in classes, the speedup I can gain depends on > a psyco-like system. Lets assume that we have a system that utilizes a > new TYPE_FORK opcode that jumps to use different code according to a > map of types, for example, if we have: > > def f(x, y): > x.hello() > > Then we will create a TYPE_FORK opcode before x.hello() that takes 'x' > as an argument, and a map of types (initially empty). When the exact > type of 'x' isn't in the map, then the rest of the code in the code > object after TYPE_FORK will have a specialized version created for x's > current type [only if that type doesn't override tp_getattr/o], and > inserted in the map. > The specialized version of the code will contain, instead of a > LOAD_ATTR for the string "hello", a FAST_LOAD_ATTR for the string > "hello" (associated with the direct-access dict item in the mro dict > (if there's no mro cache, I actually have a problem here, because I > don't know which dicts I need to export dict items from - and worse, > that list may change with time. The simplest solution is to use an > exported item from an mro cache dict)). > > FAST_LOAD_ATTR will not call PyObject_GetAttr, but instead use the > exported dict items to find the descriptor/classattr using direct > access. > If it found a descriptor with __get__/__set__, it will return its get-call. > Otherwise, it will do a normal expensive lookup on the instance dict > (for "hello") (unless __slots__ is defined in which case there is no > instance dict). > If it found that, it will return that. > Otherwise, it will return the descriptor's get-call if it has one or > the descriptor itself as a classattr. > > In other words, I am reimplementing PyObject_GenericGetAttr here, but > for mro-side lookups, using my direct lookup. > The result is: > > class X(object): > def method(self, arg): > self.x = arg # One direct-lookup+dict lookup instead of two dict lookups > self.other_method() # One direct-lookup+dict lookup instead of two > dict lookups > class Y(object): > __slots__ = ["x"] > def method(self, arg): > self.x = arg # One direct-lookup instead of one dict lookup > self.other_method() # One direct-lookup instead of one dict lookup > > A direct lookup is significantly cheaper than a dict lookup (as > optimized as dict is, it still involves C callstack setups/unwinds, > more conditionals, assignments, potential collisions and far more > instructions). > Therefore, with the combination of a psyco-like system I can eliminate > one of two dict lookup costs, and with the combination of __slots__ as > well, I can eliminate one of one dict lookup costs. > > > > > > Since a "static lookup" costs a dereference and a conditional, and a > > > > > dynamic lookup entails at least 4 C function calls (including C stack > > > > > setup/unwinds), a few C assignments and C conditionals, I believe it > > > > > is likely that this will pay off as a serious improvement in Python's > > > > > performance, when combined with a psyco-like system (not an > > > > > architecture-dependent ones). > > > > > > > > It's really only useful if you are accessing fixed attributes of a fixed > > > > object many times. The only case I can think of where this kind of > > > > thing would be useful (sufficient accesses to make a positive difference) > > > > is in the case of module globals, but in that case, we can merely change > > > > how module globals are implemented (more or less like self.__dict__ = ... > > > > in the module's __init__ method). > > > > > > That's not true. > > > > > > As I explained, getattr accesses the types's mro dicts as well. So > > > even if you are accessing a lot of different instances, and those have > > > a shared (fixed) type, you can speed up the type-side dict lookup > > > (even if you still pay for a whole instance-side lookup). Also, > > > > That's MRO caching, which you have already stated is orthogonal to this > > particular proposal. > I may have made a mistake before - its not completely orthogonal as an > MRO cache dict which can export items for direct access in psyco'd > code is a clean and simple solution, while the lack of an MRO cache > means that finding which class-side dicts to take exported items from > may be a difficult problem which may involve a cache of its own. > > > > "fixed-object" access can occur when you have a small number of > > > objects whose attributes are looked up many times. In such a case, a > > > psyco-like system can create a specialized code object specifically > > > for _instances_ (not just for types), each code object using "static > > > lookups" on the instance's dict as well, and not just on the class's > > > dict. > > > > If you re-read my last posting, which you quoted above and I re-quote, > > you can easily replace 'fixed attributes of a fixed object' with 'fixed > > attributes of a small set of fixed objects' and get what you say. Aside > > from module globals, when is this seen? > Its seen when many calls are made on singletons. > Its seen when an inner loop is not memory-intensive but is > computationally intensive - which would translate to having an > instance calling methods on itself or other instances. > You may only have 100 instances relevant in your inner loop which is > running many millions of times. In such a case, you will benefit > greatly if for every code object in the methods of every instance, you > create specialized code for every one of the types it is called with > (say, 3 types per code object), so you take perhaps a factor of 3 of > space for code objects (which I believe are not a significant portion > of memory consumption in Python), but your performance will involve > _no_ dict access at all for attribute lookups, and instead will just > compare instance pointers and then use direct access. > > > > > You may want to change the name. "Literal" implies a constant, like 1 > > > > or "hello", as in 'x = "hello"'. LOAD_GLOBAL_FAST would seem to make > > > > more sense to me, considering that is what it intends to do. > > > > > > Well, LOAD_GLOBAL_FAST can only be used when the string that's being > > > looked up is known at the code-object creation time, which means that > > > the attribute name was indeed literal. > > > > A literal is a value. A name/identifier is a reference. > > > > In: > > a = "hello" > > ... "hello" is a literal. > > > > In: > > hello = 1 > > ... hello is a name/identifier. > > > > In: > > b.hello = 1 > > ... hello is a named attribute of an object named/identified by b. > Then I agree, the use of the word literal here is inappropriate, > constant/static may be more appropriate. > > Eyal > From grosser.meister.morti at gmx.net Wed Jun 20 15:55:42 2007 From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=) Date: Wed, 20 Jun 2007 15:55:42 +0200 Subject: [Python-ideas] English builtins for Python In-Reply-To: <261237.44526.qm@web33514.mail.mud.yahoo.com> References: <261237.44526.qm@web33514.mail.mud.yahoo.com> Message-ID: <467931DE.7060101@gmx.net> Steve Howell schrieb: > I somewhat tongue-in-cheekly propose to make the first > seven most common English words all integral part of > the Python language (three already are): > > source: http://www.world-english.org/english500.htm > > 1 the: > Singletons: > > the class Logger: > # ... > > 2 of > inheritance: > > class Square of Shape:: > # pass > > 3 to > printing: > > print('hello world') to sys.stdout > > 4 and > already a keyword > > 5 a > introspection: > > if object is a dict: > # ... > is "a" the keyword or is it "is a"? > 6 in > already a keyword > > 7 is > already a keyword > > Then it gets tougher: > > 8 it > 9 you > 10 that > > Top 500 words that are already > keywords/builtins/conventions in Python: > > 27 or > 49 each each is a keyword? I don't think so. else is a keyword. > 55 if > 189 try > 198 self self is not a keyword, its a convention. > 251 open open is not a keyword, its a other name for the class "file". > 254 next > next is not a keyword. its the name of a method of an iterator. there are a lot of more methods in python which are single words. why don't list them, too? ;) > Top 500 words that are already keywords in some > languages: > > 25 this > 52 do > 68 long > > Top 500 words that should NEVER be keywords: > > 78 could > 81 did > 180 men > 252 seem > 435 oh > > Words that seem like they'd be part of a programming > language, but maybe a bad idea: > > 74 has > 82 my > 120 every > 148 too > From showell30 at yahoo.com Wed Jun 20 16:19:07 2007 From: showell30 at yahoo.com (Steve Howell) Date: Wed, 20 Jun 2007 07:19:07 -0700 (PDT) Subject: [Python-ideas] English builtins for Python In-Reply-To: <467931DE.7060101@gmx.net> Message-ID: <446970.84289.qm@web33502.mail.mud.yahoo.com> --- Mathias Panzenb?ck wrote: > > Top 500 words that are already > > keywords/builtins/conventions in Python: > > > > 27 or > > 49 each > > each is a keyword? I don't think so. else is a > keyword. > I take back "each." > > 55 if > > 189 try > > 198 self > > self is not a keyword, its a convention. > You're not reading very carefully. Scroll up, I said "keywords/builtins/conventions." > > next is not a keyword. its the name of a method of > an iterator. > there are a lot of more methods in python which are > single words. > why don't list them, too? ;) > 63 write 102 new 163 read 245 close Having listed those, it's actually kind of striking how few "common" English words have general enough meaning to be "common" programming words. http://www.world-english.org/english500.htm ____________________________________________________________________________________ Finding fabulous fares is fun. Let Yahoo! FareChase search your favorite travel sites to find flight and hotel bargains. http://farechase.yahoo.com/promo-generic-14795097 From jason.orendorff at gmail.com Wed Jun 20 22:21:19 2007 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Wed, 20 Jun 2007 13:21:19 -0700 Subject: [Python-ideas] English builtins for Python In-Reply-To: <261237.44526.qm@web33514.mail.mud.yahoo.com> References: <261237.44526.qm@web33514.mail.mud.yahoo.com> Message-ID: On 6/19/07, Steve Howell wrote: > I somewhat tongue-in-cheekly propose to make the first > seven most common English words all integral part of > the Python language (three already are): > > [...] This made me smile. Incidentally, Inform 7 has noun phrases where the words {a, an, the, any, all, most, least} have meanings; it can also be made to understand adjectives, nouns, and prepositions. So you can say things like: let x = the most annoying person in the location now all programmers are enlightened Not appropriate for Python, though. Inform 7: http://inform-fiction.org/I7/Inform%207.html Me, previously, on noun phrases: http://groups.google.com/group/rec.arts.int-fiction/msg/6bdc4b103cafe98f -j From greg.ewing at canterbury.ac.nz Thu Jun 21 05:01:22 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 21 Jun 2007 15:01:22 +1200 Subject: [Python-ideas] English builtins for Python In-Reply-To: <467931DE.7060101@gmx.net> References: <261237.44526.qm@web33514.mail.mud.yahoo.com> <467931DE.7060101@gmx.net> Message-ID: <4679EA02.9070703@canterbury.ac.nz> Mathias Panzenb?ck wrote: > is "a" the keyword or is it "is a"? To be practical, it would have to be a pseudo-keyword that was only recognised after "is". But it would be nicely perverse to make it a true keyword. :-) BTW, something like this actually happens in ALAN (a language for writing interactive fiction, aka adventure games) where you can't use "a" as part of the player-usable name of an object, because the command parser treats it as an indefinite article. So your courtroom drama can't have an object that the player refers to as "Exhibit A". "Exhibit B" is fine, though. :-) (This was true in Alan 2 at least -- Alan 3 might have improved matters.) -- Greg From greg.ewing at canterbury.ac.nz Thu Jun 21 05:25:50 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 21 Jun 2007 15:25:50 +1200 Subject: [Python-ideas] English builtins for Python In-Reply-To: References: <261237.44526.qm@web33514.mail.mud.yahoo.com> Message-ID: <4679EFBE.7000608@canterbury.ac.nz> Jason Orendorff wrote: > Incidentally, Inform 7 has noun phrases > where the words {a, an, the, any, all, most, least} have > meanings; Although it seems that I7 doesn't really have any reserved words in the usual sense. Whether a given word has a special meaning seems to he highly context-dependent, which makes forming a mental model of the grammar rather challenging... -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From showell30 at yahoo.com Thu Jun 21 11:36:05 2007 From: showell30 at yahoo.com (Steve Howell) Date: Thu, 21 Jun 2007 02:36:05 -0700 (PDT) Subject: [Python-ideas] English builtins for Python In-Reply-To: Message-ID: <934492.38703.qm@web33506.mail.mud.yahoo.com> --- Jason Orendorff wrote: > > Me, previously, on noun phrases: > http://groups.google.com/group/rec.arts.int-fiction/msg/6bdc4b103cafe98f > In that post you said: ''' Noun phrases are the product of a linguistic evolution that was basically finished thousands of years before we were born. The syntax isn't merely interesting. It's the best concrete syntax for the abstract concept it represents. Period. You're not going to beat it. (Well, arguably. My assumption here is that evolution is smarter than engineers, which I think ought not be controversial.) ''' Yes indeed. I was experiencing a similar thought process as I was writing my post. Although you pick out noun phrases as a sort of inevitable evolutionary outcome of human speech optimized on some valid dimension, I was picking up on a different figure of speech, the article. It's amazing how uncommon "the" and "a" are in mainstream programming languages. I'm not saying they should be (although I think there's some argument to using "the" for singletons), I just find it curious that they shouldn't be, and there's been enough evolution on programming languages (albeit a small amount of time compared to natural languages) to suggest that articles (as builtins) are just somehow *wrong* in programming languages. Yet they're so incredible popular in "natural" languages. ____________________________________________________________________________________ Got a little couch potato? Check out fun summer activities for kids. http://search.yahoo.com/search?fr=oni_on_mail&p=summer+activities+for+kids&cs=bz From showell30 at yahoo.com Thu Jun 21 12:09:28 2007 From: showell30 at yahoo.com (Steve Howell) Date: Thu, 21 Jun 2007 03:09:28 -0700 (PDT) Subject: [Python-ideas] English builtins for Python In-Reply-To: <4679EA02.9070703@canterbury.ac.nz> Message-ID: <514302.32164.qm@web33513.mail.mud.yahoo.com> --- Greg Ewing wrote: > Mathias Panzenb?ck wrote: > > > is "a" the keyword or is it "is a"? > > To be practical, it would have to be a > pseudo-keyword > that was only recognised after "is". > > But it would be nicely perverse to make it a true > keyword. :-) > A similarly perverse thought is to make "um" a keyword and allow it, um, almost anywhere it could possibly make sense. Which I think is almost everywhere. If the Python interpreter failed on a line with "um" in it, perhaps it would be extra aggressive in providing traceback information. :) ____________________________________________________________________________________ Choose the right car based on your needs. Check out Yahoo! Autos new Car Finder tool. http://autos.yahoo.com/carfinder/ From arno at marooned.org.uk Thu Jun 21 19:41:26 2007 From: arno at marooned.org.uk (Arnaud Delobelle) Date: Thu, 21 Jun 2007 18:41:26 +0100 (BST) Subject: [Python-ideas] English builtins for Python In-Reply-To: <934492.38703.qm@web33506.mail.mud.yahoo.com> References: <934492.38703.qm@web33506.mail.mud.yahoo.com> Message-ID: <59471.80.195.169.49.1182447686.squirrel@webmail.marooned.org.uk> On Thu, June 21, 2007 10:36 am, Steve Howell wrote: [...] > It's amazing how uncommon "the" > and "a" are in mainstream programming languages. I'm > not saying they should be (although I think there's > some argument to using "the" for singletons), I just > find it curious that they shouldn't be, and there's > been enough evolution on programming languages (albeit > a small amount of time compared to natural languages) > to suggest that articles (as builtins) are just > somehow *wrong* in programming languages. Yet they're > so incredible popular in "natural" languages. from random import randrange ABIGNUMBER = 100000 # or should it be THEBIGNUMBER? class ArticleError(Exception): pass def the(s): try: s = iter(s) ret = s.next() for el in s: raise ArticleError return ret except (StopIteration, TypeError): raise ArticleError("'the' argument must be a singleton iterable") def a(s): try: s = iter(s) for i, el in enumerate(s): if not randrange(i+1): ret = el if i == ABIGNUMBER: return ret return ret except (NameError, TypeError): raise ArticleError("'a' argument must be a non-empty iterable") an = a # for convenience :) --------------------- How to use 'a(n)' and 'the' ------- >>> a ('python') 'y' >>> a ('python') 'n' >>> an (x for x in range(10) if x+x==x*x) 0 >>> an (x for x in range(10) if x+x==x*x) 0 >>> an (x for x in range(10) if x+x==x*x) 2 >>> the (x for x in range(10) if x+x==x*x) Traceback (most recent call last): ... __main__.ArticleError: 'the' argument must be a singleton iterable >>> the (x for x in range(1, 10) if x+x==x*x) 2 >>> from itertools import count >>> a (count()) 81294 >>> a (count()) 41746 >>> from itertools import repeat >>> a (repeat('spam')) 'spam' >>> -- Arnaud From showell30 at yahoo.com Thu Jun 21 20:14:16 2007 From: showell30 at yahoo.com (Steve Howell) Date: Thu, 21 Jun 2007 11:14:16 -0700 (PDT) Subject: [Python-ideas] English builtins for Python In-Reply-To: <59471.80.195.169.49.1182447686.squirrel@webmail.marooned.org.uk> Message-ID: <409399.74206.qm@web33512.mail.mud.yahoo.com> --- Arnaud Delobelle wrote: > > from random import randrange > > ABIGNUMBER = 100000 # or should it be THEBIGNUMBER? > > class ArticleError(Exception): pass > > def the(s): > try: > s = iter(s) > ret = s.next() > for el in s: raise ArticleError > return ret > except (StopIteration, TypeError): > raise ArticleError("'the' argument must be a > singleton iterable") > > def a(s): > try: > s = iter(s) > for i, el in enumerate(s): > if not randrange(i+1): ret = el > if i == ABIGNUMBER: return ret > return ret > except (NameError, TypeError): > raise ArticleError("'a' argument must be a > non-empty iterable") Good stuff, I like it. Not sure I would actually use it, but it's a good brainstorm... :) > an = a # for convenience :) > Of course! ____________________________________________________________________________________ Boardwalk for $500? In 2007? Ha! Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games. http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow From carroll at tjc.com Fri Jun 22 00:40:42 2007 From: carroll at tjc.com (Terry Carroll) Date: Thu, 21 Jun 2007 15:40:42 -0700 (PDT) Subject: [Python-ideas] English builtins for Python In-Reply-To: <514302.32164.qm@web33513.mail.mud.yahoo.com> Message-ID: On Thu, 21 Jun 2007, Steve Howell wrote: > A similarly perverse thought is to make "um" a keyword > and allow it, um, almost anywhere it could possibly > make sense. Which I think is almost everywhere. I propose we make "um" a keyword that is synonymous with \-NEWLINE to indicate that an incomplete statement is being continued. From greg.ewing at canterbury.ac.nz Fri Jun 22 03:12:55 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 22 Jun 2007 13:12:55 +1200 Subject: [Python-ideas] English builtins for Python In-Reply-To: <934492.38703.qm@web33506.mail.mud.yahoo.com> References: <934492.38703.qm@web33506.mail.mud.yahoo.com> Message-ID: <467B2217.3040307@canterbury.ac.nz> Steve Howell wrote: > suggest that articles (as builtins) are just > somehow *wrong* in programming languages. Yet they're > so incredible popular in "natural" languages. Popularity of a given feature in natural languages doesn't necessarily imply optimality -- it could just be a result of those languages having a common ancestor. Linguists have concluded that most of the languages in use today, even ones that seem very different, can be traced back to a single ancestral language. -- Greg From showell30 at yahoo.com Fri Jun 22 04:15:50 2007 From: showell30 at yahoo.com (Steve Howell) Date: Thu, 21 Jun 2007 19:15:50 -0700 (PDT) Subject: [Python-ideas] English builtins for Python In-Reply-To: <467B2217.3040307@canterbury.ac.nz> Message-ID: <834813.52946.qm@web33501.mail.mud.yahoo.com> --- Greg Ewing wrote: > Steve Howell wrote: > > suggest that articles (as builtins) are just > > somehow *wrong* in programming languages. Yet > they're > > so incredible popular in "natural" languages. > > Popularity of a given feature in natural languages > doesn't necessarily imply optimality -- it could > just be a result of those languages having a common > ancestor. > Sure, but languages do evolve. Two examples: 1) The more popular words tend to get shortened over time. 2) Languages tend to appropriate only the most expressive words or phrases from other languages. > Linguists have concluded that most of the languages > in use today, even ones that seem very different, > can be traced back to a single ancestral language. > Same for programming languages to a certain degree. For example, Python has some stuff from C that would arguably not be there if it had been designed in a vacuum, but it traces some of its roots to C. ____________________________________________________________________________________Ready for the edge of your seat? Check out tonight's top picks on Yahoo! TV. http://tv.yahoo.com/ From rasky at develer.com Thu Jun 28 22:14:37 2007 From: rasky at develer.com (Giovanni Bajo) Date: Thu, 28 Jun 2007 22:14:37 +0200 Subject: [Python-ideas] Accelerated attr lookups In-Reply-To: References: Message-ID: On 19/06/2007 17.13, Eyal Lotem wrote: > Hi, I have attached a patch at: > https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1739789&group_id=5470 > > A common optimization tip for Python code is to use locals rather than > globals. This converts dictionary lookups of (interned) strings to > tuple indexing. I have created a patch that achieves this speed > benefit "automatically" for all globals and builtins, by adding a > feature to dictobjects. > > Additionally, the idea of this patch is that it puts down the > necessary infrastructure to also allow virtually all attribute > accesses to also be > accelerated in the same way (with some extra work, of course). > > I have already suggested this before but I got the impression that the > spirit of the replies was "talk is cheap, show us the > code/benchmarks". So I wrote some code. > > Getting the changes to work was not easy, and required learning about > the nuances of dictobject's, their re-entrancy issues, etc. These > changes do slow down dictobjects, but it seems that this slowdown is > more than offset by the speed increase of builtins/globals access. How does it compare with this patch: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1616125&group_id=5470 ? -- Giovanni Bajo From eyal.lotem at gmail.com Thu Jun 28 23:41:58 2007 From: eyal.lotem at gmail.com (Eyal Lotem) Date: Fri, 29 Jun 2007 00:41:58 +0300 Subject: [Python-ideas] Accelerated attr lookups In-Reply-To: References: Message-ID: I haven't compared benchmarks, but I strongly suspect that my patch is not as fast for "real" programs, as in real programs, globals/builtins are almost only exclusively read, and almost never written to. His patch accelerates global reads by as much as mine does, without making function object creation as expensive, and his addition of overhead to dicts is probably smaller. My patch also accelerates writes, but as I said, that will go nearly unnoticed normally. My patch is not the end, but a means to an end. If the purpose is only accelerating globals/builtins access, then the patch you linked to is simpler and better. The purpose I aim for, however, is to later use the same technique to also accelerate access to the dicts in the type or even in the instance, by specializing them in function objects. This will allow to get rid of almost all attribute lookups in dicts. Combined with the use of __slots__ in all classes, no dict lookups will be used at all for attributes at all, except in "getattr" calls. Combined with an mro cache, this should put Python very close to C in terms of attribute access speed (simple direct access). Eyal On 6/28/07, Giovanni Bajo wrote: > On 19/06/2007 17.13, Eyal Lotem wrote: > > > Hi, I have attached a patch at: > > https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1739789&group_id=5470 > > > > A common optimization tip for Python code is to use locals rather than > > globals. This converts dictionary lookups of (interned) strings to > > tuple indexing. I have created a patch that achieves this speed > > benefit "automatically" for all globals and builtins, by adding a > > feature to dictobjects. > > > > Additionally, the idea of this patch is that it puts down the > > necessary infrastructure to also allow virtually all attribute > > accesses to also be > > accelerated in the same way (with some extra work, of course). > > > > I have already suggested this before but I got the impression that the > > spirit of the replies was "talk is cheap, show us the > > code/benchmarks". So I wrote some code. > > > > Getting the changes to work was not easy, and required learning about > > the nuances of dictobject's, their re-entrancy issues, etc. These > > changes do slow down dictobjects, but it seems that this slowdown is > > more than offset by the speed increase of builtins/globals access. > > How does it compare with this patch: > https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1616125&group_id=5470 > > ? > -- > Giovanni Bajo > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas >