Mailman 3 September 2006 - Python-Dev

Re: [Python-Dev] Caching float(0.0)
by Kristján V. Jónsson Sept. 29, 2006

Sept. 29, 2006

Well gentlemen, I did gather some stats on the frequency of PyFloat_FromDouble(). out of the 1000 first different floats allocated, we get this frequency distribution once our server has started up: - stats [1000]({v=0.00000000000000000 c=410612 },{v=1.0000000000000000 c=107838 },{v=0.75000000000000000 c=25487 },{v=5.0000000000000000 c=22557 },...) std::vector<entry,std::allocator<entry> > + [0] {v=0.00000000000000000 c=410612 } entry + [1] {v=1.0000000000000000 c=107838 } entry … [View More]+ [2] {v=0.75000000000000000 c=25487 } entry + [3] {v=5.0000000000000000 c=22557 } entry + [4] {v=10000.000000000000 c=18530 } entry + [5] {v=-1.0000000000000000 c=14950 } entry + [6] {v=2.0000000000000000 c=14460 } entry + [7] {v=1500.0000000000000 c=13470 } entry + [8] {v=100.00000000000000 c=11913 } entry + [9] {v=0.50000000000000000 c=11497 } entry + [10] {v=3.0000000000000000 c=9833 } entry + [11] {v=20.000000000000000 c=9019 } entry + [12] {v=0.90000000000000002 c=8954 } entry + [13] {v=10.000000000000000 c=8377 } entry + [14] {v=4.0000000000000000 c=7890 } entry + [15] {v=0.050000000000000003 c=7732 } entry + [16] {v=1000.0000000000000 c=7456 } entry + [17] {v=0.40000000000000002 c=7427 } entry + [18] {v=-100.00000000000000 c=7071 } entry + [19] {v=5000.0000000000000 c=6851 } entry + [20] {v=1000000.0000000000 c=6503 } entry + [21] {v=0.070000000000000007 c=6071 } entry (here I omit the rest). In addition, my shared 0.0 double has some 200000 references at this point. 0.0 is very, very common. The same can be said about all the integers up to 5.0 as well as -1.0 I think I will add a simple cache for these values for Eve. something like: int i = (int) fval; if ((double)i == fval && i>=-1 && i<6) { Py_INCREF(table[i]); return table[i]; } Cheers, Kristján > -----Original Message----- > From: python-dev-bounces+kristjan=ccpgames.com(a)python.org > [mailto:python-dev-bounces+kristjan=ccpgames.com@python.org] > On Behalf Of Kristján V. Jónsson > Sent: 29. september 2006 15:18 > To: Fredrik Lundh; python-dev(a)python.org > Subject: Re: [Python-Dev] Caching float(0.0) > > Acting on this excellent advice, I have patched in a reuse > for -1.0, 0.0 and 1.0 for EVE Online. We use vectors and > stuff a lot, and 0.0 is very, very common. I'll report on > the refcount of this for you shortly. > > K > > > -----Original Message----- > > From: python-dev-bounces+kristjan=ccpgames.com(a)python.org > > [mailto:python-dev-bounces+kristjan=ccpgames.com@python.org] > > On Behalf Of Fredrik Lundh > > Sent: 29. september 2006 15:11 > > To: python-dev(a)python.org > > Subject: Re: [Python-Dev] Caching float(0.0) > > > > Nick Craig-Wood wrote: > > > > > Is there any reason why float() shouldn't cache the value > > of 0.0 since > > > it is by far and away the most common value? > > > > says who ? > > > > (I just checked the program I'm working on, and my analysis > tells me > > that the most common floating point value in that program > is 121.216, > > which occurs 32 times. from what I can tell, 0.0 isn't > used at all.) > > > > </F> > > > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev(a)python.org > > http://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > > http://mail.python.org/mailman/options/python-dev/kristjan%40c > cpgames.com > > > _______________________________________________ > Python-Dev mailing list > Python-Dev(a)python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/kristjan%40c cpgames.com > [View Less]

1 0

Re: [Python-Dev] Caching float(0.0)
by Kristján V. Jónsson Sept. 29, 2006

Sept. 29, 2006

Acting on this excellent advice, I have patched in a reuse for -1.0, 0.0 and 1.0 for EVE Online. We use vectors and stuff a lot, and 0.0 is very, very common. I'll report on the refcount of this for you shortly. K > -----Original Message----- > From: python-dev-bounces+kristjan=ccpgames.com(a)python.org > [mailto:python-dev-bounces+kristjan=ccpgames.com@python.org] > On Behalf Of Fredrik Lundh > Sent: 29. september 2006 15:11 > To: python-dev(a)python.org > Subject: … [View More]

1 0

os.unlink() closes file?
by Neal Becker Sept. 29, 2006

Sept. 29, 2006

It seems (I haven't looked at source) that os.unlink() will close the file? If so, please make this optional. It breaks the unix idiom for making a temporary file. (Yes, I know there is a tempfile module, but I need some behavior it doesn't implement so I want to do it myself).

2 2

weakref enhancements
by tomer filiba Sept. 29, 2006

Sept. 29, 2006

i'd like to suggest adding weak attributes and weak methods to the std weakref module. weakattrs are weakly-referenced attributes. when the value they reference is no longer strongly-referenced by something else, the weakattrs "nullify" themselves. weakmethod is a method decorator, like classmethod et al, that returns "weakly bound" methods. weakmethod's im_self is a weakref.proxy to `self`, which means the mere method will not keep the entire instance alive. instead, you'll get a … [View More]

7 10

Re: [Python-Dev] difficulty of implementing phase 2 of PEP 302 in Python source
by Phillip J. Eby Sept. 28, 2006

Sept. 28, 2006

At 11:25 AM 9/28/2006 -0700, Brett Cannon wrote: >I will think about it, but I am still trying to get the original question >of how bad the C code is compared to rewriting import in Python from >people. =) I would say that the C code is *delicate*, not necessarily bad. In most ways, it's rather straightforward, it's actually the requirements that are complex. :) A Python implementation, however, would be a good idea to have around for PyPy, Py3K, and other versions of Python, … [View More]

3 2

Re: [Python-Dev] difficulty of implementing phase 2 of PEP 302 in Python source
by Phillip J. Eby Sept. 28, 2006

Sept. 28, 2006

At 05:26 PM 9/27/2006 -0700, Brett Cannon wrote: >Ah, OK. So for importing 'email', the zipimporter would call the .pyc >importer and it would ask the zipimporter, "can you get me email.pyc?" and >if it said no it would move on to asking the .py importer for email.py, etc. Yes, exactly. >That's fine. Just thinking about how the current situation sucks for NFS >but how caching just isn't done. But obvoiusly this could change. Well, with this design, you can have a … [View More]CachingFilesystemImporter as your storage mechanism to speed things up. >> >>Of course, to fully support .pyc timestamp checking and writeback, you'd >> >>need some sort of "stat" or "getmtime" feature on the parent importer, as >> >>well as perhaps an optional "save_data" method. These would be extensions >> >>to PEP 302, but welcome ones. >> > >> >Could pass the string representing the location of where the string came >> >from. That would allow for the required stat calls for .pyc files as >> >needed without having to implement methods just for this one use case. >> >>Huh? In order to know if a .pyc is up to date, you need the st_mtime of >>the .py file. That can't be done in the parent importer without giving it >>format knowledge, which goes against the point of the exercise. > >Sorry, thought .pyc files based whether they needed to be recompiled based >on the stat info on the .py and .pyc file, not on data stored from within >the .pyc . It's not just that (although I believe it's also the case that there is a timestamp inside .pyc), it's that to do the check in the parent importer, the parent importer would have to know that there is such a thing as .py-and-.pyc. The whole point of this design is that the parent importer doesn't have to know *anything* about filename extensions OR how those files are formatted internally. In this scheme, adding more child importers is sufficient to add all the special handling needed for .py/.pyc-style schemes. Of course, for maximum flexibility, you might want get_stream() and get_file() methods optionally available, since a .so loader really needs a file, and .pyc might want to read in two stages. But the child importers can be defensively coded so as to be able to live with only a parent.get_data(), if necessary, and do the enhanced behaviors only if stat() or get_stream() or write_data() etc. attributes are available on the parent. If we get some standards for these additional attributes, we can document them as standard PEP 302 extensions. The format importer mechanism might want to have something like 'sys.import_formats' as a list of importer classes (or factories). Parent (storage) importer classes would then create instances to use. If you add a new format importer to sys.import_formats, you would of course need to clear sys.path_importer_cache, so that the individual importers are rebuilt on the next import, and thus they will create new child importer chains. Yeah, that pretty much ought to do it. [View Less]

2 1

AST structure and maintenance branches
by Anthony Baxter Sept. 28, 2006

Sept. 28, 2006

I'd like to propose that the AST format returned by passing PyCF_ONLY_AST to compile() get the same guarantee in maintenance branches as the bytecode format - that is, unless it's absolutely necessary, we'll keep it the same. Otherwise anyone trying to write tools to manipulate the AST is in for a massive world of hurt. Anyone have any problems with this, or can it be added to PEP 6? Anthony

4 5

Collecting 2.4.4 fixes
by A.M. Kuchling Sept. 28, 2006

Sept. 28, 2006

I've put some candidate fixes and listed some tasks at <http://wiki.python.org/moin/Python24Fixes>. --amk

1 0

Re: [Python-Dev] difficulty of implementing phase 2 of PEP 302 in Python source
by Phillip J. Eby Sept. 28, 2006

Sept. 28, 2006

At 04:11 PM 9/27/2006 -0700, Brett Cannon wrote: >On 9/27/06, Phillip J. Eby ><<mailto:pje@telecommunity.com>pje(a)telecommunity.com> wrote: >>At 02:11 PM 9/27/2006 -0700, Brett Cannon wrote: >> >But it has been suggested here that the import machinery be rewritten in >> >Python. Now I have never touched the import code since it has always had >> >the reputation of being less than friendly to work with. I am asking for >> >… [View More]opinions from people who have worked with the import machinery before if >> >it is so bad that it is worth trying to re-implement the import semantics >> >in pure Python or if in the name of time to just work with the C >> >code. Basically I will end up breaking up built-in, .py, .pyc, and >> >extension modules into individual importers and then have a chaining class >> >to act as a combined .pyc/.py combination importer (this will also make >> >writing out to .pyc files an optional step of the .py import). >> >>The problem you would run into here would be supporting zip imports. > >I have not looked at zipimport so I don't know the exact issue in terms of >how it hooks into the import machinery. But a C level API will most >likely be needed. I was actually assuming you planned to reimplement that in Python as well, and hence the need for the storage/format separation. >> It >>would probably be more useful to have a mapping of file types to "format >>handlers", because then a filesystem importer or zip importer would then be >>able to work with any .py/.pyc/.pyo/whatever formats, along with any new >>ones that are invented, without reinventing the wheel. > >So you are saying the zipimporter would then pull out of the zip file the >individual file to import and pass that to the format-specific importer? No, I'm saying that the zipimporter would simply call the format importers in sequence, as in your original concept. However, these importers would call *back* to the zipimporter to ask if the file they are looking for is there. >>Thus, whether it's file import, zip import, web import, or whatever, the >>same handlers would be reusable, and when people invent new extensions like >>.ptl, .kid, etc., they can just register format handlers instead. > >So a sepration of data store from data interpretation for importation. My >only worry is a possible explosion of checks for the various data >types. If you are using the file data store and had .py, .pyc, .so, >module.so , .ptl, and .kid registered that might suck in terms of >performance hit. Look at it this way: the parent importer can always pull a directory listing once and cache it for the duration of its calls to the child importers. In practice, however, I suspect that the stat calls will be faster. In the case of a zipimport parent, the zip directory is already cached. Also, keep in mind that most imports will likely occur *before* any special additional types get registered, so the hits will be minimal. And the more of sys.path is taken up by zip files, the less of a hit it will be for each query. > And I am assuming for a web import that it would decide based on the > extension of the resulting web address? No - you'd effectively end up doing a web hit for each possible extension. Which would suck, but that's what caching is for. Realistically, you wouldn't want to do web-based imports without some disk-based caching anyway. > And checking for the various types might not work well for other data > store types. Guess you would need a way to register with the data store > exactly what types of data interpretation you might want to check. No, you just need a method on the parent importer like get_data(). >Other option is to just have the data store do its magic and somehow know >what kind of data interpretation is needed for the string returned (e.g., >a database data store might implicitly only store .py code and thus know >that it will only return a string of source). Then that string and the >supposed file extension is passed ot the next step of creating a module >from that data string. Again, all that's way more complex than you need; you can do the same thing by just raising IOError from get_data() when asked for something that's not a .py. >>Format handlers could of course be based on the PEP 302 protocol, and >>simply accept a "parent importer" with a get_data() method. So, let's say >>you have a PyImporter: >> >> class PyImporter: >> def __init__(self, parent_importer): >> self.parent = parent_importer >> >> def find_module(self, fullname): >> path = fullname.split('.')[-1]+'.py' >> try: >> source = self.parent.get_data(path) >> except IOError: >> return None >> else: >> return PySourceLoader(source) >> >>See what I mean? The importers and loaders thus don't have to do direct >>filesystem operations. > >I think so. Basically you want more of a way to stack imports so that the >basic importers are just passed the string of what it is supposed to load >from. Other importers higher in the chain can handle getting that string. No, they're full importers; they're not passed "a string". The only difference between this and your original idea of an importer chain is that I'm saying the chained format-specific importers need to know who their "parent" importer (the data store) is, so they can be data-store independent. Everything else can be done with that, and perhaps a few extra parent importer methods for stat, save, etc. >>Of course, to fully support .pyc timestamp checking and writeback, you'd >>need some sort of "stat" or "getmtime" feature on the parent importer, as >>well as perhaps an optional "save_data" method. These would be extensions >>to PEP 302, but welcome ones. > >Could pass the string representing the location of where the string came >from. That would allow for the required stat calls for .pyc files as >needed without having to implement methods just for this one use case. Huh? In order to know if a .pyc is up to date, you need the st_mtime of the .py file. That can't be done in the parent importer without giving it format knowledge, which goes against the point of the exercise. Thus, something like stat() and save() methods need to be available on the parent, if it can support them. >>Anyway, based on my previous work with pkg_resource, pkgutil, zipimport, >>import.c , etc. I would say this is how I'd want to structure a >>reimplementation of the core system. And if it were for Py3K, I'd probably >>treat sys.path and all the import hooks associated with it as a single >>meta-importer on sys.meta_path -- listed after a meta-importer for handling >>frozen and built-in modules. (I.e., the meta-importer that uses sys.path >>and its path hooks would be last on sys.meta_path.) > >Ah, interesting idea! Could even go as far as removing sys.path and just >making it an attribute of the base importer if you really wanted to make >it just meta_path for imports. Perhaps, but then that just means you have to have a new variable for 'sys.path_importer' or some such, just to get at it. (i.e., code won't be able to assume it's always the last item in sys.meta_path). So this seems wasteful and changing things just for the sake of change, vs. just keeping the other PEP 302 sys variables. I just think the *implementation* of them can move to sys.meta_path, as that simplifies the main __import__ function down to just calling meta_path importers in sequence, modulo some package issues. One other rather tricky matter is that the sys.path meta-importer has to deal with package __path__ management... and actually, meta_path importers are supposed to receive a copy of sys.path... ugh. Well, it was a nice idea, but I guess you can't actually implement sys.path using a meta_path importer. :( For Py3K, we could drop the path argument to find_module() and manage it, but it can't be done and still allow current meta_path hooks to work right. >>In other words, sys.meta_path is really the only critical import hook from >>the raw interpreter's point of view. sys.path, however, (along with >>sys.path_hooks and sys.path_importer_cache) is critical from the >>perspective of users, applications, etc., as there has to be some way to >>get things onto Python's path in the first place. > >Yeah, I think I get it. I don't know how much it simplifies things for >users but I think it might make it easier for alternative import writers. That was the idea, yes. :) [View Less]

2 1

Re: [Python-Dev] difficulty of implementing phase 2 of PEP 302 in Python source
by Phillip J. Eby Sept. 27, 2006

Sept. 27, 2006

At 02:11 PM 9/27/2006 -0700, Brett Cannon wrote: >But it has been suggested here that the import machinery be rewritten in >Python. Now I have never touched the import code since it has always had >the reputation of being less than friendly to work with. I am asking for >opinions from people who have worked with the import machinery before if >it is so bad that it is worth trying to re-implement the import semantics >in pure Python or if in the name of time to just work … [View More]with the C >code. Basically I will end up breaking up built-in, .py, .pyc, and >extension modules into individual importers and then have a chaining class >to act as a combined .pyc/.py combination importer (this will also make >writing out to .pyc files an optional step of the .py import). The problem you would run into here would be supporting zip imports. It would probably be more useful to have a mapping of file types to "format handlers", because then a filesystem importer or zip importer would then be able to work with any .py/.pyc/.pyo/whatever formats, along with any new ones that are invented, without reinventing the wheel. Thus, whether it's file import, zip import, web import, or whatever, the same handlers would be reusable, and when people invent new extensions like .ptl, .kid, etc., they can just register format handlers instead. Format handlers could of course be based on the PEP 302 protocol, and simply accept a "parent importer" with a get_data() method. So, let's say you have a PyImporter: class PyImporter: def __init__(self, parent_importer): self.parent = parent_importer def find_module(self, fullname): path = fullname.split('.')[-1]+'.py' try: source = self.parent.get_data(path) except IOError: return None else: return PySourceLoader(source) See what I mean? The importers and loaders thus don't have to do direct filesystem operations. Of course, to fully support .pyc timestamp checking and writeback, you'd need some sort of "stat" or "getmtime" feature on the parent importer, as well as perhaps an optional "save_data" method. These would be extensions to PEP 302, but welcome ones. Anyway, based on my previous work with pkg_resource, pkgutil, zipimport, import.c, etc. I would say this is how I'd want to structure a reimplementation of the core system. And if it were for Py3K, I'd probably treat sys.path and all the import hooks associated with it as a single meta-importer on sys.meta_path -- listed after a meta-importer for handling frozen and built-in modules. (I.e., the meta-importer that uses sys.path and its path hooks would be last on sys.meta_path.) In other words, sys.meta_path is really the only critical import hook from the raw interpreter's point of view. sys.path, however, (along with sys.path_hooks and sys.path_importer_cache) is critical from the perspective of users, applications, etc., as there has to be some way to get things onto Python's path in the first place. [View Less]

2 1