Mailman 3 Should there be a way or API for retrieving from a code object a loader method and package file where the code comes from? - Python-Dev

Should there be a way or API for retrieving from a code object a loader method and package file where the code comes from?

Rocky Bernstein

23 Dec 2008 23 Dec '08

5:25 p.m.

Now that there is a package mechanism (are package mechanisms?) like zipimporter that bundle source code into a single file, should the notion of a "file" location should be adjusted to include the package and/or importer? Is there a standard API or routine which can extract this information given a code object? A use case here I am thinking of here is in a stack trace or a debugger, or a tool which wants to show in great detail information from a code object possibly via a frame. For example does this come from a zipped egg? And if so, which one? For concreteness, here is what I did and here's what I saw. Select one of the zipimporter eggs at http://code.google.com/p/pytracer and install one of these. I did this on GNU/Linux and Python 2.5 and I look at the co_filename of one of the methods:

...

...
...
import tracer tracer.__dict__['size'].func_code.co_filename 'build/bdist.linux-i686/egg/tracer.py'

But there is no file called "build/bdist.linux-686/egg/tracer.py" in the filesystem. Instead there is a member "tracer.py" inside /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg'. It's possible I caused this egg to get built incorrectly or that setuptools has a bug which entered that misleading information. However, shouldn't there be a standard way to untangle package location, loader and member inside the package? As best as I can tell, PEP 302 which discussed importer hooks and suggests a standard way to get file data. But it doesn't address a standard way to get container package and/or loader information. Also I'm not sure there *is* a standard print string way to show member inside a package. zipimporter may insert co_filename strings like: /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg/tracer.py but the trouble with this is that it means file routines have to scan the path and notice say that /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg is a *file*, not a directory. And a file stat/reading routine needs to understand what kind of packager that is in order to get tracer.py information. (Are there any file routines in place for doing this?) Thanks.

Show replies by thread

Paul Moore

23 Dec 23 Dec

7:36 p.m.

2008/12/23 Rocky Bernstein :

...

Now that there is a package mechanism (are package mechanisms?) like zipimporter that bundle source code into a single file, should the notion of a "file" location should be adjusted to include the package and/or importer?

Check PEP 302 (http://www.python.org/dev/peps/pep-0302/) specifically the get_source (optional) method. It's not exactly what you describe, but it may help. Please note that it's optional - if you loaded the code from a zipfile containing only bytecode files, there is no source to get, so you have to be prepared for that case. But if the source is available, this should give you a way of getting to it. Paul.

rocky＠gnu.org

8:05 p.m.

Paul Moore writes:

...

2008/12/23 Rocky Bernstein :

...
Now that there is a package mechanism (are package mechanisms?) like zipimporter that bundle source code into a single file, should the notion of a "file" location should be adjusted to include the package and/or importer?

Check PEP 302 (http://www.python.org/dev/peps/pep-0302/) specifically the get_source (optional) method.

Yes, that's one of the things I was thinking when I wrote: As best as I can tell, PEP 302 which discussed importer hooks and suggests a standard way to get file data. And by "suggests" I meant was implying that yes I know this is optional.

...

It's not exactly what you describe, but it may help.

Yes, it's not exactly what is desired.

...

Please note that it's optional - if you loaded the code from a zipfile containing only bytecode files, there is no source to get, so you have to be prepared for that case. But if the source is available, this should give you a way of getting to it.

What is wanted is a uniform way get and describe a file location from a code object that takes into account the file might be a member of an archive. Are there even guidelines for saying what string goes into a code object's co_filename? Clearly it should be related to the source code that generated the code, and there are various conventions that seem to exist when the code comes from an "eval" or an "exec". But empirically it seems as though there's some variation. It could be an absolute file or a file with no root directory specified. (But is it possible to have things like "." and ".."?). And in the case of a member of a package what happens? Should it be just the member without the package? Or should it include the package name like /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg/tracer.py ? Or be unspecified? If left unspecified as I gather it is now, it makes it more important to have some sort of common routine to be able to pick out the archive part in a filesystem from the member name inside the archive.

...

Paul.

Nick Coghlan

8:59 p.m.

Rocky Bernstein wrote:

...

As best as I can tell, PEP 302 which discussed importer hooks and suggests a standard way to get file data. But it doesn't address a standard way to get container package and/or loader information.

If a "filename" may not be an actual filename, but instead a pseduo-filename created based on the __file__ attribute of a Python module, then there are a few mechanisms for accessing it: 1. Use the package/module name and the relative path from that location, then use pkgutil.get_data to retrieve it. This has the advantage of correctly handling the case where no __loader__ attribute is present (or it is None), which can happen for standard filesystem imports. However, it only works in Python 2.6 and above (since get_data() is a new addition to pkgutil). 2. Implement your own version of pkgutil.get_data - more work, but it is the only way to get something along those lines that works for versions prior to Python 2.6 3. Do what a number of standard library APIs (e.g. linecache) that accept filenames do and also accept an optional "module globals" argument. If the globals argument is passed in and contains a "__loader__" entry, use the appropriate loader method when processing the "filename" that was passed in.

...

Also I'm not sure there *is* a standard print string way to show member inside a package. zipimporter may insert co_filename strings like:

/usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg/tracer.py

but the trouble with this is that it means file routines have to scan the path and notice say that /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg is a *file*, not a directory. And a file stat/reading routine needs to understand what kind of packager that is in order to get tracer.py information.

(Are there any file routines in place for doing this?)

Finding a loader given only a pseudo-filename and no module is actually possible in the specific case of zipimport, but is still pretty obscure at this point in time: 1. Scan sys.path looking for an entry that matches the start of the pseudo-filename (remembering to use os.path.normpath). 2. Once such a path entry has been found, use PEP 302 to find the associated importer object (the undocumented pkgutil.get_importer function does exactly that - although, as with any undocumented feature, the promises of API compatibility across major version changes aren't as strong as they would be for an officially documented and supported interface). 3. Hope that the importer is one like zipimport that allows get_data() to be invoked directly on the importer object, rather than only providing it on a separate loader object after the module has been loaded. If it needs a real loader instead of just the importer, then you're back to the original problem of needing a module or package name (or globals dictionary) in addition to the pseudo filename. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Paul Moore

9:11 p.m.

2008/12/23 :

...

What is wanted is a uniform way get and describe a file location from a code object that takes into account the file might be a member of an archive.

But a code object may not have come from a file. Ignoring the interactive prompt (not because it's unimportant, just because people have a tendency to assume it's the only special case :-)) you need to consider code loaded via a PEP302 importer from (say) a sqlite database, or code created using compile(), or possibly even more esoteric means. So I'm not sure your request is clearly specified.

...

Are there even guidelines for saying what string goes into a code object's co_filename? Clearly it should be related to the source code that generated the code, and there are various conventions that seem to exist when the code comes from an "eval" or an "exec".

I'm not aware of guidelines - the documentation for compile() says "The filename argument should give the file from which the code was read; pass some recognizable value if it wasn't read from a file ('<string>' is commonly used)" which is pretty non-commital.

...

But empirically it seems as though there's some variation. It could be an absolute file or a file with no root directory specified. (But is it possible to have things like "." and ".."?). And in the case of a member of a package what happens? Should it be just the member without the package? Or should it include the package name like /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg/tracer.py ?

Or be unspecified? If left unspecified as I gather it is now, it makes it more important to have some sort of common routine to be able to pick out the archive part in a filesystem from the member name inside the archive.

I think you need to be clear on *why* you want to know this information. Once it's clear what you're trying to achieve, it will be easier to say what the options are. It sounds like you're trying to propose a stronger convention, to be enforced in the future. (At least, your suggestion of producing stack traces implies that you want stack trace code not to have to deal with the current situation). When PEP 302 was being developed, we were looking at similar issues. That's why I pointed you at get_source() - it was the best we could do with all the various conflicting requirements, and the fact that it's optional is because we had to cater for cases where there simply wasn't a meaningful answer. Frankly, backward compatibility requirements kill a lot of the options here. Maybe what you want is a *pair* of linked conventions: - co_filename (or a replacement) returns a (notionally opaque, but in practice a filename for file-based cases) token representing "the file or other object the code came from" - xxx.get_source_code(token) is a function (I don't know where, xxx is a placeholder for some "suitable" module) which, given such a token, returns the source, or None if there's no viable concept of "the source". Or maybe you want a (possibly separate) attribute of a code object, which holds a string containing a human-readable (but quite possibly not machine-parseable) value representing the "place the code came from" - co_filename is essentially this at the moment, and maybe your complaint is merely that you don't find its contents sufficiently human-readable in the case of the zipimport module (in which case you might want to search some of the archives for the discussions on the constraints imposed on zipimport, because objects on sys.path must be strings and cannot be arbitrary objects...) I'm sorry if this is a little rambling. I can appreciate that there's some sort of issue that you see here, but I don't yet see any practical way of changing things that would help. And as always, there's backward compatibility to consider - existing code isn't going to change, so new code has to be prepared to handle that. I hope this is of some help, Paul.

Paul Moore

9:30 p.m.

2008/12/23 Nick Coghlan :

...

Finding a loader given only a pseudo-filename and no module is actually possible in the specific case of zipimport, but is still pretty obscure at this point in time:

1. Scan sys.path looking for an entry that matches the start of the pseudo-filename (remembering to use os.path.normpath).

2. Once such a path entry has been found, use PEP 302 to find the associated importer object (the undocumented pkgutil.get_importer function does exactly that - although, as with any undocumented feature, the promises of API compatibility across major version changes aren't as strong as they would be for an officially documented and supported interface).

3. Hope that the importer is one like zipimport that allows get_data() to be invoked directly on the importer object, rather than only providing it on a separate loader object after the module has been loaded. If it needs a real loader instead of just the importer, then you're back to the original problem of needing a module or package name (or globals dictionary) in addition to the pseudo filename.

There were lots of proposals tossed around on python-dev at the time PEP 302 was being developed, which might have made all this easier. Most, if not all, were killed by backward compatibility requirements. I have some hopes that when Brett completes his "import in Python" work, that will add sufficient flexibility to allow people to experiment with all of this machinery, and ultimately maybe move forward with a more modular import mechanism. But the timescales for Brett's changes won't be until at least Python 3.1, and it'll be a release or two after that before any significant change can be eased in in a compatible manner. That's going to take a lot of energy on someone's part. Paul. PS One of these days, I'm going to write an insanely useful importer which takes the least-convenient option wherever PEP 302 allows flexibility. It'll be adopted by everyone because it's so great, and all the software that currently makes unwarranted assumptions about importers will break and get fixed to support it because otherwise its users will rebel, and we'll live in a paradise where everything follows the specs to the letter. Oh, yes, and I'm going to win the lottery every week for the next month :-) PPS Seriously, setuptools and the adoptions of eggs has pushed a lot of code to be much more careful about unwarranted assumptions that code lives in the filesystem. That's an incredibly good thing, and very hard to do right (witness the setuptools "zip_safe" parameter which acts as a get-out clause). Much kudos to setuptools for getting as far as it has.

R. Bernstein

10:06 p.m.

Paul Moore writes:

...

2008/12/23 :

...
What is wanted is a uniform way get and describe a file location from a code object that takes into account the file might be a member of an archive.

But a code object may not have come from a file.

Right. That's why I mentioned for example "eval" and "exec" that you cite below. So remove the "file" in what is cited above. Replace with: "a unform way to get information (not necessarily just the source text) about the location/origin of code from a code object.

...

Ignoring the interactive prompt (not because it's unimportant, just because people have a tendency to assume it's the only special case :-)) you need to consider code loaded via a PEP302 importer from (say) a sqlite database, or code created using compile(), or possibly even more esoteric means.

So I'm not sure your request is clearly specified.

Is the above any more clear?

...

...
Are there even guidelines for saying what string goes into a code object's co_filename? Clearly it should be related to the source code that generated the code, and there are various conventions that seem to exist when the code comes from an "eval" or an "exec".

I'm not aware of guidelines - the documentation for compile() says "The filename argument should give the file from which the code was read; pass some recognizable value if it wasn't read from a file ('<string>' is commonly used)" which is pretty non-commital.

...
But empirically it seems as though there's some variation. It could be an absolute file or a file with no root directory specified. (But is it possible to have things like "." and ".."?). And in the case of a member of a package what happens? Should it be just the member without the package? Or should it include the package name like /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg/tracer.py ?

Or be unspecified? If left unspecified as I gather it is now, it makes it more important to have some sort of common routine to be able to pick out the archive part in a filesystem from the member name inside the archive.

I think you need to be clear on *why* you want to know this information. Once it's clear what you're trying to achieve, it will be easier to say what the options are.

This is what I wrote originally (slightly modified): A use case here I am thinking of here is in a stack trace or a debugger, or a tool which wants to show in great detail, information from a code object obtained possibly via a frame object. I find it kind of sucky to see in a traceback: "<string>" as opposed to the text (or prefix of the text) of the actual string that was passed. Or something that has been referred to as a "pseudo-file" like /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg/foo/bar.py when it is really member foo/bar.py of zipped egg /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg. (As a separate issue, it seems that zipimporter file locations inside setuptools may have a problem.) Inside a debugger or an IDE, it is conceivable a person might want loader, and module information, and if the code is part of an archive file, then member information. (If part of an eval string then, the eval string.)

...

It sounds like you're trying to propose a stronger convention, to be enforced in the future.

Well, I wasn't sure if there was one. But I gather from what you write, there isn't. :-) Yes, I would suggest a stronger convention. Or a more up-front statement that none is desired/forthcoming.

...

(At least, your suggestion of producing stack traces implies that you want stack trace code not to have to deal with the current situation). When PEP 302 was being developed, we were looking at similar issues. That's why I pointed you at get_source() - it was the best we could do with all the various conflicting requirements, and the fact that it's optional is because we had to cater for cases where there simply wasn't a meaningful answer. Frankly, backward compatibility requirements kill a lot of the options here.

Maybe what you want is a *pair* of linked conventions:

- co_filename (or a replacement) returns a (notionally opaque, but in practice a filename for file-based cases) token representing "the file or other object the code came from"

This would be nice.

...

- xxx.get_source_code(token) is a function (I don't know where, xxx is a placeholder for some "suitable" module) which, given such a token, returns the source, or None if there's no viable concept of "the source".

There always is a viable concept of a source. It's whatever was done to get the code. For example, if it was via an eval then the source was the eval function and a string, same for exec. If it's via database access, well that then and some summary info about what's known about that.

...

Or maybe you want a (possibly separate) attribute of a code object, which holds a string containing a human-readable (but quite possibly not machine-parseable) value representing the "place the code came from" - co_filename is essentially this at the moment, and maybe your complaint is merely that you don't find its contents sufficiently human-readable in the case of the zipimport module (in which case you might want to search some of the archives for the discussions on the constraints imposed on zipimport, because objects on sys.path must be strings and cannot be arbitrary objects...)

There are two problems. One is displaying location information in an unambiguous way -- the pseudo-file above is ambiguous and so is <string> since there's no guarentee that OS's make to not name a file that. The second problem is programmatically getting information such as a debugger or an IDE might do so that the information can be conveyed back to a user who might want to inspect surrounding source code or modules.

...

I'm sorry if this is a little rambling. I can appreciate that there's some sort of issue that you see here, but I don't yet see any practical way of changing things that would help. And as always, there's backward compatibility to consider - existing code isn't going to change, so new code has to be prepared to handle that.

I hope this is of some help,

Yes, thanks. At least I now have a clearer idea of the state of where things stand.

...

Paul.

R. Bernstein

10:25 p.m.

Nick Coghlan writes:

...

3. Do what a number of standard library APIs (e.g. linecache) that accept filenames do and also accept an optional "module globals" argument.

Actually, I did this and committed a change (to pydb) before posting any of these queries. ;-) If "a number of standard library APIs" are doing the *same* thing, then shouldn't this exposed as a common routine? If on the other hand, by "a number" you mean "one" as in linecache -- 1 *is* a number too! -- then perhaps the relevant code that is buried inside the "updatecache" should be exposed on its own. (As a side benefit that code can be tested separately too!) Should I file a feature request for this?

Paul Moore

10:25 p.m.

2008/12/23 R. Bernstein :

...

A use case here I am thinking of here is in a stack trace or a debugger, or a tool which wants to show in great detail, information from a code object obtained possibly via a frame object.

Thanks for the clarifications. I see what you're after much better now.

...

I find it kind of sucky to see in a traceback: "<string>" as opposed to the text (or prefix of the text) of the actual string that was passed. Or something that has been referred to as a "pseudo-file" like /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg/foo/bar.py when it is really member foo/bar.py of zipped egg /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg.

Fair comment. That points to a "human readable" type of string. It's not available at the moment, but I guess it could be. But see below.

...

...
- xxx.get_source_code(token) is a function (I don't know where, xxx is a placeholder for some "suitable" module) which, given such a token, returns the source, or None if there's no viable concept of "the source".

There always is a viable concept of a source. It's whatever was done to get the code. For example, if it was via an eval then the source was the eval function and a string, same for exec. If it's via database access, well that then and some summary info about what's known about that.

Hmm, "source" colloquially, yes "bytecode loaded from ....\xxx.pyc", for example. But not "source" in the sense of "source code". Some applications run with only bytecode shipped, no source code available at all.

...

There are two problems. One is displaying location information in an unambiguous way -- the pseudo-file above is ambiguous and so is <string> since there's no guarentee that OS's make to not name a file that. The second problem is programmatically getting information such as a debugger or an IDE might do so that the information can be conveyed back to a user who might want to inspect surrounding source code or modules.

This is more than you were asking for above. The first problem is addressed with a "human readable" (narrative) description, as above. The second, however, requires machine-readable access to source code (if it exists). That's what the loader get_source() call does for you. But you have to be prepared for the fact that it may not be possible to get source code, and decide what you want to happen in that case.

...

...
I hope this is of some help,

Yes, thanks. At least I now have a clearer idea of the state of where things stand.

Good. Sorry it's not better news :-) Paul

Brett Cannon

11:42 p.m.

On Tue, Dec 23, 2008 at 08:00, Paul Moore wrote:

...

2008/12/23 Nick Coghlan :

...
Finding a loader given only a pseudo-filename and no module is actually possible in the specific case of zipimport, but is still pretty obscure at this point in time:

1. Scan sys.path looking for an entry that matches the start of the pseudo-filename (remembering to use os.path.normpath).

2. Once such a path entry has been found, use PEP 302 to find the associated importer object (the undocumented pkgutil.get_importer function does exactly that - although, as with any undocumented feature, the promises of API compatibility across major version changes aren't as strong as they would be for an officially documented and supported interface).

3. Hope that the importer is one like zipimport that allows get_data() to be invoked directly on the importer object, rather than only providing it on a separate loader object after the module has been loaded. If it needs a real loader instead of just the importer, then you're back to the original problem of needing a module or package name (or globals dictionary) in addition to the pseudo filename.

There were lots of proposals tossed around on python-dev at the time PEP 302 was being developed, which might have made all this easier. Most, if not all, were killed by backward compatibility requirements.

I have some hopes that when Brett completes his "import in Python" work, that will add sufficient flexibility to allow people to experiment with all of this machinery, and ultimately maybe move forward with a more modular import mechanism.

I have actually made a good amount of progress as of late. It's a New Years resolution to get importlib done, but I am actually aiming for before January 1 (sans the damn compile() problem I am having).This goal does ignore everything but a compatible __import__, though.

...

But the timescales for Brett's changes won't be until at least Python 3.1, and it'll be a release or two after that before any significant change can be eased in in a compatible manner.

I suspect that any import work will be a Pending/DeprecationWarning deal, so 3.3 would be the first version that could have any real changes as the default.

...

That's going to take a lot of energy on someone's part.

That would be me. =) After importlib is finished I have a couple of PEPs planned plus properly documenting how the import machinery works in the language spec. And I suspect this will lead to some discussions about things, e.g. requirements of the format for __file__ and __path__ in regards to when they point inside of an archive, etc. -Brett

Nick Coghlan

24 Dec 24 Dec

3:59 a.m.

New subject: Should there be a way or API for retrieving from a code object a loader method and package file where the code comes from?

R. Bernstein wrote:

...

Nick Coghlan writes:

...
3. Do what a number of standard library APIs (e.g. linecache) that accept filenames do and also accept an optional "module globals" argument.

Actually, I did this and committed a change (to pydb) before posting any of these queries. ;-)

If "a number of standard library APIs" are doing the *same* thing, then shouldn't this exposed as a common routine?

If on the other hand, by "a number" you mean "one" as in linecache -- 1 *is* a number too! -- then perhaps the relevant code that is buried inside the "updatecache" should be exposed on its own. (As a side benefit that code can be tested separately too!)

Should I file a feature request for this?

The reason for my slightly odd phrasing is that all of the examples I was originally going to mention (traceback, pdb, doctest, inspect) actually all end up calling linecache to do the heavy lifting. So it is possible that linecache.getlines() actually *is* the common routine you're looking for - it just needs to be added to the documentation and the __all__ attribute for linecache to be officially supported. Currently, only the single line getline() function is documented and exposed via __all__, but I don't see any reason for that restriction - linecache.getlines() has been there with a stable API since at least Python 2.5. For cases where you have an appropriate Python object (i.e. a module, function, method, class, traceback, frame or code object) rather than a pseudo-filename, then inspect.getsource() actually jumps through a lot of hoops to try to find the actual source code for that object - in those cases, using the appropriate inspect function is generally a much better idea than trying to interpret __file__ yourself. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

R. Bernstein

9:52 a.m.

New subject: Should there be a way or API for retrieving from a code object a loader method and package file where the code comes from?

Nick Coghlan writes:

...

R. Bernstein wrote:

...
Nick Coghlan writes:

...
3. Do what a number of standard library APIs (e.g. linecache) that accept filenames do and also accept an optional "module globals" argument.

Actually, I did this and committed a change (to pydb) before posting any of these queries. ;-)

If "a number of standard library APIs" are doing the *same* thing, then shouldn't this exposed as a common routine?

If on the other hand, by "a number" you mean "one" as in linecache -- 1 *is* a number too! -- then perhaps the relevant code that is buried inside the "updatecache" should be exposed on its own. (As a side benefit that code can be tested separately too!)

Should I file a feature request for this?

The reason for my slightly odd phrasing is that all of the examples I was originally going to mention (traceback, pdb, doctest, inspect) actually all end up calling linecache to do the heavy lifting.

So it is possible that linecache.getlines() actually *is* the common routine you're looking for

I never asked about getting the text lines for the source code, no matter how many times people suggest that as an alternative. :-) Instead, I was asking about a common way to get information about the source location for say a frame or traceback object (which might include package name and type) and suggest that there should be a more unambiguous way to display this information than seems to be in use at present. Part of work to retrieve or displaying that information has to do the some of the same things that is inside of linecache.updatecache() *before* it retrieves the lines of the source code (when possible). And possibly parts of it include parts of what's done in pieces of the inspect module.

...

- it just needs to be added to the documentation and the __all__ attribute for linecache to be officially supported. Currently, only the single line getline() function is documented and exposed via __all__, but I don't see any reason for that restriction - linecache.getlines() has been there with a stable API since at least Python 2.5.

For cases where you have an appropriate Python object (i.e. a module, function, method, class, traceback, frame or code object) rather than a pseudo-filename, then inspect.getsource() actually jumps through a lot of hoops to try to find the actual source code for that object - in those cases, using the appropriate inspect function is generally a much better idea than trying to interpret __file__ yourself.

Cheers, Nick.

Thanks for the information. I will keep in mind those inspect routines. They probably will be a helpful for another problem I had been wondering about -- how one can determine if there is no code associated at a given a line and file. (In other words and invalid location for a debugger line breakpoint, such as because the line part of a comment or the interior line of a string that spans many lines)

...

-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/rocky%40gnu.org

Steve Holden

10:07 a.m.

New subject: Should there be a way or API for retrieving from a code object a loader method and package file where the code comes from?

R. Bernstein wrote:

...

Nick Coghlan writes:

...
R. Bernstein wrote:

...
Nick Coghlan writes:

...
3. Do what a number of standard library APIs (e.g. linecache) that accept filenames do and also accept an optional "module globals" argument.

Actually, I did this and committed a change (to pydb) before posting any of these queries. ;-)

If "a number of standard library APIs" are doing the *same* thing, then shouldn't this exposed as a common routine?

If on the other hand, by "a number" you mean "one" as in linecache -- 1 *is* a number too! -- then perhaps the relevant code that is buried inside the "updatecache" should be exposed on its own. (As a side benefit that code can be tested separately too!)

Should I file a feature request for this?

The reason for my slightly odd phrasing is that all of the examples I was originally going to mention (traceback, pdb, doctest, inspect) actually all end up calling linecache to do the heavy lifting.

So it is possible that linecache.getlines() actually *is* the common routine you're looking for

I never asked about getting the text lines for the source code, no matter how many times people suggest that as an alternative. :-)

Instead, I was asking about a common way to get information about the source location for say a frame or traceback object (which might include package name and type) and suggest that there should be a more unambiguous way to display this information than seems to be in use at present.

I agree. Since PEP 302 many parts of Python are rather too file-centric for my liking. I notes almost four years ago, for example, that the interpreter assumes that the os module will be imported from filestore in order to set the prefix. This issue appears to have received no attention since, and I'm certainly not the one with the best skills or knowledge to solve this problem. http://bugs.python.org/issue1116520

...

Part of work to retrieve or displaying that information has to do the some of the same things that is inside of linecache.updatecache() *before* it retrieves the lines of the source code (when possible). And possibly parts of it include parts of what's done in pieces of the inspect module.

...
- it just needs to be added to the documentation and the __all__ attribute for linecache to be officially supported. Currently, only the single line getline() function is documented and exposed via __all__, but I don't see any reason for that restriction - linecache.getlines() has been there with a stable API since at least Python 2.5.

For cases where you have an appropriate Python object (i.e. a module, function, method, class, traceback, frame or code object) rather than a pseudo-filename, then inspect.getsource() actually jumps through a lot of hoops to try to find the actual source code for that object - in those cases, using the appropriate inspect function is generally a much better idea than trying to interpret __file__ yourself.

Cheers, Nick.

Thanks for the information. I will keep in mind those inspect routines.

They probably will be a helpful for another problem I had been wondering about -- how one can determine if there is no code associated at a given a line and file. (In other words and invalid location for a debugger line breakpoint, such as because the line part of a comment or the interior line of a string that spans many lines)

Looks like that start of some necessary attention to this issue. The inspect module might indeed offer the right facilities. I'm still wondering what we do about the various prefix settings in an environment where there are no filestore imports at all. In the event I can assist feel free to rope me in. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/

rocky＠gnu.org

10:33 a.m.

Paul Moore writes:

...

2008/12/23 R. Bernstein :

...
A use case here I am thinking of here is in a stack trace or a debugger, or a tool which wants to show in great detail, information from a code object obtained possibly via a frame object.

Thanks for the clarifications. I see what you're after much better now.

...
I find it kind of sucky to see in a traceback: "<string>" as opposed to the text (or prefix of the text) of the actual string that was passed. Or something that has been referred to as a "pseudo-file" like /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg/foo/bar.py when it is really member foo/bar.py of zipped egg /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg.

Fair comment. That points to a "human readable" type of string. It's not available at the moment, but I guess it could be.

But see below.

...
...
- xxx.get_source_code(token) is a function (I don't know where, xxx is a placeholder for some "suitable" module) which, given such a token, returns the source, or None if there's no viable concept of "the source".

There always is a viable concept of a source. It's whatever was done to get the code. For example, if it was via an eval then the source was the eval function and a string, same for exec. If it's via database access, well that then and some summary info about what's known about that.

Hmm, "source" colloquially, yes "bytecode loaded from ....\xxx.pyc", for example. But not "source" in the sense of "source code". Some applications run with only bytecode shipped, no source code available at all.

...
There are two problems. One is displaying location information in an unambiguous way -- the pseudo-file above is ambiguous and so is <string> since there's no guarentee that OS's make to not name a file that. The second problem is programmatically getting information such as a debugger or an IDE might do so that the information can be conveyed back to a user who might want to inspect surrounding source code or modules.

This is more than you were asking for above.

The first problem is addressed with a "human readable" (narrative) description, as above.

The second, however, requires machine-readable access to source code (if it exists). That's what the loader get_source() call does for you. But you have to be prepared for the fact that it may not be possible to get source code, and decide what you want to happen in that case.

I'm missing your point here. When one uses information from a traceback, or is in a debugger, or is in an IDE, it is assumed that in order to use the information given you'll need access to the source code. And IDE's and debuggers have had to deal with the fact that source code is not available from day one, even before there was zipimporter. In order to get the strings of source text that linecache.getlines() gives, it has to prowl around for other information, possibly looking for a loader along the protocol defined in PEP 302 and/or others. And its that information that a debugger, IDE or some tool of that ilk might need. Many IDE's and debuggers nowadays open a socket and pass information back and forth over that. An obvious advantage is that it means you can debug remotely. But in order for this to work, some information is generally passed back and for regarding the location of the source text. In the Java world and Eclipse for example, it is possible for the jar to be in a different location from on the machine which you might be debugging on. And probably too often that jar isn't the same one. So it is helpful in this kind of scenario to break out a location into the name of a jar and the member inside the jar. Perhaps also some information about that jar. It is possible that instead of passing around locations, debuggers and such tools instead use get_source() instead, because that's what Python has to offer. :-) I jest here, but honestly I've been surprised that there is no IDE that I know of that in fact works this way. The machine running the code clearly may have more accurate access to the source than a front-end IDE. Undeterred by the harsh facts of reality, I have hope that someday there *might* be an IDE that has provision for this. So in a Ruby debugger (ruby-debug) one can request checksum information on the files the debugger things are loaded in order to facilitate checking that the source one an IDE might be showing in fact matches the source for that part of the code that one is currently under investigation.

...

...
...
I hope this is of some help,

Yes, thanks. At least I now have a clearer idea of the state of where things stand.

Good. Sorry it's not better news :-)

Paul

5600

Age (days ago)

5601

Last active (days ago)

List overview

Download

13 comments

7 participants

participants (7)

Brett Cannon
Nick Coghlan
Paul Moore
R. Bernstein
Rocky Bernstein
rocky＠gnu.org
Steve Holden

Should there be a way or API for retrieving from a code object a loader method and package file where the code comes from?

R. Bernstein

R. Bernstein

R. Bernstein

tags

participants (7)