
I have some feedback on PEP376, both the pep itself and the pkgutil code. I'll start with comments on the PEP.
* I don't like the definition of a project: this seems to define what distutils calls a distribution, and that is not necessarily and application.
* The description of the status quo is not entirely acurate w.r.t. setuptools, you describe only one of the possible ways setuptools can install a project (pip seems to use another one of setuptools' ways of installing a project, and setuptools can also install projects inside of zipfiles in multiple ways).
* Regarding the RECORD file: it is not a "CSV-like" file, it is a real CSV file. I'd specify the exact options for the 'csv' module that will be used, rather than writing that the default options are used and then explaining what those are.
* Should the PEP specify the encoding of text-files? PEP314 doesn't seem to specify the encoding of PKG-INFO files, which can cause problems when a field contains data that isn't ASCII.
All of these are minor issues. The following are imho more serious.
* The PEP doesn't describe how this PEP interacts with PEP302. That is, how should the "egg-info" machinery work when a project is not installed on the filesystem but in a zipfile. I'm not primarily interested in the ".egg" zipfiles that setuptools uses, but I am worried about how this will affect tools like py2exe and py2app that bundle the python code used by an application into a zipfile.
* Why is there a "paths" argument for the global functions (such as "get_file_users")? The description claims the functions will use sys.path and hence it is not necessary to have an argument to specify the path.
* There is no API to list the files in the egg-info directory, you could build one yourself on top of the get_installed_files method but should IMO be part of the public API.
And now on to the implementation:
* EggInfo.get_file: the implementation seems to want to load a file from the .egg-info directory, the PEP itself is unclear about what this method is intended to do.
If get_file should open a file in the egg-info directory I'd raise an exception if the path argument specifies an absolute path.
I wonder if it wouldn't be better to have a function that returns the contents of the file instead of one that returns a file-like object. Especially when thinking of PEP302 integration.
Wouldn't it be better to have two separate methods instead of "binary" argument. In 3.x file-like objects behave slightly different w.r.t binary and text stream ("bytes" vs. "str" as the result of read).
* EggInfo.get_installed_files: if 'local' is True the yielded paths are made absolute w.r.t. the egg-info directory rather than the directory containing the egg-info directory.
* The global functions seem to maintain and modify global state, wouldn't this cause problems if I specify different values of the path arguments in different pieces of code?
Ronald

2009/6/7 Ronald Oussoren ronaldoussoren@mac.com:
I have some feedback on PEP376, both the pep itself and the pkgutil code. I'll start with comments on the PEP.
Thanks for this detailed feedback.
- I don't like the definition of a project: this seems to define what distutils calls a distribution, and that is not necessarily and application.
Fixed.
- The description of the status quo is not entirely acurate w.r.t. setuptools, you describe only one of the possible ways setuptools can install a project (pip seems to use another one of setuptools' ways of installing a project, and setuptools can also install projects inside of zipfiles in multiple ways).
The goal is to keep only one way to install projects. I've added more details in the PEP
- Regarding the RECORD file: it is not a "CSV-like" file, it is a real CSV file. I'd specify the exact options for the 'csv' module that will be used, rather than writing that the default options are used and then explaining what those are.
Changed.
- Should the PEP specify the encoding of text-files? PEP314 doesn't seem to specify the encoding of PKG-INFO files, which can cause problems when a field contains data that isn't ASCII.
The encoding used is utf-8 since 2.6. I think we should rather update PEP 314, and mention it in the upcoming PEP 345 as well,
All of these are minor issues. The following are imho more serious.
- The PEP doesn't describe how this PEP interacts with PEP302. That is, how should the "egg-info" machinery work when a project is not installed on the filesystem but in a zipfile. I'm not primarily interested in the ".egg" zipfiles that setuptools uses, but I am worried about how this will affect tools like py2exe and py2app that bundle the python code used by an application into a zipfile.
I need to work on this.
- Why is there a "paths" argument for the global functions (such as "get_file_users")? The description claims the functions will use sys.path and hence it is not necessary to have an argument to specify the path.
That's because you can use these apis to work on an arbitrary list of paths in a packaging tool. I have changed the description accordingly.
- There is no API to list the files in the egg-info directory, you could build one yourself on top of the get_installed_files method but should IMO be part of the public API.
Added.
And now on to the implementation:
- EggInfo.get_file: the implementation seems to want to load a file from the .egg-info directory, the PEP itself is unclear about what this method is intended to do.
Added more details. I've also renamed some APIs to make things clearer (get_file -> get_egg_info_file) (I am also thinking about renaming EggInfo to a better name)
If get_file should open a file in the egg-info directory I'd raise an exception if the path argument specifies an absolute path.
Added.
I wonder if it wouldn't be better to have a function that returns the contents of the file instead of one that returns a file-like object. Especially when thinking of PEP302 integration.
Why ?
Wouldn't it be better to have two separate methods instead of "binary" argument. In 3.x file-like objects behave slightly different w.r.t binary and text stream ("bytes" vs. "str" as the result of read).
Will work on that.
- EggInfo.get_installed_files: if 'local' is True the yielded paths are made absolute w.r.t. the egg-info directory rather than the directory containing the egg-info directory.
Yes that was a bug. Fixed in the prototype.
- The global functions seem to maintain and modify global state, wouldn't this cause problems if I specify different values of the path arguments in different pieces of code?
The cache just prevents re-reading a directory content. Do you have a scenario in mind of a possible problem ? The only problem I can see is when a project is uninstalled by a program while these APIs are used by another program.
By the way, I'm thinking about adding something else for that part: A singleton-like class for EggInfoDirectory. That would make sure there's only one instance for each path.
Tarek

On 8 Jun, 2009, at 2:58, Tarek Ziadé wrote:
2009/6/7 Ronald Oussoren ronaldoussoren@mac.com:
- The global functions seem to maintain and modify global state,
wouldn't this cause problems if I specify different values of the path arguments in different pieces of code?
The cache just prevents re-reading a directory content. Do you have a scenario in mind of a possible problem ? The only problem I can see is when a project is uninstalled by a program while these APIs are used by another program.
I guess the problem only occurs if "egg_infos" is part of the public interface. What if module A calls get_egg_infos(), then module B calls get_egg_infos(somePath), then module A uses "egg_infos" assuming that it still refers to the same set of paths.
There's also the issue of API complexity. The global functions don't add a lot to using an EggInfoDirectories object when you are working with a different set of paths than sys.path.
Ronald

2009/6/8 Ronald Oussoren ronaldoussoren@mac.com:
I guess the problem only occurs if "egg_infos" is part of the public interface. What if module A calls get_egg_infos(), then module B calls get_egg_infos(somePath), then module A uses "egg_infos" assuming that it still refers to the same set of paths.
Vaguely off-topic, but is it only me that hates the abbreviation "infos"? It can only realistically be an abbreviation for the word "informations" - and there's no such word :-(
Paul.

On Mon, Jun 8, 2009 at 5:11 PM, Paul Moorep.f.moore@gmail.com wrote:
2009/6/8 Ronald Oussoren ronaldoussoren@mac.com:
I guess the problem only occurs if "egg_infos" is part of the public interface. What if module A calls get_egg_infos(), then module B calls get_egg_infos(somePath), then module A uses "egg_infos" assuming that it still refers to the same set of paths.
Vaguely off-topic, but is it only me that hates the abbreviation "infos"? It can only realistically be an abbreviation for the word "informations" - and there's no such word :-(
Mmm, that's my frenglish, Sir ;) (If you see other weird stuff please let me know)
So I guess "get_egg_info" is the shortest, proper form ?
Paul.

On 8 Jun, 2009, at 8:16, Tarek Ziadé wrote:
On Mon, Jun 8, 2009 at 5:11 PM, Paul Moorep.f.moore@gmail.com wrote:
2009/6/8 Ronald Oussoren ronaldoussoren@mac.com:
I guess the problem only occurs if "egg_infos" is part of the public interface. What if module A calls get_egg_infos(), then module B calls get_egg_infos(somePath), then module A uses "egg_infos" assuming that it still refers to the same set of paths.
Vaguely off-topic, but is it only me that hates the abbreviation "infos"? It can only realistically be an abbreviation for the word "informations" - and there's no such word :-(
Mmm, that's my frenglish, Sir ;) (If you see other weird stuff please let me know)
So I guess "get_egg_info" is the shortest, proper form ?
The thingy we're getting is called an "EggInfo", which would IMHO mean that "get_egg_infos" is technically the correct name for something that returns a list of them. I'd interpret the singular form as a function that returns one EggInfo object.
Disclaimer: I'm not a native english speaker
Ronald
Paul.
-- Tarek Ziadé | http://ziade.org

On Mon, Jun 8, 2009 at 5:23 PM, Ronald Oussorenronaldoussoren@mac.com wrote:
On 8 Jun, 2009, at 8:16, Tarek Ziadé wrote:
On Mon, Jun 8, 2009 at 5:11 PM, Paul Moorep.f.moore@gmail.com wrote:
2009/6/8 Ronald Oussoren ronaldoussoren@mac.com:
I guess the problem only occurs if "egg_infos" is part of the public interface. What if module A calls get_egg_infos(), then module B calls get_egg_infos(somePath), then module A uses "egg_infos" assuming that it still refers to the same set of paths.
Vaguely off-topic, but is it only me that hates the abbreviation "infos"? It can only realistically be an abbreviation for the word "informations" - and there's no such word :-(
Mmm, that's my frenglish, Sir ;) (If you see other weird stuff please let me know)
So I guess "get_egg_info" is the shortest, proper form ?
The thingy we're getting is called an "EggInfo", which would IMHO mean that "get_egg_infos" is technically the correct name for something that returns a list of them. I'd interpret the singular form as a function that returns one EggInfo object.
I have a problem now with the EggInfo name, because of the repetition:
EggInfo.get_egg_info_file
I think EggInfo could be renamed to something better, like DistInfo maybe, (not PkgInfo because it handles more that PKG-INFO)
Tarek

Ronald Oussoren ronaldoussoren@mac.com writes:
On 8 Jun, 2009, at 8:16, Tarek Ziadé wrote:
On Mon, Jun 8, 2009 at 5:11 PM, Paul Moorep.f.moore@gmail.com wrote:
Vaguely off-topic, but is it only me that hates the abbreviation "infos"? It can only realistically be an abbreviation for the word "informations" - and there's no such word :-(
It's not only you; it's bad English, which harms communication perceptibly.
Mmm, that's my frenglish, Sir ;) (If you see other weird stuff please let me know)
So I guess "get_egg_info" is the shortest, proper form ?
The thingy we're getting is called an "EggInfo", which would IMHO mean that "get_egg_infos" is technically the correct name for something that returns a list of them. I'd interpret the singular form as a function that returns one EggInfo object.
Disclaimer: I'm not a native english speaker
The singular form is correct for uncountable nouns, such as substances or concepts. Nouns like “equipment” or “information” are uncountable, so do not get pluralised: we don't specify the *number*, but rather the *amount* of that noun. URL:http://esl.about.com/od/grammarforbeginners/a/g_cucount.htm
Now, in this case, the term is being used to name the type of a Python object, and Python objects certainly *are* countable :-) I would want to name the container or collection: perhaps ‘get_egg_info_list’, but someone who knows the full purpose of that function can probably improve that.

2009/6/8 Tarek Ziadé ziade.tarek@gmail.com:
Mmm, that's my frenglish, Sir ;) (If you see other weird stuff please let me know)
Well, it's better than my French, so who am I to complain? (As everyone knows, the English speak French by USING ENGLISH WORDS, BUT SHOUTING... :-))
So I guess "get_egg_info" is the shortest, proper form ?
That would be fine, yes. (Though I have a sneaking suspicion that I saw one place which had both _info and _infos, with the _infos form being the way of getting multiple copies of the things that _info gets one of. That's harder to deal with - I'll have a look and see if I can find the place I mean.)
[...]
2009/6/8 Ronald Oussoren ronaldoussoren@mac.com:
The thingy we're getting is called an "EggInfo", which would IMHO mean that "get_egg_infos" is technically the correct name for something that returns a list of them. I'd interpret the singular form as a function that returns one EggInfo object.
That's the place I mean. But I'm not sure I like the idea of calling it an "EggInfo". I'll see if I can think of a better name (but not being familiar with the domain, I'm not sure I'll be able to). If it has to be an "EggInfo", then you shouldn't break it with underscores - so you have
get_egginfo - get an EggInfo object from wherever get_egginfos - get a list of EggInfo objects
(and make sure that the invented term EggInfo is defined somewhere before it gets used here).
Paul.

2009/6/8 Paul Moore p.f.moore@gmail.com:
That's the place I mean. But I'm not sure I like the idea of calling it an "EggInfo". I'll see if I can think of a better name (but not being familiar with the domain, I'm not sure I'll be able to).
Hmm.
Reading the PEP, the EggInfo class represents the contents of the .egg-info directory. OK, but .egg-info is a directory containing information about the egg. The class name implies that directory contains an "egg info" (ie, an "egg information"). But that's not grammatically correct - you don't have "an information".
So the question is, what is a good (singular noun) word for the collection of data in the .egg-info directory?
I don't know enough of the concepts/terminology for this stuff to be able to suggest anything, though.
Paul.

On Mon, Jun 8, 2009 at 3:04 PM, Ronald Oussorenronaldoussoren@mac.com wrote:
On 8 Jun, 2009, at 2:58, Tarek Ziadé wrote:
2009/6/7 Ronald Oussoren ronaldoussoren@mac.com:
- The global functions seem to maintain and modify global state, wouldn't
this cause problems if I specify different values of the path arguments in different pieces of code?
The cache just prevents re-reading a directory content. Do you have a scenario in mind of a possible problem ? The only problem I can see is when a project is uninstalled by a program while these APIs are used by another program.
I guess the problem only occurs if "egg_infos" is part of the public interface. What if module A calls get_egg_infos(), then module B calls get_egg_infos(somePath), then module A uses "egg_infos" assuming that it still refers to the same set of paths.
I'll work on that.
There's also the issue of API complexity. The global functions don't add a lot to using an EggInfoDirectories object when you are working with a different set of paths than sys.path.
We have created collections to make it easy to use from a third party application that wants to work with the directories, and make it easy to bypass the cache.
These global functions are just shortcuts for the most common usage and not really intended to do custom work. Maybe the paths argument could be completely removed si they are specialized for sys.path.
Ronald

On 09-06-08 02:58 AM, Tarek Ziadé wrote:
- Should the PEP specify the encoding of text-files? PEP314 doesn't seem to specify the encoding of PKG-INFO files, which can cause problems when a field contains data that isn't ASCII.
The encoding used is utf-8 since 2.6. I think we should rather update PEP 314, and mention it in the upcoming PEP 345 as well,
For the python-wifi-0.3.1 package, I noticed that PKG-INFO uses 'latin1' encoding.
grep Author python_wifi.egg-info/PKG-INFO | head -n 1
Author: R�man Joost
grep Author python_wifi.egg-info/PKG-INFO | hexdump -C | head -n 1
00000000 41 75 74 68 6f 72 3a 20 52 f3 6d 61 6e 20 4a 6f |Author: R.man Jo|
Note: latin1 uses "f3", while utf-8 uses "c3 b3".
The reason for not using utf-8 in PKG-INFO is perhaps due to the presence of "# -*- coding: latin1 -*-" in setup.py. Cf. PEP 0263
Indeed, even 'python setup.py --author' sends latin1 encoded bytes (not utf-8).

On 8 Jun, 2009, at 10:36, Sridhar Ratnakumar wrote:
On 09-06-08 02:58 AM, Tarek Ziadé wrote:
- Should the PEP specify the encoding of text-files? PEP314
doesn't seem to specify the encoding of PKG-INFO files, which can cause problems when a field contains data that isn't ASCII.
The encoding used is utf-8 since 2.6. I think we should rather update PEP 314, and mention it in the upcoming PEP 345 as well,
For the python-wifi-0.3.1 package, I noticed that PKG-INFO uses 'latin1' encoding.
Not quite. My best guess is that it doesn't use any encoding, it just dumps the bytes in the string you specified in the PKG-INFO file (at least for python2, I haven't checked what distutils does in python3).
Ronald

On 09-06-08 01:48 PM, Ronald Oussoren wrote:
On 8 Jun, 2009, at 10:36, Sridhar Ratnakumar wrote:
On 09-06-08 02:58 AM, Tarek Ziadé wrote:
- Should the PEP specify the encoding of text-files? PEP314
doesn't seem to specify the encoding of PKG-INFO files, which can cause problems when a field contains data that isn't ASCII.
The encoding used is utf-8 since 2.6. I think we should rather update PEP 314, and mention it in the upcoming PEP 345 as well,
For the python-wifi-0.3.1 package, I noticed that PKG-INFO uses 'latin1' encoding.
Not quite.
In this particular package, yes.. but not generally of course.
My best guess is that it doesn't use any encoding, it just dumps the bytes in the string you specified in the PKG-INFO file (at least for python2, I haven't checked what distutils does in python3).
Correct; that is what I thought. Tarek said "The encoding used is utf-8 since 2.6" .. for which I provided this counter-example (python-wifi-0.3.1).

On Tue, Jun 9, 2009 at 12:18 AM, Sridhar Ratnakumarsridharr@activestate.com wrote:
My best guess is that it doesn't use any encoding, it just dumps the bytes in the string you specified in the PKG-INFO file (at least for python2, I haven't checked what distutils does in python3).
Correct; that is what I thought. Tarek said "The encoding used is utf-8 since 2.6" .. for which I provided this counter-example (python-wifi-0.3.1).
In Python 2 >= 2.6 distutils uses dist.PKG_INFO_ENCODING (which is 'utf-8')
then, uses this encoding when it writes the PKG-INFO file.
That is:
<<<< if isinstance(value, unicode): value = value.encode(PKG_INFO_ENCODING) else: value = str(value) file.write('%s: %s\n' % (name, value))
In Python 3 it just writes the fields like they are provided.
participants (5)
-
Ben Finney
-
Paul Moore
-
Ronald Oussoren
-
Sridhar Ratnakumar
-
Tarek Ziadé