On Mon, Jun 22, 2009 at 4:59 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
  - the **MD5** hash of the file, encoded in hex. Notice that `pyc` and `pyo`
   generated files will not have a hash.

Why the exception for pyc and pyo files?

As in PEP 262, since they are produced automatically from a py file, checking the py file is enough
to decide if the file has changed.



 - `zlib` and `zlib-2.5.2.egg-info` are located in `site-packages` so the file
   paths are relative to it.

Is it a general rule? That is, the paths in RECORD are always relative to its
grandparent directory?


no, they can be located anywhere on the system. But when the paths are located in the
same directory where the .egg-info directory is located, a relative path is used. (see the section before this example)

I'll add an example that contains files located elswhere in the system (like a script and a data file)

 


 The RECORD format
 -----------------

 The `RECORD` file is a CSV file, composed of records, one line per
 installed file. The ``csv`` module is used to read the file, with
 the `excel` dialect, which uses these options to read the file:

 - field delimiter : `,`
 - quoting char :  `"`.
 - line terminator : `\r\n`

Wouldn't it be better to use the native line terminator on the current
platform? (someone might want to edit or at least view the file)

Good idea, I'll change that,
 


What is the character encoding for non-ASCII filenames? UTF-8?

Yes, there's a constant in Distutils, called PKG_INFO_ENCODING that will be used for the file generation.

 

Are the RECORD file's contents somehow part of the DistributionMetadata?


The DistributionMetadata correspond to the fields defined in PEP 345, e.g. written in the PKG-INFO file,
which is mentioned in the RECORD file.

We are reworking PEP 345 as well, to add some fields. What did you have in mind ?
 


 - ``DistributionDirectories``: manages ``EggInfoDirectory`` instances.

What is an EggInfoDirectory ?

Typo (old name), fixing this..
 


A plural class name looks strange (I think it's the first time I see one in
the CPython codebase). How about another name? (DistributionPool,
DistributionMap, WorkingSet etc.).

Sure, WorkingSet is nice, it's the name used in setuptools,
 


 - ``get_egginfo_file(path, binary=False)`` -> file object

    Returns a file located under the `.egg-info` directory.

    Returns a ``file`` instance for the file pointed by ``path``.

Is it always a file instance or just a file-like object? (zipped distributions
come to mind). Is it opened read-only?


It's in read-only mode, either "r" either "rb" and in case of a zip file,
it returns a file-like object using zipfile.ZipFile.open.



 - ``owner(path)`` -> ``Distribution`` instance or None

   If ``path`` is used by only one ``Distribution`` instance, returns it.
   Otherwise returns None.

This is a bit confusing. If it returns None, it doesn't distinguish between the
case where several Distributions refer to the path, and the case where no
distributions refer to it, does it?

The idea of this API is to find out of a distribution "owns" a file, e.g. is the only distribution
that uses it, so it can be safely removed. 


Is there any reason to have this method while file_users(path) already exists?

Its just a helper for uninstallers, but file_users() could probably be enough,
I can remove "owns" if people find it confusing,



 A new class called ``DistributionDirectories`` is created. It's a collection of
 ``DistributionDirectory`` and ``ZippedDistributionDirectory`` instances.
 The constructor takes one optional argument ``use_cache`` set to ``True`` by
 default.

You forgot to describe the constructor's signature and what it does exactly.

I'll add that,
 

 ``EggInfoDirectories`` also provides the following methods besides the ones
 from ``dict``::

What is EggInfoDirectories?

This is a DistributionDirectories, one more instance I forgot to rename, I'll fix that
 


 - ``append(path)``

   Creates an ``DistributionDirectory`` (or ``ZippedDistributionDirectory``)
   instance for ``path`` and adds it in the mapping.

 - ``load(paths)``

   Creates and adds ``DistributionDirectory`` (or
   ``ZippedDistributionDirectory``) instances corresponding to ``paths``.

Why are these methods named completely differently although they do almost the
same thing? Besides, append() makes it look like ordering matters. Does it?
(for a naming suggestion, e.g. load(path) and load_all(paths). Or,
even simpler, load(*paths) or load(paths))

Right, I'll fix that,

 

 - ``get_file_users(path)`` -> Iterator of ``Distribution`` (or
   ``ZippedDistribution``) instances.

This method is named file_users in another class. Perhaps the naming should be
consistent?

Right, it used to be get_*, that's a typo. I'll fix it,
 


 All these functions use the same global instance of ``DistributionDirectories``
 to use the cache.

Is the global instance available to users?

No I didn't made it available to avoid concurrency problems,


     >>> for path, hash, size in dist.get_installed_files()::
     ...     print '%s %s %d %s' % (path, hash, size)

There's one too many "%s" here.

Fixing it too,

 


Thanks for your work!


Thanks for the feedback, I'll commit the fixes asap.

Tarek