[Python-Dev] [GSoC] Porting on RPM3

David Malcolm dmalcolm at redhat.com
Tue Mar 22 02:06:22 CET 2011


[CCing Panu Matilainen, the maintainer of rpm, or, at least rpm 4.*,
which is what all major distributions are using AIUI]

On Mon, 2011-03-21 at 10:50 +0100, "Martin v. Löwis" wrote:
> Am 21.03.2011 07:37, schrieb Prashant Kumar:
> > Hello,
> >     My name is  Prashant Kumar and I've worked on porting few python
> > libraries(distutils2, configobj) and I've been looking at the ideas
> > list for GSoC for a project related to porting.
> > 
> >     I came across [1]  and found it interesting. It mentions that some

Hi Prashant!  Thanks for the interest.

Panu: [1] is http://wiki.python.org/moin/RPMOnPython3 , a Google Summer
of Code proposal to work on the Python 3 bindings to RPM.

> > of the work has already been done; I would like to look at the code
> > repository for the same, could someone provide me the link for the
> > same?

> Not so much the code but the person who did the porting. This was Dave
> Malcolm (CC'ed); please get in touch with him. Please familiarize
> yourself with the existing Python bindings (in the latest RPM 4 release
> from rpm.org). You'll notice that this already has Python 3 support;
> not sure whether that's the most recent code, though.

Panu Matilainen also worked on the python 3 port of the librpm python
bindings.

For the rpm source code, see: http://rpm.org/wiki/GetSource  (the python
bindings are in a subdirectory of the main source tree).

My initial patchbomb landed on the mailing list here:
  http://lists.rpm.org/pipermail/rpm-maint/2009-October/002528.html
and Panu committed and fixed up the patches around then.

My understanding is that the current status is that the bindings work,
but all values that were formerly exposed to Python 2 as "str" are now
exposed to Python 3 as "bytes", which would require changing all
consumers of the code.

I believe Panu has also been working on a rewrite of the Python
bindings, since the existing code is a little messy.

Panu, am I remembering this correctly?

The idea is that these types are fundamentally string-like, but
unfortunately rpm has always been a bit loose in its interpretation of
the encoding of byte values in package files and package databases.
There are millions of rpm files out there, and millions of rpm
databases, and all of these are in _some_ encoding.  I have seen
specfiles in which parts of the file were encoded in UTF-8 and other
parts were encoded in Latin-1 (this broke one of my python scripts
horribly).

Martin and I discussed this last week at PyCon.  I believe the proposal
that we came up with was:
  - try to interpret bytes as UTF-8, using the "surrogateescape"
mechanism, so that if it fails, we can at least preserve the exact bytes
and round-trip

Ultimately, this does mean trying to impose some kind of encoding
standard on rpm files and rpm databases, which I think would be a Good
Thing, but is perhaps something of scope creep compared to what the
proposal at [1] says.  See e.g. http://rpm.org/ticket/30

Other ideas that occur:
  - does rpmlint check for encoding yet?
  - what to do e.g. about canonicalization?  What happens if one rpm
provide a feature named "café" (where the "é" is U+00E9) and another rpm
requires a feature named "café" (where the "é" is U+0065 LATIN SMALL
LETTER E + U+0301 COMBINING ACUTE ACCENT)?  IIRC we ruled that rpms in
Fedora had to have ASCII names, and I'm guessing this applies to
metadata, but we do allow UTF-8 filenames within package payloads
(again, IIRC)

I should mention that I'm drowning in email, and more likely to receive
email to which I am directly listed in the "To" or "CC".

Alas, it's also worth mentioning that there was a hostile fork of rpm,
at rpm5.org, and that the "#rpm" channel on Freenode relates to that
fork.   I would advise not bothering with the rpm5 code; my
understanding is that all major Linux distributions that use rpm use the
rpm 4.* code hosted at rpm.org, not the rpm5 fork (and I have no
personal interest in a GSOC project to work on python 3 support there).
I doublechecked that fork in its CVS repository, and it does not yet
have any of the Python 3 support.

Hope this is helpful
Dave



More information about the Python-Dev mailing list