[Tutor] help with regexps/filename parsing

Kent Johnson kent37 at tds.net
Mon Jan 31 13:54:33 CET 2005


This works:

names = [
'XFree86-ISO8859-15-75dpi-fonts-4.3.0-78.EL.i386.rpm', #        (Note the EL embedded in name)
'xfig-3.2.3d-12.i386.rpm', #        (standard naming)
'rhel-ig-ppc-multi-zh_tw-3-4.noarch.rpm',
'perl-DateManip-5.42a-0.rhel3.noarch.rpm',
'openoffice.org-style-gnome-1.1.0-16.9.EL.i386.rpm',
]

import re

pattern = r'''
     (?P<base>.+)
     -(?P<version>[\w.]+)
     -(?P<release>[\w.]+)
     \.(?P<arch>\w+)
     \.rpm
'''

patternRe = re.compile(pattern, re.VERBOSE)

for name in names:
     m = patternRe.search(name)
     if m:
         print m.group('base', 'version', 'release', 'arch')
     else:
         print 'No match:', name


I figured this out by working from right to left:
- always ends with .rpm
- everything back to the next . is the arch
- everything back to the first (rightmost) - is the release
- everything to the next - is version
- everything left is the base name

Note the release for perl-DateManip-5.42a-0.rhel3.noarch.rpm is 0.rhel3 not 0 as you gave it.

Kent

Scott W wrote:
> Slight correction which I realized after sending, see below for 
> version/release seperation, which I should have seen but blame lack of 
> sleep ;-)
> 
> Scott W wrote:
> 
>> Hey all.
>>
>> I've got an issue that's been driving me a bit nuts.  I'm sure it 
>> _can_ be done with a regexp, although I'm missing a piece needed to 
>> tie it together to work for all cases.
>>
>> I need to parse out a list of RPMs in this case, but it seems the RPM 
>> naming convention has changed, as there are files I'll need to parse 
>> that are NOT in the normal name-version-release.arch.rpm format.
>>
>> I need to be able to grab the 'basename' for each file, as well as the 
>> version and arch, although these can be done seperately.  The problem 
>> can be shown by the following list of filenames:
>>
>> XFree86-ISO8859-15-75dpi-fonts-4.3.0-78.EL.i386.rpm        (Note the 
>> EL embedded in name)
>> xfig-3.2.3d-12.i386.rpm        (standard naming)
>> rhel-ig-ppc-multi-zh_tw-3-4.noarch.rpm
>> perl-DateManip-5.42a-0.rhel3.noarch.rpm
>> openoffice.org-style-gnome-1.1.0-16.9.EL.i386.rpm
>>
>> Those should represent the set of variations now possible.  I can 
>> handle most, but not all of the cases...any suggestions that would 
>> cover all of the above allowing the extraction of:
>> basename- in this case:
>>     XFree86-ISO8859-15-75dpi-fonts,
>>     xfig,
>>     rhel-ig-ppc-multi-zh_tw,
>>     perl-DateManip,
>>     openoffice.org-style-gnome
>>
>> version:
>>     4.3.0-78.EL    (yes, including the .EL unfortunately, although I'd 
>> be OK without it and munging it on end if needed)
>>     3.2.3d-12
>>     3-4
>>     5.42a-0
>>     1.1.0-16.9.EL
> 
> 
> corrected versions:
>     4.3.0
>     3.2.3d
>     3
>     5.42a
>     1.10
> 
> (new) releases:
>     78.EL
>     12
>     4
>     0
>     16.9.EL
> 
> 
>> arches:
>>     i386,
>>     i386,
>>     noarch,
>>     noarch,
>>     i386
>> respectively.
>>
>> Any help greatly appreciated, as I've been beating myself up on this 
>> one for a bit.
>>
>> Thanks,
>>
>> Scott
>>
>> _______________________________________________
>> Tutor maillist  -  Tutor at python.org
>> http://mail.python.org/mailman/listinfo/tutor
>>
> 
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
> 



More information about the Tutor mailing list