[Tutor] help with regexps/filename parsing
Kent Johnson
kent37 at tds.net
Mon Jan 31 13:54:33 CET 2005
This works:
names = [
'XFree86-ISO8859-15-75dpi-fonts-4.3.0-78.EL.i386.rpm', # (Note the EL embedded in name)
'xfig-3.2.3d-12.i386.rpm', # (standard naming)
'rhel-ig-ppc-multi-zh_tw-3-4.noarch.rpm',
'perl-DateManip-5.42a-0.rhel3.noarch.rpm',
'openoffice.org-style-gnome-1.1.0-16.9.EL.i386.rpm',
]
import re
pattern = r'''
(?P<base>.+)
-(?P<version>[\w.]+)
-(?P<release>[\w.]+)
\.(?P<arch>\w+)
\.rpm
'''
patternRe = re.compile(pattern, re.VERBOSE)
for name in names:
m = patternRe.search(name)
if m:
print m.group('base', 'version', 'release', 'arch')
else:
print 'No match:', name
I figured this out by working from right to left:
- always ends with .rpm
- everything back to the next . is the arch
- everything back to the first (rightmost) - is the release
- everything to the next - is version
- everything left is the base name
Note the release for perl-DateManip-5.42a-0.rhel3.noarch.rpm is 0.rhel3 not 0 as you gave it.
Kent
Scott W wrote:
> Slight correction which I realized after sending, see below for
> version/release seperation, which I should have seen but blame lack of
> sleep ;-)
>
> Scott W wrote:
>
>> Hey all.
>>
>> I've got an issue that's been driving me a bit nuts. I'm sure it
>> _can_ be done with a regexp, although I'm missing a piece needed to
>> tie it together to work for all cases.
>>
>> I need to parse out a list of RPMs in this case, but it seems the RPM
>> naming convention has changed, as there are files I'll need to parse
>> that are NOT in the normal name-version-release.arch.rpm format.
>>
>> I need to be able to grab the 'basename' for each file, as well as the
>> version and arch, although these can be done seperately. The problem
>> can be shown by the following list of filenames:
>>
>> XFree86-ISO8859-15-75dpi-fonts-4.3.0-78.EL.i386.rpm (Note the
>> EL embedded in name)
>> xfig-3.2.3d-12.i386.rpm (standard naming)
>> rhel-ig-ppc-multi-zh_tw-3-4.noarch.rpm
>> perl-DateManip-5.42a-0.rhel3.noarch.rpm
>> openoffice.org-style-gnome-1.1.0-16.9.EL.i386.rpm
>>
>> Those should represent the set of variations now possible. I can
>> handle most, but not all of the cases...any suggestions that would
>> cover all of the above allowing the extraction of:
>> basename- in this case:
>> XFree86-ISO8859-15-75dpi-fonts,
>> xfig,
>> rhel-ig-ppc-multi-zh_tw,
>> perl-DateManip,
>> openoffice.org-style-gnome
>>
>> version:
>> 4.3.0-78.EL (yes, including the .EL unfortunately, although I'd
>> be OK without it and munging it on end if needed)
>> 3.2.3d-12
>> 3-4
>> 5.42a-0
>> 1.1.0-16.9.EL
>
>
> corrected versions:
> 4.3.0
> 3.2.3d
> 3
> 5.42a
> 1.10
>
> (new) releases:
> 78.EL
> 12
> 4
> 0
> 16.9.EL
>
>
>> arches:
>> i386,
>> i386,
>> noarch,
>> noarch,
>> i386
>> respectively.
>>
>> Any help greatly appreciated, as I've been beating myself up on this
>> one for a bit.
>>
>> Thanks,
>>
>> Scott
>>
>> _______________________________________________
>> Tutor maillist - Tutor at python.org
>> http://mail.python.org/mailman/listinfo/tutor
>>
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
More information about the Tutor
mailing list