Regular expression to capture model numbers

Piet van Oostrum piet at cs.uu.nl
Thu Apr 23 11:29:45 EDT 2009


>>>>> John Machin <sjmachin at lexicon.net> (JM) wrote:

>JM> On Apr 23, 8:01 am, krishnaposti... at gmail.com wrote:

>>> Requirements:
>>>   The text must contain a combination of numbers, alphabets and hyphen
>>> with at least two of the three elements present.

>JM> Unfortunately(?), regular expressions can't express complicated
>JM> conditions like that.

Yes, they can but it is not pretty.

The pattern must start with a letter, a digit or a hyphen. 

If it starts with a letter, for example, there must be at least a hyphen
or a digit somewhere. So let us concentrate on the first one of these
that occurs in the string. Then the preceding things are only letters
and after it can be any combination of letters, digits and hyphens. So
the pattern for this is (when we write L for letters, and d for digits):

L+[-d][-Ld]*.

Similarly for strings starting with a digit and with a hyphen. Now
replacing L with [A-Za-z] and d with [0-9] or \d and factoring out the
[-Ld]* which is common to all 3 cases you get:

(?:[A-Za-z]+[-0-9]|[0-9]+[-A-Za-z]|-+[0-9A-Za-z])[-0-9A-Za-z]*

>>> obj = re.compile(r'(?:[A-Za-z]+[-0-9]|[0-9]+[-A-Za-z]|-+[0-9A-Za-z])[-0-9A-Za-z]*')
>>> re.findall(obj, 'TestThis;1234;Test123AB-x')
['Test123AB-x']

Or you can use re.I and mention only one case of letters:

obj = re.compile(r'(?:[a-z]+[-0-9]|[0-9]+[-a-z]|-+[0-9a-z])[-0-9a-z]*', re.I)
-- 
Piet van Oostrum <piet at cs.uu.nl>
URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email: piet at vanoostrum.org



More information about the Python-list mailing list