Regular expression to capture model numbers

John Machin sjmachin at lexicon.net
Thu Apr 23 20:10:50 EDT 2009


On Apr 24, 1:29 am, Piet van Oostrum <p... at cs.uu.nl> wrote:
> >>>>> John Machin <sjmac... at lexicon.net> (JM) wrote:
> >JM> On Apr 23, 8:01 am, krishnaposti... at gmail.com wrote:
> >>> Requirements:
> >>>   The text must contain a combination of numbers, alphabets and hyphen
> >>> with at least two of the three elements present.
> >JM> Unfortunately(?), regular expressions can't express complicated
> >JM> conditions like that.
>
> Yes, they can but it is not pretty.
>
> The pattern must start with a letter, a digit or a hyphen.
>
> If it starts with a letter, for example, there must be at least a hyphen
> or a digit somewhere. So let us concentrate on the first one of these
> that occurs in the string. Then the preceding things are only letters
> and after it can be any combination of letters, digits and hyphens. So
> the pattern for this is (when we write L for letters, and d for digits):
>
> L+[-d][-Ld]*.
>
> Similarly for strings starting with a digit and with a hyphen. Now
> replacing L with [A-Za-z] and d with [0-9] or \d and factoring out the
> [-Ld]* which is common to all 3 cases you get:
>
> (?:[A-Za-z]+[-0-9]|[0-9]+[-A-Za-z]|-+[0-9A-Za-z])[-0-9A-Za-z]*
>
> >>> obj = re.compile(r'(?:[A-Za-z]+[-0-9]|[0-9]+[-A-Za-z]|-+[0-9A-Za-z])[-0-9A-Za-z]*')
> >>> re.findall(obj, 'TestThis;1234;Test123AB-x')
>
> ['Test123AB-x']
>
> Or you can use re.I and mention only one case of letters:
>
> obj = re.compile(r'(?:[a-z]+[-0-9]|[0-9]+[-a-z]|-+[0-9a-z])[-0-9a-z]*', re.I)

Understandable and maintainable, I don't think. Suppose that instead
the first character is limited to being alphabetic. You have to go
through the whole process of elaborating the possibilites again, and I
don't consider that process qualifies as "express[ing] complicated
conditions like that".




More information about the Python-list mailing list