Regular expression guaranteed to fail

Des Small des.small at bristol.ac.uk
Fri Aug 20 12:35:18 CEST 2004


I want to use sets and regular expressions to implement some
linguistic ideas.  Representing sounds by symbols, and properties
(coronal or velar articulation; voicedness) by sets of symbols with
those properties, it is natural to then map these sets, and
intersections of them, to regular expressions to apply to strings.

The question is, what regular expression should correspond to the
empty set?  I've provisionally gone with "(?!.*)", i.e., the negation
of a look-ahead which matches anything.  Is there an established idiom
for this, and is that it?  And if there isn't, does this seem
reasonable?

Example code:

"""
import sets

def str2set(s): return sets.Set(s.split())

cor = str2set("N D T") # Coronal articulation
vel = str2set("K G") # Velar articulation
voi = str2set("N D G") # Voiced

def set2re(s):
    if s: return "|".join([e for e in s])
    else: return "(?!.*)"
"""

So we can get a regexp (string) that matches symbols corresponding to
velar and voiced sounds:
"""
>>> set2re(cor & voi)
=> 'D|N'
"""
But nothing can be (in this model at least) velar and coronal:
"""
>>> cor & vel
=> Set([])
"""
and this maps to the Regexp Which Matches Nothing:
"""
>>> set2re(cor & vel)
=> '(?!.*)'
"""

This seems quite elegant to me, but there is such a fine line between
elegance and utter weirdness and I'd be glad to know which side other
persons think I'm on here.

Des
-- 
"[T]he structural trend in linguistics which took root with the
International Congresses of the twenties and early thirties [...] had
close and effective connections with phenomenology in its Husserlian
and Hegelian versions." -- Roman Jakobson



More information about the Python-list mailing list