Regular expression for matching IPA characters in Unicode?

Michael Hoffman at
Mon Oct 11 15:08:16 CEST 2004

Mickel Grönroos wrote:

> Which is the best way of checking that a given unicode string only 
> contains IPA characters, e.g. characters in the range \u0250-\u02AF?

Well, I'll give you an example that only includes characters in the 
range [\u0250, \u02AF] but those are just the IPA *extensions.* You also 
need to include basic latin and greek characters from other blocks.


And why do you want to do this anyway?

This example uses the itertools example all() which tells you whether a 
predicate is true for every item in an iterable. The predicate here is 
whether the item is contained in IPA_CHARS, which you can expand...


import itertools
from sets import Set # set() is a built-in in 2.4

IPA_CHARS = Set(map(unichr, xrange(0x250, 0x2b0)))

def all(seq, pred=bool):
     "Returns True if pred(x) is True for every element in the iterable"
     return False not in itertools.imap(pred, seq)

def is_ipa(iterable):
     return all(iterable, IPA_CHARS.__contains__)

print is_ipa(u"aeiou") # this is valid IPA, but not in the extensions block
print is_ipa(u"\u0260\u02af") # valid IPA in the extensions block


Michael Hoffman

More information about the Python-list mailing list