Regular expression for matching IPA characters in Unicode?

Michael Hoffman m.h.3.9.1.without.dots.at.cam.ac.uk at example.com
Mon Oct 11 15:08:16 CEST 2004


Mickel Grönroos wrote:

> Which is the best way of checking that a given unicode string only 
> contains IPA characters, e.g. characters in the range \u0250-\u02AF?

Well, I'll give you an example that only includes characters in the 
range [\u0250, \u02AF] but those are just the IPA *extensions.* You also 
need to include basic latin and greek characters from other blocks.

See: http://www.unicode.org/charts/PDF/U0250.pdf

And why do you want to do this anyway?

This example uses the itertools example all() which tells you whether a 
predicate is true for every item in an iterable. The predicate here is 
whether the item is contained in IPA_CHARS, which you can expand...

=====

import itertools
from sets import Set # set() is a built-in in 2.4

IPA_CHARS = Set(map(unichr, xrange(0x250, 0x2b0)))

def all(seq, pred=bool):
     # http://www.python.org/doc/current/lib/itertools-example.html
     "Returns True if pred(x) is True for every element in the iterable"
     return False not in itertools.imap(pred, seq)

def is_ipa(iterable):
     return all(iterable, IPA_CHARS.__contains__)

print is_ipa(u"aeiou") # this is valid IPA, but not in the extensions block
print is_ipa(u"\u0260\u02af") # valid IPA in the extensions block

====output===

False
True
-- 
Michael Hoffman



More information about the Python-list mailing list