question of regular expression
serge at zolshar.ru
Fri Nov 16 20:02:20 CET 2001
> I am now using org.apache.regexp for regular expression in Java.
> I want to search the unicode characters (i am afraid the others can't
> understand the language of the unicode, so I still try to use English
> for my example, assume all the following words are unicode)
> I want to search the words with the following possibilities:
> AB or C or E or ST
> how to I write the expression?
> I have tried this:
> (AB|C|E|ST), however, it also matches when there is B and T only....
The pattern is correct.
> (while only AB or ST allows to be matched)......
The 2 possible culprits are:
- bad unicode handling by jakarta-regexp library;
- bad unicode handling by you (there are a TONS of problems with
unicode input and output, trust me; localization is much more
difficult thing than it may seem).
I have run your test with russian characters(Cyrillic unicode block)
and four different regex libraries -
jakarata-regexp,jakarta-oro,jregex,regex4j - and all worked as
expected. So the second case seem more probable for me. Provide more
More information about the Python-list