question of regular expression
skeptic
serge at zolshar.ru
Fri Nov 16 14:02:20 EST 2001
Hello, Stephen!
> I am now using org.apache.regexp for regular expression in Java.
> I want to search the unicode characters (i am afraid the others can't
> understand the language of the unicode, so I still try to use English
> for my example, assume all the following words are unicode)
> e.g.
> I want to search the words with the following possibilities:
>
> AB or C or E or ST
>
> how to I write the expression?
>
> I have tried this:
>
> (AB|C|E|ST), however, it also matches when there is B and T only....
The pattern is correct.
> (while only AB or ST allows to be matched)......
>
> thx!
The 2 possible culprits are:
- bad unicode handling by jakarta-regexp library;
- bad unicode handling by you (there are a TONS of problems with
unicode input and output, trust me; localization is much more
difficult thing than it may seem).
I have run your test with russian characters(Cyrillic unicode block)
and four different regex libraries -
jakarata-regexp,jakarta-oro,jregex,regex4j - and all worked as
expected. So the second case seem more probable for me. Provide more
details.
Regards
More information about the Python-list
mailing list