Need a Regular expression to remove a char for Unicode text
Sybren Stuvel
sybrenUSE at YOURthirdtower.com.imagination
Fri Oct 13 07:41:18 EDT 2006
à°¶à±à°°à±€à°¨à°¿à°µà°¾à°¸ enlightened us with:
> Can any one tell me how can i remove a character from a unocode
> text. à°•à°²à±<200c>&హార is a Telugu word in Unicode. Here i want to
> remove '&' but not replace with a zero width char. And one more
> thing, if any whitespaces are there before and after '&' char, the
> text should be kept as it is.
So basically, you want to match <200c>& and replace it with <200c>,
but only if it's not surrounded by whitespace, right?
r"(?<!\s)\x200c&(?!\s)" should match. I'm sure you'll be able to take
it from there.
Sybren
--
Sybren Stüvel
Stüvel IT - http://www.stuvel.eu/
More information about the Python-list
mailing list