help needed with regex and unicode
Pradnyesh Sawant
spradml at gmail.com
Tue Mar 4 00:19:54 EST 2008
Hi all,
I have a file which contains chinese characters. I just want to find out
all the places that these chinese characters occur.
The following script doesn't seem to work :(
**********************************************************************
class RemCh(object):
def __init__(self, fName):
self.pattern = re.compile(r'[\u2F00-\u2FDF]+')
fp = open(fName, 'r')
content = fp.read()
s = re.search('[\u2F00-\u2fdf]', content, re.U)
if s:
print s.group(0)
if __name__ == '__main__':
rc = RemCh('/home/pradnyesh/removeChinese/delFolder.php')
**********************************************************************
the php file content is something like the following:
**********************************************************************
// Check if the folder still has subscribed blogs
$subCount = function1($param1, $param2);
if ($subCount > 0) {
$errors['summary'] = 'æÂï½ æ½å¤æ¤Ã¥Ã¯Â«Ã¥Ã©Ã©Â§Ã§Â²Ã¨';
$errorMessage = 'æÂï½ æ½å¤æ¤Ã¥Ã¯Â«Ã¥Ã©Ã©Â§Ã§Â²Ã¨';
}
if (empty($errors)) {
$ret = function2($blog_res, $yuid, $fid);
if ($ret >= 0) {
$saveFalg = TRUE;
} else {
error_log("ERROR:: ret: $ret, function1($param1, $param2)");
$errors['summary'] = "æÂï½ æ½å¤æ¤Ã¨Â±Ã¥Ã£
$errorMessage = "æÂï½ æ½å¤æ¤Ã¨Â±Ã¥Ã£
}
}
**********************************************************************
--
warm regards,
Pradnyesh Sawant
--
Luck is the residue of good design. --Anon
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 196 bytes
Desc: Digital signature
URL: <http://mail.python.org/pipermail/python-list/attachments/20080304/534df959/attachment.sig>
More information about the Python-list
mailing list