help needed with regex and unicode

Pradnyesh Sawant spradml at gmail.com
Tue Mar 4 00:19:54 EST 2008


Hi all,
I have a file which contains chinese characters. I just want to find out
all the places that these chinese characters occur.

The following script doesn't seem to work :(

**********************************************************************
class RemCh(object):
    def __init__(self, fName):
        self.pattern = re.compile(r'[\u2F00-\u2FDF]+')
        fp = open(fName, 'r')
        content = fp.read()
        s = re.search('[\u2F00-\u2fdf]', content, re.U)
        if s:
            print s.group(0)
if __name__ == '__main__':
    rc = RemCh('/home/pradnyesh/removeChinese/delFolder.php')
**********************************************************************

the php file content is something like the following:

**********************************************************************
    // Check if the folder still has subscribed blogs
    $subCount = function1($param1, $param2);
    if ($subCount > 0) {
        $errors['summary'] = 'æ­ï½ æ½å¤此åï«åéé§ç²è';
        $errorMessage  = 'æ­ï½ æ½å¤此åï«åéé§ç²è';
    }

    if (empty($errors)) {
        $ret = function2($blog_res, $yuid, $fid);
        if ($ret >= 0) {
            $saveFalg = TRUE;
        } else {
            error_log("ERROR:: ret: $ret, function1($param1, $param2)");
            $errors['summary'] = "æ­ï½ æ½å¤此è±åã
            $errorMessage  = "æ­ï½ æ½å¤此è±åã
        }
    }
**********************************************************************

-- 
warm regards,
Pradnyesh Sawant
--
Luck is the residue of good design. --Anon
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 196 bytes
Desc: Digital signature
URL: <http://mail.python.org/pipermail/python-list/attachments/20080304/534df959/attachment.sig>


More information about the Python-list mailing list