How to use Unicode regexes?
rhys tucker
rhystucker at rhystucker.fsnet.co.uk
Fri Jul 27 16:14:56 EDT 2001
Could somebody show me how to do Unicode regexes? I'm trying to write a strings-like utility for windows - so I want to match ascii and unicode characters in a binary file. Do I need
one regex pattern since ascii and Unicode are similar for ascii text characters or are 2 regex patterns needed since they are different byte sizes?
The documentation suggest that I need to use \w pattern to match Unicode and set UNICODE. I'm not sure what and how to set Unicode.
This is what I've done so far - it matches (some ?) ascii characters but misses those unicode strings.
#!/usr/bin/env python
# strings program
import sys
from re import compile, findall
f = open(sys.argv[1])
fl = f.read()
patt = compile("[\032-\176\000]{4,}")
matches = findall(patt, fl)
for match in matches:
print match
Thanks to those people who answered my earlier question.
rhys
More information about the Python-list
mailing list