
Hi, Python 3 has two string prefixes r"" for raw strings and b"" for bytes. So if you want to create a regex based on bytes as far as I can tell you have to do something like this: FONTNAME_RE = re.compile(r"/FontName\s+/(\S+)".encode("ascii")) # or FONTNAME_RE = re.compile(b"/FontName\\s+/(\\S+)") I think it would be much nicer if one could write: FONTNAME_RE = re.compile(br"/FontName\s+/(\S+)") # or FONTNAME_RE = re.compile(rb"/FontName\s+/(\S+)") I _slightly_ prefer rb"" to br"" but either would be great:-) Why would you want a bytes regex? In my case I am reading PostScript files and PostScript .pfa font files so that I can embed the latter into the former. But I don't know what encoding these files use beyond the fact that it is ASCII or some ASCII superset like Latin1. So in true Python style I don't assume: instead I read the files as bytes and do all my processing using bytes, at no point decoding since I only ever insert ASCII characters. I don't think this is a rare example: with Python 3's clean separation between strings & bytes (a major advance IMO), I think there will often be cases where all the processing is done using bytes. -- Mark Summerfield, Qtrac Ltd, www.qtrac.eu C++, Python, Qt, PyQt - training and consultancy "Advanced Qt Programming" - ISBN 0321635906 http://www.qtrac.eu/aqpbook.html I ordered a Dell netbook with Ubuntu... I got no OS, no apology, no solution, & no refund (so far) http://www.qtrac.eu/dont-buy-dell.html