Hi,
Python 3 has two string prefixes r"" for raw strings and b"" for bytes.
So if you want to create a regex based on bytes as far as I can tell you have to do something like this:
FONTNAME_RE = re.compile(r"/FontName\s+/(\S+)".encode("ascii")) # or FONTNAME_RE = re.compile(b"/FontName\s+/(\S+)")
I think it would be much nicer if one could write:
FONTNAME_RE = re.compile(br"/FontName\s+/(\S+)") # or FONTNAME_RE = re.compile(rb"/FontName\s+/(\S+)")
I _slightly_ prefer rb"" to br"" but either would be great:-)
Why would you want a bytes regex?
In my case I am reading PostScript files and PostScript .pfa font files so that I can embed the latter into the former. But I don't know what encoding these files use beyond the fact that it is ASCII or some ASCII superset like Latin1. So in true Python style I don't assume: instead I read the files as bytes and do all my processing using bytes, at no point decoding since I only ever insert ASCII characters. I don't think this is a rare example: with Python 3's clean separation between strings & bytes (a major advance IMO), I think there will often be cases where all the processing is done using bytes.