[Python-ideas] Maybe allow br"" or rb"" e.g., for bytes regexes in Py3?
Mark Summerfield
mark at qtrac.eu
Tue Jun 29 10:20:56 CEST 2010
Hi,
Python 3 has two string prefixes r"" for raw strings and b"" for bytes.
So if you want to create a regex based on bytes as far as I can tell you
have to do something like this:
FONTNAME_RE = re.compile(r"/FontName\s+/(\S+)".encode("ascii"))
# or
FONTNAME_RE = re.compile(b"/FontName\\s+/(\\S+)")
I think it would be much nicer if one could write:
FONTNAME_RE = re.compile(br"/FontName\s+/(\S+)")
# or
FONTNAME_RE = re.compile(rb"/FontName\s+/(\S+)")
I _slightly_ prefer rb"" to br"" but either would be great:-)
Why would you want a bytes regex?
In my case I am reading PostScript files and PostScript .pfa font files
so that I can embed the latter into the former. But I don't know what
encoding these files use beyond the fact that it is ASCII or some ASCII
superset like Latin1. So in true Python style I don't assume: instead I
read the files as bytes and do all my processing using bytes, at no
point decoding since I only ever insert ASCII characters. I don't think
this is a rare example: with Python 3's clean separation between strings
& bytes (a major advance IMO), I think there will often be cases where
all the processing is done using bytes.
--
Mark Summerfield, Qtrac Ltd, www.qtrac.eu
C++, Python, Qt, PyQt - training and consultancy
"Advanced Qt Programming" - ISBN 0321635906
http://www.qtrac.eu/aqpbook.html
I ordered a Dell netbook with Ubuntu...
I got no OS, no apology, no solution, & no refund (so far)
http://www.qtrac.eu/dont-buy-dell.html
More information about the Python-ideas
mailing list