[Python-ideas] Maybe allow br"" or rb"" e.g., for bytes regexes in Py3?

June 29, 2010

      Hi,

Python 3 has two string prefixes r"" for raw strings and b"" for bytes.

So if you want to create a regex based on bytes as far as I can tell you
have to do something like this:

    FONTNAME_RE = re.compile(r"/FontName\s+/(\S+)".encode("ascii"))
    # or
    FONTNAME_RE = re.compile(b"/FontName\\s+/(\\S+)")

I think it would be much nicer if one could write:

    FONTNAME_RE = re.compile(br"/FontName\s+/(\S+)")
    # or
    FONTNAME_RE = re.compile(rb"/FontName\s+/(\S+)")

I _slightly_ prefer rb"" to br"" but either would be great:-)

Why would you want a bytes regex?

In my case I am reading PostScript files and PostScript .pfa font files
so that I can embed the latter into the former. But I don't know what
encoding these files use beyond the fact that it is ASCII or some ASCII
superset like Latin1. So in true Python style I don't assume: instead I
read the files as bytes and do all my processing using bytes, at no
point decoding since I only ever insert ASCII characters. I don't think
this is a rare example: with Python 3's clean separation between strings
& bytes (a major advance IMO), I think there will often be cases where
all the processing is done using bytes.

-- 
Mark Summerfield, Qtrac Ltd, www.qtrac.eu
    C++, Python, Qt, PyQt - training and consultancy
        "Advanced Qt Programming" - ISBN 0321635906
            http://www.qtrac.eu/aqpbook.html

                I ordered a Dell netbook with Ubuntu...
       I got no OS, no apology, no solution, & no refund (so far)
               http://www.qtrac.eu/dont-buy-dell.html