New subject: Maybe allow br"" or rb"" e.g., for bytes regexes in Py3?

29 Jun 2010

      Hi,

Python 3 has two string prefixes r"" for raw strings and b"" for bytes.

So if you want to create a regex based on bytes as far as I can tell you
have to do something like this:

    FONTNAME_RE = re.compile(r"/FontName\s+/(\S+)".encode("ascii"))
    # or
    FONTNAME_RE = re.compile(b"/FontName\\s+/(\\S+)")

I think it would be much nicer if one could write:

    FONTNAME_RE = re.compile(br"/FontName\s+/(\S+)")
    # or
    FONTNAME_RE = re.compile(rb"/FontName\s+/(\S+)")

I _slightly_ prefer rb"" to br"" but either would be great:-)

Why would you want a bytes regex?

In my case I am reading PostScript files and PostScript .pfa font files
so that I can embed the latter into the former. But I don't know what
encoding these files use beyond the fact that it is ASCII or some ASCII
superset like Latin1. So in true Python style I don't assume: instead I
read the files as bytes and do all my processing using bytes, at no
point decoding since I only ever insert ASCII characters. I don't think
this is a rare example: with Python 3's clean separation between strings
& bytes (a major advance IMO), I think there will often be cases where
all the processing is done using bytes.

-- 
Mark Summerfield, Qtrac Ltd, www.qtrac.eu
    C++, Python, Qt, PyQt - training and consultancy
        "Advanced Qt Programming" - ISBN 0321635906
            http://www.qtrac.eu/aqpbook.html

                I ordered a Dell netbook with Ubuntu...
       I got no OS, no apology, no solution, & no refund (so far)
               http://www.qtrac.eu/dont-buy-dell.html

Maybe allow br"" or rb"" e.g., for bytes regexes in Py3?

Mark Summerfield

Nick Coghlan

Mark Summerfield

Greg Ewing

Guido van Rossum

MRAB

Terry Reedy

Stephen J. Turnbull

tags

participants (7)