Maybe allow br"" or rb"" e.g., for bytes regexes in Py3?

Hi, Python 3 has two string prefixes r"" for raw strings and b"" for bytes. So if you want to create a regex based on bytes as far as I can tell you have to do something like this: FONTNAME_RE = re.compile(r"/FontName\s+/(\S+)".encode("ascii")) # or FONTNAME_RE = re.compile(b"/FontName\\s+/(\\S+)") I think it would be much nicer if one could write: FONTNAME_RE = re.compile(br"/FontName\s+/(\S+)") # or FONTNAME_RE = re.compile(rb"/FontName\s+/(\S+)") I _slightly_ prefer rb"" to br"" but either would be great:-) Why would you want a bytes regex? In my case I am reading PostScript files and PostScript .pfa font files so that I can embed the latter into the former. But I don't know what encoding these files use beyond the fact that it is ASCII or some ASCII superset like Latin1. So in true Python style I don't assume: instead I read the files as bytes and do all my processing using bytes, at no point decoding since I only ever insert ASCII characters. I don't think this is a rare example: with Python 3's clean separation between strings & bytes (a major advance IMO), I think there will often be cases where all the processing is done using bytes. -- Mark Summerfield, Qtrac Ltd, www.qtrac.eu C++, Python, Qt, PyQt - training and consultancy "Advanced Qt Programming" - ISBN 0321635906 http://www.qtrac.eu/aqpbook.html I ordered a Dell netbook with Ubuntu... I got no OS, no apology, no solution, & no refund (so far) http://www.qtrac.eu/dont-buy-dell.html

On Tue, Jun 29, 2010 at 6:20 PM, Mark Summerfield <mark@qtrac.eu> wrote:
According to my local build, we already picked 'br': Python 3.2a0 (py3k:81943, Jun 12 2010, 22:02:56) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.
I installed the system python3 to confirm that this isn't new: Python 3.1.2 (r312:79147, Apr 15 2010, 15:35:48) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.
br"\t" b'\\t'
Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

You're right, so I've raised it as a doc bug: http://bugs.python.org/issue9114 On 2010-06-29, Nick Coghlan wrote:
-- Mark Summerfield, Qtrac Ltd, www.qtrac.eu C++, Python, Qt, PyQt - training and consultancy "Advanced Qt Programming" - ISBN 0321635906 http://www.qtrac.eu/aqpbook.html I ordered a Dell netbook with Ubuntu... I got no OS, no apology, no solution, & no refund (so far) http://www.qtrac.eu/dont-buy-dell.html

On 6/29/2010 10:04 PM, MRAB wrote:
Even though most say or think 'raw unicode' rather than 'unicode raw'. But ur and br strike me as logically correct. In both Py2 and Py3, string literals are str literals. The r prefix disables most of the cooking of the literal. The u and b prefixes are effectively abbreviations for unicode() and bytes() calls on, I presume, the buffer part of a partially formed str object. In other words, br'abc' has the same effect as bytes(r'abc') but is easier to write and, I presume, faster to compute. It it easy for people who only use ascii chars in Python code to forget that Python3 code is now actually a sequence of unicode chars rather than of (extended) ascii chars. -- Terry Jan Reedy

Mark Summerfield writes:
Python 3 has two string prefixes r"" for raw strings and b"" for bytes.
And you *can* combine them, but it needs to be in the right order (although I'm not sure that's intentional): steve@uwakimon ~ $ python3.1 Python 3.1.2 (release31-maint, May 12 2010, 20:15:06) [GCC 4.3.4] on linux2 Type "help", "copyright", "credits" or "license" for more information.
Watch out for that time machine!

On Tue, Jun 29, 2010 at 6:20 PM, Mark Summerfield <mark@qtrac.eu> wrote:
According to my local build, we already picked 'br': Python 3.2a0 (py3k:81943, Jun 12 2010, 22:02:56) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.
I installed the system python3 to confirm that this isn't new: Python 3.1.2 (r312:79147, Apr 15 2010, 15:35:48) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.
br"\t" b'\\t'
Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

You're right, so I've raised it as a doc bug: http://bugs.python.org/issue9114 On 2010-06-29, Nick Coghlan wrote:
-- Mark Summerfield, Qtrac Ltd, www.qtrac.eu C++, Python, Qt, PyQt - training and consultancy "Advanced Qt Programming" - ISBN 0321635906 http://www.qtrac.eu/aqpbook.html I ordered a Dell netbook with Ubuntu... I got no OS, no apology, no solution, & no refund (so far) http://www.qtrac.eu/dont-buy-dell.html

On 6/29/2010 10:04 PM, MRAB wrote:
Even though most say or think 'raw unicode' rather than 'unicode raw'. But ur and br strike me as logically correct. In both Py2 and Py3, string literals are str literals. The r prefix disables most of the cooking of the literal. The u and b prefixes are effectively abbreviations for unicode() and bytes() calls on, I presume, the buffer part of a partially formed str object. In other words, br'abc' has the same effect as bytes(r'abc') but is easier to write and, I presume, faster to compute. It it easy for people who only use ascii chars in Python code to forget that Python3 code is now actually a sequence of unicode chars rather than of (extended) ascii chars. -- Terry Jan Reedy

Mark Summerfield writes:
Python 3 has two string prefixes r"" for raw strings and b"" for bytes.
And you *can* combine them, but it needs to be in the right order (although I'm not sure that's intentional): steve@uwakimon ~ $ python3.1 Python 3.1.2 (release31-maint, May 12 2010, 20:15:06) [GCC 4.3.4] on linux2 Type "help", "copyright", "credits" or "license" for more information.
Watch out for that time machine!
participants (7)
-
Greg Ewing
-
Guido van Rossum
-
Mark Summerfield
-
MRAB
-
Nick Coghlan
-
Stephen J. Turnbull
-
Terry Reedy