Maybe allow br"" or rb"" e.g., for bytes regexes in Py3?
Hi, Python 3 has two string prefixes r"" for raw strings and b"" for bytes. So if you want to create a regex based on bytes as far as I can tell you have to do something like this: FONTNAME_RE = re.compile(r"/FontName\s+/(\S+)".encode("ascii")) # or FONTNAME_RE = re.compile(b"/FontName\\s+/(\\S+)") I think it would be much nicer if one could write: FONTNAME_RE = re.compile(br"/FontName\s+/(\S+)") # or FONTNAME_RE = re.compile(rb"/FontName\s+/(\S+)") I _slightly_ prefer rb"" to br"" but either would be great:-) Why would you want a bytes regex? In my case I am reading PostScript files and PostScript .pfa font files so that I can embed the latter into the former. But I don't know what encoding these files use beyond the fact that it is ASCII or some ASCII superset like Latin1. So in true Python style I don't assume: instead I read the files as bytes and do all my processing using bytes, at no point decoding since I only ever insert ASCII characters. I don't think this is a rare example: with Python 3's clean separation between strings & bytes (a major advance IMO), I think there will often be cases where all the processing is done using bytes. -- Mark Summerfield, Qtrac Ltd, www.qtrac.eu C++, Python, Qt, PyQt - training and consultancy "Advanced Qt Programming" - ISBN 0321635906 http://www.qtrac.eu/aqpbook.html I ordered a Dell netbook with Ubuntu... I got no OS, no apology, no solution, & no refund (so far) http://www.qtrac.eu/dont-buy-dell.html
On Tue, Jun 29, 2010 at 6:20 PM, Mark Summerfield <mark@qtrac.eu> wrote:
FONTNAME_RE = re.compile(br"/FontName\s+/(\S+)") # or FONTNAME_RE = re.compile(rb"/FontName\s+/(\S+)")
I _slightly_ prefer rb"" to br"" but either would be great:-)
According to my local build, we already picked 'br': Python 3.2a0 (py3k:81943, Jun 12 2010, 22:02:56) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.
"\t" '\t' r"\t" '\\t' b"\t" b'\t' br"\t" b'\\t'
I installed the system python3 to confirm that this isn't new: Python 3.1.2 (r312:79147, Apr 15 2010, 15:35:48) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.
br"\t" b'\\t'
Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
You're right, so I've raised it as a doc bug: http://bugs.python.org/issue9114 On 2010-06-29, Nick Coghlan wrote:
On Tue, Jun 29, 2010 at 6:20 PM, Mark Summerfield <mark@qtrac.eu> wrote:
FONTNAME_RE = re.compile(br"/FontName\s+/(\S+)") # or FONTNAME_RE = re.compile(rb"/FontName\s+/(\S+)")
I _slightly_ prefer rb"" to br"" but either would be great:-)
According to my local build, we already picked 'br':
Python 3.2a0 (py3k:81943, Jun 12 2010, 22:02:56) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.
"\t"
'\t'
r"\t"
'\\t'
b"\t"
b'\t'
br"\t"
b'\\t'
I installed the system python3 to confirm that this isn't new:
Python 3.1.2 (r312:79147, Apr 15 2010, 15:35:48) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.
br"\t"
b'\\t'
Cheers, Nick.
-- Mark Summerfield, Qtrac Ltd, www.qtrac.eu C++, Python, Qt, PyQt - training and consultancy "Advanced Qt Programming" - ISBN 0321635906 http://www.qtrac.eu/aqpbook.html I ordered a Dell netbook with Ubuntu... I got no OS, no apology, no solution, & no refund (so far) http://www.qtrac.eu/dont-buy-dell.html
On Tue, Jun 29, 2010 at 6:07 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Nick Coghlan wrote:
According to my local build, we already picked 'br':
Wouldn't "raw bytes" sound better than "bytes raw"? Or do the Dutch say it differently? :-)
I can pronounce "brrrrr" but I can't say "rrrrrb". :-) -- --Guido van Rossum (python.org/~guido)
Guido van Rossum wrote:
On Tue, Jun 29, 2010 at 6:07 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Nick Coghlan wrote:
According to my local build, we already picked 'br': Wouldn't "raw bytes" sound better than "bytes raw"? Or do the Dutch say it differently? :-)
I can pronounce "brrrrr" but I can't say "rrrrrb". :-)
And, of course, Python 2 has 'ur', but not 'ru'.
On 6/29/2010 10:04 PM, MRAB wrote:
Guido van Rossum wrote:
On Tue, Jun 29, 2010 at 6:07 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Nick Coghlan wrote:
According to my local build, we already picked 'br': Wouldn't "raw bytes" sound better than "bytes raw"? Or do the Dutch say it differently? :-)
I can pronounce "brrrrr" but I can't say "rrrrrb". :-)
And, of course, Python 2 has 'ur', but not 'ru'.
Even though most say or think 'raw unicode' rather than 'unicode raw'. But ur and br strike me as logically correct. In both Py2 and Py3, string literals are str literals. The r prefix disables most of the cooking of the literal. The u and b prefixes are effectively abbreviations for unicode() and bytes() calls on, I presume, the buffer part of a partially formed str object. In other words, br'abc' has the same effect as bytes(r'abc') but is easier to write and, I presume, faster to compute. It it easy for people who only use ascii chars in Python code to forget that Python3 code is now actually a sequence of unicode chars rather than of (extended) ascii chars. -- Terry Jan Reedy
Mark Summerfield writes:
Python 3 has two string prefixes r"" for raw strings and b"" for bytes.
And you *can* combine them, but it needs to be in the right order (although I'm not sure that's intentional): steve@uwakimon ~ $ python3.1 Python 3.1.2 (release31-maint, May 12 2010, 20:15:06) [GCC 4.3.4] on linux2 Type "help", "copyright", "credits" or "license" for more information.
rb"a\rc" File "<stdin>", line 1 rb"a\rc" ^ SyntaxError: invalid syntax br"abc" b'abc' br"a\rc" b'a\\rc'
Watch out for that time machine!
participants (7)
-
Greg Ewing
-
Guido van Rossum
-
Mark Summerfield
-
MRAB
-
Nick Coghlan
-
Stephen J. Turnbull
-
Terry Reedy