[Python-ideas] RFC: bytestring as a str representation [was: a new bytestring type?]
INADA Naoki
songofacandy at gmail.com
Wed Jan 8 14:10:42 CET 2014
You're right.
As I said previous mail, I had not considered about using surrogateescape.
But surrogateescpae is not silverbullet.
Decode with ascii and encode with target encoding is not valid on ascii
compatible encoding.
In [29]: bindata = b'abc'
In [30]: bindata = bindata.decode('ascii', 'surrogateescape')
In [31]: text = 'abc'
In [32]: query = 'SET textcolumn=%s bincolumn=%s' % ("'" + text + "'", "'"
+ bindata + "'")
In [33]: query.encode('utf16', 'surrogateescape')
Out[33]: b"\xff\xfeS\x00E\x00T\x00
\x00t\x00e\x00x\x00t\x00c\x00o\x00l\x00u\x00m\x00n\x00=\x00'\x00a\x00b\x00c\x00'\x00
\x00b\x00i\x00n\x00c\x00o\x00l\x00u\x00m\x00n\x00=\x00'\x00a\x00b\x00c\x00'\x00"
Fortunately, I can't use utf16 as client encoding with MySQL.
mysql> SET NAMES utf16;
ERROR 1231 (42000): Variable 'character_set_client' can't be set to the
value of 'utf16'
On Wed, Jan 8, 2014 at 9:11 PM, Stephen J. Turnbull <stephen at xemacs.org>wrote:
> >>>>> INADA Naoki writes:
>
> > I share my experience that I've suffered by bytes doesn't have %-format.
> > `MySQL-python is a most major DB-API 2.0 driver for MySQL.
> > MySQL-python uses 'format' paramstyle.
>
> > MySQL protocol is basically encoded text, but it may contain arbitrary
> > (escaped) binary.
> > Here is simplified example constructing real SQL from SQL format and
> > arguments. (Works only on Python 2.7)
>
> '>' quotes are omitted for clarity and comments deleted.
>
> def escape_string(s):
> return s.replace("'", "''")
>
> def convert(x):
> if isinstance(x, unicode):
> x = x.encode('utf-8')
> if isinstance(x, str):
> x = "'" + escape_string(x) + "'"
> else:
> x = str(x)
> return x
>
> def build_query(query, *args):
> if isinstance(query, unicode):
> query = query.encode('utf-8')
> return query % tuple(map(convert, args))
>
> textdata = b"hello"
> bindata = b"abc\xff\x00"
> query = "UPDATE table SET textcol=%s bincol=%s"
>
> print build_query(query, textdata, bindata)
>
> > I can't port this to Python 3.
>
> Why not? The obvious translation is
>
> # This is Python 3!!
> def escape_string(s):
> return s.replace("'", "''")
>
> def convert(x):
> if isinstance(x, bytes):
> x = escape_string(x.decode('ascii', errors='surrogateescape'))
> x = "'" + x + "'"
> else:
> x = str(x)
> return x
>
> def build_query(query, *args):
> query = query % tuple(map(convert, args))
> return query.encode('utf-8', errors='surrogateescape')
>
> textdata = "hello"
> bindata = b"abc\xff\x00"
> query = "UPDATE table SET textcol=%s bincol=%s"
>
> print build_query(query, textdata, bindata)
>
> The main issue I can think you might have with this is that there will
> need to be conversions to and from 16-bit representations, which take
> up unnecessary space for bindata, and are relatively slow for bindata.
> But it seems to me that these are second-order costs compared to the
> other work an adapter needs to do. What am I missing?
>
> With the proposed 'ascii-compatible' representation, if you have to
> handle many MB of binary or textdata with non-ASCII characters,
>
> def convert(x):
> if isinstance(x, str):
> x = x.encode('utf-8').decode('ascii-compatible')
> elif isinstance(x, bytes):
> x = escape_string(x.decode('ascii-compatible'))
> x = "'" + x + "'"
> else:
> x = str(x) # like 42
> return x
>
> def build_query(query, *args):
> query = convert(query) % tuple(map(convert, args))
> return query.encode('utf-8', errors='surrogateescape')
>
> ensures that the '%' format operator is always dealing with 8-bit
> representations only. There might be a conversion from 16-bit to
> 8-bit for str, but there will be no conversions from 8-bit to 16-bit
> representations. I don't know if that makes '%' itself faster, but
> it might.
>
>
--
INADA Naoki <songofacandy at gmail.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140108/9c7f064c/attachment.html>
More information about the Python-ideas
mailing list