[Python-Dev] PEP 460 reboot

Glenn Linderman v+python at g.nevcal.com
Tue Jan 14 07:37:17 CET 2014


On 1/13/2014 9:25 PM, Nick Coghlan wrote:
> since this observation makes it clear that there's*no*  coherent way
> to offer a pure binary interpolation API - the only general purpose
> combination mechanism for segments of binary data that can avoid
> making assumptions about the encodings of metacharacters is simple
> concatenation.
That's almost true, and I'm glad that you, Guido, and all of us can 
understand that the currently defined python2 and python3 formatting 
syntaxes contain an inherent ASCII assumption, just like many internet 
protocols. The bitter fight is over :)

However, your statement above isn't 100% accurate, so just for the 
pedantry of it, I'll point out why. A mechanism could be defined where 
"format string" would only contain format specifications, and any other 
text would be considered an error. The format string could have an 
explicit or a defined encoding, there would be no need to make an 
assumption about its encoding. And since it would not contain text 
except for format specifications, it would only be used as a rule-book 
on how to interpret the parameters, contributing no text of its own to 
the result.

This wouldn't solve the problem at hand, though, which is to provide a 
nice migration path from Python 2 to Python 3 for code that uses 
ASCII-based format strings that do contribute text as well as include 
parameter data.

Whether such a technique would be more useful than simple concatenation 
(or complex concatenation such as join) remains to be seen, and possibly 
discussed, if anyone is interested, but it probably would belong on 
python-ideas, since it would not address an immediate porting issue.

Assuming an ASCII-in-bytes format string (but with no contributed text 
to the result) one could write something like

b"%{koi7}s%{00}v%{big5}d%{00}v%{ShiftJIS}s%{0000}v%b" / ( cyrillic, len( 
blob ), japanese, blob )

So the encodings to be applied to each of the input parameters could be 
explicitly specified.

The %{00}v stuff would be interpolated into the output... expressed in 
ASCII as hex, two characters per byte.  Note that the number uses 
Chinese digits in the big5 encoding, but I don't know if the Chinese 
even use their own digits or ASCII ones these days, or what base they 
use, I guess it was the Babylonians that used base 60 from which our 
timekeeping and angular measures were derived. The example shows a null 
byte or two between items in the output.

So there _could be_ a coherent way to offer an interpolation mechanism 
that is pure binary, and allows selection of encoding of str data, if 
and as needed.  One specifier could even be an encoding to apply to any 
format specifiers that don't include an encoding, so in the typical case 
of dealing with a single language output, the appropriate encoding could 
be set at the beginning of the format specification and overridden by 
particular specifiers if need be. But while there _could be_ such an 
interpolation mechanism, it isn't compatible with Python 2, and the jury 
hasn't decided whether such a thing is sufficiently more useful than 
concatenation to be worth implementing.  A different operator might be 
required, or the whole thing could be a function instead of an operator, 
with a similar format specification, or one more like the minilanguage 
used with format in python 3.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140113/f5fa256d/attachment.html>


More information about the Python-Dev mailing list