[Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5

Fri Jan 10 18:17:02 CET 2014

(Sorry if this messes-up the thread order, it is meant as a reply to the
original RFC.)

Dear list,

newbie here. After much hesitation I decided to put forward a use case
which bothers me about the current proposal. Disclaimer: I happen to write
a library which is directly influenced by this.

As you may know, PDF operates over bytes and an integer or floating-point
number is written down as-is, for example "100" or "1.23".

However, the proposal drops "%d", "%f" and "%x" formats and the suggested
workaround for writing down a number is to use ".encode('ascii')", which I
think has two problems:

One is that it needs to construct one additional object per formatting as
opposed to Python 2; it is not uncommon for a PDF file to contain millions
of numbers.

The second problem is that, in my eyes, it is very counter-intuitive to
require the use of str only to get formatting on bytes. Consider the case
where a large bytes object is created out of many smaller bytes objects. If
I wanted to format a part I had to use str instead. For example:

    content = b''.join([
        b'header',
        b'some dictionary structure',
        b'part 1 abc',
        ('part 2 %.3f' % number).encode('ascii'),
        b'trailer'])

In the case of PDF, the embedding of an image into PDF looks like:

    10 0 obj
      << /Type /XObject
         /Width 100
         /Height 100
         /Alternates 15 0 R
         /Length 2167
      >>
    stream
    ...binary image data...
    endstream
    endobj

Because of the image it makes sense to store such structure inside bytes.
On the other hand, there may well be another "obj" which contains the
coordinates of Bezier paths:

    11 0 obj
    ...
    stream
    0.5 0.1 0.2 RG
    300 300 m
    300 400 400 400 400 300 c
    b
    endstream
    endobj

To summarize, there are cases which mix "binary" and "text" and, in my
opinion, dropping the bytes-formatting of numbers makes it more complicated
than it was. I would appreciate any explanation on how:

    b'%.1f %.1f %.1f RG' % (r, g, b)

is more confusing than:

    b'%s %s %s RG' % tuple(map(lambda x: (u'%.1f' % x).encode('ascii'), (r,
g, b)))

Similar situation exists for HTTP ("Content-Length: 123") and ASCII STL
("vertex 1.0 0.0 0.0").

Thanks and have a nice day,

Juraj Sukop

PS: In the case the proposal will not include the number formatting, it
would be nice to list there a set of guidelines or examples on how to
proceed with porting Python 2 formats to Python 3.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140110/c2eb5dbe/attachment.html>