[Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5

Sat Jan 11 05:14:25 CET 2014

On 11Jan2014 00:43, Juraj Sukop <juraj.sukop at gmail.com> wrote:
> On Fri, Jan 10, 2014 at 11:12 PM, Victor Stinner
> <victor.stinner at gmail.com>wrote:
> > What not building "10 0 obj ... stream" and "endstream endobj" in
> > Unicode and then encode to ASCII? Example:
> >
> > data = b''.join((
> >   ("%d %d obj ... stream" % (10, 0)).encode('ascii'),
> >   binary_image_data,
> >   ("endstream endobj").encode('ascii'),
> > ))
> 
> The key is "encode to ASCII" which means that the result is bytes. Then,
> there is this "11 0 obj" which should also be bytes. But it has no
> "binary_image_data" - only lots of numbers waiting to be somehow converted
> to bytes. I already mentioned the problems with ".encode('ascii')" but it
> does not stop here. Numbers may appear not only inside "streams" but almost
> anywhere: in the header there is PDF version, an image has to have "width"
> and "height", at the end of PDF there is a structure containing offsets to
> all of the objects in file. Basically, to ".encode('ascii')" every possible
> number is not exactly simple or pretty.

Hi Juraj,

Might I suggest a helper function (outside the PEP scope) instead
of arguing for support for %f et al?

Thus:

  def bytify(things, encoding='ascii'):
    for thing:
      if isinstance(thing, bytes):
        yield thing
      else:
        yield str(thing).encode('ascii')

Then one's embedding in PDF might become, more readably:

  data = b' '.join( bytify( [ 10, 0, obj, binary_image_data, ... ] ) )

Of course, bytify might be augmented with whatever encoding facilities
might suit your needs.

Cheers,
-- 
Cameron Simpson <cs at zip.com.au>

We tend to overestimate the short-term impact of technological change and
underestimate its long-term impact.     - Amara's Law