[Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5

Mariano Reingart reingart at gmail.com
Sat Jan 11 23:13:39 CET 2014


On Fri, Jan 10, 2014 at 9:13 PM, Juraj Sukop <juraj.sukop at gmail.com> wrote:

>
>
>
> On Sat, Jan 11, 2014 at 12:49 AM, Antoine Pitrou <solipsis at pitrou.net>wrote:
>
>> Also, when you say you've never encountered UTF-16 text in PDFs, it
>>  sounds like those people who've never encountered any non-ASCII data in
>> their programs.
>
>
> Let me clarify: one does not think in "writing text in Unicode"-terms in
> PDF. Instead, one records the sequence of "character codes" which
> correspond to "glyphs" or the glyph IDs directly. That's because one
> Unicode character may have more than one glyph and more characters can be
> shown as one glyph.
>
>
>
AFAIK (and just for the record), there could be both Latin1 text and UTF-16
in a PDF (and other encodings too), depending on the font used:

/Encoding /WinAnsiEncoding (mostly latin1 "standard" fonts)
/Encoding /Identity-H (generally for unicode UTF-16 True Type "embedded"
fonts)

For example, in PyFPDF (a PHP library ported to python), the following code
writes out text that could be encoded in two different encodings:

s = sprintf("BT %.2f %.2f Td (%s) Tj ET", x*self.k, (self.h-y)*self.k, txt)

https://code.google.com/p/pyfpdf/source/browse/fpdf/fpdf.py#602

In Python2, txt is just a str, but in Python3 handling everything as latin1
string obviously doesn't work for TTF in this case.

Best regards

Mariano Reingart
http://www.sistemasagiles.com.ar
http://reingart.blogspot.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140111/dc5a31c8/attachment.html>


More information about the Python-Dev mailing list