[Tutor] Unicode question

Wed Sep 12 14:00:30 CEST 2007

Dear Kent,

thanks for your respond.
It is clear now.

> As a mnemonic I think of Unicode as pure unencoded data. (This is *not*
> accurate, it is a memory aid!) Then it's easy to remember that decode()
> removes encoding == convert to Unicode, encode() adds encoding ==
> convert from Unicode.

So I had to convert cp852 ascii file into unicode, that can be made with 
page.decode('cp852')

There was another problem also about, o with double acute and O with 
double acute,
as they were missed from the font files.

It works well now.

from reportlab.platypus import *
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.lib import pagesizes
from reportlab.lib.units import cm

PAGE_HEIGHT=defaultPageSize[1]
import copy

styles = getSampleStyleSheet()
InvStyle = copy.deepcopy(styles["Normal"])
InvStyle.fontSize = 8
InvStyle.leading = 9
InvStyle.fontName = 'Courier'
InvLineNum = 92

im = Image("bimbambumm.bmp", width=100, height=35)
im.hAlign = 'RIGHT'

def MakePdfInvoice(InvoiceNum, pages):
    PdfInv = []
    for page in pages:
        PdfInv.append(im)
        PdfInv.append(Preformatted(page, InvStyle))
        PdfInv.append(PageBreak())
    PdfInv = PdfInv[:-1]

    doc = SimpleDocTemplate(InvoiceNum)
    doc.topMargin = 1*cm
    doc.bottomMargin = 0
    doc.leftMargin = 0
    doc.rightMArgin = 0
    doc.build(PdfInv)

def BreakIntoPages(content):
    while len(content) > InvLineNum:
        page = content[:InvLineNum]
        content = content[InvLineNum:]
        yield page
    else:
        yield content

if __name__ == '__main__':
    content = open('invoice01_0707.txt').readlines()
    content = [line.replace('\x8a','\x99').replace('\x8b','\x94') for line 
in content]
    pages = []
    for page in BreakIntoPages(content):
        page = ''.join(page)
        pages.append(page.decode('cp852'))
    MakePdfInvoice('test.pdf', pages)

Kent Johnson <kent37 at tds.net> wrote on 2007.09.11 15:49:24:

> János Juhász wrote:
> > Dear All,
> >
> > I would like to convert my DOS txt file into pdf with reportlab.
> > The file can be seen correctly in Central European (DOS) encoding in
> > Explorer.
> >
> > My winxp uses cp852 as default codepage.
> >
> > When I open the txt file in notepad and set OEM/DOS script for 
terminal
> > fonts, it shows the file correctly.
> >
> > I tried to convert the file with the next way:
> >

> Use decode() here, not encode().
> decode() goes towards Unicode
> encode() goes away from Unicode

> As a mnemonic I think of Unicode as pure unencoded data. (This is *not*
> accurate, it is a memory aid!) Then it's easy to remember that decode()
> removes encoding == convert to Unicode, encode() adds encoding ==
> convert from Unicode.

> >     MakePdfInvoice('test.pdf', page)
> >
> > But it raised exception:
> > ordinal not in range(128)

> When you call encode on a string (instead of a unicode object) the
> string is first decoded to Unicode using ascii encoding. This usually 
fails.

> Kent