[Tutor] Unicode question
János Juhász
janos.juhasz at VELUX.com
Wed Sep 12 14:00:30 CEST 2007
Dear Kent,
thanks for your respond.
It is clear now.
> As a mnemonic I think of Unicode as pure unencoded data. (This is *not*
> accurate, it is a memory aid!) Then it's easy to remember that decode()
> removes encoding == convert to Unicode, encode() adds encoding ==
> convert from Unicode.
So I had to convert cp852 ascii file into unicode, that can be made with
page.decode('cp852')
There was another problem also about, o with double acute and O with
double acute,
as they were missed from the font files.
It works well now.
from reportlab.platypus import *
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.lib import pagesizes
from reportlab.lib.units import cm
PAGE_HEIGHT=defaultPageSize[1]
import copy
styles = getSampleStyleSheet()
InvStyle = copy.deepcopy(styles["Normal"])
InvStyle.fontSize = 8
InvStyle.leading = 9
InvStyle.fontName = 'Courier'
InvLineNum = 92
im = Image("bimbambumm.bmp", width=100, height=35)
im.hAlign = 'RIGHT'
def MakePdfInvoice(InvoiceNum, pages):
PdfInv = []
for page in pages:
PdfInv.append(im)
PdfInv.append(Preformatted(page, InvStyle))
PdfInv.append(PageBreak())
PdfInv = PdfInv[:-1]
doc = SimpleDocTemplate(InvoiceNum)
doc.topMargin = 1*cm
doc.bottomMargin = 0
doc.leftMargin = 0
doc.rightMArgin = 0
doc.build(PdfInv)
def BreakIntoPages(content):
while len(content) > InvLineNum:
page = content[:InvLineNum]
content = content[InvLineNum:]
yield page
else:
yield content
if __name__ == '__main__':
content = open('invoice01_0707.txt').readlines()
content = [line.replace('\x8a','\x99').replace('\x8b','\x94') for line
in content]
pages = []
for page in BreakIntoPages(content):
page = ''.join(page)
pages.append(page.decode('cp852'))
MakePdfInvoice('test.pdf', pages)
Kent Johnson <kent37 at tds.net> wrote on 2007.09.11 15:49:24:
> János Juhász wrote:
> > Dear All,
> >
> > I would like to convert my DOS txt file into pdf with reportlab.
> > The file can be seen correctly in Central European (DOS) encoding in
> > Explorer.
> >
> > My winxp uses cp852 as default codepage.
> >
> > When I open the txt file in notepad and set OEM/DOS script for
terminal
> > fonts, it shows the file correctly.
> >
> > I tried to convert the file with the next way:
> >
> Use decode() here, not encode().
> decode() goes towards Unicode
> encode() goes away from Unicode
> As a mnemonic I think of Unicode as pure unencoded data. (This is *not*
> accurate, it is a memory aid!) Then it's easy to remember that decode()
> removes encoding == convert to Unicode, encode() adds encoding ==
> convert from Unicode.
> > MakePdfInvoice('test.pdf', page)
> >
> > But it raised exception:
> > ordinal not in range(128)
> When you call encode on a string (instead of a unicode object) the
> string is first decoded to Unicode using ascii encoding. This usually
fails.
> Kent
More information about the Tutor
mailing list