ascii txt to LaTeX

marco marco.rossini at gmx.ch
Wed Apr 23 23:40:34 CEST 2003


i wrote a python program for converting an word wrapped(!) ascii text into 
LaTeX format. it actually doesn't do much, just finds(!) and places 
paragraphs (\n\n), replaces apostrophes and umlauts. it does routine stuff: 
why do boring things myself if my computer can do them for me?

feel free to use it (GPL). also i would appreciate critique on my code.

probably someone has made such (or: a better) program before, but i don't 
care, it was easy for me to do and it's all _I_ need.

        marco

#!/usr/bin/python

# Converts a regular word-wrapped ascii text into formated LaTeX
# detects paragraphs, replaces apostrophes, replaces umlauts

# Program written by Marco Rossini <marco.rossini at gmx.ch>
# Copyright: GPL

from sys import argv
from sys import exit
from math import ceil
from string import find
from string import join
from string import strip
from string import replace
from string import whitespace
from string import punctuation

if len(argv) != 2: exit("txt2latex: Argument error!")

try: f = file(argv[1],"r")
except: exit("txt2latex: File not found!")


# GET THE MAXIMAL NUMBER OF CHARACTERS PER LINE
array = f.readlines()
linelength = 0
for i in range(len(array)):
    array[i] = strip(array[i])
    if len(array[i]) > linelength: linelength = len(array[i])

# GUESS IF IT'S A PARAGRAPH BREAK
for i in range(len(array)-1):
    nleft = linelength - len(array[i])
    nright = find(array[i+1]," ")
    if nright == -1: nright = len(array[i+1])
    # if it is, append \n\n to the line, else a space
    if nright+1 <= nleft:
        if len(array[i]) > 0: array[i] += '\n\n'
    else:
        array[i] += ' '

# the lines are joined, the text is NOT word wrapped anymore
text = join(array,"")

# Replace apostrophes intelligent(ly?)
i = 0;
while i < len(text):
    # ... before a word
    if text[i] == '\"' and find(whitespace,text[i-1]) >= 0:
        text = text[:i] + "``" + text[i+1:]
        i += 1
    # ... before a word (single)
    if text[i] == '\'' and find(whitespace,text[i-1]) >= 0:
        text = text[:i] + "`" + text[i+1:]
    # ... after a word, no punctuation
    if text[i] == '\"' and find(whitespace,text[i+1]) >= 0:
        text = text[:i] + "''" + text[i+1:]
        i += 1
    if text[i] == '\"' and find(punctuation,text[i+1])>= 0:
        text = text[:i] + "''" + text[i+1:]
        i += 1
    i += 1

# Here replacements for umlauts. modify if you like to.
text = replace(text,"ä","\\\"{a}") -A
text = replace(text,"Ä","\\\"{A}") -A
text = replace(text,"ö","\\\"{o}") -A
text = replace(text,"Ö","\\\"{O}") -A
text = replace(text,"ü","\\\"{u}") -A
text = replace(text,"Ü","\\\"{U}") -A

# handle output
print text





More information about the Python-list mailing list