newbie - HTML character codes
Roberto Bonvallet
Roberto.Bonvallet at cern.ch
Wed Dec 13 09:11:43 EST 2006
ardief wrote:
[...]
> And I want the HTML char codes to turn into their equivalent plain
> text. I've looked at the newsgroup archives, the cookbook, the web in
> general and can't manage to sort it out. I thought doing something like
> this -
>
> file = open('filename', 'r')
It's not a good idea to use 'file' as a variable name, since you are
shadowing the builtin type of the same name.
> ofile = open('otherfile', 'w')
>
> done = 0
>
> while not done:
> line = file.readline()
> if 'THE END' in line:
> done = 1
> elif '—' in line:
> line.replace('—', '--')
The replace method doesn't modify the 'line' string, it returns a new string.
> ofile.write(line)
> else:
> ofile.write(line)
This should work (untested):
infile = open('filename', 'r')
outfile = open('otherfile', 'w')
for line in infile:
outfile.write(line.replace('—', '--'))
But I think the best approach is to use a existing aplication or library
that solves the problem. recode(1) can easily convert to and from HTML
entities:
recode html..utf-8 filename
Best regards.
--
Roberto Bonvallet
More information about the Python-list
mailing list