piping input to an external script
tim.arnold at sas.com
Tue May 12 18:46:47 CEST 2009
"Dave Angel" <davea at ieee.org> wrote in message
news:mailman.25.1242113076.8015.python-list at python.org...
> Tim Arnold wrote:
>> Hi, I have some html files that I want to validate by using an external
>> script 'validate'. The html files need a doctype header attached before
>> validation. The files are in utf8 encoding. My code:
>> import os,sys
>> import codecs,subprocess
>> HEADER = '<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01
>> filename = 'mytest.html'
>> fd = codecs.open(filename,'rb',encoding='utf8')
>> s = HEADER + fd.read()
>> p = subprocess.Popen(['validate'],
>> validate = p.communicate(unicode(s,encoding='utf8'))
>> print validate
>> I get lots of lines like this:
>> Error at line 1, character 66:\tillegal character number 0
>> etc etc.
>> But I can give the command in a terminal 'cat mytest.html | validate' and
>> get reasonable output. My subprocess code must be wrong, but I could use
>> some help to see what the problem is.
>> python2.5.1, freebsd6
> The usual rule in debugging: split the problem into two parts, and test
> each one separately, starting with the one you think most likely to be the
> In this case the obvious place to split is with the data you're passing to
> the communicate call. I expect it's already wrong, long before you hand
> it to the subprocess. So write it to a file instead, and inspect it with
> a binary file viewer. And of course test it manually with your validate
> program. Is validate really expecting a Unicode stream in stdin ?
Good advice from everyone. The example was simpler than my actual situation,
but it did show the problem. Dave's final question was the right one: I
needed to pass the html content as a string, not unicode object:
HEADER = '<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">\n'
filename = 'mytest.html'
fd = codecs.open(filename,'rb',encoding='utf8')
s = HEADER + fd.read().encode('utf8') # <- made the difference
p = subprocess.Popen(['validate',],
validate = p.communicate(s)
More information about the Python-list