piping input to an external script
davea at ieee.org
Tue May 12 09:24:18 CEST 2009
Tim Arnold wrote:
> Hi, I have some html files that I want to validate by using an external
> script 'validate'. The html files need a doctype header attached before
> validation. The files are in utf8 encoding. My code:
> import os,sys
> import codecs,subprocess
> HEADER = '<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">'
> filename = 'mytest.html'
> fd = codecs.open(filename,'rb',encoding='utf8')
> s = HEADER + fd.read()
> p = subprocess.Popen(['validate'],
> validate = p.communicate(unicode(s,encoding='utf8'))
> print validate
> I get lots of lines like this:
> Error at line 1, character 66:\tillegal character number 0
> etc etc.
> But I can give the command in a terminal 'cat mytest.html | validate' and
> get reasonable output. My subprocess code must be wrong, but I could use
> some help to see what the problem is.
> python2.5.1, freebsd6
The usual rule in debugging: split the problem into two parts, and test
each one separately, starting with the one you think most likely to be
In this case the obvious place to split is with the data you're passing
to the communicate call. I expect it's already wrong, long before you
hand it to the subprocess. So write it to a file instead, and inspect
it with a binary file viewer. And of course test it manually with your
validate program. Is validate really expecting a Unicode stream in stdin ?
More information about the Python-list