piping input to an external script

Tue May 12 03:24:18 EDT 2009

Tim Arnold wrote:
> Hi, I have some html files that I want to validate by using an external 
> script 'validate'. The html files need a doctype header attached before 
> validation. The files are in utf8 encoding. My code:
> ---------------
> import os,sys
> import codecs,subprocess
> HEADER = '<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">'
>
> filename  = 'mytest.html'
> fd = codecs.open(filename,'rb',encoding='utf8')
> s = HEADER + fd.read()
> fd.close()
>
> p = subprocess.Popen(['validate'],
>                     stdin=subprocess.PIPE,
>                     stdout=subprocess.PIPE,
>                     stderr=subprocess.STDOUT)
> validate = p.communicate(unicode(s,encoding='utf8'))
> print validate
> ---------------
>
> I get lots of lines like this:
> Error at line 1, character 66:\tillegal character number 0
> etc etc.
>
> But I can give the command in a terminal 'cat mytest.html | validate' and 
> get reasonable output. My subprocess code must be wrong, but I could use 
> some help to see what the problem is.
>
> python2.5.1, freebsd6
> thanks,
> --Tim
>
>
>
>   
The usual rule in debugging:  split the problem into two parts, and test 
each one separately, starting with the one you think most likely to be 
the culprit

In this case the obvious place to split is with the data you're passing 
to the  communicate call.  I expect it's already wrong, long before you 
hand it to the subprocess.  So write it to a file instead, and inspect 
it with a binary file viewer.  And of course test it manually with your 
validate program.  Is validate really expecting a Unicode stream in stdin ?