[Python-Dev] Re: privacy in log files?

M.-A. Lemburg mal@lemburg.com
Wed, 19 Feb 2003 11:17:28 +0100


Guido van Rossum wrote:
>>Guido van Rossum wrote:
>>
>>>I found this comment in Parser/tokenizer.c:
>>>
>>>		/* We don't use PyErr_WarnExplicit() here because
>>>		   printing the line in question to e.g. a log file
>>>		   could result in sensitive information being
>>>		   exposed. */
>>>
>>>I didn't see a SF reference there or in the CVS checkin comment, so
>>>I'm stumped.  What's the use case? 
> 
> 
> [MAL]
> 
>>I have gotten a lot of emails from various people about the
>>new source code encoding feature and the warning that is
>>generated for code lines which have non-ASCII characters
>>in them if the file doesn't have a coding header.
> 
> Is the idea that non-ASCII characters are likely to be used in
> passwords?

Not necessarily, but the line could have a non-ASCII comment causing
the warning.

>>Many of these people mentioned that webserver logs (for CGI
>>scripts) would get flooded with these warnings and that there
>>is a potential security breach here if a source line is
>>being copied into to these logs. It is rather common that
>>these logs are world readable, so passwords and other sensible
>>information could easily escape the script's source code,
>>e.g. login information for databases.
> 
> I can interpret world-readable in two ways.  On Unix, it traditionally
> means that anybody with a login name can read it.  Since Apache
> typically runs as user nobody, CGI scripts have to be world-readable
> as well.

They have to be group nobody or nogroup (depending on distribution)
and group readable. World readable is not needed.

> So I'm still not convinced.  Or are there sites that
> actually publish their log files on the web?  What would the point of
> that be?  I'd be surprised if there wasn't a lot of other
> privacy-sensitive data in such log files, and the complainers should
> complain about the public logs rather than focusing on Python trying
> to issue a useful error message.

True, but why stir up more noise ? The whole idea in itself
has already caused endless discussions.

The message now prints the file name and the line number. I think
that's good enough.

BTW, there's also another reason not to print the source code line:
since we know it contains non-ASCII data, it would clutter up the
log file, possibly making it useless to other programs reading
it. The same it true for interactive terminal sessions that could
start to behave in strange ways after printing what they think are
control characters.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Software directly from the Source  (#1, Feb 19 2003)
 >>> Python/Zope Products & Consulting ...         http://www.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
Python UK 2003, Oxford:                                     41 days left
EuroPython 2003, Charleroi, Belgium:                       125 days left