Virtual Host handling in listinfo.py
In what follows I am referring to code in the file Mailman/Cgi/listinfo.py in the 2.0beta6 release of Mailman. I'm working with Apache/1.3.12 (Unix) which may influence your judgement about my arguments.
Sorry if what follows is too long but I found it useful to fully analyse my own thinking on the topic. Red face for me if the issue is well known to you all.
The code I'm concerned with is in the function FormatListinfoOverview. It deals with the situation when mm_cfg.VIRTUAL_HOST_OVERVIEW is true and computation is done to detemine which advertised mail lists should be returned when the URI /mailman/listinfo/ is being responded to. The relevant bits of the code are as follows:
def FormatListinfoOverview(error=None):
...
<snip>
...
http_host = os.environ.get('HTTP_HOST', os.environ.get('SERVER_NAME'))
port = os.environ.get('SERVER_PORT')
# strip off the port if there is one
if port and http_host[-len(port)-1:] == ':'+port:
http_host = http_host[:-len(port)-1]
if mm_cfg.VIRTUAL_HOST_OVERVIEW and http_host:
host_name = http_host
else:
host_name = mm_cfg.DEFAULT_HOST_NAME
...
<snip>
...
for n in names:
if mlist.advertised:
if mm_cfg.VIRTUAL_HOST_OVERVIEW and \
http_host and \
string.find(http_host, mlist.web_page_url) == -1 and \
string.find(mlist.web_page_url, http_host) == -1:
# List is for different identity of this host - skip it.
continue
else:
advertised.append(mlist)
...
<snip>
There is a flaw in this code in the way that it strips the port number from the http_host variable but I'll come on to that below.
As best I can judge the purpose of considering the value of the environment variable HTTP_HOST (if available) instead of just using the SERVER_NAME value is to try and deduce a virtual host's server name in cirumstances when the web server has not. For instance:
Typically VirtualHost directives in httpd.conf will have been defined using FQDN, for example:
NameVirtualHost 192.168.1.1 <VirtualHost bert.my.co.uk> ServerName bert.my.co.uk </VirutalHost> <VirtualHost fred.my.co.uk> ServerName fred.my.co.uk </VirutalHost>
The virtual hosts will have associated ServerName directives whose values are used to set SERVER_NAME.
If a user on the local network uses a URL which does not fully quality the servers domain name, e.g. http://fred/mailman/listinfo/, then the VirtualHost directive is not correlated by the web server and the SERVER_NAME will not be set to fred.my.co.uk but to some other value depending on type and order of the VirtualHost directives in httpd.conf, bert.my.co.uk in this example.
In these circumstances, the cunning code above will ignore the SERVER_NAME value and match the fred value in HTTP_HOST.
I do not think this trick in the FormatListinfoOverview function is the right way to overcome this problem. If you want to match both partial and fully qualified domain names to a virtual host then two VirtualHost directives should be used in httpd.conf, for example:
NameVirtualHost 192.168.1.1
<VirtualHost bert.my.co.uk>
ServerName bert.my.co.uk
</VirutalHost>
<VirtualHost fred.my.co.uk>
ServerName fred.my.co.uk
</VirutalHost>
<VirtualHost fred>
ServerName fred.my.co.uk
</VirutalHost>
By doing this both web server and listinfo.py reach the same conclusion by the same route. I am saying that, in principle, the problem, which is generic to the way the web server is operating, should be solved by setting up the correct virtual host definitions in httpd.conf not by second guessing the virtual host setup in listinfo.py. listinfo.py should only consider the value of SERVER_NAME and not even look at HTTP_HOST. My proposed changes to the FormatListinfoOverview function are:
cut here-vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv diff -r -c2 mailman-2.0beta6.stock/Mailman/Cgi/listinfo.py mailman-2.0beta6.listinfo1/Mailman/Cgi/listinfo.py *** mailman-2.0beta6.stock/Mailman/Cgi/listinfo.py Wed Aug 2 00:10:41 2000 --- mailman-2.0beta6.listinfo1/Mailman/Cgi/listinfo.py Fri Oct 6 14:22:44 2000
*** 59,73 **** "Present a general welcome and itemize the (public) lists for this host."
! # XXX We need a portable way to determine the host by which we are being ! # visited! An absolute URL would do... ! http_host = os.environ.get('HTTP_HOST', os.environ.get('SERVER_NAME')) ! port = os.environ.get('SERVER_PORT') ! # strip off the port if there is one ! if port and http_host[-len(port)-1:] == ':'+port: ! http_host = http_host[:-len(port)-1] ! if mm_cfg.VIRTUAL_HOST_OVERVIEW and http_host: ! host_name = http_host ! else: ! host_name = mm_cfg.DEFAULT_HOST_NAME
doc = Document()
--- 59,63 ---- "Present a general welcome and itemize the (public) lists for this host."
! http_host = host_name = os.environ.get('SERVER_NAME')
doc = Document()
cut here-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Even if you do not agree with my first conclusion, please read on.]
Which brings me on to the problem that started me looking at this whole issue; what happens when SSH's port forwarding is used to make the browser/web connection? For example:
I'm dialling in to my favorite ISP from home using my laptop. In order to then connect to our internal-access-only Mailman web server through our corporate firewall, I have to use SSH's port forwarding.
I set up port forwarding so that any connection to local port 8081 on my laptop is forwarded to fred.my.co.uk:80 by the firewall machine.
I give my browser the URL http://localhost:8081/mailman/listinfo/.
The HTTP request is forwarded satisfactorily to fred.my.co.uk:80 with the URI being /mailman/listinfo/.
Because mm_cfg.VIRTUAL_HOST_OVERVIEW is true, listinfo.py proceeds to tell me there are no advertised mail lists on host localhost:8081. Well, I knew that.
This is because of two flaws in the FormatListinfoOverview function:
In trying to remove the port number from the end of the string value of HTTP_HOST, the code assumes that the length of the port number is equal to the length of the SERVER_PORT environment variable's value. In the case of my example this is assumption is wrong: the port number at the end of HTTP_HOST is 4 characters ('8081') and the SERVER_PORT is 2 characters long ('80').
Even with this first flaw corrected, the code still fails to recognise the circumstances because it is analysing HTTP_HOST and extracting the value 'localhost'. But the ip number of this value does not even match the SERVER_ADDR environment variable's value, which is a dead giveaway.
This problem disappears if virtual host definition in httpd.conf is used instead of trickery involving HTTP_HOST in listinfo.py.
So also does a similar problem which occurs if the explicit ip number of the server is used in the URL given to the browser, instead of the server's domain name, assuming no ip-based virtual host has been defined in httpd.conf to map the ip number to an acceptable ServerName.
OK, so you do not agree with my contention that listinfo.py should not consider HTTP_HOST because it might break a bunch of existing Mailman installations. In that case, the following changes to the FormatListinfoOverview function avoid my problems. The position here is that if:
either - The ip number of the HTTP_HOST doesn't match the SERVER_ADDR.
or - The URL contains the server's ip number rather than its name.
then the code behaves as if the browser didn't supply an HTTP Host header:
cut here-vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv diff -r -c2 mailman-2.0beta6.stock/Mailman/Cgi/listinfo.py mailman-2.0beta6.listinfo2/Mailman/Cgi/listinfo.py *** mailman-2.0beta6.stock/Mailman/Cgi/listinfo.py Wed Aug 2 00:10:41 2000 --- mailman-2.0beta6.listinfo2/Mailman/Cgi/listinfo.py Fri Oct 6 15:06:31 2000
*** 22,25 **** --- 22,26 ---- import os import string
import socket
from Mailman import mm_cfg
*** 62,72 **** # visited! An absolute URL would do... http_host = os.environ.get('HTTP_HOST', os.environ.get('SERVER_NAME'))
! if port and http_host[-len(port)-1:] == ':'+port: ! http_host = http_host[:-len(port)-1] ! if mm_cfg.VIRTUAL_HOST_OVERVIEW and http_host: host_name = http_host else: host_name = mm_cfg.DEFAULT_HOST_NAMEport = os.environ.get('SERVER_PORT') # strip off the port if there is one
--- 63,77 ----
# visited! An absolute URL would do...
http_host = os.environ.get('HTTP_HOST', os.environ.get('SERVER_NAME'))
# strip off the port if there is one
! if http_host:
! http_host = string.split(http_host, ':')[0]
! host_ip = socket.gethostbyname(http_host)
! server_ip = os.environ.get('SERVER_ADDR')
! if mm_cfg.VIRTUAL_HOST_OVERVIEW and http_host and
! host_ip == server_ip and
! host_ip != http_host:
host_name = http_host
else:
- http_host = None host_name = mm_cfg.DEFAULT_HOST_NAME cut here-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
I have yet to post either of these above patches to sourceforge. I would appreciate a sanity check on my thinking and any rebuttal of my arguments or constructive comments. RSVP
Richard Barrett, PostPoint 30, e-mail:r.barrett@ftel.co.uk Fujitsu Telecommunications Europe Ltd, tel: (44) 121 717 6337 Solihull Parkway, Birmingham Business Park, B37 7YU, England "Democracy is two wolves and a lamb voting on what to have for lunch. Liberty is a well armed lamb contesting the vote." Benjamin Franklin, 1759
participants (1)
-
Richard Barrett