How to parse HTTP time header?

Philip Semanchuk philip at semanchuk.com
Sat Nov 7 23:19:11 EST 2009


On Nov 7, 2009, at 10:56 PM, Kevin Ar18 wrote:

>
>>> Basically, I'm wondering if it is part of the standard library
>>> somewhere before I code my own.
>>>
>>> Page 20 of RFC2616 (HTTP) describes the format(s) for the time
>>> header.  It wouldn't be too difficult for me to code up a solution
>>> for the 3 standard formats, but what get's me is the little note
>>> about how some servers may still send badly format time headers. :(
>>> So, I'm curious if this has already been done in the standard Python
>>> library?
>>
>> The parsedate() function in the rfc822 module does this and claims to
>> be tolerant of slightly malformed dates, but that module is  
>> deprecated
>> as of Python 2.5 in favor of the email module which hopefully has an
>> equivalent function.
> Thanks, I'll give 'em a look. :)



Sorry, my mistake -- 2616 != 2822. I'm not sure if there's something  
in the standard library for parsing RFC 2616 dates.

When I faced the problem of parsing HTTP dates, I wrote my own  
function although this was in an application that was deliberately  
unforgiving of invalid input and therefore my code makes no allowances  
for it. FWIW, it parsed over 1 million dates without encountering any  
that raised an error.

Here it is, written in a time when I obviously didn't have total  
respect for PEP 8.

ASCTIME_FORMAT = "%a %b %d %H:%M:%S %Y"
RFC_850_FORMAT = "%A, %d-%b-%y %H:%M:%S GMT"
RFC_1123_FORMAT = "%a, %d %b %Y %H:%M:%S GMT"

def HttpDateToFloat(HttpDateString):
     # Per RFC 2616 section 3.3, HTTP dates can come in three flavors --
     # Sun, 06 Nov 1994 08:49:37 GMT ; RFC 822, updated by RFC 1123
     # Sunday, 06-Nov-94 08:49:37 GMT ; RFC 850, obsoleted by RFC 1036
     # Sun Nov 6 08:49:37 1994 ; ANSI C's asctime() format
     if not HttpDateString.endswith("GMT"):
         date = time.strptime(HttpDateString, ASCTIME_FORMAT)
     else:
         if "-" in HttpDateString:
             # RFC 850 format
             date = time.strptime(HttpDateString, RFC_850_FORMAT)
         else:
             # RFC 822/1123
             date = time.strptime(HttpDateString, RFC_1123_FORMAT)

     return calendar.timegm(date)



bye
Philip



More information about the Python-list mailing list