How to parse HTTP time header?
Philip Semanchuk
philip at semanchuk.com
Sat Nov 7 23:19:11 EST 2009
On Nov 7, 2009, at 10:56 PM, Kevin Ar18 wrote:
>
>>> Basically, I'm wondering if it is part of the standard library
>>> somewhere before I code my own.
>>>
>>> Page 20 of RFC2616 (HTTP) describes the format(s) for the time
>>> header. It wouldn't be too difficult for me to code up a solution
>>> for the 3 standard formats, but what get's me is the little note
>>> about how some servers may still send badly format time headers. :(
>>> So, I'm curious if this has already been done in the standard Python
>>> library?
>>
>> The parsedate() function in the rfc822 module does this and claims to
>> be tolerant of slightly malformed dates, but that module is
>> deprecated
>> as of Python 2.5 in favor of the email module which hopefully has an
>> equivalent function.
> Thanks, I'll give 'em a look. :)
Sorry, my mistake -- 2616 != 2822. I'm not sure if there's something
in the standard library for parsing RFC 2616 dates.
When I faced the problem of parsing HTTP dates, I wrote my own
function although this was in an application that was deliberately
unforgiving of invalid input and therefore my code makes no allowances
for it. FWIW, it parsed over 1 million dates without encountering any
that raised an error.
Here it is, written in a time when I obviously didn't have total
respect for PEP 8.
ASCTIME_FORMAT = "%a %b %d %H:%M:%S %Y"
RFC_850_FORMAT = "%A, %d-%b-%y %H:%M:%S GMT"
RFC_1123_FORMAT = "%a, %d %b %Y %H:%M:%S GMT"
def HttpDateToFloat(HttpDateString):
# Per RFC 2616 section 3.3, HTTP dates can come in three flavors --
# Sun, 06 Nov 1994 08:49:37 GMT ; RFC 822, updated by RFC 1123
# Sunday, 06-Nov-94 08:49:37 GMT ; RFC 850, obsoleted by RFC 1036
# Sun Nov 6 08:49:37 1994 ; ANSI C's asctime() format
if not HttpDateString.endswith("GMT"):
date = time.strptime(HttpDateString, ASCTIME_FORMAT)
else:
if "-" in HttpDateString:
# RFC 850 format
date = time.strptime(HttpDateString, RFC_850_FORMAT)
else:
# RFC 822/1123
date = time.strptime(HttpDateString, RFC_1123_FORMAT)
return calendar.timegm(date)
bye
Philip
More information about the Python-list
mailing list