[docs] python3.4 cookiejar can't get cookies

1067511899 at qq.com 1067511899 at qq.com
Tue Jan 17 03:47:20 EST 2017


    sorry for my poor English first,I am a Chinese.

    when access http://hd.chinatax.gov.cn/guoshui/main.jsp  the  firefox can get cookie and you can find it by devtools,but cookiejar can't.

    the full description is posted :https://www.reddit.com/r/learnpython/comments/5o9olm/about_cookiesheaderssomething_already_drive_me_mad/?st=iy16b0iq&sh=a5a7e960,A 

   and after I debug into the code,I find the problem :

   python read headers from web server successfully.but when to parse it to class email.message.Message(policy=compat32),a problem happens.

   if the headers contains something like: Cache-Control : No-cache, I mean there is blank(one or more) beside the ':',the feedparser.py lines:227 will go wrong.

because lines:35_37:
# RFC 2822 $3.6.8 Optional fields.  ftext is %d33-57 / %d59-126, Any character
# except controls, SP, and ":".
headerRE = re.compile(r'^(From |[\041-\071\073-\176]*:|[\t ])')

I read RFC 2822 $3.6.8,and yes,the headerRE has no mistake,but as you can see,the headers can be created by programmer,and blank beside ':' do happen,so my suggestion is :
headerRE = re.compile(r'^(From |[\041-\071\073-\176]*\s*:\s*|[\t ])')

this may not strict fulfile the RFC,but it acts exactly like other web browser,such firefox,etc.

if meet some web site that not strict fulfile the RFC,no matter cookiejar or requests will not get the cookie right.but that not make sense,because web browser can.

btw:I changed my feedparser.py,because I need scraw web site in my work.

there are also another way to solve the problem,but I think that is really not pythonic.

import http.cookiejar, urllib.request

cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))

r = opener.open("http://hd.chinatax.gov.cn/guoshui/main.jsp")

for x in pls:
    if x.strip():
        if tmp[0]=='Set-Cookie':


yours . Mengwei lee

1067511899 at qq.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/docs/attachments/20170117/67cf57e2/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 13.gif
Type: image/gif
Size: 1736 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/docs/attachments/20170117/67cf57e2/attachment-0001.gif>

More information about the docs mailing list