[issue4773] HTTPMessage not documented and has inconsistent API across 2.6/3.0

Fri Mar 27 02:00:27 CET 2009

Brad Miller <bonelake at gmail.com> added the comment:

On Thu, Mar 26, 2009 at 4:29 PM, Barry A. Warsaw <report at bugs.python.org>wrote:

>
> Barry A. Warsaw <barry at python.org> added the comment:
>
> I propose that you only document the getitem header access API.  I.e.
> the thing that info() gives you can be used to access the message
> headers via message['content-type'].  That's an API common to both
> rfc822.Messages (the ultimate base class of mimetools.Message) and
> email.message.Message.
>

As I've found myself in the awkward position of having to explain the new
3.0 api to my students I've thought about this and have some
ideas/questions.
I'm also willing to help with the documentation or any enhancements.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'addinfourl' object is unsubscriptable

I wish I new what an addinfourl object was.

'Fri, 27 Mar 2009 00:41:34 GMT'

'Fri, 27 Mar 2009 00:41:34 GMT'

['Date', 'Server', 'Last-Modified', 'ETag', 'Accept-Ranges',
'Content-Length', 'Connection', 'Content-Type']

Using x.headers over x.info()  makes the most sense to me, but I don't know
that I can give any good rationale.  Which would we want to document?

'text/html; charset=ISO-8859-1'

I guess technically this is correct since the charset is part of the
Content-Type header in HTTP but it does make life difficult for what I think
will be a pretty common use case in this new urllib:  read from the url (as
bytes) and then decode them into a string using the appropriate character
set.

As you follow this road, you have the confusing option of these three calls:

'iso-8859-1'
>>> x.headers.get_charsets()
['iso-8859-1']

I think it should be a bug that get_charset() does not return anything in
this case.  It is not at all clear why get_content_charset() and
get_charset() should have different behavior.

Brad

>
> ----------
> nosy: +barry
>
> _______________________________________
> Python tracker <report at bugs.python.org>
> <http://bugs.python.org/issue4773>
> _______________________________________
>

----------
Added file: http://bugs.python.org/file13430/unnamed

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue4773>
_______________________________________
-------------- next part --------------
<div><br><div class="gmail_quote">On Thu, Mar 26, 2009 at 4:29 PM, Barry A. Warsaw <span dir="ltr">&lt;<a href="mailto:report at bugs.python.org">report at bugs.python.org</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
Barry A. Warsaw &lt;<a href="mailto:barry at python.org">barry at python.org</a>&gt; added the comment:<br>
<br>
I propose that you only document the getitem header access API. Â I.e.<br>
the thing that info() gives you can be used to access the message<br>
headers via message[&#39;content-type&#39;]. Â That&#39;s an API common to both<br>
rfc822.Messages (the ultimate base class of mimetools.Message) and<br>
email.message.Message.<br></blockquote><div><br></div><div>As I&#39;ve found myself in the awkward position of having to explain the new 3.0 api to my students I&#39;ve thought about this and have some ideas/questions.<div>
<br></div><div>I&#39;m also willing to help with the documentation or any enhancements.</div><div><br></div><div>&gt;&gt;&gt; x = urllib.request.urlopen(&#39;<a href="http://knuth.luther.edu/python/test.html">http://knuth.luther.edu/python/test.html</a>&#39;)</div>
<div><div>&gt;&gt;&gt; x[&#39;Date&#39;]</div><div>Traceback (most recent call last):</div><div>Â Â File &quot;&lt;stdin&gt;&quot;, line 1, in &lt;module&gt;</div><div>TypeError: &#39;addinfourl&#39; object is unsubscriptable</div>
<div><br></div><div>I wish I new what an addinfourl object was.</div><div><br></div><div><div>&gt;&gt;&gt; <a href="http://x.info">x.info</a>()[&#39;Date&#39;]</div><div>&#39;Fri, 27 Mar 2009 00:41:34 GMT&#39;</div><div><br>
</div><div><div>&gt;&gt;&gt; x.headers[&#39;Date&#39;]</div><div>&#39;Fri, 27 Mar 2009 00:41:34 GMT&#39;</div><div><br></div><div><div>&gt;&gt;&gt; x.headers.keys()</div><div>[&#39;Date&#39;, &#39;Server&#39;, &#39;Last-Modified&#39;, &#39;ETag&#39;, &#39;Accept-Ranges&#39;, &#39;Content-Length&#39;, &#39;Connection&#39;, &#39;Content-Type&#39;]</div>
<div><br></div><div>Using x.headers over <a href="http://x.info">x.info</a>() Â makes the most sense to me, but I don&#39;t know that I can give any good rationale. Â Which would we want to document?</div></div></div></div>
<div><br></div><div><div>&gt;&gt;&gt; x.headers[&#39;Content-Type&#39;]</div><div>&#39;text/html; charset=ISO-8859-1&#39;</div><div><br></div><div>I guess technically this is correct since the charset is part of the Content-Type header in HTTP but it does make life difficult for what I think will be a pretty common use case in this new urllib: Â read from the url (as bytes) and then decode them into a string using the appropriate character set.</div>
<div><br></div></div><div>As you follow this road, you have the confusing option of these three calls:</div><div><br></div><div><div>&gt;&gt;&gt; x.headers.get_charset()</div><div>&gt;&gt;&gt; x.headers.get_content_charset()</div>
<div>&#39;iso-8859-1&#39;</div><div>&gt;&gt;&gt; x.headers.get_charsets()</div><div>[&#39;iso-8859-1&#39;]</div><div><br></div><div>I think it should be a bug that get_charset() does not return anything in this case. Â It is not at all clear why get_content_charset() and get_charset() should have different behavior.</div>
<div><br></div><div>Brad</div></div><div><br></div></div></div><div>Â </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
----------<br>
nosy: +barry<br>
<div><div></div><div class="h5"><br>
_______________________________________<br>
Python tracker &lt;<a href="mailto:report at bugs.python.org">report at bugs.python.org</a>&gt;<br>
&lt;<a href="http://bugs.python.org/issue4773" target="_blank">http://bugs.python.org/issue4773</a>&gt;<br>
_______________________________________<br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br>Brad Miller<br>Assistant Professor, Computer Science<br>Luther College<br>
</div>