<html><head></head><body><div style="font-family: Verdana;font-size: 12.0px;"><div>Hello,</div>

<div> </div>

<div>being not an expert in web programming, I got the task to parse a web site. Nothing as easy as that I thought and read quickly about</div>

<div> </div>

<div>https://docs.python.org/3/library/urllib.request.html</div>

<div> </div>

<div>After using the result html.parser.HTMLParser I got the unicode error "TypeError: Can't convert 'bytes' object to str implicitly". OK, I read a bit more about this and found this example in the documentation referenced above:</div>

<div>
<pre><span style="font-size:10px;"><span style="font-family: courier new,courier,monospace;"><span class="gp">>>> </span><span class="k">with</span> <span class="n">urllib</span><span class="o">.</span><span class="n">request</span><span class="o">.</span><span class="n">urlopen</span><span class="p">(</span><span class="s">'http://www.python.org/'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="gp">... </span>    <span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="mi">100</span><span class="p">)</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="s">'utf-8'</span><span class="p">))</span>
<span class="gp">...</span>
<span class="go"><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"</span>
<span class="go">"http://www.w3.org/TR/xhtml1/DTD/xhtm</span></span></span></pre>

<div>I didn't like the hard-coded utf-8 decoding (because I did not really know the encoding of the web site). After half an hour of googling (and I learned things I never wanted to learn about unicode) I found the very simple solution:</div>

<div> </div>

<div><span style="font-size:10px;"><span style="font-family: courier new,courier,monospace;">charset = f.headers.get_param('charset')</span></span></div>

<div> </div>

<div>It would have been so nice to get this easy approach right at the example, the example could even been changed to</div>

<div>
<pre><span style="font-size:10px;"><span style="font-family: courier new,courier,monospace;"><span class="gp">>>> </span><span class="k">with</span> <span class="n">urllib</span><span class="o">.</span><span class="n">request</span><span class="o">.</span><span class="n">urlopen</span><span class="p">(</span><span class="s">'http://www.python.org/'</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:
...     charset = </span>f.headers.get_param('charset')
<span class="gp">... </span>    <span class="nb">print</span><span class="p">(</span><span class="n">f</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="mi">100</span><span class="p">)</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="p">charset))</span>
<span class="gp">...</span>
<span class="go"><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"</span>
<span class="go">"http://www.w3.org/TR/xhtml1/DTD/xhtm</span></span></span></pre>

<div>OK, these are just my 2 cents.</div>

<div> </div>

<div>Kind Regards</div>

<div>Christoph</div>
</div>
</div></div></body></html>