urllib2 (py2.6) vs urllib.request (py3)

mattia gervaz at gmail.com
Tue Mar 17 07:15:37 EDT 2009


Il Tue, 17 Mar 2009 10:55:21 +0000, R. David Murray ha scritto:

> mattia <gervaz at gmail.com> wrote:
>> Hi all, can you tell me why the module urllib.request (py3) add extra
>> characters (b'fef\r\n and \r\n0\r\n\r\n') in a simple example like the
>> following and urllib2 (py2.6) correctly not?
>> 
>> py2.6
>> >>> import urllib2
>> >>> f = urllib2.urlopen("http://www.google.com").read() fd =
>> >>> open("google26.html", "w")
>> >>> fd.write(f)
>> >>> fd.close()
>> 
>> py3
>> >>> import urllib.request
>> >>> f = urllib.request.urlopen("http://www.google.com").read() with
>> >>> open("google30.html", "w") as fd:
>> ...     print(f, file=fd)
>> ...
>> >>>
>> >>>
>> Opening the two html pages with ff I've got different results (the
>> extra characters mentioned earlier), why?
> 
> The problem isn't a difference between urllib2 and urllib.request, it is
> between fd.write and print.  This produces the same result as your first
> example:
> 
> 
>>>> import urllib.request
>>>> f = urllib.request.urlopen("http://www.google.com").read() with
>>>> open("temp3.html", "wb") as fd:
> ...     fd.write(f)
> 
> 
> The "b'....'" is the stringified representation of a bytes object, which
> is what urllib.request returns in python3.  Note the 'wb', which is a
> critical difference from the python2.6 case.  If you omit the 'b' in
> python3, it will complain that you can't write bytes to the file object.
> 
> The thing to keep in mind is that print converts its argument to string
> before writing it anywhere (that's the point of using it), and that
> bytes (or buffer) and string are very different types in python3.

Well... now in the saved file I've got extra characters "fef" at the 
begin and "0" at the end...



More information about the Python-list mailing list