[Tutor] urllib2, read data with specific encoding
Kent Johnson
kent37 at tds.net
Wed Sep 23 02:06:07 CEST 2009
On Tue, Sep 22, 2009 at 7:56 PM, Sander Sweers <sander.sweers at gmail.com> wrote:
> On Tue, 2009-09-22 at 18:04 -0400, Kent Johnson wrote:
>> > def reader(fobject, encoding='UTF-8'):
>> > '''Read a fileobject with specified encoding, defaults UTF-8.'''
>> > r = codecs.getreader(encoding)
>> > data = r(fobject)
>> > return data
>> >
>> > I would call it like reader(urllib2.urlopen(someurl), 'somencoding').
>> > Now I am looking for advice if this is the proper way of dealing with
>> > these type of issues? Is there better practice maybe?
>>
>> That seems ok if you want a file-like object.
>
> Ok good, I was worried I was doing something stupid :-)
>
>> If you just want a string it would be simpler to use
>> urllib2.urlopen(someurl).read().decode('someencoding')
>
> Wouldn't this have an extra conversion from str to unicode which my
> function skips?
No, IIUC your version is returning unicode strings, the decoding is
just hidden inside the codecs reader object.
Kent
More information about the Tutor
mailing list