python3 urlopen(...).read() returns bytes

Tue Dec 23 01:00:36 EST 2008

On Dec 22, 9:05 pm, Christian Heimes <li... at cheimes.de> wrote:
> ajaksu schrieb:
>
> > That said, a "decode to declared HTTP header encoding" version of
> > urlopen could be useful to give some users the output they want (text
> > from network io) or to make it clear why bytes is the safe way.
>
> Yeah, your idea sounds both useful and feasible. A patch is welcome! :)

Would monkeypatching what urlopen returns be good enough or should we
aim at a cleaner implementation?

Glenn, does this sketch work for you?

def urlopen_text(url, data=None,
timeout=socket._GLOBAL_DEFAULT_TIMEOUT):
    response = urlopen(url, data, timeout)
    _readline = response.readline
    _readlines = response.readlines
    _read = response.read
    charset = response.headers.get_charsets()[0]
    def readline(limit = -1):
        content = _readline()
        return str(content, encoding=charset)
    response.readline = readline
    def readlines(hint = None):
        content = _readlines()
        return [str(line, encoding=charset) for line in content]
    response.readlines = readlines
    def read(n = -1):
        content = _read()
        return str(content, encoding=charset)
    response.read = read
    return response

Any comments/suggestions are very welcome. I could use some help from
people that know urllib on the best way to get the charset. Maybe
after some sleep I can code it in a less awful way :)

Daniel