[Python-Dev] bytes / unicode

Toshio Kuratomi a.badger at gmail.com
Tue Jun 22 19:21:23 CEST 2010


On Tue, Jun 22, 2010 at 08:31:13PM +0900, Stephen J. Turnbull wrote:
> Toshio Kuratomi writes:
>  > unicode handling redesign.  I'm stating my reading of the RFC not to defend
>  > the use case Philip has, but because I think that the outlook that non-text
>  > uris (before being percentencoded) are violations of the RFC
> 
> That's not what I'm saying.  What I'm trying to point out is that
> manipulating a bytes object as an URI sort of presumes a lot about its
> encoding as text.

I think we're more or less in agreement now but here I'm not sure.  What
manipulations are you thinking about?  Which stage of URI construction are
you considering?

I've just taken a quick look at python3.1's urllib module and I see that
there is a bit of confusion there.  But it's not about unicode vs bytes but
about whether a URI should be operated on at the real URI level or the
data-that-makes-a-uri level.

* all functions I looked at take python3 str rather than bytes so there's no
  confusing stuff here
* urllib.request.urlopen takes a strict uri.  That means that you must have
  a percent encoded uri at this point
* urllib.parse.urljoin takes regular string values
* urllib.parse and urllib.unparse take regular string values

> Since many of the URIs we deal with are more or
> less textual, why not take advantage of that?
>
Cool, so to summarize what I think we agree on:

* Percent encoded URIs are text according to the RFC.
* The data that is used to construct the URI is not defined as text by the
  RFC.
* However, it is very often text in an unspecified encoding
* It is extremely convenient for programmers to be able to treat the data
  that is used to form a URI as text in nearly all common cases.

-Toshio
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20100622/a926e262/attachment.pgp>


More information about the Python-Dev mailing list