[Python-Dev] bytes / unicode
Toshio Kuratomi
a.badger at gmail.com
Tue Jun 22 19:21:23 CEST 2010
On Tue, Jun 22, 2010 at 08:31:13PM +0900, Stephen J. Turnbull wrote:
> Toshio Kuratomi writes:
> > unicode handling redesign. I'm stating my reading of the RFC not to defend
> > the use case Philip has, but because I think that the outlook that non-text
> > uris (before being percentencoded) are violations of the RFC
>
> That's not what I'm saying. What I'm trying to point out is that
> manipulating a bytes object as an URI sort of presumes a lot about its
> encoding as text.
I think we're more or less in agreement now but here I'm not sure. What
manipulations are you thinking about? Which stage of URI construction are
you considering?
I've just taken a quick look at python3.1's urllib module and I see that
there is a bit of confusion there. But it's not about unicode vs bytes but
about whether a URI should be operated on at the real URI level or the
data-that-makes-a-uri level.
* all functions I looked at take python3 str rather than bytes so there's no
confusing stuff here
* urllib.request.urlopen takes a strict uri. That means that you must have
a percent encoded uri at this point
* urllib.parse.urljoin takes regular string values
* urllib.parse and urllib.unparse take regular string values
> Since many of the URIs we deal with are more or
> less textual, why not take advantage of that?
>
Cool, so to summarize what I think we agree on:
* Percent encoded URIs are text according to the RFC.
* The data that is used to construct the URI is not defined as text by the
RFC.
* However, it is very often text in an unspecified encoding
* It is extremely convenient for programmers to be able to treat the data
that is used to form a URI as text in nearly all common cases.
-Toshio
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20100622/a926e262/attachment.pgp>
More information about the Python-Dev
mailing list