rrr at ronadam.com
Sun Feb 19 00:56:02 CET 2006
Josiah Carlson wrote:
> Ron Adam <rrr at ronadam.com> wrote:
>> Josiah Carlson wrote:
>>> Again, the problem is ambiguity; what does bytes.recode(something) mean?
>>> Are we encoding _to_ something, or are we decoding _from_ something?
>> This was just an example of one way that might work, but here are my
>> thoughts on why I think it might be good.
>> In this case, the ambiguity is reduced as far as the encoding and
>> decodings opperations are concerned.)
>> somestring = encodings.tostr( someunicodestr, 'latin-1')
>> It's pretty clear what is happening to me.
>> It will encode to a string an object, named someunicodestr, with
>> the 'latin-1' encoder.
> But now how do you get it back? encodings.tounicode(..., 'latin-1')?,
> unicode(..., 'latin-1')?
Yes, Just do.
someunicodestr = encoding.tounicode( somestring, 'latin-1')
> What about string transformations:
> somestring = encodings.tostr(somestr, 'base64')
> How do we get that back? encodings.tostr() again is completely
> ambiguous, str(somestring, 'base64') seems a bit awkward (switching
In the case where a string is converted to another string. It would
probably be best to have a requirement that they all get converted to
unicode as an intermediate step. By doing that it becomes an explicit
two step opperation.
# string to string encoding
u_string = encodings.tounicode(s_string, 'base64')
s2_string = encodings.tostr(u_string, 'base64')
Or you could have a convenience function to do it in the encodings
def strtostr(s, sourcecodec, destcodec):
u = tounicode(s, sourcecodec)
return tostr(u, destcodec)
s2 = encodings.strtostr(s, 'base64, 'base64)
Which would be kind of pointless in this example, but it would be a good
way to test a codec.
assert s == s2
>>> Are we going to need to embed the direction in the encoding/decoding
>>> name (to_base64, from_base64, etc.)? That doesn't any better than
>>> binascii.b2a_base64 .
>> No, that's why I suggested two separate lists (or dictionaries might be
>> better). They can contain the same names, but the lists they are in
>> determine the context and point to the needed codec. And that step is
>> abstracted out by putting it inside the encodings.tostr() and
>> encodings.tounicode() functions.
>> So either function would call 'base64' from the correct codec list and
>> get the correct encoding or decoding codec it needs.
> Either the API you have described is incomplete, you haven't noticed the
> directional ambiguity you are describing, or I have completely lost it.
Most likely I gave an incomplete description of the API in this case
because there are probably several ways to implement it.
>>> What about .reencode and .redecode? It seems as
>>> though the 're' added as a prefix to .encode and .decode makes it
>>> clearer that you get the same type back as you put in, and it is also
>>> unambiguous to direction.
> I must not be expressing myself very well.
> Right now:
> s.encode() -> s
> s.decode() -> s, u
> u.encode() -> s, u
> u.decode() -> u
> Martin et al's desired change to encode/decode:
> s.decode() -> u
> u.encode() -> s
> No others.
Which would be similar to the functions I suggested. The main
difference is only weather it would be better to have them as methods or
separate factory functions and the spelling of the names. Both have
their advantages I think.
>> The method bytes.recode(), always does a byte transformation which can
>> be almost anything. It's the context bytes.recode() is used in that
>> determines what's happening. In the above cases, it's using an encoding
>> transformation, so what it's doing is precisely what you would expect by
>> it's context.
> Indeed, there is a translation going on, but it is not clear as to
> whether you are encoding _to_ something or _from_ something. What does
> s.recode('base64') mean? Are you encoding _to_ base64 or _from_ base64?
> That's where the ambiguity lies.
Bengt didn't propose adding .recode() to the string types, but only the
bytes type. The byte type would "recode" the bytes using a specific
transformation. I believe his view is it's a lower level API than
strings that can be used to implement the higher level encoding API
with, not replace the encoding API. Or that is they way I interpreted
>> There isn't a bytes.decode(), since that's just another transformation.
>> So only the one method is needed. Which makes it easer to learn.
> But ambiguous.
What's ambiguous about it? It's no more ambiguous than any math
operation where you can do it one way with one operations and get your
original value back with the same operation by using an inverse value.
n2=n+1; n3=n+(-1); n==n3
n2=n*2; n3=n*(.5); n==n3
>> Learning how the current system works comes awfully close to reverse
>> engineering. Maybe I'm overstating it a bit, but I suspect many end up
>> doing exactly that in order to learn how Python does it.
> Again, we _need_ better documentation, regardless of whether or when the
> removal of some or all .encode()/.decode() methods happen.
Yes, in the short term some parts of PEP 100 could be moved to the
python docs I think.
More information about the Python-Dev