[Python-Dev] just say no...

Mark Hammond mhammond@skippinet.com.au
Sat, 13 Nov 1999 10:41:16 +1100


[Greg writes]

> As a separate argument, MAL can argue that "t#" should create
> an internal,
> associated buffer to hold a UTF-8 encoding and then return
> that. But the
> "s#" should return the raw bytes!
> [ and I'll argue against the response to "t#" anyhow... ]

Hmm.  Climbing over these dead bodies could get a bit smelly :-)

Im inclined to agree that holding 2 internal buffers for the unicode
object is not ideal.  However, I _am_ concerned with getting decent
PyArg_ParseTuple and Py_BuildValue support, and if the cost is an
extra buffer I will survive.  So lets look for solutions that dont
require it, rather than holding it up as evil when no other solution
is obvious.

My requirements appear to me to be very simple (for an anglophile):

Lets say I have a platform Unicode value - eg, I got a Unicode value
from some external library (say COM :-)  Lets assume for now that the
Unicode string is fully representable as ASCII  - say a file or
directory name that COM gave me.  I simply want to be able to pass
this Unicode object to "open()", and have it work.  This assumes that
open() will not become "native unicode", simply as the underlying C
support is not unicode aware - it needs to be converted to a "char *"
(ie, will use the "t#" format)

The second side of the equation is when I expose a Python function
that talks Unicode - eg, I need to _pass_ a platform Unicode value to
an external library.  The Python programmer should be able to pass a
Unicode object (no problem), or a PyString object.

In code terms:
Prob1:
  name = SomeComObject.GetFileName() # A Unicode object
  f = open(name)
Prob2:
  SomeComObject.SetFileName("foo.txt")

IMO it is important that we have a good strategy for dealing with this
for extensions.  MAL addresses one direction, but not the other.

Maybe if we toss around general solutions for this the implementation
will fall out.  MALs idea of the additional buffer starts to address
this, but isnt the whole story.

Any ideas on this?