[Python-3000] How will unicode get used?

Mon Sep 25 01:48:29 CEST 2006

Martin v. Löwis wrote:
> Gábor Farkas schrieb:
>> while i understand the constraints, i think it's not a good decision to 
>> leave this to be implementation-dependent.
>>
>> the strings seem to me as such a basic functionality, that it's 
>> behaviour should not depend on the platform.
>>
>> for example, how is an application developer then supposed to write 
>> their applications?
> 
> An application developer should always know what the target platforms
> are. For example, does the code need to work with IronPython or not?

i think if IronPython claims to be a python implementation, then at 
least for a simple hello-world style string manipulation program should 
behave the same way on IronPython and on Cpython.

(of course when it's a 'bigger' program, that use some python libraries, 
  then yes, he should know. but we are talking about a builtin type here)

> Python is not aiming at 100% portability at all costs. Many aspects
> are platform dependent, and while this has complicated some
> applications, is has simplified others (which could make use of
> platform details that otherwise would not have been exposed to the
> Python programmer).

hmmm.. i thought that all those 'platform dependent' aspects are in the 
libraries (win32/sys/posix/os/whatetever), and not in the "core" part.

so, are there any in the "core" (stupid naming i know. i mean 
not-in-libraries) part?

> 
>> should he write his own slicing/whatever functions to get consistent 
>> behaviour on linux/windows?
> 
> Depends on the application, and the specific slicing operations.
> If the slicing appears in the processing of .ini files (say),
> no platform-dependent slicing should be necessary.

why?

or you simply assume that an ini file cannot contain non-bmp unicode 
characters?

but if you'd like to have an example then:

let's say in an application i only want to display the first 70 
characters of a string.

now, for this to behave correctly on non-bmp characters, i will need to 
write a custom function, correct?

> 
>> but the same way i could say, that because most of the unix-world is 
>> utf-8, for those pythons the best way is to handle it internally as 
>> utf-8, couldn't i?
> 
> I think you live in a free country: you can certainly say that
> I think you would be wrong. The common on-disk/on-wire representation
> of text should not influence the design of an in-memory representation.

sorry, i should have clarified this more.

i simply reacted to the situation that for example cpython-win32 and 
IronPython use 16bit unicode-strings, which makes it easy for them to 
communicate with the (afaik) mostly 16bit-unicode win32 API.

on the other hand, for example GTK uses utf8-encoded strings...so when 
on linux the python-GTK bindings want to transfer strings, they will 
have to do charset-conversion.

but this was only an example.

> 
>> it simply seems to me strange to make compromises that makes the life of 
>> the cpython-users harder, just to make the life for the 
>> jython/ironpython developers (i mean the 'creators') easier.
> 
> Guido didn't say that the life of the CPython user needs to be hard.

hmmm.. for me having to worry about string-handling differences in the 
programming language i use qualifies as 'harder'.

> He said it will be implementation-dependent, referring to Jython
> and IronPython.
> Whether or not CPython uses a consistent representation
> or consistent python-level experience across platforms is a different
> issue. CPython could behave absolutely consistently, and use four-byte
> Unicode on all systems, and the length of a non-BMP string would
> still be implementation-defined.
> 

i understand that difference.

(i just find it hard to believe, that string-handling does not seem 
important enough to make it truly cross-platform (or cross-implementation))

gabor