[Python-3000] How will unicode get used?
gabor at nekomancer.net
Mon Sep 25 01:48:29 CEST 2006
Martin v. Löwis wrote:
> Gábor Farkas schrieb:
>> while i understand the constraints, i think it's not a good decision to
>> leave this to be implementation-dependent.
>> the strings seem to me as such a basic functionality, that it's
>> behaviour should not depend on the platform.
>> for example, how is an application developer then supposed to write
>> their applications?
> An application developer should always know what the target platforms
> are. For example, does the code need to work with IronPython or not?
i think if IronPython claims to be a python implementation, then at
least for a simple hello-world style string manipulation program should
behave the same way on IronPython and on Cpython.
(of course when it's a 'bigger' program, that use some python libraries,
then yes, he should know. but we are talking about a builtin type here)
> Python is not aiming at 100% portability at all costs. Many aspects
> are platform dependent, and while this has complicated some
> applications, is has simplified others (which could make use of
> platform details that otherwise would not have been exposed to the
> Python programmer).
hmmm.. i thought that all those 'platform dependent' aspects are in the
libraries (win32/sys/posix/os/whatetever), and not in the "core" part.
so, are there any in the "core" (stupid naming i know. i mean
>> should he write his own slicing/whatever functions to get consistent
>> behaviour on linux/windows?
> Depends on the application, and the specific slicing operations.
> If the slicing appears in the processing of .ini files (say),
> no platform-dependent slicing should be necessary.
or you simply assume that an ini file cannot contain non-bmp unicode
but if you'd like to have an example then:
let's say in an application i only want to display the first 70
characters of a string.
now, for this to behave correctly on non-bmp characters, i will need to
write a custom function, correct?
>> but the same way i could say, that because most of the unix-world is
>> utf-8, for those pythons the best way is to handle it internally as
>> utf-8, couldn't i?
> I think you live in a free country: you can certainly say that
> I think you would be wrong. The common on-disk/on-wire representation
> of text should not influence the design of an in-memory representation.
sorry, i should have clarified this more.
i simply reacted to the situation that for example cpython-win32 and
IronPython use 16bit unicode-strings, which makes it easy for them to
communicate with the (afaik) mostly 16bit-unicode win32 API.
on the other hand, for example GTK uses utf8-encoded strings...so when
on linux the python-GTK bindings want to transfer strings, they will
have to do charset-conversion.
but this was only an example.
>> it simply seems to me strange to make compromises that makes the life of
>> the cpython-users harder, just to make the life for the
>> jython/ironpython developers (i mean the 'creators') easier.
> Guido didn't say that the life of the CPython user needs to be hard.
hmmm.. for me having to worry about string-handling differences in the
programming language i use qualifies as 'harder'.
> He said it will be implementation-dependent, referring to Jython
> and IronPython.
> Whether or not CPython uses a consistent representation
> or consistent python-level experience across platforms is a different
> issue. CPython could behave absolutely consistently, and use four-byte
> Unicode on all systems, and the length of a non-BMP string would
> still be implementation-defined.
i understand that difference.
(i just find it hard to believe, that string-handling does not seem
important enough to make it truly cross-platform (or cross-implementation))
More information about the Python-3000