Mailman 3 April 2000 - Python-Dev

Python 1.6a2 Unicode bug (was Re: comparing strings and ints)
by alisa＠robanal.demon.co.uk April 27, 2000

April 27, 2000

>I wrote: >>A utf-8-encoded 8-bit string in Python is *not* a string, but a "ByteArray". > >Another way of putting this is: >- utf-8 in an 8-bit string is to a unicode string what a pickle is to an >object. >- defaulting to utf-8 upon coercing is like implicitly trying to unpickle >an 8-bit string when comparing it to an instance. Bad idea. > >Defaulting to Latin-1 is the only logical choice, no matter how >western-culture-centric this may seem. > >Just … [View More] The Van Rossum Common Sense gene strikes again! You guys owe it to the world to have lots of children. I agree 100%. Let me also add that if you want to do encoding work that goes beyond what the library gives you, you absolutely need a 'byte array' type which makes no assumptions and does nothing magic to its content. I have always thought of 8-bit strings as 'byte arrays' and not 'characer arrays', and doing anything magic to them in literals or standard input is going to cause lots of trouble. I think our proposal is BETTER than Java, Tcl, Visual Basic etc for the following reasons: - you can work with old fashioned strings, which are understood by everyone to be arrays of bytes, and there is no magic conversion going on. The bytes in literal strings in your script file are the bytes that end up in the program. - you can work with Unicode strings if you want - you are in explicit control of conversions between them - both types have similar methods so there isn't much to learn or remember The 'no magic' thing is very important with Japanese, where very often you need to roll your own codecs and look at the raw bytes; any auto-conversion might not go through the filter you want and you've already lost information before you started. Especially If your job is to repair possibly corrupt data. Any company with a few extra custom characters in the user-defined Shift-JIS range is going to suddenly find their Perl scripts are failing or trashing all their data as a result of the UTF-8 decision. I'm also convinced that the majority of Python scripts won't need to work in Unicode. Even working with exotic languages, there is always a native 8-bit encoding. I have only used Unicode when (a) working with data that is in several languages (b) doing conversions, which requires a 'central point' (b) wanting to do per-character operations safely on multi-byte data I still haven't sorted out in my head whether the default encoding thing is a big red herring or is important; I already have a safe way to construct Unicode literals in my source files if I want to using unicode('rawdata','myencoding'). But if there has to be one I'd say the following: - strict ASCII is an option - Latin-1 is the more generous option that is right for the most people, and has a 'special status' among 8-bit encodings - UTF-8 is not one byte per character and will confuse people Just my 2p worth, Andy [View Less]

1 0

[Fwd: [Python-Dev] Where the speed is lost! (was: 1.6 speed)]
by Christian Tismer April 26, 2000

April 26, 2000

I forgot to cc python-dev. This file is closed for me. the sun is shining again, life is so wonderful and now for something completely different - chris

1 0

L1 data cache profile for Python 1.5.2 and 1.6
by Neil Schemenauer April 26, 2000

April 26, 2000

Using this tool: http://www.cacheprof.org/ I got this output: http://www.enme.ucalgary.ca/~nascheme/python/cache.out http://www.enme.ucalgary.ca/~nascheme/python/cache-152.out The cache miss rate for eval_code2 is about two times larger in 1.6. The overall miss rate is about the same. Is this significant? I suspect that the instruction cache is more important for eval_code2. Unfortunately cacheprof can only profile the L1 data cache. Perhaps someone will find this data … [View More]

1 0

RE: [Thread-SIG] Re: [Python-Dev] baby steps for free-threading
by Salz, Rich April 26, 2000

April 26, 2000

>In my experience, allowing/requiring programmers to specify sharedness is >a very rich source of hard-to-find bugs. My experience is the opposite, since most objects aren't shared. :) You could probably do something like add an "owning thread" to each object structure, and on refcount throw an exception if not shared and the current thread isn't the owner. Not sure if space is a concern, but since the object is either shared or needs its own mutex, you make them a union: bool shared; … [View More]

9 25

Re: Python 1.6a2 Unicode bug (was Re: comparing strings and ints)
by Just van Rossum April 26, 2000

April 26, 2000

Fredrik Lundh replied to himself in c.l.py: >> as far as I can tell, it's supposed to be a feature. >> >> if you mix 8-bit strings with unicode strings, python 1.6a2 >> attempts to interpret the 8-bit string as an utf-8 encoded >> unicode string. >> >> but yes, I also think it's a bug. but this far, my attempts >> to get someone else to fix it has failed. might have to do >> it myself... ;-) > >postscript: the powers-that-be has … [View More]

1 0

Re: [Python-checkins] CVS: python/dist/src/Modules socketmodule.c,1.104,1.105
by pf＠artcom-gmbh.de April 26, 2000

April 26, 2000

Guido van Rossum: > Modified Files: > socketmodule.c [...] > *** 2526,2529 **** > --- 2526,2532 ---- > #ifdef MSG_DONTROUTE > insint(d, "MSG_DONTROUTE", MSG_DONTROUTE); > + #endif > + #ifdef MSG_DONTWAIT > + insint(d, "MSG_DONWAIT", MSG_DONTWAIT); -------------------------^^? Shouldn't this read "MSG_DONTWAIT"? ----------------------------^! Nitpicking, Peter

1 0

Re: [Python-checkins] CVS: python/dist/src/Modules socketmodule.c,1.104,1.105
by Fredrik Lundh April 25, 2000

April 25, 2000

> + insint(d, "MSG_DONWAIT", MSG_DONTWAIT); better make that > + insint(d, "MSG_DONTWAIT", MSG_DONTWAIT); right? </F>

1 0

1.6 speed
by A.M. Kuchling April 25, 2000

April 25, 2000

Python 1.6a2 is around 10% slower than 1.5 on pystone. Any idea why? [amk@mira Python-1.6a2]$ ./python Lib/test/pystone.py Pystone(1.1) time for 10000 passes = 3.59 This machine benchmarks at 2785.52 pystones/second [amk@mira Python-1.6a2]$ python1.5 Lib/test/pystone.py Pystone(1.1) time for 10000 passes = 3.19 This machine benchmarks at 3134.8 pystones/second --amk

6 10

Off-topic
by Christian Tismer April 25, 2000

April 25, 2000

Hey, don't blame me for posting a joke :-) Please read from the beginning, don't look at the end first. No, this is no offense... -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE … [View More]

1 0

map() methods (was: Re: [Patches] Review (was: Please review before applying))
by Guido van Rossum April 25, 2000

April 25, 2000

[Moving this to python-dev because it's a musing > > The main point is to avoid string.*. > > Agreed. Also replacing map by a loop might not even be slower. > What remains as open question: Several modules need access > to string constants, and they therefore still have to import > string. > Is there an elegant solution to this? import string > That's why i asked for some way to access "".__class__ or > whatever, to get into some common namespace with the … [View More]

4 7