With the extensive changes in the works, Python 3.0.1 is shaping-up to be a complete rerelease of 3.0 with API changes and major usability fixes. It will fully supplant the original 3.0 release which was hobbled by poor IO performance. I propose to make the new release more attractive by backporting several module improvements already in 3.1, including two new itertools and one collections class. These are already fully documented, tested, and checked-in to 3.1 and it would be ashamed to let them sit idle for a year or so, when the module updates are already ready-to-ship. Raymond
On Tue, Jan 27, 2009 at 11:00 AM, Raymond Hettinger
With the extensive changes in the works, Python 3.0.1 is shaping-up to be a complete rerelease of 3.0 with API changes and major usability fixes. It will fully supplant the original 3.0 release which was hobbled by poor IO performance.
I propose to make the new release more attractive by backporting several module improvements already in 3.1, including two new itertools and one collections class. These are already fully documented, tested, and checked-in to 3.1 and it would be ashamed to let them sit idle for a year or so, when the module updates are already ready-to-ship.
In that case, I recommend just releasing it as 3.1. I had always anticipated a 3.1 release much sooner than the typical release schedule. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
From: "Guido van Rossum"
In that case, I recommend just releasing it as 3.1. I had always anticipated a 3.1 release much sooner than the typical release schedule.
That is great idea. It's a strong cue that there is a somewhat major break with 3.0 (removed functions, API fixes, huge performance fixes, and whatnot). Raymond
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Jan 27, 2009, at 2:05 PM, Guido van Rossum wrote:
On Tue, Jan 27, 2009 at 11:00 AM, Raymond Hettinger
wrote: With the extensive changes in the works, Python 3.0.1 is shaping-up to be a complete rerelease of 3.0 with API changes and major usability fixes. It will fully supplant the original 3.0 release which was hobbled by poor IO performance.
I propose to make the new release more attractive by backporting several module improvements already in 3.1, including two new itertools and one collections class. These are already fully documented, tested, and checked-in to 3.1 and it would be ashamed to let them sit idle for a year or so, when the module updates are already ready-to-ship.
In that case, I recommend just releasing it as 3.1. I had always anticipated a 3.1 release much sooner than the typical release schedule.
I was going to object on principle to Raymond's suggestion to rip out the operator module functions in Python 3.0.1. I have no objection to ripping them out for 3.1. If you really think we need a Python 3.1 soon, then I won't worry about trying to get a 3.0.1 out soon. 3.1 is Benjamin's baby :). If OTOH we do intend to get a 3.0.1 out, say by the end of February, then please be careful to adhere to our guidelines for which version various changes can go in. For example, the operator methods needs to be restored to the 3.0 maintenance branch, and any other API changes added to 3.0 need to be backed out and applied only to the python3 trunk. Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSX9ggHEjvBPtnXfVAQJkTwQAmpKLlXwiIdgHANxlj85wNko4kB7o8Xv8 8wKT6/ZZeU8t09eelchklhw9rAB4I/BQcoQYPg9jiUydbFWdPd/0/G8xrr+F+dTO J2fkGEK1GVorcAZ3iWywpLQXPnHgfrelUBhKT5KzIu5xWzuEnLBDT3c+r2fwNZia hNpAu1Ihj+s= =g69v -----END PGP SIGNATURE-----
On Tue, Jan 27, 2009 at 11:29, Barry Warsaw
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Jan 27, 2009, at 2:05 PM, Guido van Rossum wrote:
On Tue, Jan 27, 2009 at 11:00 AM, Raymond Hettinger
wrote: With the extensive changes in the works, Python 3.0.1 is shaping-up to be a complete rerelease of 3.0 with API changes and major usability fixes. It will fully supplant the original 3.0 release which was hobbled by poor IO performance.
I propose to make the new release more attractive by backporting several module improvements already in 3.1, including two new itertools and one collections class. These are already fully documented, tested, and checked-in to 3.1 and it would be ashamed to let them sit idle for a year or so, when the module updates are already ready-to-ship.
In that case, I recommend just releasing it as 3.1. I had always anticipated a 3.1 release much sooner than the typical release schedule.
A quick 3.1 release also shows how committed we are to 3.x and that we realize that 3.0 had some initial growing pains that needed to be worked out.
I was going to object on principle to Raymond's suggestion to rip out the operator module functions in Python 3.0.1.
I thought it was for 3.1?
I have no objection to ripping them out for 3.1.
If you really think we need a Python 3.1 soon, then I won't worry about trying to get a 3.0.1 out soon. 3.1 is Benjamin's baby :).
Depending on what Benjamin wants to do we could try for something like a release by PyCon or at PyCon during the sprints. Actually the sprint one is a rather nice idea if Benjamin is willing to spend sprint time on it (and he is sticking around for the sprints) as I assume you, Barry, will be there to be able to help in person and we can squash last minute issues really quickly.
If OTOH we do intend to get a 3.0.1 out, say by the end of February, then please be careful to adhere to our guidelines for which version various changes can go in. For example, the operator methods needs to be restored to the 3.0 maintenance branch, and any other API changes added to 3.0 need to be backed out and applied only to the python3 trunk.
If you have the time for it, Barry, I am +1 on an end of February 3.0.1 with a March/April 3.1 if that works for Benjamin. -Brett
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Jan 27, 2009, at 2:39 PM, Brett Cannon wrote:
I was going to object on principle to Raymond's suggestion to rip out the operator module functions in Python 3.0.1.
I thought it was for 3.1?
Sorry, I probably misread Raymond's suggestion.
I have no objection to ripping them out for 3.1.
If you really think we need a Python 3.1 soon, then I won't worry about trying to get a 3.0.1 out soon. 3.1 is Benjamin's baby :).
Depending on what Benjamin wants to do we could try for something like a release by PyCon or at PyCon during the sprints. Actually the sprint one is a rather nice idea if Benjamin is willing to spend sprint time on it (and he is sticking around for the sprints) as I assume you, Barry, will be there to be able to help in person and we can squash last minute issues really quickly.
Yep, I'm planning on sticking around, so that's a great idea.
If OTOH we do intend to get a 3.0.1 out, say by the end of February, then please be careful to adhere to our guidelines for which version various changes can go in. For example, the operator methods needs to be restored to the 3.0 maintenance branch, and any other API changes added to 3.0 need to be backed out and applied only to the python3 trunk.
If you have the time for it, Barry, I am +1 on an end of February 3.0.1 with a March/April 3.1 if that works for Benjamin.
Or at least a 3.1alpha/beta/whatever during Pycon. I'm sure I can find the time to do a 3.0.1 before Pycon. Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSX9mGXEjvBPtnXfVAQL5BgP+JXX43hbNlrjeV9YBFBbCB9SfnFlImTTx ZHhilw12yH13Ha2RLbre+sWlBDQFdTeAJkjUWg2/iZ7Ti8g9eD7sp1KRRuLkbTx0 83h+ciTd9Fdp+sv4JRKfP609X0dlAfbrjjVU/NzXCHePXb++Tr2liHRtHwnr3DgL kZNp1jOTG8Q= =nVHs -----END PGP SIGNATURE-----
On Tue, Jan 27, 2009 at 1:00 PM, Raymond Hettinger
With the extensive changes in the works, Python 3.0.1 is shaping-up to be a complete rerelease of 3.0 with API changes and major usability fixes. It will fully supplant the original 3.0 release which was hobbled by poor IO performance.
I propose to make the new release more attractive by backporting several module improvements already in 3.1, including two new itertools and one collections class. These are already fully documented, tested, and checked-in to 3.1 and it would be ashamed to let them sit idle for a year or so, when the module updates are already ready-to-ship.
At the moment, there are 4 release blockers for 3.0.1. I'd like to see 3.0.1 released soon (within the next month.) It would fix the hugest mistakes in the initial release most of which have been done committed since December. I'm sure it would be attractive enough with the nasty bugs fixed in it! Let's not completely open the flood gates. Releasing 3.1 in March or April also sounds good. I will be at least at the first day of sprints. -- Regards, Benjamin
At the moment, there are 4 release blockers for 3.0.1. I'd like to see 3.0.1 released soon (within the next month.)
I agree. In December, there was a huge sense of urgency that we absolutely must have a 3.0.1 last year - and now people talk about giving up 3.0 entirely. Releasing 3.1 6 months after 3.0 sounds reasonable; I don't think it should be released earlier (else 3.0 looks fairly ridiculous). Regards, Martin
On Tue, Jan 27, 2009 at 3:22 PM, Benjamin Peterson
At the moment, there are 4 release blockers for 3.0.1. I'd like to see 3.0.1 released soon (within the next month.) It would fix the hugest mistakes in the initial release most of which have been done committed since December. I'm sure it would be attractive enough with the nasty bugs fixed in it! Let's not completely open the flood gates.
Releasing 3.1 in March or April also sounds good. I will be at least at the first day of sprints.
As an interested observer, but not yet user of the 3.x series, I was wondering about progress on restoring io performance and what release those improvements were slated for. This is the major blocker for me to begin porting my non-numpy/scipy dependent code. Much of my current work is in bioinformatics, often dealing with multi-gigabyte datasets, so file io fast is critical. Otherwise, I'll have to live with 2.x for the indefinite future. Thanks, ~Kevin
On Tue, Jan 27, 2009 at 12:28 PM, "Martin v. Löwis"
At the moment, there are 4 release blockers for 3.0.1. I'd like to see 3.0.1 released soon (within the next month.)
I agree. In December, there was a huge sense of urgency that we absolutely must have a 3.0.1 last year - and now people talk about giving up 3.0 entirely.
Releasing 3.1 6 months after 3.0 sounds reasonable; I don't think it should be released earlier (else 3.0 looks fairly ridiculous).
It sounds like my approval of Raymond's removal of certain (admittedly obsolete) operators from the 3.0 branch was premature. Barry at least thinks those should be rolled back. Others? -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Releasing 3.1 6 months after 3.0 sounds reasonable; I don't think it should be released earlier (else 3.0 looks fairly ridiculous).
It sounds like my approval of Raymond's removal of certain (admittedly obsolete) operators from the 3.0 branch was premature. Barry at least thinks those should be rolled back. Others?
I agree that not too much harm is done by removing stuff in 3.0.1 that erroneously had been left in the 3.0 release - in particular if 3.0.1 gets released quickly (e.g. within two months of the original release). If that is an acceptable policy, then those changes would fall under the policy. If the policy is *not* acceptable, a lot of changes to 3.0.1 need to be rolled back (e.g. the ongoing removal of __cmp__ fragments) Regards, Martin
From: ""Martin v. Löwis""
Releasing 3.1 6 months after 3.0 sounds reasonable; I don't think it should be released earlier (else 3.0 looks fairly ridiculous).
I think it should be released earlier and completely supplant 3.0 before more third-party developers spend time migrating code. We needed 3.0 to get released so we could get the feedback necessary to shake it out. Now, it is time for it to fade into history and take advantage of the lessons learned. The principles for the 2.x series don't really apply here. In 2.x, there was always a useful, stable, clean release already fielded and there were tons of third-party apps that needed a slow rate of change. In contrast, 3.0 has a near zero installed user base (at least in terms of being used in production). It has very few migrated apps. It is not particularly clean and some of the work for it was incomplete when it was released. My preference is to drop 3.0 entirely (no incompatable bugfix release) and in early February release 3.1 as the real 3.x that migrators ought to aim for and that won't have incompatable bugfix releases. Then at PyCon, we can have a real bug day and fix-up any chips in the paint. If 3.1 goes out right away, then it doesn't matter if 3.0 looks ridiculous. All eyes go to the latest release. Better to get this done before more people download 3.0 to kick the tires. Raymond
Hello Kevin,
As an interested observer, but not yet user of the 3.x series, I was wondering about progress on restoring io performance and what release those improvements were slated for.
There is an SVN branch with a complete rewrite (in C) of the IO stack. You can find it in branches/io-c. Apart from a problem in _ssl.c, it should be quite usable. Your tests and observations are welcome! Regards Antoine.
On Tue, Jan 27, 2009 at 3:19 PM, Raymond Hettinger
If 3.1 goes out right away, then it doesn't matter if 3.0 looks ridiculous. All eyes go to the latest release. Better to get this done before more people download 3.0 to kick the tires.
It seems like we are arguing over the version number of basically the same thing. I would like to see 3.0.1 released in early February for nearly the reasons you name. However, it seems to me that there are two kinds of issues: those like __cmp__ removal and some silly IO bugs that have been fixed for a while and our waiting to be released. There's also projects like io in c which are important, but would not make the schedule you and I want for 3.0.1/3.1. It's for those longer term features that I want 3.0.1 and 3.1. If we immedatly released 3.1, when would those longer term projects that are important for migration make it to stable? 3.2 is probably a while off. -- Regards, Benjamin
Benjamin Peterson
At the moment, there are 4 release blockers for 3.0.1. I'd like to see 3.0.1 released soon (within the next month.) It would fix the hugest mistakes in the initial release most of which have been done committed since December. I'm sure it would be attractive enough with the nasty bugs fixed in it! Let's not completely open the flood gates.
Releasing 3.1 in March or April also sounds good. I will be at least at the first day of sprints.
+1 on all Benjamin said. The IO-in-C branch cannot be reasonably pulled in release30-maint, but it will be ready for 3.1. Speaking of which, testers are welcome (the branch is in branches/io-c). Also, I need someone to update the Windows build files. Regards Antoine.
[Benjamin Peterson]
It seems like we are arguing over the version number of basically the same thing. I would like to see 3.0.1 released in early February for nearly the reasons you name. However, it seems to me that there are two kinds of issues: those like __cmp__ removal and some silly IO bugs that have been fixed for a while and our waiting to be released. There's also projects like io in c which are important, but would not make the schedule you and I want for 3.0.1/3.1.
What is involved in finishing io-in-c? ISTM, that is critical and that its absence is a serious barrier to adoption in a production environment. How far away is it? Raymond
My preference is to drop 3.0 entirely (no incompatable bugfix release) and in early February release 3.1 as the real 3.x that migrators ought to aim for and that won't have incompatable bugfix releases. Then at PyCon, we can have a real bug day and fix-up any chips in the paint.
I would fear that than 3.1 gets the same fate as 3.0. In May, we will all think "what piece of junk was that 3.1 release, let's put it to history", and replace it with 3.2. By then, users will wonder if there is ever a 3.x release that is any good. Regards, Martin
On Tue, Jan 27, 2009 at 14:31, "Martin v. Löwis"
My preference is to drop 3.0 entirely (no incompatable bugfix release) and in early February release 3.1 as the real 3.x that migrators ought to aim for and that won't have incompatable bugfix releases. Then at PyCon, we can have a real bug day and fix-up any chips in the paint.
I would fear that than 3.1 gets the same fate as 3.0. In May, we will all think "what piece of junk was that 3.1 release, let's put it to history", and replace it with 3.2. By then, users will wonder if there is ever a 3.x release that is any good.
That's my fear as well. I have no problem doing a quick 3.0.1 release any time between now and the end of February and start with the first alpha or beta of 3.1 at PyCon. -Brett
Antoine Pitrou
There is an SVN branch with a complete rewrite (in C) of the IO stack. You can find it in branches/io-c. Apart from a problem in _ssl.c, it should be quite usable. Your tests and observations are welcome!
And I'll look at that _ssl.c problem. Bill
Raymond Hettinger
What is involved in finishing io-in-c?
Off the top of my head: - fix the _ssl bug which prevents some tests from passing (issue #4967) - clean up io.py (and decide what to do with the remaining Python code: basically, the parts of StringIO which are implemented in Python) - of course, test in various situations, review the code, suggest possible improvements... Now here are some performance figures. Text I/O is done in utf-8 with universal newlines enabled: === I/O in C === ** Binary input ** [ 400KB ] read one unit at a time... 1.64 MB/s [ 400KB ] read 20 units at a time... 27.2 MB/s [ 400KB ] read 4096 units at a time... 845 MB/s [ 20KB ] read whole contents at once... 924 MB/s [ 400KB ] read whole contents at once... 883 MB/s [ 10MB ] read whole contents at once... 980 MB/s [ 400KB ] seek forward one unit at a time... 0.528 MB/s [ 400KB ] seek forward 1000 units at a time... 516 MB/s [ 400KB ] alternate read & seek one unit... 1.33 MB/s [ 400KB ] alternate read & seek 1000 units... 490 MB/s ** Text input ** [ 400KB ] read one unit at a time... 2.28 MB/s [ 400KB ] read 20 units at a time... 29.2 MB/s [ 400KB ] read one line at a time... 71.7 MB/s [ 400KB ] read 4096 units at a time... 97.4 MB/s [ 20KB ] read whole contents at once... 108 MB/s [ 400KB ] read whole contents at once... 112 MB/s [ 10MB ] read whole contents at once... 89.7 MB/s [ 400KB ] seek forward one unit at a time... 0.0904 MB/s [ 400KB ] seek forward 1000 units at a time... 87.4 MB/s ** Binary append ** [ 20KB ] write one unit at a time... 0.668 MB/s [ 400KB ] write 20 units at a time... 12.2 MB/s [ 400KB ] write 4096 units at a time... 722 MB/s [ 10MB ] write 1e6 units at a time... 1529 MB/s ** Text append ** [ 20KB ] write one unit at a time... 0.983 MB/s [ 400KB ] write 20 units at a time... 16 MB/s [ 400KB ] write 4096 units at a time... 236 MB/s [ 10MB ] write 1e6 units at a time... 261 MB/s ** Binary overwrite ** [ 20KB ] modify one unit at a time... 0.677 MB/s [ 400KB ] modify 20 units at a time... 12.1 MB/s [ 400KB ] modify 4096 units at a time... 382 MB/s [ 400KB ] alternate write & seek one unit... 0.212 MB/s [ 400KB ] alternate write & seek 1000 units... 173 MB/s [ 400KB ] alternate read & write one unit... 0.827 MB/s [ 400KB ] alternate read & write 1000 units... 276 MB/s ** Text overwrite ** [ 20KB ] modify one unit at a time... 0.296 MB/s [ 400KB ] modify 20 units at a time... 5.69 MB/s [ 400KB ] modify 4096 units at a time... 151 MB/s === I/O in Python (branches/py3k) === ** Binary input ** [ 400KB ] read one unit at a time... 0.174 MB/s [ 400KB ] read 20 units at a time... 3.44 MB/s [ 400KB ] read 4096 units at a time... 246 MB/s [ 20KB ] read whole contents at once... 443 MB/s [ 400KB ] read whole contents at once... 216 MB/s [ 10MB ] read whole contents at once... 274 MB/s [ 400KB ] seek forward one unit at a time... 0.188 MB/s [ 400KB ] seek forward 1000 units at a time... 182 MB/s [ 400KB ] alternate read & seek one unit... 0.0821 MB/s [ 400KB ] alternate read & seek 1000 units... 81.2 MB/s ** Text input ** [ 400KB ] read one unit at a time... 0.218 MB/s [ 400KB ] read 20 units at a time... 3.8 MB/s [ 400KB ] read one line at a time... 3.69 MB/s [ 400KB ] read 4096 units at a time... 34.9 MB/s [ 20KB ] read whole contents at once... 70.5 MB/s [ 400KB ] read whole contents at once... 81 MB/s [ 10MB ] read whole contents at once... 68.7 MB/s [ 400KB ] seek forward one unit at a time... 0.0709 MB/s [ 400KB ] seek forward 1000 units at a time... 67.3 MB/s ** Binary append ** [ 20KB ] write one unit at a time... 0.15 MB/s [ 400KB ] write 20 units at a time... 2.88 MB/s [ 400KB ] write 4096 units at a time... 346 MB/s [ 10MB ] write 1e6 units at a time... 728 MB/s ** Text append ** [ 20KB ] write one unit at a time... 0.0814 MB/s [ 400KB ] write 20 units at a time... 1.51 MB/s [ 400KB ] write 4096 units at a time... 118 MB/s [ 10MB ] write 1e6 units at a time... 218 MB/s ** Binary overwrite ** [ 20KB ] modify one unit at a time... 0.123 MB/s [ 400KB ] modify 20 units at a time... 2.34 MB/s [ 400KB ] modify 4096 units at a time... 213 MB/s [ 400KB ] alternate write & seek one unit... 0.0816 MB/s [ 400KB ] alternate write & seek 1000 units... 71.4 MB/s [ 400KB ] alternate read & write one unit... 0.0448 MB/s [ 400KB ] alternate read & write 1000 units... 41.1 MB/s ** Text overwrite ** [ 20KB ] modify one unit at a time... 0.0723 MB/s [ 400KB ] modify 20 units at a time... 1.36 MB/s [ 400KB ] modify 4096 units at a time... 88.3 MB/s Regards Antoine.
Martin v. Löwis
The IO-in-C branch cannot be reasonably pulled in release30-maint, but it
will
be ready for 3.1.
Even if 3.1 is released in February?
No, unless we take some risks and rush it in. (technically, it seems to work, but it's such a critical piece of code that it would be nice to let it rest a little) Regards Antoine.
On Tue, Jan 27, 2009 at 4:44 PM, Antoine Pitrou
Raymond Hettinger
writes: What is involved in finishing io-in-c?
Off the top of my head: - fix the _ssl bug which prevents some tests from passing (issue #4967) - clean up io.py (and decide what to do with the remaining Python code: basically, the parts of StringIO which are implemented in Python) - of course, test in various situations, review the code, suggest possible improvements...
There are also several IO bugs that should be fixed before it becomes official like #5006.
Now here are some performance figures. Text I/O is done in utf-8 with universal newlines enabled:
-- Regards, Benjamin
On Tue, Jan 27, 2009 at 4:44 PM, Antoine Pitrou
Now here are some performance figures. Text I/O is done in utf-8 with universal newlines enabled:
Would it be much trouble to also compare performance with Python 2.6? -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com
Daniel Stutzbach
Would it be much trouble to also compare performance with Python 2.6?
Here are the results on trunk. Keep in mind Text IO, while it's still `open("r", filename)`, does not mean the same thing. === 2.7 I/O (trunk) === ** Binary input ** [ 400KB ] read one unit at a time... 1.48 MB/s [ 400KB ] read 20 units at a time... 29.2 MB/s [ 400KB ] read 4096 units at a time... 1038 MB/s [ 20KB ] read whole contents at once... 1145 MB/s [ 400KB ] read whole contents at once... 891 MB/s [ 10MB ] read whole contents at once... 966 MB/s [ 400KB ] seek forward one unit at a time... 0.893 MB/s [ 400KB ] seek forward 1000 units at a time... 568 MB/s [ 400KB ] alternate read & seek one unit... 1.11 MB/s [ 400KB ] alternate read & seek 1000 units... 563 MB/s ** Text input ** [ 400KB ] read one unit at a time... 1.41 MB/s [ 400KB ] read 20 units at a time... 28.4 MB/s [ 400KB ] read one line at a time... 207 MB/s [ 400KB ] read 4096 units at a time... 1060 MB/s [ 20KB ] read whole contents at once... 1196 MB/s [ 400KB ] read whole contents at once... 841 MB/s [ 10MB ] read whole contents at once... 966 MB/s [ 400KB ] seek forward one unit at a time... 0.873 MB/s [ 400KB ] seek forward 1000 units at a time... 589 MB/s ** Binary append ** [ 20KB ] write one unit at a time... 0.887 MB/s [ 400KB ] write 20 units at a time... 15.8 MB/s [ 400KB ] write 4096 units at a time... 1071 MB/s [ 10MB ] write 1e6 units at a time... 1523 MB/s ** Text append ** [ 20KB ] write one unit at a time... 1.33 MB/s [ 400KB ] write 20 units at a time... 22.9 MB/s [ 400KB ] write 4096 units at a time... 1244 MB/s [ 10MB ] write 1e6 units at a time... 1540 MB/s ** Binary overwrite ** [ 20KB ] modify one unit at a time... 0.867 MB/s [ 400KB ] modify 20 units at a time... 15.3 MB/s [ 400KB ] modify 4096 units at a time... 446 MB/s [ 400KB ] alternate write & seek one unit... 0.237 MB/s [ 400KB ] alternate write & seek 1000 units... 151 MB/s [ 400KB ] alternate read & write one unit... 0.221 MB/s [ 400KB ] alternate read & write 1000 units... 153 MB/s ** Text overwrite ** [ 20KB ] modify one unit at a time... 1.32 MB/s [ 400KB ] modify 20 units at a time... 22.5 MB/s [ 400KB ] modify 4096 units at a time... 509 MB/s
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Jan 27, 2009, at 5:36 PM, Brett Cannon wrote:
On Tue, Jan 27, 2009 at 14:31, "Martin v. Löwis"
wrote: My preference is to drop 3.0 entirely (no incompatable bugfix release) and in early February release 3.1 as the real 3.x that migrators ought to aim for and that won't have incompatable bugfix releases. Then at PyCon, we can have a real bug day and fix-up any chips in the paint.
I would fear that than 3.1 gets the same fate as 3.0. In May, we will all think "what piece of junk was that 3.1 release, let's put it to history", and replace it with 3.2. By then, users will wonder if there is ever a 3.x release that is any good.
That's my fear as well. I have no problem doing a quick 3.0.1 release any time between now and the end of February and start with the first alpha or beta of 3.1 at PyCon.
+1 Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSX+R6HEjvBPtnXfVAQLQBwQAuJfVHtKQRqptjl1Hlkz37RSqMnCGNE/f Fm2JmulfWbtlZgeZ+YgBMyPw2jGpmkSp/zB0aThuBNRrtcEPOnO0nFKxWwcFwBa/ ZddlM9RJvb+GgBPNOjnSXNSJcYmNLwea7GuKPkTVmkb9nH0JLOnk2dLVTGjJ89Q4 F3qsGz5coEc= =gUH4 -----END PGP SIGNATURE-----
On Tue, Jan 27, 2009 at 14:44, Antoine Pitrou
Raymond Hettinger
writes: What is involved in finishing io-in-c?
Off the top of my head: - fix the _ssl bug which prevents some tests from passing (issue #4967) - clean up io.py (and decide what to do with the remaining Python code: basically, the parts of StringIO which are implemented in Python)
The other VMs might appreciate the code being available and used if _io is not available for import. If you need help on how to have the tests run twice, once on the Python code and again on the C code, you can look at test_heapq and test_warnings for approaches.
- of course, test in various situations, review the code, suggest possible improvements...
Now here are some performance figures. Text I/O is done in utf-8 with universal newlines enabled:
That is impressive! Congrats to you and (I think) Amaury for all the hard work you guys have put in. -Brett
=== I/O in C ===
** Binary input **
[ 400KB ] read one unit at a time... 1.64 MB/s [ 400KB ] read 20 units at a time... 27.2 MB/s [ 400KB ] read 4096 units at a time... 845 MB/s
[ 20KB ] read whole contents at once... 924 MB/s [ 400KB ] read whole contents at once... 883 MB/s [ 10MB ] read whole contents at once... 980 MB/s
[ 400KB ] seek forward one unit at a time... 0.528 MB/s [ 400KB ] seek forward 1000 units at a time... 516 MB/s [ 400KB ] alternate read & seek one unit... 1.33 MB/s [ 400KB ] alternate read & seek 1000 units... 490 MB/s
** Text input **
[ 400KB ] read one unit at a time... 2.28 MB/s [ 400KB ] read 20 units at a time... 29.2 MB/s [ 400KB ] read one line at a time... 71.7 MB/s [ 400KB ] read 4096 units at a time... 97.4 MB/s
[ 20KB ] read whole contents at once... 108 MB/s [ 400KB ] read whole contents at once... 112 MB/s [ 10MB ] read whole contents at once... 89.7 MB/s
[ 400KB ] seek forward one unit at a time... 0.0904 MB/s [ 400KB ] seek forward 1000 units at a time... 87.4 MB/s
** Binary append **
[ 20KB ] write one unit at a time... 0.668 MB/s [ 400KB ] write 20 units at a time... 12.2 MB/s [ 400KB ] write 4096 units at a time... 722 MB/s [ 10MB ] write 1e6 units at a time... 1529 MB/s
** Text append **
[ 20KB ] write one unit at a time... 0.983 MB/s [ 400KB ] write 20 units at a time... 16 MB/s [ 400KB ] write 4096 units at a time... 236 MB/s [ 10MB ] write 1e6 units at a time... 261 MB/s
** Binary overwrite **
[ 20KB ] modify one unit at a time... 0.677 MB/s [ 400KB ] modify 20 units at a time... 12.1 MB/s [ 400KB ] modify 4096 units at a time... 382 MB/s
[ 400KB ] alternate write & seek one unit... 0.212 MB/s [ 400KB ] alternate write & seek 1000 units... 173 MB/s [ 400KB ] alternate read & write one unit... 0.827 MB/s [ 400KB ] alternate read & write 1000 units... 276 MB/s
** Text overwrite **
[ 20KB ] modify one unit at a time... 0.296 MB/s [ 400KB ] modify 20 units at a time... 5.69 MB/s [ 400KB ] modify 4096 units at a time... 151 MB/s
=== I/O in Python (branches/py3k) ===
** Binary input **
[ 400KB ] read one unit at a time... 0.174 MB/s [ 400KB ] read 20 units at a time... 3.44 MB/s [ 400KB ] read 4096 units at a time... 246 MB/s
[ 20KB ] read whole contents at once... 443 MB/s [ 400KB ] read whole contents at once... 216 MB/s [ 10MB ] read whole contents at once... 274 MB/s
[ 400KB ] seek forward one unit at a time... 0.188 MB/s [ 400KB ] seek forward 1000 units at a time... 182 MB/s [ 400KB ] alternate read & seek one unit... 0.0821 MB/s [ 400KB ] alternate read & seek 1000 units... 81.2 MB/s
** Text input **
[ 400KB ] read one unit at a time... 0.218 MB/s [ 400KB ] read 20 units at a time... 3.8 MB/s [ 400KB ] read one line at a time... 3.69 MB/s [ 400KB ] read 4096 units at a time... 34.9 MB/s
[ 20KB ] read whole contents at once... 70.5 MB/s [ 400KB ] read whole contents at once... 81 MB/s [ 10MB ] read whole contents at once... 68.7 MB/s
[ 400KB ] seek forward one unit at a time... 0.0709 MB/s [ 400KB ] seek forward 1000 units at a time... 67.3 MB/s
** Binary append **
[ 20KB ] write one unit at a time... 0.15 MB/s [ 400KB ] write 20 units at a time... 2.88 MB/s [ 400KB ] write 4096 units at a time... 346 MB/s [ 10MB ] write 1e6 units at a time... 728 MB/s
** Text append **
[ 20KB ] write one unit at a time... 0.0814 MB/s [ 400KB ] write 20 units at a time... 1.51 MB/s [ 400KB ] write 4096 units at a time... 118 MB/s [ 10MB ] write 1e6 units at a time... 218 MB/s
** Binary overwrite **
[ 20KB ] modify one unit at a time... 0.123 MB/s [ 400KB ] modify 20 units at a time... 2.34 MB/s [ 400KB ] modify 4096 units at a time... 213 MB/s
[ 400KB ] alternate write & seek one unit... 0.0816 MB/s [ 400KB ] alternate write & seek 1000 units... 71.4 MB/s [ 400KB ] alternate read & write one unit... 0.0448 MB/s [ 400KB ] alternate read & write 1000 units... 41.1 MB/s
** Text overwrite **
[ 20KB ] modify one unit at a time... 0.0723 MB/s [ 400KB ] modify 20 units at a time... 1.36 MB/s [ 400KB ] modify 4096 units at a time... 88.3 MB/s
Regards
Antoine.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Jan 27, 2009, at 3:48 PM, Martin v. Löwis wrote:
Releasing 3.1 6 months after 3.0 sounds reasonable; I don't think it should be released earlier (else 3.0 looks fairly ridiculous).
It sounds like my approval of Raymond's removal of certain (admittedly obsolete) operators from the 3.0 branch was premature. Barry at least thinks those should be rolled back. Others?
I agree that not too much harm is done by removing stuff in 3.0.1 that erroneously had been left in the 3.0 release - in particular if 3.0.1 gets released quickly (e.g. within two months of the original release).
If that is an acceptable policy, then those changes would fall under the policy. If the policy is *not* acceptable, a lot of changes to 3.0.1 need to be rolled back (e.g. the ongoing removal of __cmp__ fragments)
I have no problem with removing things that were advertised and/or documented to be removed in 3.0 but accidentally were not. That seems like a reasonable policy to me. However, if we did not tell people that something was going to be removed, then I don't think we can really remove it in 3.0. Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSX+S4nEjvBPtnXfVAQIjuQQAucsAp79ZtlcOq1GPiwDaEoYMKTEgkkNp hLgdDW85ktmFf0xHl/KAU8lcxeaiWGepefsRxsx7c5fX6UIVZPUHDvkDkf5rImx6 wg7Nin2MirLT/lXY7a8//N+5TwLqIBTLLEfAIAFvDhrQT/CuMfZej7leB7BAd7Ti puLWYYYUL+M= =pK8E -----END PGP SIGNATURE-----
[Martin]
I would fear that than 3.1 gets the same fate as 3.0. In May, we will all think "what piece of junk was that 3.1 release, let's put it to history", and replace it with 3.2. By then, users will wonder if there is ever a 3.x release that is any good.
I thought the gist of Guido's idea was to label 3.0.1 as 3.1 to emphasize the magnitude of differences from 3.0. That seemed like a good idea to me. But I'm happy no matter what you want to call it. The important thing is that the bugfixes go in and the half-started removals get finished. I would like the next release (whatever it is called) to include the IO speedups which will help remove a barrier to adoption for serious use. I do hope the next release goes out as soon as possible. I use 3.0 daily and my impression is that the current version needs to be replaced as soon as possible. If it gets called 3.1, the nice side effect for me is that my itertools updates get fielded a bit sooner. But that is a somewhat unimportant consideration. I really have no opinion on what the next release gets called. Raymond
If something gets left in 3.0.1 and then ripped-out in 3.1, I think we're doing more harm than good. Very little code has been ported to 3.0 so far. One there is a base, all changes become more difficult. In the interests of our users, I vote for sooner than later. Also, 3.0 is a special case because it is IMO a broken release. AFAICT, it is not in any distro yet. Hopefully, no one will keep it around and it will vanish silently. Raymond ----- Original Message ----- I have no problem with removing things that were advertised and/or documented to be removed in 3.0 but accidentally were not. That seems like a reasonable policy to me. However, if we did not tell people that something was going to be removed, then I don't think we can really remove it in 3.0. Barry
Raymond Hettinger
Also, 3.0 is a special case because it is IMO a broken release. AFAICT, it is not in any distro yet.
I have access to an Ubuntu 8.10 box and: $ apt-cache search python3.0 idle-python3.0 - An IDE for Python (v3.0) using Tkinter libpython3.0 - Shared Python runtime library (version 3.0) python3-all - Package depending on all supported Python runtime versions python3-all-dbg - Package depending on all supported Python debugging packages python3-all-dev - Package depending on all supported Python development packages python3-dbg - Debug Build of the Python Interpreter (version 3.0) python3.0 - An interactive high-level object-oriented language (version 3.0) python3.0-dbg - Debug Build of the Python Interpreter (version 3.0) python3.0-dev - Header files and a static library for Python (v3.0) python3.0-doc - Documentation for the high-level object-oriented language Python (v3.0) python3.0-examples - Examples for the Python language (v3.0) python3.0-minimal - A minimal subset of the Python language (version 3.0) But it's not installed by default. Regards Antoine.
On Tue, Jan 27, 2009 at 4:54 PM, Antoine Pitrou
Daniel Stutzbach
writes: Would it be much trouble to also compare performance with Python 2.6?
Here are the results on trunk.
Thanks, Antoine! To make comparison easier, I put together the results into a Google Spreadsheet: http://spreadsheets.google.com/pub?key=pbqSxQEo4UXwPlifXmvPHGQ Keep in mind Text IO, while it's still `open("r",
filename)`, does not mean the same thing.
That's because in Python 3, the Text IO has to convert to Unicode, correct? -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com
Raymond Hettinger schrieb:
[Martin]
I would fear that than 3.1 gets the same fate as 3.0. In May, we will all think "what piece of junk was that 3.1 release, let's put it to history", and replace it with 3.2. By then, users will wonder if there is ever a 3.x release that is any good.
I thought the gist of Guido's idea was to label 3.0.1 as 3.1 to emphasize the magnitude of differences from 3.0. That seemed like a good idea to me. But I'm happy no matter what you want to call it. The important thing is that the bugfixes go in and the half-started removals get finished. I would like the next release (whatever it is called) to include the IO speedups which will help remove a barrier to adoption for serious use.
FWIW, I completely agree here.
I do hope the next release goes out as soon as possible. I use 3.0 daily and my impression is that the current version needs to be replaced as soon as possible.
That's important to note: I do not use Python 3.x productively in any way, other than trying to port a bit of a library every now and then, and I expect that many others here are in the same position. In these matters, we should give more weight to what *actual users* like Raymond think. It's a great thing that we actually got 3.0 out, and didn't stall somewhere along the way, but the next step is to make sure it gets accepted and used, and doesn't get abandoned for a long time because of policies that come from the 2.x branch but might not be healthy for 3.x. Georg
On Tue, Jan 27, 2009 at 5:04 PM, Barry Warsaw
I have no problem with removing things that were advertised and/or documented to be removed in 3.0 but accidentally were not. That seems like a reasonable policy to me. However, if we did not tell people that something was going to be removed, then I don't think we can really remove it in 3.0.
As others have said, this would technically include cmp() removal. In the 2.x docs, there are big warnings by the operator functions and a suggestion to use ABCs. We also already have a 2to3 fixer for the module. -- Regards, Benjamin
Daniel Stutzbach
Thanks, Antoine! To make comparison easier, I put together the results into a
Google Spreadsheet:http://spreadsheets.google.com/pub?key=pbqSxQEo4UXwPlifXmvPHGQ Thanks, that's much more readable indeed.
That's because in Python 3, the Text IO has to convert to Unicode, correct?
Yes, exactly. Regards Antoine.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Jan 27, 2009, at 6:21 PM, Raymond Hettinger wrote:
If something gets left in 3.0.1 and then ripped-out in 3.1, I think we're doing more harm than good. Very little code has been ported to 3.0 so far. One there is a base, all changes become more difficult.
In the interests of our users, I vote for sooner than later.
Also, 3.0 is a special case because it is IMO a broken release. AFAICT, it is not in any distro yet. Hopefully, no one will keep it around and it will vanish silently.
I stand by my opinion about the right way to do this. I also think that a 3.1 release 6 months after 3.0 is perfectly fine and serves our users just as well. Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSX+fNnEjvBPtnXfVAQJO1QQAmRVH0tslNfRfpQsC+2jlJu5uljOVvuvN uE3/HFktxLUr6NPdOk+Ir1r2p4mQ5iXFlZbJvOSNckM3UYSFkeKmS/T0nVJzqx89 +23sv7UC2Qf8zJRJBEhzuePT1iAE8OybRH1Vxql9ka8FVzCrZHt2JhnRZUmHNblT Y2d92iL7eqE= =Qzdr -----END PGP SIGNATURE-----
On Tue, Jan 27, 2009 at 5:44 PM, Antoine Pitrou
Daniel Stutzbach
writes: That's because in Python 3, the Text IO has to convert to Unicode, correct?
Yes, exactly.
What kind of input are you using for the Text tests? I'm kind of surprised that the conversion to Unicode results in such a dramatic slowdown, if you're feeding it plain text (characters 0x00 through 0x7f). -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com
Daniel Stutzbach
What kind of input are you using for the Text tests? I'm kind of surprised
that the conversion to Unicode results in such a dramatic slowdown, if you're feeding it plain text (characters 0x00 through 0x7f). It's some arbitrary text composed of 95% ASCII characters and 5% non-ASCII. On this specific example, utf8 decodes at around 250 MB/s, latin1 at almost 1 GB/s (on the same machine on which I ran the benchmarks). You can find the test here: http://svn.python.org/view/sandbox/trunk/iobench/
On Tue, Jan 27, 2009 at 6:15 PM, Antoine Pitrou
It's some arbitrary text composed of 95% ASCII characters and 5% non-ASCII. On this specific example, utf8 decodes at around 250 MB/s, latin1 at almost 1 GB/s (on the same machine on which I ran the benchmarks).
For the "10MB whole contents at once" test, we then have: (assuming the code does no pipelining of disk I/O with decoding) 10MB / 980MB/s to read from disk = 10 ms 10MB / 250MB/s to decode to utf8 = 40 ms 10MB / (10ms + 40ms) = 200 MB/s In practice, your results shows around 90 MB/s. That's at least vaguely in the same ballpark. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com
Daniel Stutzbach
For the "10MB whole contents at once" test, we then have: (assuming the code does no pipelining of disk I/O with decoding)
10MB / 980MB/s to read from disk = 10 ms 10MB / 250MB/s to decode to utf8 = 40 ms 10MB / (10ms + 40ms) = 200 MB/s
In practice, your results shows around 90 MB/s. That's at least vaguely in the same ballpark.
Yes, the remaining CPU time is spent in the IncrementalNewlineDecoder (which does universal newline translation). Antoine.
Barry Warsaw wrote:
On Jan 27, 2009, at 6:21 PM, Raymond Hettinger wrote:
If something gets left in 3.0.1 and then ripped-out in 3.1, I think we're doing more harm than good. Very little code has been ported to 3.0 so far. One there is a base, all changes become more difficult.
In the interests of our users, I vote for sooner than later.
Also, 3.0 is a special case because it is IMO a broken release. AFAICT, it is not in any distro yet. Hopefully, no one will keep it around and it will vanish silently.
I stand by my opinion about the right way to do this. I also think that a 3.1 release 6 months after 3.0 is perfectly fine and serves our users just as well.
+1 regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/
Benjamin Peterson a écrit :
There are also several IO bugs that should be fixed before it becomes official like #5006.
I looked at this one, but I discovered another a bug with f.tell(): it's now issue #5008. This issue is now closed, that I will look again to #5006. See also #5016 (f.seekable() bug). Victor
On 27 Jan 2009, at 23:56, Barry Warsaw wrote:
Also, 3.0 is a special case because it is IMO a broken release. AFAICT, it is not in any distro yet. Hopefully, no one will keep it around and it will vanish silently.
I stand by my opinion about the right way to do this. I also think that a 3.1 release 6 months after 3.0 is perfectly fine and serves our users just as well.
I'm lurking here, as I usually have nothing to contribute, but here's my take on this: <user> I'm generally a Python 2.4 user, but have recently been able to tinker in 2.6. I hope to be using 2.6 as my main language within a year. I anticipate dropping all 2.4 projects within 5 years. We have not yet dropped 2.3. I didn't know 3.0 is considered a broken release, but teething troubles are to be expected. Knowing this, I would be reluctant to use 3.0.1, it sounds like too small a change. If you put a lot of things into a minor point release you risk setting expectations about future ones. From the 2.x series I 2.x.{y,y+1) to be seemless, but 2. {x,x+1} to be more performant, include new features and potentially break comlpex code. I personally would see a 3.1 with C based IO support as being more sensible than a 3.0.1 with lots of changes. I wouldn't worry about 3.x being seen as a dead duck, as you say it's not in wide use yet. We trust you guys, if there's been big fixes there should be a big version update. Broadcast what's been made better and it'll encourage us to try it. </user> Matt
[Matthew Wilkes]
I didn't know 3.0 is considered a broken release, but teething troubles are to be expected. Knowing this, I would be reluctant to use 3.0.1, it sounds like too small a change.
Not to worry. Many of the major language features are stable and many of the rough edges are quickly getting ironed-out. Over time, anything that's slow will get optimized and all will be well. What we're discussing are subtlies of major vs minor releases. When the tp_compare change goes in, will it affect third-party C extensions enough to warrant a 3.1 name instead of 3.0.1? Are users better served by removing operator.isSequenceType() in 3.0.1 while there are still few early adopers and few converted third-party modules or will we help them more by warning them in advance and waiting for 3.1. The nice thing about the IO speedups is that the API is already set and won't change. So, the speedup doesn't really affect whether the release gets named 3.0.1 or 3.1. The important part is that we get it out as soon as it's solid so that we don't preclude adoption by users who need fast IO. Raymond
Steve Holden wrote:
Barry Warsaw wrote: [...]
I stand by my opinion about the right way to do this. I also think that a 3.1 release 6 months after 3.0 is perfectly fine and serves our users just as well.
+1
I should have been more explicit. I think that stuff that was slated for removal in 3.0 should be removed as soon as possible, and a micro release is fine for that. ISTM that if we really cared about our users we would have got this right before we released 3.0. Since we clearly didn't, it behooves us make sure that any 3.1 release isn't a( repeat performance. There are changes that should clearly have been made before 3.0 saw the light of day, which are now being discussed for incorporation. If those changes were *supposed* to be made before 3.0 came out then they should be made as soon as possible. Waiting for a major release only encourages people to use them, and once they get use further changes will be seen as introducing incompatibilities that we have promised would not occur. So it seems that the operator functions should stand not on the order of their going, but depart. While a quick 3.1 release might look like the best compromise for now, it cannot then be followed with a quick 3.2 release, and then we are in the territory Martin warned about. Quality is crucial after a poor initial release: we have to engender confidence in the user base that we are not dicking them around with ill-thought-out changes. So on balance I think it might be better to live with the known inadequacies of 3.0, making small changes for 3.0.1 and possibly ignoring the policy that says we don't remove features in point releases (since they apparently should have been taken out of 3.0 but weren't). But this is only going to work if the quality of 3.1 is considerably higher than 3.0, making it worth the wait. I think that both 3.0 and 2.6 were rushed releases. 2.6 showed it in the inclusion (later recognizable as somewhat ill-advised so late in the day) of multiprocessing; 3.0 shows it in the very fact that this discussion has become necessary. So we face an important turning point: is 3.1 going to be serious production quality or not? Given that we have just been presented with a fabulous resource that could help improve delivered quality (I am talking about snakebite.org, of course) we might be well-advised to use the 3.1 release as a demonstration of how much it is going to improve the quality of delivered releases. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/
Raymond Hettinger wrote:
[Antoine Pitrou]
Now here are some performance figures. Text I/O is done in utf-8 with universal newlines enabled:
That's a substantial boost. How does it compare to Py2.x equivalents?
Comparison of three cases (including performance rations): MB/S MB/S MB/S in C in py3k in 2.7 C/3k 2.7/3k ** Binary append ** 10M write 1e6 units at a time 1529.00 728.000 1523.000 2.10 2.09 20K write one unit at a time 0.668 0.150 0.887 4.45 5.91 400K write 20 units at a time 12.200 2.880 15.800 4.24 5.49 400K write 4096 units at a time 722.00 346.000 1071.000 2.09 3.10 ** Binary input ** 10M read whole contents at once 980.00 274.000 966.000 3.58 3.53 20K read whole contents at once 924.00 443.000 1145.000 2.09 2.58 400K alternate read & seek 1000 units 490.000 81.200 563.000 6.03 6.93 400K alternate read & seek one unit 1.330 0.082 1.11 16.20 13.52 400K read 20 units at a time 27.200 3.440 29.200 7.91 8.49 400K read 4096 units at a time 845.00 246.000 1038.000 3.43 4.22 400K read one unit at a time 1.64 0.174 1.480 9.43 8.51 400K read whole contents at once 883.00 216.000 891.000 4.09 4.13 400K seek forward 1000 units a time 516.00 182.000 568.000 2.84 3.12 400K seek forward one unit at a time 0.528 0.188 0.893 2.81 4.75 ** Binary overwrite ** 20K modify one unit at a time 0.677 0.123 0.867 5.50 7.05 400K alternate read & write 1000 unit 276.000 41.100 153.000 6.72 3.72 400K alternate read & write one unit 0.827 0.045 0.22 18.46 4.93 400K alternate write & seek 1000 unit 173.000 71.400 151.000 2.42 2.11 400K alternate write & seek one unit 0.212 0.082 0.237 2.60 2.90 400K modify 20 units at a time 12.100 2.340 15.300 5.17 6.54 400K modify 4096 units at a time 382.00 213.000 446.000 1.79 2.09 ** Text append ** 10M write 1e6 units at a time 261.00 218.000 1540.000 1.20 7.06 20K write one unit at a time 0.983 0.081 1.33 12.08 16.34 400K write 20 units at a time 16.000 1.510 22.90 10.60 15.17 400K write 4096 units at a time 236.00 118.000 1244.000 2.00 10.54 ** Text input ** 10M read whole contents at once 89.700 68.700 966.000 1.31 14.06 20K read whole contents at once 108.000 70.500 1196.000 1.53 16.96 400K read 20 units at a time 29.200 3.800 28.400 7.68 7.47 400K read 4096 units at a time 97.400 34.900 1060.000 2.79 30.37 400K read one line at a time 71.700 3.690 207.00 19.43 56.10 400K read one unit at a time 2.280 0.218 1.41 10.46 6.47 400K read whole contents at once 112.000 81.000 841.000 1.38 10.38 400K seek forward 1000 units at a time 87.400 67.300 589.000 1.30 8.75 400K seek forward one unit at a time 0.090 0.071 0.873 1.28 12.31 ** Text overwrite ** 20K modify one unit at a time 0.296 0.072 1.320 4.09 18.26 400K modify 20 units at a time 5.690 1.360 22.500 4.18 16.54 400K modify 4096 units at a time 151.000 88.300 509.000 1.71 5.76 --Scott David Daniels Scott.Daniels@Acm.Org
-On [20090128 00:21], Raymond Hettinger (python@rcn.com) wrote:
Also, 3.0 is a special case because it is IMO a broken release. AFAICT, it is not in any distro yet. Hopefully, no one will keep it around and it will vanish silently.
It is in FreeBSD's ports since December. Fairly good chance it is in pkgsrc
also by now. Might even be that it is part of FreeBSD's 7.1-RELEASE.
So I reckon with 'distro' you were speaking of Linux only?
--
Jeroen Ruigrok van der Werven
-On [20090128 00:57], Barry Warsaw (barry@python.org) wrote:
I stand by my opinion about the right way to do this. I also think that a 3.1 release 6 months after 3.0 is perfectly fine and serves our users just as well.
When API fixes were mentioned, does that mean changes in the API which
influence the C extension? If so, then I think a minor number update (3.1)
is more warranted than a revision number update (3.0.1).
--
Jeroen Ruigrok van der Werven
[Scott David Daniels]
Comparison of three cases (including performance rations): MB/S MB/S MB/S in C in py3k in 2.7 C/3k 2.7/3k ** Text append ** 10M write 1e6 units at a time 261.00 218.000 1540.000 1.20 7.06 20K write one unit at a time 0.983 0.081 1.33 12.08 16.34 400K write 20 units at a time 16.000 1.510 22.90 10.60 15.17 400K write 4096 units at a time 236.00 118.000 1244.000 2.00 10.54
Do you know why the text-appends fell off so much in the 1st and last cases?
** Text input ** 10M read whole contents at once 89.700 68.700 966.000 1.31 14.06 20K read whole contents at once 108.000 70.500 1196.000 1.53 16.96 ... 400K read one line at a time 71.700 3.690 207.00 19.43 56.10 ... 400K read whole contents at once 112.000 81.000 841.000 1.38 10.38 400K seek forward 1000 units at a time 87.400 67.300 589.000 1.30 8.75 400K seek forward one unit at a time 0.090 0.071 0.873 1.28 12.31
Looks like most of these still have substantial falloffs in performance. Is this part still a work in progress or is this as good as its going to get?
** Text overwrite ** 20K modify one unit at a time 0.296 0.072 1.320 4.09 18.26 400K modify 20 units at a time 5.690 1.360 22.500 4.18 16.54 400K modify 4096 units at a time 151.000 88.300 509.000 1.71 5.76
Same question on this batch. Raymond
Hello,
Raymond Hettinger
MB/S MB/S MB/S in C in py3k in 2.7 C/3k 2.7/3k ** Text append ** 10M write 1e6 units at a time 261.00 218.000 1540.000 1.20 7.06 20K write one unit at a time 0.983 0.081 1.33 12.08 16.34 400K write 20 units at a time 16.000 1.510 22.90 10.60 15.17 400K write 4096 units at a time 236.00 118.000 1244.000 2.00 10.54
Do you know why the text-appends fell off so much in the 1st and last cases?
When writing large chunks of text (4096, 1e6), bookkeeping costs become marginal and encoding costs dominate. 2.x has no encoding costs, which explains why it's so much faster. A quick test tells me utf-8 encoding runs at 280 MB/s. on this dataset (the 400KB text file). You see that there is not much left to optimize on large writes.
** Text input ** 10M read whole contents at once 89.700 68.700 966.000 1.31 14.06 20K read whole contents at once 108.000 70.500 1196.000 1.53 16.96 ... 400K read one line at a time 71.700 3.690 207.00 19.43 56.10 ... 400K read whole contents at once 112.000 81.000 841.000 1.38 10.38 400K seek forward 1000 units at a time 87.400 67.300 589.000 1.30 8.75 400K seek forward one unit at a time 0.090 0.071 0.873 1.28 12.31
Looks like most of these still have substantial falloffs in performance. Is this part still a work in progress or is this as good as its going to get?
There is nothing left obvious to optimize in the read() department. Decoding and newline translation costs dominate. Decoding has already been optimized for the most popular encodings in py3k: http://mail.python.org/pipermail/python-checkins/2009-January/077024.html Newline translation follows a fast path depending on various heuristics. I also took particular care of the "read one line at a time" scenario because it's the most likely idiom when reading a text file. I think there is hardly anything left to optimize on this one. Your eyes are welcome, though. Note that the benchmark is run with the following default settings for text I/O: utf-8 encoding, universal newlines enabled, text containing only "\n" newlines. You can play with settings here: http://svn.python.org/view/sandbox/trunk/iobench/ Text seek() and tell(), on the other hand, is known to be slow, and it could perhaps be improved. It is assumed, however, that they won't be used a lot for text files.
** Text overwrite ** 20K modify one unit at a time 0.296 0.072 1.320 4.09 18.26 400K modify 20 units at a time 5.690 1.360 22.500 4.18 16.54 400K modify 4096 units at a time 151.000 88.300 509.000 1.71 5.76
Same question on this batch.
There seems to be some additional overhead in this case. Perhaps it could be improved, I'll have to take a look... But I doubt overwriting chunks of text is a common scenario. Regards Antoine.
2009/1/28 Antoine Pitrou
When writing large chunks of text (4096, 1e6), bookkeeping costs become marginal and encoding costs dominate. 2.x has no encoding costs, which explains why it's so much faster.
Interesting. However, it's still "slower" in terms of perception. In 2.x, I regularly do the equivalent of f = open("filename", "r") ... read strings from f ... Yes, I know this is byte I/O in reality, but for everything I do (Latin-1 on input and output, and for most practical purposes ASCII-only) it simply isn't relevant to me. If Python 3.x makes this substantially slower (working in a naive mode where I ignore encoding issues), claiming it's "encoding costs" doesn't make any difference - in a practical sense, I don't get any benefits and yet I pay the cost. (You can say my approach is wrong, but so what? I'll just say that 2.x is faster for me, and not migrate. Ultimately, this is about "marketing" 3.x...) It would be helpful to limit this cost as much as possible - maybe that's simply ensuring that the default encoding for open is (in the majority of cases) a highly-optimised one whose costs *don't* dominate in the way you describe (although if you're using UTF-8, I'd guess that would be the usual default on Linux, so it looks like there's some work needed there). Hmm, I just checked and on Windows, it appears that sys.getdefaultencoding() is UTF-8. That seems odd - I would have thought the majority of Windows systems were NOT set to use UTF-8 by default... Paul.
Le Wednesday 28 January 2009 11:55:16 Antoine Pitrou, vous avez écrit :
2.x has no encoding costs, which explains why it's so much faster.
Why not testing io.open() or codecs.open() which create unicode strings? -- Victor Stinner aka haypo http://www.haypocalc.com/blog/
Paul Moore
It would be helpful to limit this cost as much as possible - maybe that's simply ensuring that the default encoding for open is (in the majority of cases) a highly-optimised one whose costs *don't* dominate in the way you describe
As I pointed out, utf-8, utf-16 and latin1 decoders have already been optimized in py3k. For *pure ASCII* input, utf-8 decoding is blazingly fast (1GB/s here). The dataset for iobench isn't pure ASCII though, and that's why it's not as fast. People are invited to test their own workloads with the io-c branch and report performance figures (and possible bugs). There are so many possibilities that the benchmark figures given by a generic tool can only be indicative. Regards Antoine.
Victor Stinner
Le Wednesday 28 January 2009 11:55:16 Antoine Pitrou, vous avez écrit :
2.x has no encoding costs, which explains why it's so much faster.
Why not testing io.open() or codecs.open() which create unicode strings?
The goal is to test the idiomatic way of opening text files (the "one obvious way to do it", if you want). There is no doubt that io.open() and codecs.open() in 2.x are much slower than the io-c branch. However, nobody is expecting very good performance from io.open() and codecs.open() in 2.x either. Regards Antoine.
2009/1/28 Antoine Pitrou
Paul Moore
writes: It would be helpful to limit this cost as much as possible - maybe that's simply ensuring that the default encoding for open is (in the majority of cases) a highly-optimised one whose costs *don't* dominate in the way you describe
As I pointed out, utf-8, utf-16 and latin1 decoders have already been optimized in py3k. For *pure ASCII* input, utf-8 decoding is blazingly fast (1GB/s here). The dataset for iobench isn't pure ASCII though, and that's why it's not as fast.
Ah, thanks. Although you said your data was 95% ASCII, and you're getting decode speeds of 250MB/s. That's 75% slowdown for 5% of the data! Surely that's not right???
People are invited to test their own workloads with the io-c branch and report performance figures (and possible bugs). There are so many possibilities that the benchmark figures given by a generic tool can only be indicative.
At the moment, I don't have the time to download and build the branch, and in any case as I only have Visual Studio Express, I don't get the PGO optimisations, making any tests I do highly suspect. Paul. PS Can anyone comment on why Python defaults to utf-8 on Windows? That seems like a highly suspect default...
Le Wednesday 28 January 2009 12:41:07 Antoine Pitrou, vous avez écrit :
Why not testing io.open() or codecs.open() which create unicode strings?
There is no doubt that io.open() and codecs.open() in 2.x are much slower than the io-c branch. However, nobody is expecting very good performance from io.open() and codecs.open() in 2.x either.
I use codecs.open() in my programs and so I'm interested by the benchmark on this function ;-) But if I understand correctly, Python (3.1 ?) will be faster (or much faster) to read/write files in unicode, and that's a great news ;-) -- Victor Stinner aka haypo http://www.haypocalc.com/blog/
On Wed, Jan 28, 2009 at 4:32 AM, Steve Holden
I think that both 3.0 and 2.6 were rushed releases. 2.6 showed it in the inclusion (later recognizable as somewhat ill-advised so late in the day) of multiprocessing; 3.0 shows it in the very fact that this discussion has become necessary.
What about some kine of mechanism to "triage" 3rd party modules? Something like: module gains popularity -> the core team decides it's worthy -> the module is included in the library in some kind of "contrib"/"ext" package (like the future mechanism) and for one major release stays in that package (so developers don't have to rush fixing _all_ the bugs they can while making a major release) -> after (at least) one major release the module moves up one level and it's considered stable and rock solid. Meanwhile the documentation must say that the 3rd party contributed module is not considered production ready, though usable, until the release current + 1 I don't know if it feasible, if it's insane or what, it's just an idea I had. -- Lawrence, http://oluyede.org - http://twitter.com/lawrenceoluyede "It is difficult to get a man to understand something when his salary depends on not understanding it" - Upton Sinclair
Paul Moore
As I pointed out, utf-8, utf-16 and latin1 decoders have already been
optimized
in py3k. For *pure ASCII* input, utf-8 decoding is blazingly fast (1GB/s here). The dataset for iobench isn't pure ASCII though, and that's why it's not as fast.
Ah, thanks. Although you said your data was 95% ASCII, and you're getting decode speeds of 250MB/s. That's 75% slowdown for 5% of the data! Surely that's not right???
If you look at how utf-8 decoding is implemented (in unicodeobject.c), it's quite obvious why it is so :-) There is a (very) fast path for chunks of pure ASCII data, and (fast but not blazingly fast) fallback for non ASCII data. Please don't think of it as a slowdown... It's still much faster than 2.x, which manages 130MB/s on the same data. Regards Antoine.
2009/1/28 Antoine Pitrou
If you look at how utf-8 decoding is implemented (in unicodeobject.c), it's quite obvious why it is so :-) There is a (very) fast path for chunks of pure ASCII data, and (fast but not blazingly fast) fallback for non ASCII data.
Thanks for the explanation.
Please don't think of it as a slowdown... It's still much faster than 2.x, which manages 130MB/s on the same data.
Don't get me wrong - I'm hugely grateful for this work. And personally, I don't expect that I/O speed is ever likely to be a real bottleneck in the type of program I write. But I'm concerned that (much as with the whole "Python 3.0 is incompatible, and it will be hard to port to" meme) people will pick up on raw benchmark figures - no matter how much they aren't comparing like with like - and start making it sound like "Python 3.0 I/O is slower than 2.x" - which is a great disservice to the good work that's been done. I do think it's worth taking care over the default encoding, though. Quite apart from performance, getting "correct" behaviour is important. I can't speak for Unix, but on Windows, the following behaviour feels like a bug to me:
echo a£b >a1
python Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.
print open("a1").read() a£b
^Z
print(open("a1").read()) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "D:\Apps\Python30\lib\io.py", line 1491, in write b = encoder.encode(s) File "D:\Apps\Python30\lib\encodings\cp850.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_map)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u0153' in
\Apps\Python30\python.exe Python 3.0 (r30:67507, Dec 3 2008, 20:14:27) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. position 1: character maps to <undefined>
^Z
chcp Active code page: 850
Paul.
Le mercredi 28 janvier 2009 à 16:54 +0000, Paul Moore a écrit :
I do think it's worth taking care over the default encoding, though. Quite apart from performance, getting "correct" behaviour is important. I can't speak for Unix, but on Windows, the following behaviour feels like a bug to me: [...]
Please open a bug :) cheers Antoine.
On 2009-01-27 22:19, Raymond Hettinger wrote:
From: ""Martin v. Löwis""
Releasing 3.1 6 months after 3.0 sounds reasonable; I don't think it should be released earlier (else 3.0 looks fairly ridiculous).
I think it should be released earlier and completely supplant 3.0 before more third-party developers spend time migrating code. We needed 3.0 to get released so we could get the feedback necessary to shake it out. Now, it is time for it to fade into history and take advantage of the lessons learned.
The principles for the 2.x series don't really apply here. In 2.x, there was always a useful, stable, clean release already fielded and there were tons of third-party apps that needed a slow rate of change.
In contrast, 3.0 has a near zero installed user base (at least in terms of being used in production). It has very few migrated apps. It is not particularly clean and some of the work for it was incomplete when it was released.
My preference is to drop 3.0 entirely (no incompatable bugfix release) and in early February release 3.1 as the real 3.x that migrators ought to aim for and that won't have incompatable bugfix releases. Then at PyCon, we can have a real bug day and fix-up any chips in the paint.
If 3.1 goes out right away, then it doesn't matter if 3.0 looks ridiculous. All eyes go to the latest release. Better to get this done before more people download 3.0 to kick the tires.
Why don't we just mark 3.0.x as experimental branch and keep updating/ fixing things that were not sorted out for the 3.0.0 release ?! I think that's a fair approach, given that the only way to get field testing for new open-source software is to release early and often. A 3.1 release should then be the first stable release of the 3.x series and mark the start of the usual deprecation mechanisms we have in the 2.x series. Needless to say, that rushing 3.1 out now would only cause yet another experimental release... major releases do take time to stabilize. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 28 2009)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
M.-A. Lemburg wrote:
On 2009-01-27 22:19, Raymond Hettinger wrote:
From: ""Martin v. Löwis""
Releasing 3.1 6 months after 3.0 sounds reasonable; I don't think it should be released earlier (else 3.0 looks fairly ridiculous).
I think it should be released earlier and completely supplant 3.0 before more third-party developers spend time migrating code. We needed 3.0 to get released so we could get the feedback necessary to shake it out. Now, it is time for it to fade into history and take advantage of the lessons learned.
The principles for the 2.x series don't really apply here. In 2.x, there was always a useful, stable, clean release already fielded and there were tons of third-party apps that needed a slow rate of change.
In contrast, 3.0 has a near zero installed user base (at least in terms of being used in production). It has very few migrated apps. It is not particularly clean and some of the work for it was incomplete when it was released.
My preference is to drop 3.0 entirely (no incompatable bugfix release) and in early February release 3.1 as the real 3.x that migrators ought to aim for and that won't have incompatable bugfix releases. Then at PyCon, we can have a real bug day and fix-up any chips in the paint.
If 3.1 goes out right away, then it doesn't matter if 3.0 looks ridiculous. All eyes go to the latest release. Better to get this done before more people download 3.0 to kick the tires.
Why don't we just mark 3.0.x as experimental branch and keep updating/ fixing things that were not sorted out for the 3.0.0 release ?! I think that's a fair approach, given that the only way to get field testing for new open-source software is to release early and often.
A 3.1 release should then be the first stable release of the 3.x series and mark the start of the usual deprecation mechanisms we have in the 2.x series. Needless to say, that rushing 3.1 out now would only cause yet another experimental release... major releases do take time to stabilize.
+1 I don't think we do users any favours by being cautious in removing / fixing things in the 3.0 releases. Michael Foord -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog
Paul Moore wrote:
Hmm, I just checked and on Windows, it appears that sys.getdefaultencoding() is UTF-8. That seems odd - I would have thought the majority of Windows systems were NOT set to use UTF-8 by default...
In Python 3, sys.getdefaultencoding() is "utf-8" on all platforms, just as it was "ascii" in 2.x, on all platforms. The default encoding isn't used for I/O; check f.encoding to find out what encoding is used to read the file you are reading. Regards, Martin
print(open("a1").read()) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "D:\Apps\Python30\lib\io.py", line 1491, in write b = encoder.encode(s) File "D:\Apps\Python30\lib\encodings\cp850.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_map)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u0153' in position 1: character maps to <undefined>
Looks right to me. Martin
2009/1/28 "Martin v. Löwis"
Paul Moore wrote:
Hmm, I just checked and on Windows, it appears that sys.getdefaultencoding() is UTF-8. That seems odd - I would have thought the majority of Windows systems were NOT set to use UTF-8 by default...
In Python 3, sys.getdefaultencoding() is "utf-8" on all platforms, just as it was "ascii" in 2.x, on all platforms. The default encoding isn't used for I/O; check f.encoding to find out what encoding is used to read the file you are reading.
Thanks for the explanation. It might be clearer to document this a little more explicitly in the docs for open() (on the basis that people using open() are the most likely to be naive about encodings). I'll see if I can come up with an appropriate doc patch. Paul.
2009/1/28 "Martin v. Löwis"
print(open("a1").read()) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "D:\Apps\Python30\lib\io.py", line 1491, in write b = encoder.encode(s) File "D:\Apps\Python30\lib\encodings\cp850.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_map)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u0153' in position 1: character maps to <undefined>
Looks right to me.
I don't see why. I wrote the file from the console (cp850), read it in Python using the default encoding (which I would expect to match the console encoding), wrote it to sys.stdout (which I would expect to use the console encoding). How did the character end up not being encodable, when I've only used one encoding throughout? (And if my assumptions about the encodings used are wrong at some point, that's what I'm suggesting is the error). Paul.
Thanks for the explanation. It might be clearer to document this a little more explicitly in the docs for open() (on the basis that people using open() are the most likely to be naive about encodings). I'll see if I can come up with an appropriate doc patch.
Notice that the determination of the specific encoding used is fairly elaborate: - if IO is to a terminal, Python tries to determine the encoding of the terminal. This is mostly relevant for Windows (which uses, by default, the "OEM code page" in the terminal). - if IO is to a file, Python tries to guess the "common" encoding for the system. On Unix, it queries the locale, and falls back to "ascii" if no locale is set. On Windows, it uses the "ANSI code page". On OSX, it uses the "system encoding". - if IO is binary, (clearly) no encoding is used. Network IO is always binary. - for file names, yet different algorithms apply. On Windows, it uses the Unicode API, so no need for an encoding. On Unix, it (again) uses the locale encoding. On OSX, it uses UTF-8 (just to be clear: this applies to the first argument of open(), not to the resulting file object) Regards, Martin
On Wed, Jan 28, 2009 at 10:29 AM, "Martin v. Löwis"
Notice that the determination of the specific encoding used is fairly elaborate: - if IO is to a terminal, Python tries to determine the encoding of the terminal. This is mostly relevant for Windows (which uses, by default, the "OEM code page" in the terminal). - if IO is to a file, Python tries to guess the "common" encoding for the system. On Unix, it queries the locale, and falls back to "ascii" if no locale is set. On Windows, it uses the "ANSI code page". On OSX, it uses the "system encoding". - if IO is binary, (clearly) no encoding is used. Network IO is always binary. - for file names, yet different algorithms apply. On Windows, it uses the Unicode API, so no need for an encoding. On Unix, it (again) uses the locale encoding. On OSX, it uses UTF-8 (just to be clear: this applies to the first argument of open(), not to the resulting file object)
This a very helpful explanation. Is it in the docs somewhere, or if it isn't, could it be? Steve -- I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a tiny blip on the distant coast of sanity. --- Bucky Katt, Get Fuzzy
Paul Moore wrote:
2009/1/28 "Martin v. Löwis"
: print(open("a1").read()) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "D:\Apps\Python30\lib\io.py", line 1491, in write b = encoder.encode(s) File "D:\Apps\Python30\lib\encodings\cp850.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_map)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u0153' in position 1: character maps to <undefined> Looks right to me.
I don't see why. I wrote the file from the console (cp850), read it in Python using the default encoding (which I would expect to match the console encoding), wrote it to sys.stdout (which I would expect to use the console encoding).
How did the character end up not being encodable, when I've only used one encoding throughout? (And if my assumptions about the encodings used are wrong at some point, that's what I'm suggesting is the error).
Well, first try to understand what the error *is*: py> unicodedata.name('\u0153') 'LATIN SMALL LIGATURE OE' py> unicodedata.name('£') 'POUND SIGN' py> ascii('£') "'\\xa3'" py> ascii('£'.encode('cp850').decode('cp1252')) "'\\u0153'" So when Python reads the file, it uses cp1252. This is sensible - just that the console uses cp850 doesn't change the fact that the "common" encoding of files on your system is cp1252. It is an unfortunate fact of Windows that the console window uses a different encoding from the rest of the system (namely, the console uses the OEM code page, and everything else uses the ANSI code page). Furthermore, U+0153 does not exist in cp850 (i.e. the terminal doesn't support œ), hence the exception. Regards, Martin
2009/1/28 "Martin v. Löwis"
Well, first try to understand what the error *is*:
py> unicodedata.name('\u0153') 'LATIN SMALL LIGATURE OE' py> unicodedata.name('£') 'POUND SIGN' py> ascii('£') "'\\xa3'" py> ascii('£'.encode('cp850').decode('cp1252')) "'\\u0153'"
So when Python reads the file, it uses cp1252. This is sensible - just that the console uses cp850 doesn't change the fact that the "common" encoding of files on your system is cp1252. It is an unfortunate fact of Windows that the console window uses a different encoding from the rest of the system (namely, the console uses the OEM code page, and everything else uses the ANSI code page).
Ah, I see. That is entirely obvious. The key bit of information is that the default io encoding is cp1252, not cp850. I know that in theory, I see the consequences often enough (:-)), but it isn't "instinctive" for me. And the simple "default encoding is system dependent" comment is not very helpful in terms of warning me that there could be an issue. I do think that more wording around encoding defaults would be useful - as I said, I'll think about how best it could be made into a doc patch. I suspect the best approach would be to have a section (maybe in the docs for the codecs module) explaining all the details, and then a cross-reference to that from the various places (open, io) where default encodings are mentioned. Paul.
Furthermore, U+0153 does not exist in cp850 (i.e. the terminal doesn't support œ), hence the exception.
Regards, Martin
Steven Bethard wrote:
On Wed, Jan 28, 2009 at 10:29 AM, "Martin v. Löwis"
wrote: Notice that the determination of the specific encoding used is fairly elaborate: - if IO is to a terminal, Python tries to determine the encoding of the terminal. This is mostly relevant for Windows (which uses, by default, the "OEM code page" in the terminal). - if IO is to a file, Python tries to guess the "common" encoding for the system. On Unix, it queries the locale, and falls back to "ascii" if no locale is set. On Windows, it uses the "ANSI code page". On OSX, it uses the "system encoding". - if IO is binary, (clearly) no encoding is used. Network IO is always binary. - for file names, yet different algorithms apply. On Windows, it uses the Unicode API, so no need for an encoding. On Unix, it (again) uses the locale encoding. On OSX, it uses UTF-8 (just to be clear: this applies to the first argument of open(), not to the resulting file object)
This a very helpful explanation. Is it in the docs somewhere, or if it isn't, could it be?
Here is the current entry on encodings in the Lib ref, built-in types, file objects. file.encoding The encoding that this file uses. When strings are written to a file, they will be converted to byte strings using this encoding. In addition, when the file is connected to a terminal, the attribute gives the encoding that the terminal is likely to use (that information might be incorrect if the user has misconfigured the terminal). The attribute is read-only and may not be present on all file-like objects. It may also be None, in which case the file uses the system default encoding for converting strings.
On Wed, 28 Jan 2009 18:52:41 +0000, Paul Moore
2009/1/28 "Martin v. Löwis"
: Well, first try to understand what the error *is*:
py> unicodedata.name('\u0153') 'LATIN SMALL LIGATURE OE' py> unicodedata.name('£') 'POUND SIGN' py> ascii('£') "'\\xa3'" py> ascii('£'.encode('cp850').decode('cp1252')) "'\\u0153'"
So when Python reads the file, it uses cp1252. This is sensible - just that the console uses cp850 doesn't change the fact that the "common" encoding of files on your system is cp1252. It is an unfortunate fact of Windows that the console window uses a different encoding from the rest of the system (namely, the console uses the OEM code page, and everything else uses the ANSI code page).
Ah, I see. That is entirely obvious. The key bit of information is that the default io encoding is cp1252, not cp850. I know that in theory, I see the consequences often enough (:-)), but it isn't "instinctive" for me. And the simple "default encoding is system dependent" comment is not very helpful in terms of warning me that there could be an issue.
It probably didn't help that the exception raised told you that the error was in the "charmap" codec. This should have said "cp850" instead. The fact that cp850 is implemented in terms of "charmap" isn't very interesting. The fact that while encoding some text using "cp850" is. Jean-Paul
Michael Foord wrote:
M.-A. Lemburg wrote:
Why don't we just mark 3.0.x as experimental branch and keep updating/ fixing things that were not sorted out for the 3.0.0 release ?! I think that's a fair approach, given that the only way to get field testing for new open-source software is to release early and often.
A 3.1 release should then be the first stable release of the 3.x series and mark the start of the usual deprecation mechanisms we have in the 2.x series. Needless to say, that rushing 3.1 out now would only cause yet another experimental release... major releases do take time to stabilize.
+1
I don't think we do users any favours by being cautious in removing / fixing things in the 3.0 releases.
I have two main reactions to 3.0. 1. It is great for my purpose -- coding algorithms. The cleaner object and text models are a mental relief for me. So it was a service to me to release it. I look forward to it becoming standard Python and have made my small contribution by helping clean up the 3.0 version of the docs. 2. It is something of a trial run that it should be fixed as soon as possible. I seem to remember sometning from Shakespear(?) "If it twer done, tis best it twer done quickly". Guido said something over a year ago to the effect that he did not expect 3.0 to be used as a production release, so I do not think it should to treated as one. Label it developmental and people will not try to keep in use for years and years in the way that, say, 2.4 still is. tjr
On Wed, Jan 28, 2009 at 11:52 AM, Paul Moore
Ah, I see. That is entirely obvious. The key bit of information is that the default io encoding is cp1252, not cp850. I know that in theory, I see the consequences often enough (:-)), but it isn't "instinctive" for me. And the simple "default encoding is system dependent" comment is not very helpful in terms of warning me that there could be an issue.
I do think that more wording around encoding defaults would be useful - as I said, I'll think about how best it could be made into a doc patch. I suspect the best approach would be to have a section (maybe in the docs for the codecs module) explaining all the details, and then a cross-reference to that from the various places (open, io) where default encodings are mentioned.
It'd also help if the file repr gave the encoding:
f = open('/dev/null') f
import sys sys.stdout
Of course I can check .encoding manually, but it needs to be more intuitive. -- Adam Olsen, aka Rhamphoryncus
On Wed, Jan 28, 2009 at 1:42 PM, Adam Olsen
It'd also help if the file repr gave the encoding:
+1 -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com
Terry Reedy wrote:
Michael Foord wrote:
M.-A. Lemburg wrote:
Why don't we just mark 3.0.x as experimental branch and keep updating/ fixing things that were not sorted out for the 3.0.0 release ?! I think that's a fair approach, given that the only way to get field testing for new open-source software is to release early and often.
A 3.1 release should then be the first stable release of the 3.x series and mark the start of the usual deprecation mechanisms we have in the 2.x series. Needless to say, that rushing 3.1 out now would only cause yet another experimental release... major releases do take time to stabilize.
+1
I don't think we do users any favours by being cautious in removing / fixing things in the 3.0 releases.
I have two main reactions to 3.0.
1. It is great for my purpose -- coding algorithms. The cleaner object and text models are a mental relief for me. So it was a service to me to release it. I look forward to it becoming standard Python and have made my small contribution by helping clean up the 3.0 version of the docs.
2. It is something of a trial run that it should be fixed as soon as possible. I seem to remember sometning from Shakespear(?) "If it twer done, tis best it twer done quickly".
Guido said something over a year ago to the effect that he did not expect 3.0 to be used as a production release, so I do not think it should to treated as one. Label it developmental and people will not try to keep in use for years and years in the way that, say, 2.4 still is.
It might also be a good idea to take the download link off the front page of python.org: until that happens newbies are going to keep coming along and downloading it "because it's the newest". regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/
Steve Holden wrote:
2.6 showed it in the inclusion (later recognizable as somewhat ill-advised so late in the day) of multiprocessing;
Given the longstanding fork() bugs that were fixed as a result of that inclusion, I think that ill-advised is too strong... could it have done with a little more time to bed down multiprocessing in particular? Possibly. Was it worth holding up the whole release just for that? I don't think so - we'd already fixed up the problems that the test suite and python-dev were likely to find, so the cost/benefit ratio on a delay would have been pretty poor. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
It might also be a good idea to take the download link off the front page of python.org: until that happens newbies are going to keep coming along and downloading it "because it's the newest".
It was (and probably still is) Guido's position that 3.0 *is* the version that newbies should be using. Regards, Martin
2009/1/28 Raymond Hettinger
[Adam Olsen]
It'd also help if the file repr gave the encoding:
+1 from me too. That will be a big help.
Definitely. People *are* going to get confused by encoding errors - let's give them all the help we can. Paul
"Martin v. Löwis" writes:
It might also be a good idea to take the download link off the front page of python.org: until that happens newbies are going to keep coming along and downloading it "because it's the newest".
It was (and probably still is) Guido's position that 3.0 *is* the version that newbies should be using.
Indeed. See Terry Reedy's post. Somebody who is looking for a platform for a production application is not going to download something "because it's the newest". Sure, those advocating other platforms will carp about Python 3.0, but hey, where is Perl 6? "The amazing thing about a dancing bear is *not* how well it dances." Let's not get too worried about the PR aspects; just fixing the bugs as we go along will fix that to the extent that people are not totally prejudiced anyway. I think there is definitely something to the notion that the 3.x vs. 3.0.y distinction should signal something, and I personally like MAL's suggestion that 3.0.x should be marked some sort of beta in perpetuity, or at least until 3.1 is ready to ship as stable and production-ready. (That's AIUI, MAL's intent may be somewhat different.)
Stephen J. Turnbull wrote:
"Martin v. Löwis" writes:
It might also be a good idea to take the download link off the front page of python.org: until that happens newbies are going to keep coming along and downloading it "because it's the newest".
By that logic, I would suggest removing 2.6 ;-) See below.
It was (and probably still is) Guido's position that 3.0 *is* the version that newbies should be using.
Indeed. See Terry Reedy's post.
When people ask on c.l.p, I recommend either 3.0 for the relative cleanliness or 2.5 (until now, at least) for the 3rd-party add-on availability (that will gradually improve for both 2.6 and more slowly, for 3.x). I expect that some newbies would find 2.6 a somewhat confusing mix of old and new. tjr
Terry Reedy wrote:
Stephen J. Turnbull wrote:
"Martin v. Löwis" writes:
It might also be a good idea to take the download link off the front page of python.org: until that happens newbies are going to keep coming along and downloading it "because it's the newest".
By that logic, I would suggest removing 2.6 ;-) See below.
It was (and probably still is) Guido's position that 3.0 *is* the version that newbies should be using.
Indeed. See Terry Reedy's post.
When people ask on c.l.p, I recommend either 3.0 for the relative cleanliness or 2.5 (until now, at least) for the 3rd-party add-on availability (that will gradually improve for both 2.6 and more slowly, for 3.x). I expect that some newbies would find 2.6 a somewhat confusing mix of old and new.
Fair point. At least we both agree that the current site doesn't best serve the punters. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/
On 2009-01-29 01:59, Stephen J. Turnbull wrote:
I think there is definitely something to the notion that the 3.x vs. 3.0.y distinction should signal something, and I personally like MAL's suggestion that 3.0.x should be marked some sort of beta in perpetuity, or at least until 3.1 is ready to ship as stable and production-ready. (That's AIUI, MAL's intent may be somewhat different.)
That's basically it, yes. I don't think that marking 3.0 as experimental is bad in any way, as long as we're clear about it. Having lots of incompatible changes in a patch level release will start to get users worrying about the stability of the 3.x branch anyway, so a heads-up message and clear perspective for the 3.1 release is a lot better than dumping 3.0 altogether or not providing such a perspective at all. That said, we should stick to the statement already made for 3.0 (too early as it now appears), ie. that the same development and releases processes will apply to the 3.x branch as we have for 2.x - starting with 3.1. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 29 2009)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
On Wed, Jan 28, 2009, M.-A. Lemburg wrote:
Why don't we just mark 3.0.x as experimental branch and keep updating/ fixing things that were not sorted out for the 3.0.0 release ?! I think that's a fair approach, given that the only way to get field testing for new open-source software is to release early and often.
A 3.1 release should then be the first stable release of the 3.x series and mark the start of the usual deprecation mechanisms we have in the 2.x series. Needless to say, that rushing 3.1 out now would only cause yet another experimental release... major releases do take time to stabilize.
Speaking as the original author of PEP6 (Bug Fix Releases), this sounds like a reasonable middle ground. I certainly advocate that nobody consider Python 3.0 for production software, and enshrining that into the dev process should work well. At the same time, I think each individual change that doesn't clearly fall into the PEP6 process of being a bugfix needs to be vetted beyond what's permitted for not-yet-released versions. The problem is that the obvious candidate for doing the vetting is the Release Manager, and Barry doesn't like this approach. The vetting does need to be handled by a core committer IMO -- MAL, are you volunteering? Anyone else? Barry, are you actively opposed to marking 3.0.x as experimental, or do you just dislike it? (I.e. are you -1 or -0?) -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Weinberg's Second Law: If builders built buildings the way programmers wrote programs, then the first woodpecker that came along would destroy civilization.
[Aahz]
At the same time, I think each individual change that doesn't clearly fall into the PEP6 process of being a bugfix needs to be vetted beyond what's permitted for not-yet-released versions.
To get the ball rolling, I have a candidate for discussion. Very late in the 3.0 process (after feature freeze), the bsddb code was ripped out (good riddance). This had the unfortunate side-effect of crippling shelves which now fall back to using dumbdbm. I'm somewhat working on an alternate dbm based on sqlite3: http://code.activestate.com/recipes/576638/ It is a pure python module and probably will not be used directly, but shelves will see an immediate benefit (especially for large shelves) in terms of speed and space. On the one hand, it is an API change or new feature because people can (if they choose) access the dbm directly. OTOH, it is basically a performance fix for shelves whose API won't change at all. The part that is visible and incompatible is that 3.0.1 shelves won't be readable by 3.0.0.
The problem is that the obvious candidate for doing the vetting is the Release Manager, and Barry doesn't like this approach. The vetting does need to be handled by a core committer IMO -- MAL, are you volunteering? Anyone else?
It should be someone who is using 3.0 regularly (ideally someone who is working on fixing it). IMO, people who aren't exercising it don't really have a feel for the problems or the cost/benefits of the fixes.
Barry, are you actively opposed to marking 3.0.x as experimental, or do you just dislike it? (I.e. are you -1 or -0?)
My preference is to *not* mark it as experimental. Instead, I prefer doing what it takes to make the 3.0.x series viable. Raymond
On Thu, Jan 29, 2009 at 3:27 PM, Raymond Hettinger
To get the ball rolling, I have a candidate for discussion.
Very late in the 3.0 process (after feature freeze), the bsddb code was ripped out (good riddance). This had the unfortunate side-effect of crippling shelves which now fall back to using dumbdbm.
I'm somewhat working on an alternate dbm based on sqlite3: http://code.activestate.com/recipes/576638/ It is a pure python module and probably will not be used directly, but shelves will see an immediate benefit (especially for large shelves) in terms of speed and space.
On the one hand, it is an API change or new feature because people can (if they choose) access the dbm directly. OTOH, it is basically a performance fix for shelves whose API won't change at all. The part that is visible and incompatible is that 3.0.1 shelves won't be readable by 3.0.0.
That is too much for 3.0.1. It could affect external file formats which strikes me as a bad idea. Sounds like a good candidate for 3.1, which we should be expecting in 4-6 months I hope. Also you could try find shelve users (are there any?) and recommend they install this as a 3rd party package, with the expectation it'll be built into 3.1. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
On the one hand, it is an API change or new feature because people can (if they choose) access the dbm directly. OTOH, it is basically a performance fix for shelves whose API won't change at all. The part that is visible and incompatible is that 3.0.1 shelves won't be readable by 3.0.0.
That is too much for 3.0.1. It could affect external file formats which strikes me as a bad idea.
We should have insisted that bsddb not be taken out until a replacement was put in. The process was broken with the RM insisting on feature freeze early in the game but letting tools like bsddb get ripped-out near the end. IMO, it was foolish to do one without the other. After the second alphas was out, there was resistance to any additions or to revisiting any of the early changes -- that was probably as mistake -- now we're deferring the fix for another 4-6 months and 3.0.x will never have it (at least right out of the box, as shipped).
Also you could try find shelve users (are there any?)
I'm a big fan of shelves and have always used them extensively. Not sure if anyone else cares about them though.
recommend they install this as a 3rd party package, with the expectation it'll be built into 3.1.
Will do. That was my original plan since the day bsddb got ripped out. Raymond
A couple additional thoughts FWIW: * whichdb() selects from multiple file formats, so 3.0.1 would still be able to read 3.0.0 files. It is the 2.x shelves that won't be readable at all under any scenario. * If you're thinking that shelves have very few users and that 3.0.0 has had few adopters, doesn't that mitigate the effects of making a better format available in 3.0.1? Wouldn't this be the time to do it? * The file format itself is not new or unreadable by 3.0.0. It is just a plain sqlite3 file. Was is new is the ability of shelve's to call sqlite. To me, that is something a little different than changing a pickle protocol or somesuch. Raymond
Raymond Hettinger
* If you're thinking that shelves have very few users and that 3.0.0 has had few adopters, doesn't that mitigate the effects of making a better format available in 3.0.1? Wouldn't this be the time to do it?
There was already another proposal for an sqlite-based dbm module, you may want to synchronize with it: http://bugs.python.org/issue3783 As I see it, the problem with introducing it in 3.0.1 is that we would be rushing in a new piece of code without much review or polish. Also, there are only two release blockers left for 3.0.1, so we might just finish those and release, then concentrate on 3.1. Regards Antoine.
Raymond Hettinger writes:
My preference is to *not* mark it as experimental.
Don't take the word "experimental" too seriously. It's clearly an exaggeration given the current state of 3.0.x. What is meant is an explicit announcement that the stability rules chosen in response to the bool-True-False brouhaha will be relaxed for the 3.0.x series *only*.
Instead, I prefer doing what it takes to make the 3.0.x series viable.
That's not an "instead", that's two independent choices. The point is that most of the people who are voicing concerns fear precisely that policy. I think that the important question is "can the 3.0.x series be made 'viable' in less than the time frame for 3.1?" If not, I really have to think it's DOA from the point of view of folks who consider 3.0.0 non-viable. I think that's what Barry and Martin are saying. Guido is saying something different. AIUI, he's saying that explicitly introducing controlled instability into 3.0.x of the form "this is what the extremely stable non-buggy inherited-from-3.0 core of 3.1 is going to look like" will be a great service to those who consider 3.0.0 non-viable. The key point is that new features in 3.1 are generally going to be considered less reliable than those inherited from 3.0, and thus a debugged 3.0, even if the implementations have been unstable, provides a way for the very demanding to determine what that set is, and to test how it behaves in their applications. I think it's worth a try, after consultation with some of the major developers who are the ostensible beneficiaries. But if tried, I think it's important to mark 3.0.x as "not yet stable" even if the instability is carefully defined and controlled.
On Fri, Jan 30, 2009, Antoine Pitrou wrote:
Raymond Hettinger
writes: * If you're thinking that shelves have very few users and that 3.0.0 has had few adopters, doesn't that mitigate the effects of making a better format available in 3.0.1? Wouldn't this be the time to do it?
There was already another proposal for an sqlite-based dbm module, you may want to synchronize with it: http://bugs.python.org/issue3783
As I see it, the problem with introducing it in 3.0.1 is that we would be rushing in a new piece of code without much review or polish. Also, there are only two release blockers left for 3.0.1, so we might just finish those and release, then concentrate on 3.1.
There's absolutely no reason not to have a 3.0.2 before 3.1 comes out. You're probably right that what Raymond wants to is best not done for 3.0.1 -- but once we've agreed in principle that 3.0.x isn't a true production release of Python for PEP6 purposes, we can do "release early, release often". -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Weinberg's Second Law: If builders built buildings the way programmers wrote programs, then the first woodpecker that came along would destroy civilization.
Don't take the word "experimental" too seriously. It's clearly an exaggeration given the current state of 3.0.x. What is meant is an explicit announcement that the stability rules chosen in response to the bool-True-False brouhaha will be relaxed for the 3.0.x series *only*.
The name for that shouldn't be "experimental", though. I don't think it needs any name at all. It would be sufficient to report, in the release announcement, and some stuff got removed in an incompatible way. This is also different from bool-True-False, which was an addition, not a removal.
I think that the important question is "can the 3.0.x series be made 'viable' in less than the time frame for 3.1?" If not, I really have to think it's DOA from the point of view of folks who consider 3.0.0 non-viable. I think that's what Barry and Martin are saying.
DOA == dead on arrival? I don't think Python 3.0 is dead. Instead, I think it is fairly buggy, but those bugs can be fixed. Removal of stuff is *not* a bug fix, of course. The *real* bugs in 3.0 is stuff like "IDLE doesn't work", "bdist_wininst doesn't work", etc. I personally can agree with removal of stuff (despite it not being a bug fix). However, more importantly, I want to support respective authority. If the release manager sets a policy on what is and what is not acceptable for a bug fix release, every committer should implement this policy (or at least not actively break it). With the removals in the code, I do think it is important to release 3.0.1 quickly, like, say, next week.
The key point is that new features in 3.1 are generally going to be considered less reliable than those inherited from 3.0, and thus a debugged 3.0, even if the implementations have been unstable, provides a way for the very demanding to determine what that set is, and to test how it behaves in their applications.
That is fairly abstract. What specific bugs in Python 3.0 are you talking about? Regards, Martin
On Thu, 29 Jan 2009 at 16:43, Raymond Hettinger wrote:
On Thu, 29 Jan 2009 at 15:40, Guido van Rossum wrote:
Also you could try find shelve users (are there any?)
I'm a big fan of shelves and have always used them extensively. Not sure if anyone else cares about them though.
I use them. Not in any released products at the moment, though, and I haven't migrated the shelve-using code to 3.0 yet. So I'd be in favor of adding sqlite3 support as soon as practical. --RDM
"Martin v. Löwis" writes:
Don't take the word "experimental" too seriously. What is meant is an explicit announcement that the stability rules will be relaxed for the 3.0.x series *only*.
The name for that shouldn't be "experimental", though. I don't think it needs any name at all.
That's what I meant. I'm sure that whoever wrote the word "experimental" in the first place regrets it, because it doesn't reflect what they meant.
I think that the important question is "can the 3.0.x series be made 'viable' in less than the time frame for 3.1?" If not, I really have to think it's DOA from the point of view of folks who consider 3.0.0 non-viable. I think that's what Barry and Martin are saying.
DOA == dead on arrival? I don't think Python 3.0 is dead.
I'm sorry, DOA was poor word choice, especially this context. I meant that people who currently consider 3.0 non-viable are more likely to focus on the branch that will become 3.1 unless a "viable" 3.0.x will arrive *very* quickly.
That is fairly abstract. What specific bugs in Python 3.0 are you talking about?
I'm not talking about specific bugs; I'm perfectly happy with 3.0 for my purposes, and I think it very unlikely that any of the possibly destabilizing changes that have been proposed for 3.0.1 will affect me adversely. Rather, I'm trying to disentangle some of the unfortunate word choices that have been made (and I apologize for making one of my own!), and find common ground so that a policy can be set more quickly. IMO it's likely that there's really no audience for a 3.0.x series that conforms to the rules used for 2.x from 2.2.1 or so on. That is, there are people who really don't care because 3.0 is already a better platform for their application whether there are minor changes or not, and there are people who do care about stability but they're not going to use 3.0.x whether it adheres to the previous rules strictly or not. There are very few who will use 3.0.x if and only if it adheres strictly.
On Thu, Jan 29, 2009 at 4:58 PM, Raymond Hettinger
A couple additional thoughts FWIW:
* whichdb() selects from multiple file formats, so 3.0.1 would still be able to read 3.0.0 files. It is the 2.x shelves that won't be readable at all under any scenario.
* If you're thinking that shelves have very few users and that 3.0.0 has had few adopters, doesn't that mitigate the effects of making a better format available in 3.0.1? Wouldn't this be the time to do it?
* The file format itself is not new or unreadable by 3.0.0. It is just a plain sqlite3 file. Was is new is the ability of shelve's to call sqlite. To me, that is something a little different than changing a pickle protocol or somesuch.
Sorry, not convinced. This is a change of a different scale than removing stuff that should've been removed. I understand you'd like to see your baby released. But I think it's better to have it tried and tested by others *outside* the core distro first. dbm is not broken in 3.0, just slow. Well so be it, io.py is too and that's a lot more serious. I also note that on some systems at least ndbm and/or gdbm are supported. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Aahz
There's absolutely no reason not to have a 3.0.2 before 3.1 comes out. You're probably right that what Raymond wants to is best not done for 3.0.1 -- but once we've agreed in principle that 3.0.x isn't a true production release of Python for PEP6 purposes, we can do "release early, release often".
It's a possibility. To be honest, I didn't envision us releasing a 3.0.2 rather than focusing on 3.1 (which, as others said, can be released in a few months if we keep the amount of changes under control). But then it's only a matter of naming. We can continue the 3.0.x series and incorporate in them whatever was initially planned for 3.1 (including the IO-in-C branch, the dbm.sqlite module, etc.), and release 3.1 only when the whole thing is "good enough". Regards Antoine.
On 2009-01-30 11:40, Antoine Pitrou wrote:
Aahz
writes: There's absolutely no reason not to have a 3.0.2 before 3.1 comes out. You're probably right that what Raymond wants to is best not done for 3.0.1 -- but once we've agreed in principle that 3.0.x isn't a true production release of Python for PEP6 purposes, we can do "release early, release often".
It's a possibility. To be honest, I didn't envision us releasing a 3.0.2 rather than focusing on 3.1 (which, as others said, can be released in a few months if we keep the amount of changes under control).
But then it's only a matter of naming. We can continue the 3.0.x series and incorporate in them whatever was initially planned for 3.1 (including the IO-in-C branch, the dbm.sqlite module, etc.), and release 3.1 only when the whole thing is "good enough".
That would be my preference. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 30 2009)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
Antoine Pitrou wrote:
Raymond Hettinger
writes: * If you're thinking that shelves have very few users and that 3.0.0 has had few adopters, doesn't that mitigate the effects of making a better format available in 3.0.1? Wouldn't this be the time to do it?
There was already another proposal for an sqlite-based dbm module, you may want to synchronize with it: http://bugs.python.org/issue3783
As I see it, the problem with introducing it in 3.0.1 is that we would be rushing in a new piece of code without much review or polish.
Again
Also, there are only two release blockers left for 3.0.1, so we might just finish those and release, then concentrate on 3.1.
Seems to me that every deviation from the policy introduced as a result for the True/False debacle leads to complications and problems. There's no point having a policy instigated for good reasons if we can ignore those reasons on a whim. So to my mind, ignoring the policy *is* effectively declaring 3.0 to be, well, if not a dead parrot then at least a rushed release. Most consistently missing from this picture has been effective communications (in both directions) with the user base. Consequently nobody knows whether specific features are in serious use, and nobody knows whether 3.0 is intended to be a stable base for production software or not. Ignoring users, and acting as though we know what they are doing and what they want, is not going to lead to better acceptance of future releases. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/
On Fri, 30 Jan 2009 07:03:03 -0500, Steve Holden
Antoine Pitrou wrote:
Raymond Hettinger
writes: * If you're thinking that shelves have very few users and that 3.0.0 has had few adopters, doesn't that mitigate the effects of making a better format available in 3.0.1? Wouldn't this be the time to do it?
There was already another proposal for an sqlite-based dbm module, you may want to synchronize with it: http://bugs.python.org/issue3783
As I see it, the problem with introducing it in 3.0.1 is that we would be rushing in a new piece of code without much review or polish.
Again
Also, there are only two release blockers left for 3.0.1, so we might just finish those and release, then concentrate on 3.1.
Seems to me that every deviation from the policy introduced as a result for the True/False debacle leads to complications and problems. There's no point having a policy instigated for good reasons if we can ignore those reasons on a whim.
So to my mind, ignoring the policy *is* effectively declaring 3.0 to be, well, if not a dead parrot then at least a rushed release.
Most consistently missing from this picture has been effective communications (in both directions) with the user base. Consequently nobody knows whether specific features are in serious use, and nobody knows whether 3.0 is intended to be a stable base for production software or not. Ignoring users, and acting as though we know what they are doing and what they want, is not going to lead to better acceptance of future releases.
My 2 cents as a user... I wouldn't consider v3.0.n (where n is small) for use in production. v3.1 however implies (to me at least) a level of quality where I would be disappointed if it wasn't production ready. Therefore I would suggest the main purpose of any v3.0.1 release is to make sure that v3.1 is up to scratch. Phil
2009/1/30 Steve Holden
Most consistently missing from this picture has been effective communications (in both directions) with the user base.
Serious question: does anybody know how to get better communication from the user base? My impression is that it's pretty hard to find out who is actually using 3.0, and get any feedback from them. I suppose a general query on clp might get some feedback, but otherwise, what? I've not seen any significant amount of blog activity on 3.0. As a small contribution, my position is as follows: I use Python mostly for one-off scripts, both at home and at work. I also use Python for a suite of database monitoring tools, as well as using some applications written in Python (Mercurial and MoinMoin, in particular). Ignore the applications, they aren't moving to 3.0 in the short term (based on comments from the application teams). For my own use, the key modules I need are cx_Oracle and pywin32. cx_Oracle was available for 3.0 very quickly (and apparently the port wasn't too hard, which is good feedback!). pywin32 is just now available in preview form. My production box is still using 2.5, and I will probably migrate to 2.6 in due course - but I'll probably leave 3.0 for the foreseeable future (I may rethink if MoinMoin becomes available on 3.0 sooner rather than later). For my desktop PC, I'm using 2.6 but as I do a fair bit of experimenting with modules, I'm taking it slowly (I'd like to see 2.6 binaries for a few more packages, really). I have 3.0 installed, but not as default, so frankly it doesn't get used unless I'm deliberately trying it out. Based on the recent threads, I'm thinking I really should make 3.0 the default just to get a better feel for it. The io-in-C changes would probably help push me to doing so (performance isn't really an issue for me, but I find I'm irrationally swayed by the "3.0 io is slow, but it's getting fixed soon by the io-in-C rewrite" messages I've been seeing - I have no idea if that's a general impression, or just a result of my following python-dev, though). It would make no difference to me, personally, whether *any* of the changes being discussed were released in 3.0.1 or 3.1 (except insofar as I'd like to see them sooner rather than later). So, in summary, for practical purposes I use 2.6. I probably could use 3.0 for a significant proportion of my needs, but the impressions I've been getting make me cautious. I'm using Windows, and although I *can* build a lot of stuff myself, I really don't want to be bothered, so I rely on bdist_wininst installers being available, which is an additional constraint. Paul.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Jan 29, 2009, at 5:09 PM, Aahz wrote:
The problem is that the obvious candidate for doing the vetting is the Release Manager, and Barry doesn't like this approach. The vetting does need to be handled by a core committer IMO -- MAL, are you volunteering? Anyone else?
Barry, are you actively opposed to marking 3.0.x as experimental, or do you just dislike it? (I.e. are you -1 or -0?)
I'm opposed to marking 3.0 experimental, so I guess -1 there. It's the first model year of a redesigned nameplate, but it's still got four wheels, a good motor and it turns mostly in the direction you point it. :) No release is ever what everyone wants. There has never been a release where I haven't wanted to add or change something after the fact (see my recent 2.6 unicode grumblings). Perhaps frustratingly, but usually correctly, the community is very resistant to making such feature or API changes after a release is made. That's just life; we deal with it, workaround it and work harder towards the next major release. If that's too burdensome, then maybe it's really the 18 month development cycle that needs to be re-examined. All that aside, I will support whatever community consensus or BDFL pronouncement is made here. Don't be surprised if when you ask me though I'm more conservative than you want. You can always appeal to a higher authority (python-dev or Guido). So don't worry, I'll continue to RM the 3.0 series! Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSYMbn3EjvBPtnXfVAQLsUAP+J3WPGMNgGPSWrawJa8Yp+1RBTIt2vOif rgV+5xyOQqOKnuDntZPAv1R2SqrTCHv8abyLP4pBaoklqtymIDgikiOLJkI2tHij MT+gfPu4Xb7F35HAXE/6vhel124nr8JG15fXBQdEWqiozNZl9GaXEqKZY8tdhgkC 4VDdY6KEwL0= =kvOy -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Jan 29, 2009, at 6:27 PM, Raymond Hettinger wrote:
The problem is that the obvious candidate for doing the vetting is the Release Manager, and Barry doesn't like this approach. The vetting does need to be handled by a core committer IMO -- MAL, are you volunteering? Anyone else?
It should be someone who is using 3.0 regularly (ideally someone who is working on fixing it). IMO, people who aren't exercising it don't really have a feel for the problems or the cost/benefits of the fixes.
That's not the right way to look at it. I'm using 2.6 heavily these days, does that mean I get to decide what goes in it or not? No. Everyone here, whether they are using 2.6 or not should weigh in, with of course one BDFL to rule them all. Same goes for 3.0. This is a community effort and I feel strongly that we should work toward reaching consensus (that seems to be an American theme these days). Make your case, we'll listen to the pros and cons, decide as a community and then move on. Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSYMcnHEjvBPtnXfVAQK+aQQApR5McrCOiYUf6RiNvmrDKmTShMde4iWt Rh9x3wY3EVQskcgdpd+05VSfceVCKJJlqbR1NdMDtnuzM8aD56qQyAxYHhqYyxkh 0adHg1ZmYt/95K0/WE3DM8NoBUPxUFIb4nyeprGBsYola9BUQNc//VSRSIyXf0U6 p3xwN8oQS/c= =KKeq -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Jan 29, 2009, at 6:40 PM, Guido van Rossum wrote:
On Thu, Jan 29, 2009 at 3:27 PM, Raymond Hettinger
wrote: To get the ball rolling, I have a candidate for discussion.
Very late in the 3.0 process (after feature freeze), the bsddb code was ripped out (good riddance). This had the unfortunate side-effect of crippling shelves which now fall back to using dumbdbm.
I'm somewhat working on an alternate dbm based on sqlite3: http://code.activestate.com/recipes/576638/ It is a pure python module and probably will not be used directly, but shelves will see an immediate benefit (especially for large shelves) in terms of speed and space.
On the one hand, it is an API change or new feature because people can (if they choose) access the dbm directly. OTOH, it is basically a performance fix for shelves whose API won't change at all. The part that is visible and incompatible is that 3.0.1 shelves won't be readable by 3.0.0.
That is too much for 3.0.1. It could affect external file formats which strikes me as a bad idea.
Sounds like a good candidate for 3.1, which we should be expecting in 4-6 months I hope. Also you could try find shelve users (are there any?) and recommend they install this as a 3rd party package, with the expectation it'll be built into 3.1.
I concur. Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSYMctnEjvBPtnXfVAQKC3QP/bVCQ6KTI5Kd1H/y2Qp85pkLiC8JAH7ap 8vJ2xPjZde4oe6tz5WRziUparpM5FMA4Cz0fuMg4C7vtt6ZLIG27OKVuXx9i4atG zrtnEfs129Xouq4se6UFiIaIj1KNiNWbZa4cOkSlQFUq37Ww/B25JlrtGnreZB4v 13r8lRzTNOU= =8Fo7 -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Jan 29, 2009, at 7:43 PM, Raymond Hettinger wrote:
We should have insisted that bsddb not be taken out until a replacement was put in. The process was broken with the RM insisting on feature freeze early in the game but letting tools like bsddb get ripped-out near the end. IMO, it was foolish to do one without the other.
Very good arguments were made for ripping bsddb out. Guido agreed. A replacement would have delayed 3.0 even more than it originally was, and the replacement would not have been battle tested. It's possible, maybe even likely, that the replacement would have been found inadequate later on and then we'd be saddled with a different mistake. Given that it's easy to make 3rd party packages work, I firmly believe this was the right decision. With a proven, solid, popular replacement available for several months, it will be easy to pull that into the 3.1 release. Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSYMdtXEjvBPtnXfVAQK+FAQAlNL26s4ekva/3jpnATfZfXtAkHa+Wqdo f9luB8gkLk3Dk0qXyjm6AisFCMh+Zgu8g+OgrWS3DO6yR+/SlfjVcPbq0kr8nP+L +EXXisuZofeHuxp0JZ3ePoL94ALbv35norx1yHqiKnEMEvUbCfdNWb4sGE2kM5ZE snfeFattlIg= =RQ7t -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Jan 29, 2009, at 8:34 PM, Stephen J. Turnbull wrote:
I think that the important question is "can the 3.0.x series be made 'viable' in less than the time frame for 3.1?" If not, I really have to think it's DOA from the point of view of folks who consider 3.0.0 non-viable. I think that's what Barry and Martin are saying.
Of course, the definition of "viable" is the key thing here. I'm not picking on Raymond, but what is not viable for him will be perfectly viable for other people. We have to be very careful not to view our little group of insiders as the sole universe of Python users (3.0 or otherwise).
Guido is saying something different. AIUI, he's saying that explicitly introducing controlled instability into 3.0.x of the form "this is what the extremely stable non-buggy inherited-from-3.0 core of 3.1 is going to look like" will be a great service to those who consider 3.0.0 non-viable.
The key point is that new features in 3.1 are generally going to be considered less reliable than those inherited from 3.0, and thus a debugged 3.0, even if the implementations have been unstable, provides a way for the very demanding to determine what that set is, and to test how it behaves in their applications.
I'm not sure I agree with that last paragraph. We have a pretty good track record of introducing stable new features in dot-x releases, so there's no reason to believe that the same won't work for 3.x.
I think it's worth a try, after consultation with some of the major developers who are the ostensible beneficiaries. But if tried, I think it's important to mark 3.0.x as "not yet stable" even if the instability is carefully defined and controlled.
It all depends on where that instability lies. If 3.0 crashed every time you raised an exception due to some core design flaw, then yeah, we'd have a problem. The fact that a bundled module doesn't do what you want it to does not scream instability to me. The should-have- been-removed features don't either. Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSYMfenEjvBPtnXfVAQLIhwP+JVFJWXRoQ5Fz65vrmmGo+8w7ZspjVCWP 9a+yrAh1aGHf0w4vQAirRuBGZNWvl4e5F/Pd4DoWdFVPPKuEhyOiavPAP90ViThy yKHHoEBv6cloUIRXrKendJGzA7L5bDVN0CoQjcPh499mpDxvq7aGgru2lYdD7iT0 KuB21maqMTc= =dWTA -----END PGP SIGNATURE-----
On Thu, Jan 29, 2009 at 8:25 PM, Raymond Hettinger
[Guido van Rossum]
Sorry, not convinced.
No worries. Py3.1 is not far off.
Just so I'm clear. Are you thinking that 3.0.x will never have fast shelves, or are you thinking 3.0.2 or 3.0.3 after some external deployment and battle-testing for the module?
I don't know about fast shelves, but I don't think your new module should be added to 3.0.x for any x. Who knows if there even will be a 3.0.2 -- it sounds like it's better to focus on 3.1 after 3.0.1. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Paul Moore wrote:
Serious question: does anybody know how to get better communication from the user base?
One of the nice things about Python is that the downloads are truly free -- no required 'registration'. On the other hand, there is no option to give feedback either. If PSF/devs wanted to add something to the site, and someone else volunteered to do the implementation, I would volunteer to help with both design and analysis. That said, I think a main determinant of general 3.0 use will be availability of 3rd-party libraries, including Windows binaries. So perhaps we should aim survey efforts at their authors. I have the impression that the C-API porting guide needs improvement for such effort. On the other hand, perhaps they wonder whether ports will be used. In that case, we need more reports like the post of Nick Efford: "
We'd love to switch to 3.0 as soon as possible (i.e., Oct 2009), as it is a significantly cleaner language for our purposes. [university CS courses] However, we make extensive use of third-party libraries and frameworks such as Pygame, wxPython, etc, to increase the motivation levels of students. The 3.0-readiness of these libraries and frameworks is inevitably going to be a factor in the decision we make this summer. "
Terry Jan Reedy
Serious question: does anybody know how to get better communication from the user base? My impression is that it's pretty hard to find out who is actually using 3.0, and get any feedback from them.
I think the bug tracker is a way in which users communicate with developers. There have been 296 issues since Dec 3rd that got tagged with version 3.0. The absolute majority of these were documentation problems (documentation was incorrect). Then, I would say we have installation problems, and then problems with IDLE. There is also a significant number of 2to3 problems.
I'm using Windows, and although I *can* build a lot of stuff myself, I really don't want to be bothered, so I rely on bdist_wininst installers being available, which is an additional constraint.
Notice that bdist_wininst doesn't really work in 3.0. So you likely won't see many packages until 3.0.1 is released. Regards, Martin
Just my 2 eurocents: I think version numbers communicate a couple of things. One thing the communicate is that if you go from x.y.0 to x.y.1 (or from x.y.34 to x.y.35 for that matter) you signify that this is a bug fix release, and that the risk of any of your stuff breaking is close to zero, unless you somehow where relying on what essentially was broken behavior. It's also correct that a .0 anywhere indicates that you should wait, and that a .1 indicated that this should be safer. Of course, you can end up where these two things clash. Where you need to make a major change that breaks something, but you at the same time don't want to flag "Yes, this will be as bugfree as you normally would expect from a .1 release." My opinion is that in that case, the first rule should win out. Don't make potentially incompatible changes in a minor version increase. So it seems to me here that a 3.0.1 bugfix release, and then a 3.1 with the API changes and C IO is at least the type of numbering I would expect.
2009/1/31 "Martin v. Löwis"
Notice that bdist_wininst doesn't really work in 3.0. So you likely won't see many packages until 3.0.1 is released.
Ah, that might be an issue :-) Can you point me at specifics (bug reports or test cases)? I could see if I can help in fixing things. Paul.
participants (30)
-
"Martin v. Löwis"
-
Aahz
-
Adam Olsen
-
Antoine Pitrou
-
Barry Warsaw
-
Benjamin Peterson
-
Bill Janssen
-
Brett Cannon
-
Daniel Stutzbach
-
Georg Brandl
-
Guido van Rossum
-
Jean-Paul Calderone
-
Jeroen Ruigrok van der Werven
-
Kevin Jacobs <jacobs@bioinformed.com>
-
Lawrence Oluyede
-
Lennart Regebro
-
M.-A. Lemburg
-
Matthew Wilkes
-
Michael Foord
-
Nick Coghlan
-
Paul Moore
-
Phil Thompson
-
Raymond Hettinger
-
rdmurray@bitdance.com
-
Scott David Daniels
-
Stephen J. Turnbull
-
Steve Holden
-
Steven Bethard
-
Terry Reedy
-
Victor Stinner