Here is my todo list for Py2.3. Feel free to comment on whether it is too late to pursue these or whether I should continue to work on them for the second beta. 1) PEP 42 lists a request to add timeout settings to the higher level net libraries. Should this still be done? In Py2.3, sockets offers a setdefaulttimeout() function that provides an indirect way of meeting the same goal. 2) Jack Diedrich is working on two patches for itertools: itertools.window(iterable, n=2) --> (a0, a1), (a1, a2), (a2, a3), ... itertools.roundrobin(*iterables) which loops over the iterables returning one element from each and then cycles back to the first until all of the iterables are exhausted: itertools.roundrobin('ab', 'cde') --> a, c, b, d, e Both functions were discussed on comp.lang.python and have been requested by multiple users. Neither is easily implemented in terms of the existing tools. OTOH, the more tools you add, the harder it is to comprehend the toolset as a whole. 3) difflib now has functions to create a context diff or unified diff. A natural next step is to add a patch() function that applies the diff and finishes the roundtrip. It also helps fulfill the original reason for adding the context/unified diffs which was to make it easier for general python users to either create or apply patches. 4) I've had a long outstanding patch to add methods like isalpha() to string objects. The goal was to make sure that replacements exist for all the tools in the string module. The hold-up has been in making UniCode equivalents. If this is still wanted, I'll finish it up. Raymond Hettinger
Here is my todo list for Py2.3. Feel free to comment on whether it is too late to pursue these or whether I should continue to work on them for the second beta.
1) PEP 42 lists a request to add timeout settings to the higher level net libraries. Should this still be done? In Py2.3, sockets offers a setdefaulttimeout() function that provides an indirect way of meeting the same goal.
IMO this is API design that should be done without the time pressure of a beta. I'd like you to experiment a bit with the setdefaulttimeout() approach to see if it is workable though.
2) Jack Diedrich is working on two patches for itertools:
itertools.window(iterable, n=2) --> (a0, a1), (a1, a2), (a2, a3), ...
itertools.roundrobin(*iterables) which loops over the iterables returning one element from each and then cycles back to the first until all of the iterables are exhausted: itertools.roundrobin('ab', 'cde') --> a, c, b, d, e
Both functions were discussed on comp.lang.python and have been requested by multiple users. Neither is easily implemented in terms of the existing tools. OTOH, the more tools you add, the harder it is to comprehend the toolset as a whole.
itertools is new and yours; if you're comfortable with this, I'm okay with it.
3) difflib now has functions to create a context diff or unified diff. A natural next step is to add a patch() function that applies the diff and finishes the roundtrip. It also helps fulfill the original reason for adding the context/unified diffs which was to make it easier for general python users to either create or apply patches.
Interesting. My own need is not for patching but for a three-way merge. If you could add a direct API for that, it'd be great. difflib.merge(mine, old, yours) -> new. With conflict markers a la diff3 -m -E.
4) I've had a long outstanding patch to add methods like isalpha() to string objects. The goal was to make sure that replacements exist for all the tools in the string module. The hold-up has been in making UniCode equivalents. If this is still wanted, I'll finish it up.
I'm +0 on that. --Guido van Rossum (home page: http://www.python.org/~guido/)
"Raymond Hettinger" <raymond.hettinger@verizon.net> writes:
1) PEP 42 lists a request to add timeout settings to the higher level net libraries. Should this still be done?
No. It shouldn't be done wholesale, anyway, but with one protocol at a time. Patches should get review, as this is tricky stuff.
2) Jack Diedrich is working on two patches for itertools:
itertools.window(iterable, n=2) --> (a0, a1), (a1, a2), (a2, a3), ...
itertools.roundrobin(*iterables) which loops over the iterables returning one element from each and then cycles back to the first until all of the iterables are exhausted: itertools.roundrobin('ab', 'cde') --> a, c, b, d, e
We are approaching *beta* *2*. No new features, please. I find it much more important to get the release published than enhancing the features. Anybody with spare times at his hands should look into the ever-growing backlog of bug reports. Some of the bugs are really bad, much worse than the features would do good. For example, to do something really useful, get rid of the recursion in SRE (although this might be beyond the scope of a beta 2 as well).
3) difflib now has functions to create a context diff or unified diff. A natural next step is to add a patch() function that applies the diff and finishes the roundtrip.
No additions.
4) I've had a long outstanding patch to add methods like isalpha() to string objects. The goal was to make sure that replacements exist for all the tools in the string module. The hold-up has been in making UniCode equivalents. If this is still wanted, I'll finish it up.
It is still wanted, and I'd encourage you to come up with a patch. However, getting that patch into 2.4 might still be early enough, and you might have to start today to finish it before the first beta of 2.4. Regards, Martin
For example, to do something really useful, get rid of the recursion in SRE (although this might be beyond the scope of a beta 2 as well).
I've just done so! The recursion limit is gone. And the interesting part is, there's less code, and the logic is a lot simpler and more obvious. My tests have shown that there was no slow down, and indeed some cases are faster. All regression tests succeed (besides those that checked the recursion limit), and, of course, these work:
import re re.match("^(?:x)*$", 5000000*"x") <_sre.SRE_Match object at 0x300eb258> re.match("^(?:x)*?$", 5000000*"x") <_sre.SRE_Match object at 0x300eb5c8>
Now, for the question... <jumping frenetically> Can I commit!? Can I commit!? </> :-) -- Gustavo Niemeyer http://niemeyer.net
>> For example, to do something really useful, get rid of the recursion >> in SRE (although this might be beyond the scope of a beta 2 as well). Gustavo> I've just done so! The recursion limit is gone. Way to go, Gustavo! I suspect we all owe you a beer (hic!). ;-) Gustavo> Now, for the question... Gustavo> <jumping frenetically> Can I commit!? Can I commit!? </> If we weren't already at b1 I'd say go for it. With the expectation of only one more beta though, I'm not sure. I'd like to see a third beta or something else which would insure it gets plenty of testing. Perhaps you could build a test coverage version of the interpreter using gcc, run just the re regression tests and see if there are some lines of the changed code which are not being properly exercised. That would suggest where more tests are warranted. Skip
Way to go, Gustavo! I suspect we all owe you a beer (hic!). ;-)
Thank you!! I'll mention this in the next time we meet. 8-)~
If we weren't already at b1 I'd say go for it. With the expectation of only one more beta though, I'm not sure. I'd like to see a third beta or something else which would insure it gets plenty of testing.
Yes, another beta would be nice. Anyway, I'm open to whatever the BDFL, or the BDFLT(eam) decides. :-)
Perhaps you could build a test coverage version of the interpreter using gcc, run just the re regression tests and see if there are some lines of the changed code which are not being properly exercised. That would suggest where more tests are warranted.
Thanks for the suggestion! I've just done that, and commited the new version to CVS. I was able to get 82% of coverage observating gcov's output: 82.07% of 1372 source lines executed in file ./Modules/_sre.c This observation made me notice some interesting facts as well, like a few unreachable places, not being used by the current parser/compiler. This can probably be removed in the future. Another thing I got in the process is the following regexp: re.match("(?=a)*", "a") This will blow up the recursion limit in the current implementation, and will consume all memory in the new implementation. IMO, this should raise an error during the parsing process, as it makes no sense to repeat a non-consuming group. -- Gustavo Niemeyer http://niemeyer.net
Thanks for the suggestion! I've just done that, and commited the new version to CVS. I was able to get 82% of coverage observating gcov's output:
82.07% of 1372 source lines executed in file ./Modules/_sre.c
FYI, I took some time to write an overview of the procedure in my (wiki|web)log: https://moin.conectiva.com.br/GustavoNiemeyer self-promoting-ly y'rs -- Gustavo Niemeyer http://niemeyer.net
Gustavo Niemeyer wrote:
Thanks for the suggestion! I've just done that, and commited the new version to CVS. I was able to get 82% of coverage observating gcov's output:
82.07% of 1372 source lines executed in file ./Modules/_sre.c
FYI, I took some time to write an overview of the procedure in my (wiki|web)log:
Of course you can always patch the Makefile and add "-fprofile-arcs -ftest-coverage" to the OPT variable. The best solution would be to add a new configure option --with-coverage. BTW, I'm currently working on a new coverage tool. The mayor difference to Skip's tools is that the coverage information is imported into a database, so coverage information can be tracked over time. Bye, Walter Dörwald
<jumping frenetically> Can I commit!? Can I commit!? </>
It would be good if your fix was reviewed. Can you post it on SF?
Sure, once the code is completely ready. I'm still retouching the code. Anyway, without any intention of pretending to be a smart ass, I don't think someone will be able to test and review it anymore than I did. I've been working on this code, and understanding how it works, for quite some time now. If we do want this for 2.3, we should get this in ASAP, since the best way to test it now is the real world. And, of course, revision control is always your friend. :-) -- Gustavo Niemeyer http://niemeyer.net
On Thu, 19 Jun 2003, Gustavo Niemeyer wrote:
<jumping frenetically> Can I commit!? Can I commit!? </>
It would be good if your fix was reviewed. Can you post it on SF?
Sure, once the code is completely ready. I'm still retouching the code.
Anyway, without any intention of pretending to be a smart ass, I don't think someone will be able to test and review it anymore than I did. I've been working on this code, and understanding how it works, for quite some time now. If we do want this for 2.3, we should get this in ASAP, since the best way to test it now is the real world. And, of course, revision control is always your friend. :-)
As I seem to have assumed responsibility for keeping Python buildable on FreeBSD, which also seems to be the platform most sensitive to the recursion issue, I'd very much like to test your patch! -- Andrew I MacIntyre "These thoughts are mine alone..." E-mail: andymac@bullseye.apana.org.au (pref) | Snail: PO Box 370 andymac@pcug.org.au (alt) | Belconnen ACT 2616 Web: http://www.andymac.org/ | Australia
<jumping frenetically> Can I commit!? Can I commit!? </>
FWIW, there's a preview version of the patch in SF #757624. -- Gustavo Niemeyer http://niemeyer.net
[Gustavo Niemeyer]
I've just done so! The recursion limit is gone. ...
Cool! Very cool. Check it in. Note that the primary effect will be to change complaints about the recursion limit to complaints about "infinite loops in sre" <0.4 wink>. BTW, it would be best to Fredrik Lundh's blessing for this, since sre is still "his" package. I'm copying him on this. Some version of Gustavo's patch is here: http://www.python.org/sf/757624
I've just done so! The recursion limit is gone. ...
Cool! Very cool. Check it in. Note that the primary effect will be
Thanks! I'll do a few minor retouches, and check it in.
to change complaints about the recursion limit to complaints about "infinite loops in sre" <0.4 wink>.
That's not true! They'll be about memory blowing up. ;-)) More seriously, the real infinite loops (like the one I've shown in the other mail) should be prevented while parsing or compiling (which is hopefully not hard), and the cases where the user matches against a *really* huge string.. well.. 1000**1000**1000 is easier. :-)
BTW, it would be best to Fredrik Lundh's blessing for this, since sre is still "his" package. I'm copying him on this. Some version of Gustavo's patch is here:
Given his wonderful work on SRE, I'd really like to know his opinion about it. OTOH, I must confess I have a bad experience waiting for Fredrik's opinion. :-) Hopefully, he will answer so quickly I'll feel ashamed for even mentioning this. -- Gustavo Niemeyer http://niemeyer.net
tim wrote:
BTW, it would be best to Fredrik Lundh's blessing for this, since sre is still "his" package. I'm copying him on this.
on midsummer's eve? I'm supposed to be eating pickled herring and drinking schapps, not trying to decipher C code...
Some version of Gustavo's patch is here:
looking at the patch, I'm 95% confident that it's the right thing (or close enough to the right thing ;-) but reading the unified patch is not exactly trivial; a brief prose description of the new mechanism would be nice. have you benchmarked this on "real-world" examples, and on more than one platform? before-and-after figures for xmllib/tokenize on large source files would be a good indication on the performance impact (if any). (and to be slightly nitpicking, I think it's good style to keep the alphabetical order when adding stuff to lists that are already in alphabetical order, unless you have really good reasons no to...) </F>
on midsummer's eve? I'm supposed to be eating pickled herring and drinking schapps, not trying to decipher C code...
I'm glad I was wrong.. :-)
Some version of Gustavo's patch is here:
looking at the patch, I'm 95% confident that it's the right thing (or close enough to the right thing ;-)
Cool! :-)
but reading the unified patch is not exactly trivial; a brief prose
Indeed. It looks awful. :-)
description of the new mechanism would be nice.
Basically, MAX_REPEAT and MIN_REPEAT were changed from <REPEAT> <skip> <min> <max> ... <(MAX|MIN)_UNTIL> ... to <(MAX|MIN)_REPEAT> <skip> <min> <max> ... <SUCCESS> ... and all logic was moved from (MAX|MIN)_UNTIL to (MAX|MIN)_REPEAT. In the implementation, the main change was turning mark_stack into a generic data_stack, and using it to push the state of each iteration from MAX_REPEAT, so that it can pop them out while backtracking. Another way to implement this was to simply test tail-matching forwards, but while this would save memory and be easier to implement, it'd certainly affect performance. MIN_REPEAT was quite straightforward, as it tests tail-matching forwards, thus no state saving is necessary.
have you benchmarked this on "real-world" examples, and on more than one platform? before-and-after figures for xmllib/tokenize on large source files would be a good indication on the performance impact (if any).
No, I haven't done with real programs. OTOH, I've done tests with large streams which explored the worse case of both algorithms, and it has shown no negative impacts, and indeed it's faster in some situations. For example, I could remove the MIN_REPEAT_ONE opcode, since the new generic MIN_REPEAT implementation is as fast as the old specific implementation. I'm also checking if it is possible to move some of the intelligence in MAX_REPEAT_ONE to MAX_REPEAT. Of course, I'll be thankful for any further benchmarks done on this code.
(and to be slightly nitpicking, I think it's good style to keep the alphabetical order when adding stuff to lists that are already in alphabetical order, unless you have really good reasons no to...)
I'm sorry. I thought it was ordered based on logic proximity, but I should have looked more carefuly. Thanks for reviewing it! -- Gustavo Niemeyer http://niemeyer.net
implementation. I'm also checking if it is possible to move some of the intelligence in MAX_REPEAT_ONE to MAX_REPEAT.
Good news and bad news. I was able to greatly improve some cases using the priciples of MAX_REPEAT_ONE, but I also got a problem in the current implementation. I'll be working on it. -- Gustavo Niemeyer http://niemeyer.net
On Fri, 20 Jun 2003, Gustavo Niemeyer wrote:
implementation. I'm also checking if it is possible to move some of the intelligence in MAX_REPEAT_ONE to MAX_REPEAT.
Good news and bad news. I was able to greatly improve some cases using the priciples of MAX_REPEAT_ONE, but I also got a problem in the current implementation. I'll be working on it.
Not sure whether this refers to the second version of the patch. I've tried this (2nd version of patch) on both FreeBSD 4.8 and 5.1. gcc 2.95 (4.8) barfs on the code, but gcc 3.2.2 (5.1) is ok - attached patch shuts 2.95 up. Unfortunately, in both cases test_pyclbr goes off with the pixies with the patch applied. Interestingly, interrupting with a Ctrl-C produces a segmentation fault, and the gdb backtrace is the same for both platforms (4.8/5.1). If the bt is useful, I can send it direct. Don't yet know what other tests might be misbehaving, having only gotten as far as test_pyclbr. Regards, Andrew. -- Andrew I MacIntyre "These thoughts are mine alone..." E-mail: andymac@bullseye.apana.org.au (pref) | Snail: PO Box 370 andymac@pcug.org.au (alt) | Belconnen ACT 2616 Web: http://www.andymac.org/ | Australia
Not sure whether this refers to the second version of the patch. [...]
I've just submitted a new patch to SF (version 3) which addresses these problems. Can you please repeat the test? Thank you very much! -- Gustavo Niemeyer http://niemeyer.net
On Sun, 22 Jun 2003, Gustavo Niemeyer wrote:
Not sure whether this refers to the second version of the patch. [...]
I've just submitted a new patch to SF (version 3) which addresses these problems. Can you please repeat the test?
With v3 of the patch, the interpreter survives the full regression test on both FreeBSD 4.8 (gcc 2.95.3) & FreeBSD 5.1 (gcc 3.2.2). Just for completeness, I tested with USE_RECURSION_LIMIT reverted to 10000 on both systems. I didn't monitor performance, other than to note that there was no gross slowdown... -- Andrew I MacIntyre "These thoughts are mine alone..." E-mail: andymac@bullseye.apana.org.au (pref) | Snail: PO Box 370 andymac@pcug.org.au (alt) | Belconnen ACT 2616 Web: http://www.andymac.org/ | Australia
Raymond Hettinger wrote:
Here is my todo list for Py2.3. 4) I've had a long outstanding patch to add methods like isalpha() to string objects. The goal was to make sure that replacements exist for all the tools in the string module. The hold-up has been in making UniCode equivalents. If this is still wanted, I'll finish it up.
Hmm, Unicode object already have these methods... I'd like to add another TODO to the list: 5) Add functions sys.setdefaultsourceencoding() and sys.getdefaultsourceencoding() which allow setting and querying the Python compiler's assumption about the default source code encoding (currently ASCII) much in the same way as sys.set/getdefaultencoding() work for the internal string encoding assumption. Just like the latter, sys.setdefaultsourceencoding() should only be usable in site.py and get deleted from the sys module in the same way after completed execution of site.py. I probably won't have time to write a patch for this, so volunteers are most welcome. If you have questions, feel free to ask. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Jun 21 2003)
Python/Zope Products & Consulting ... http://www.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
EuroPython 2003, Charleroi, Belgium: 3 days left
"M.-A. Lemburg" <mal@lemburg.com> writes:
4) I've had a long outstanding patch to add methods like isalpha() to string objects. The goal was to make sure that replacements exist for all the tools in the string module. The hold-up has been in making UniCode equivalents. If this is still wanted, I'll finish it up.
Hmm, Unicode object already have these methods...
I think Raymond is talking about methods *like* isalpha :-) See patch 562501, he wants ispunct, isgraph, isprint, isctrl, isxdigit, i.e. all ctype.h macros. Unicode objects don't support these, and the ctype.h macros aren't available for them.
I'd like to add another TODO to the list:
5) Add functions sys.setdefaultsourceencoding() and sys.getdefaultsourceencoding() which allow setting and querying the Python compiler's assumption about the default source code encoding (currently ASCII) much in the same way as sys.set/getdefaultencoding() work for the internal string encoding assumption.
This should not go into 2.3 (just as the other features shouldn't go there). I personally envision that this is solved in a different way: Instead of having a global setting, there should be a context-specific one, for use in eval/exec, and interactive mode. The "default" encoding for source code files should not be settable, and should remain at ASCII forever. Regards, Martin
Martin v. Löwis wrote:
"M.-A. Lemburg" <mal@lemburg.com> writes:
4) I've had a long outstanding patch to add methods like isalpha() to string objects. The goal was to make sure that replacements exist for all the tools in the string module. The hold-up has been in making UniCode equivalents. If this is still wanted, I'll finish it up.
Hmm, Unicode object already have these methods...
I think Raymond is talking about methods *like* isalpha :-) See patch 562501, he wants ispunct, isgraph, isprint, isctrl, isxdigit, i.e. all ctype.h macros. Unicode objects don't support these, and the ctype.h macros aren't available for them.
I see.
I'd like to add another TODO to the list:
5) Add functions sys.setdefaultsourceencoding() and sys.getdefaultsourceencoding() which allow setting and querying the Python compiler's assumption about the default source code encoding (currently ASCII) much in the same way as sys.set/getdefaultencoding() work for the internal string encoding assumption.
This should not go into 2.3 (just as the other features shouldn't go there).
It was planned for 2.3 several months ago. The fact that it isn't in there yet is mostly my fault: I didn't have time to cook up a patch and forgot to ask here for other volunteers.
I personally envision that this is solved in a different way: Instead of having a global setting, there should be a context-specific one, for use in eval/exec, and interactive mode. The "default" encoding for source code files should not be settable, and should remain at ASCII forever.
This feature is needed to calm down concerns of non-ASCII Python users who want to customize Python to better suit their needs for both educational and production use purposes. The same disclaimers as for the sys.setdefaultencoding() pair of APIs apply to these too. The implementation should follow the same path that the default encoding is using (storing it in the interpreter state instead of a global). -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Jun 21 2003)
Python/Zope Products & Consulting ... http://www.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
EuroPython 2003, Charleroi, Belgium: 3 days left
"M.-A. Lemburg" <mal@lemburg.com> writes:
It was planned for 2.3 several months ago. The fact that it isn't in there yet is mostly my fault: I didn't have time to cook up a patch and forgot to ask here for other volunteers.
You mean, this was your plan? I am not aware of such a plan, and it is not part of the approved PEP 263. I would strongly object to such a change.
This feature is needed to calm down concerns of non-ASCII Python users who want to customize Python to better suit their needs for both educational and production use purposes.
People who want such a feature will have to fork Python. However, most users will accept to put encoding declarations into their source code. They will curse, and then they will get over it. Regards, Martin
Martin v. Löwis wrote:
"M.-A. Lemburg" <mal@lemburg.com> writes:
It was planned for 2.3 several months ago. The fact that it isn't in there yet is mostly my fault: I didn't have time to cook up a patch and forgot to ask here for other volunteers.
You mean, this was your plan?
Partly, yes. Guido and I decided to add this feature in private discussions with Python users who were strongly opposed to the PEP 263 way of forcing the ASCII encoding onto existing Python source code.
I am not aware of such a plan, and it is not part of the approved PEP 263. I would strongly object to such a change.
Why is that ? The proposed APIs will work just like their counterparts for the internal Unicode/string conversion which have proven to quiet down discussions about choosing ASCII as default encoding. I expect the same to happen for the Python source code encoding default.
This feature is needed to calm down concerns of non-ASCII Python users who want to customize Python to better suit their needs for both educational and production use purposes.
People who want such a feature will have to fork Python. However, most users will accept to put encoding declarations into their source code. They will curse, and then they will get over it.
No need to fork Python :-) They can customize their site.py settings to their liking; of course, they will also have to live with the consequences, just as the users who tweak the default encoding of the interpreter. "Practicality beats purity." -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Jun 21 2003)
Python/Zope Products & Consulting ... http://www.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
EuroPython 2003, Charleroi, Belgium: 3 days left
"M.-A. Lemburg" <mal@lemburg.com> writes:
I am not aware of such a plan, and it is not part of the approved PEP 263. I would strongly object to such a change.
Why is that ? The proposed APIs will work just like their counterparts for the internal Unicode/string conversion which have proven to quiet down discussions about choosing ASCII as default encoding.
But do they have done good? I don't consider quieting down of discussions a good thing per se.
"Practicality beats purity."
That is, unfortunately, convincing. I'll certainly bow to BDFL pronouncement, but I don't have to like this feature. So I withdraw my observation that this would be out of scope for the next beta. I'll hope that nobody volunteers to implement it, anyway :-) Any potential implementer, please find a way to integrate this with IDLE: In absence of a declared source encoding, IDLE should then probably assume that source files are in the system source encoding. Regards, Martin
Martin v. Löwis wrote:
"M.-A. Lemburg" <mal@lemburg.com> writes:
I am not aware of such a plan, and it is not part of the approved PEP 263. I would strongly object to such a change.
Why is that ? The proposed APIs will work just like their counterparts for the internal Unicode/string conversion which have proven to quiet down discussions about choosing ASCII as default encoding.
But do they have done good? I don't consider quieting down of discussions a good thing per se.
Providing more options often helps in finding compromises. Not that I like any of these APIs or that I have ever used them, but if they make people happy, I don't mind exposing them.
"Practicality beats purity."
That is, unfortunately, convincing. I'll certainly bow to BDFL pronouncement, but I don't have to like this feature.
No question about that :-)
So I withdraw my observation that this would be out of scope for the next beta. I'll hope that nobody volunteers to implement it, anyway :-) Any potential implementer, please find a way to integrate this with IDLE: In absence of a declared source encoding, IDLE should then probably assume that source files are in the system source encoding.
I think that we can safely leave providing patches for this to the people who will make use of the feature :-) -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Jun 22 2003)
Python/Zope Products & Consulting ... http://www.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
EuroPython 2003, Charleroi, Belgium: 2 days left
"M.-A. Lemburg" <mal@lemburg.com> writes:
I am not aware of such a plan, and it is not part of the approved PEP 263. I would strongly object to such a change.
Why is that ? The proposed APIs will work just like their counterparts for the internal Unicode/string conversion which have proven to quiet down discussions about choosing ASCII as default encoding.
[MvL]
But do they have done good? I don't consider quieting down of discussions a good thing per se.
"Practicality beats purity."
That is, unfortunately, convincing. I'll certainly bow to BDFL pronouncement, but I don't have to like this feature.
So I withdraw my observation that this would be out of scope for the next beta. I'll hope that nobody volunteers to implement it, anyway :-) Any potential implementer, please find a way to integrate this with IDLE: In absence of a declared source encoding, IDLE should then probably assume that source files are in the system source encoding.
Let's discuss this at EuroPython. We're all (MvL, MAL, me) going to be there, right? --Guido van Rossum (home page: http://www.python.org/~guido/)
"M.-A. Lemburg" <mal@lemburg.com> writes:
Why is that ? The proposed APIs will work just like their counterparts for the internal Unicode/string conversion which have proven to quiet down discussions about choosing ASCII as default encoding. I expect the same to happen for the Python source code encoding default.
It just occurred to me that these people can put import warnings warnings.filterwarnings("ignore", ".*pep-0263", DeprecationWarning) into site.py to achieve nearly the same effect that they would get with sys.setsourceencoding. People are probably concerned about the flood of warnings, and they want to silence them, so that everything continues to work as it did before. Ignoring those warnings appears to be the right solution, then. If they want to make use of the new features (i.e. non-ASCII in Unicode literals), they still need to put an encoding declaration into the file. However they are probably willing to do that, as they are editing the file, anyway. Regards, Martin
On Monday, Jun 23, 2003, at 08:18 Europe/Amsterdam, Martin v. Löwis wrote:
"M.-A. Lemburg" <mal@lemburg.com> writes:
Why is that ? The proposed APIs will work just like their counterparts for the internal Unicode/string conversion which have proven to quiet down discussions about choosing ASCII as default encoding. I expect the same to happen for the Python source code encoding default.
It just occurred to me that these people can put
import warnings warnings.filterwarnings("ignore", ".*pep-0263", DeprecationWarning)
into site.py to achieve nearly the same effect that they would get with sys.setsourceencoding.
It would silence the warnings, but I would guess that if you actually processed the file (for instance, open it in Idle) you would see strange characters, no? -- Jack Jansen, <Jack.Jansen@cwi.nl>, http://www.cwi.nl/~jack If I can't dance I don't want to be part of your revolution -- Emma Goldman
Martin v. Löwis wrote:
"M.-A. Lemburg" <mal@lemburg.com> writes:
Why is that ? The proposed APIs will work just like their counterparts for the internal Unicode/string conversion which have proven to quiet down discussions about choosing ASCII as default encoding. I expect the same to happen for the Python source code encoding default.
It just occurred to me that these people can put
import warnings warnings.filterwarnings("ignore", ".*pep-0263", DeprecationWarning)
into site.py to achieve nearly the same effect that they would get with sys.setsourceencoding.
I know, but that trick only works in Python 2.3. In Python 2.4 they would get a SyntaxError and their scripts would simply fail to load.
People are probably concerned about the flood of warnings, and they want to silence them, so that everything continues to work as it did before. Ignoring those warnings appears to be the right solution, then.
If they are only concerned about the warnings in 2.3, yes,...
If they want to make use of the new features (i.e. non-ASCII in Unicode literals), they still need to put an encoding declaration into the file. However they are probably willing to do that, as they are editing the file, anyway.
... but some of them are also worried about raising the bar teaching Python to newbies. They don't want to start the Python course explaining advanced features like source code encodings. -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Jun 23 2003)
Python/Zope Products & Consulting ... http://www.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
EuroPython 2003, Charleroi, Belgium: one day left
"M.-A. Lemburg" <mal@lemburg.com> writes:
... but some of them are also worried about raising the bar teaching Python to newbies. They don't want to start the Python course explaining advanced features like source code encodings.
They don't have to. All they have to do is to arrange IDLE so that it always stores files as UTF-8 with BOM, and then everything will work out just fine, with no need of teaching. Regards, Martin
Martin v. Löwis wrote:
"M.-A. Lemburg" <mal@lemburg.com> writes:
... but some of them are also worried about raising the bar teaching Python to newbies. They don't want to start the Python course explaining advanced features like source code encodings.
They don't have to. All they have to do is to arrange IDLE so that it always stores files as UTF-8 with BOM, and then everything will work out just fine, with no need of teaching.
That's a good hint, but it only works with IDLE and Notepad, doesn't it ? Other editors or Python IDEs will probably need some more time to grow such a feature. -- Marc-Andre Lemburg eGenix.com Professional Python Software directly from the Source (#1, Jun 24 2003)
Python/Zope Products & Consulting ... http://www.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
"M.-A. Lemburg" <mal@lemburg.com> writes:
That's a good hint, but it only works with IDLE and Notepad, doesn't it ? Other editors or Python IDEs will probably need some more time to grow such a feature.
Correct. However, which other editors are commonly used to teach Python? Regards, Martin
Martin v. ? wrote:
"M.-A. Lemburg" <mal@lemburg.com> writes:
That's a good hint, but it only works with IDLE and Notepad, doesn't it ? Other editors or Python IDEs will probably need some more time to grow such a feature.
Correct. However, which other editors are commonly used to teach Python?
We have educational partners who use Komodo. (That said, Komodo should be able to do the right thing regardless). I'm sure many people use Emacs to teach Python too -- I used to in some circles =). --david
David Ascher <DavidA@ActiveState.com> writes:
We have educational partners who use Komodo. (That said, Komodo should be able to do the right thing regardless).
I'm sure many people use Emacs to teach Python too -- I used to in some circles =).
Thanks for the data. I'll try to find out what is needed to make them work with Python source encodings transparently (for Emacs, it will be tough, as it does not support UTF-8 signatures all that well). Regards, Martin
On Sat, Jun 21, 2003, M.-A. Lemburg wrote:
Martin v. Löwis wrote:
"M.-A. Lemburg" <mal@lemburg.com> writes:
5) Add functions sys.setdefaultsourceencoding() and sys.getdefaultsourceencoding() which allow setting and querying the Python compiler's assumption about the default source code encoding (currently ASCII) much in the same way as sys.set/getdefaultencoding() work for the internal string encoding assumption.
This should not go into 2.3 (just as the other features shouldn't go there).
It was planned for 2.3 several months ago. The fact that it isn't in there yet is mostly my fault: I didn't have time to cook up a patch and forgot to ask here for other volunteers.
-1 unless Guido puts forth a Pronouncement that there will be two more betas instead of just one. -0 on a function that imports a specific module with a specific encoding. While I understand the pain, I really don't want to break what "beta" means. I think that no patch should be accepted unless Guido explicitly Pronounces that it's okay this late. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ Usenet is not a democracy. It is a weird cross between an anarchy and a dictatorship.
-1 unless Guido puts forth a Pronouncement that there will be two more betas instead of just one.
I'm pressed for time and we're late. I'd rather get it out that do another beta. I think Martin might be able to take over deciding on what can go into the next beta and what not. There's always 2.3.1. I'll discuss this with Martin at EuroPython in a few days. --Guido van Rossum (home page: http://www.python.org/~guido/)
participants (14)
-
Aahz -
Andrew MacIntyre -
David Ascher -
Fredrik Lundh -
Guido van Rossum -
Gustavo Niemeyer -
Jack Jansen -
M.-A. Lemburg -
martin@v.loewis.de -
Neil Schemenauer -
Raymond Hettinger -
Skip Montanaro -
Tim Peters -
Walter Dörwald