Mailman 3 Inconsistent script/console behaviour - Python-Dev

Inconsistent script/console behaviour

older
Fwd: Anyone still using Python 2.5?

anatoly techtonik

Sept. 23, 2011

11:25 p.m.

Currently if you work in console and define a function and then immediately call it - it will fail with SyntaxError. For example, copy paste this completely valid Python script into console: def some(): print "XXX" some() There is an issue for that that was just closed by Eric. However, I'd like to know if there are people here that agree that if you paste a valid Python script into console - it should work without changes. -- anatoly t.

Show replies by date

Guido van Rossum

September 2011

11:32 p.m.

On Fri, Sep 23, 2011 at 4:25 PM, anatoly techtonik <techtonik@gmail.com> wrote:

...

Currently if you work in console and define a function and then immediately call it - it will fail with SyntaxError. For example, copy paste this completely valid Python script into console:

def some(): print "XXX" some()

There is an issue for that that was just closed by Eric. However, I'd like to know if there are people here that agree that if you paste a valid Python script into console - it should work without changes.

You can't fix this without completely changing the way the interactive console treats blank lines. None that it's not just that a blank line is required after a function definition -- you also *can't* have a blank line *inside* a function definition. The interactive console is optimized for people entering code by typing, not by copying and pasting large gobs of text. If you think you can have it both, show us the code. -- --Guido van Rossum (python.org/~guido)

Georg Brandl

8:27 a.m.

Am 24.09.2011 01:32, schrieb Guido van Rossum:

...

On Fri, Sep 23, 2011 at 4:25 PM, anatoly techtonik <techtonik@gmail.com> wrote:

...
Currently if you work in console and define a function and then immediately call it - it will fail with SyntaxError. For example, copy paste this completely valid Python script into console:

def some(): print "XXX" some()

There is an issue for that that was just closed by Eric. However, I'd like to know if there are people here that agree that if you paste a valid Python script into console - it should work without changes.

You can't fix this without completely changing the way the interactive console treats blank lines. None that it's not just that a blank line is required after a function definition -- you also *can't* have a blank line *inside* a function definition.

While the former could be changed (I think), the latter certainly cannot. So it's probably not worth changing established behavior. Georg

Yuval Greenfield

9:53 a.m.

Could you elaborate on what would be wrong if function definitions ended only after an explicitly less indented line? The only problem that comes to mind is global scope "if" statements that wouldn't execute when expected (we actually might need to terminate them with a dedented "pass"). On Sep 24, 2011 4:26 AM, "Georg Brandl" <g.brandl@gmx.net> wrote:

...

Am 24.09.2011 01:32, schrieb Guido van Rossum:

...
On Fri, Sep 23, 2011 at 4:25 PM, anatoly techtonik <techtonik@gmail.com> wrote:

...
Currently if you work in console and define a function and then immediately call it - it will fail with SyntaxError. For example, copy paste this completely valid Python script into console:

def some(): print "XXX" some()

There is an issue for that that was just closed by Eric. However, I'd like to know if there are people here that agree that if you paste a valid Python script into console - it should work without changes.

You can't fix this without completely changing the way the interactive console treats blank lines. None that it's not just that a blank line is required after a function definition -- you also *can't* have a blank line *inside* a function definition.

While the former could be changed (I think), the latter certainly cannot. So it's probably not worth changing established behavior.

Georg

_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ubershmekel%40gmail.com

Georg Brandl

10:05 a.m.

You're right that in principle for function definitions there is no ambiguity. But you also presented the downfall of that proposal: all multi-clause statements will still need an explicit way of termination, and of course the "pass" would be exceedingly ugly, not to mention much more confusing than the current way. Georg Am 24.09.2011 11:53, schrieb Yuval Greenfield:

...

Could you elaborate on what would be wrong if function definitions ended only after an explicitly less indented line? The only problem that comes to mind is global scope "if" statements that wouldn't execute when expected (we actually might need to terminate them with a dedented "pass").

On Sep 24, 2011 4:26 AM, "Georg Brandl" <g.brandl@gmx.net <mailto:g.brandl@gmx.net>> wrote:

...
Am 24.09.2011 01:32, schrieb Guido van Rossum:

...
On Fri, Sep 23, 2011 at 4:25 PM, anatoly techtonik <techtonik@gmail.com <mailto:techtonik@gmail.com>> wrote:

...
Currently if you work in console and define a function and then immediately call it - it will fail with SyntaxError. For example, copy paste this completely valid Python script into console:

def some(): print "XXX" some()

There is an issue for that that was just closed by Eric. However, I'd like to know if there are people here that agree that if you paste a valid Python script into console - it should work without changes.

You can't fix this without completely changing the way the interactive console treats blank lines. None that it's not just that a blank line is required after a function definition -- you also *can't* have a blank line *inside* a function definition.

While the former could be changed (I think), the latter certainly cannot. So it's probably not worth changing established behavior.

Guido van Rossum

2:59 p.m.

I see a lot of flawed "proposals". This is clearly a python-ideas discussion. (Anatoly, take note -- please post your new gripe there.) In the mean time, there's a reasonable work-around if you have to copy/paste a large block of formatted code:

...

...
...
exec(''' . . . <put anything you like here> . . . ''')

The only thing that you can't put in there is a triple-quoted string using single quotes. -- --Guido van Rossum (python.org/~guido)

anatoly techtonik

December 2011

8:58 a.m.

On Sat, Sep 24, 2011 at 11:27 AM, Georg Brandl <g.brandl@gmx.net> wrote:

...

Am 24.09.2011 01:32, schrieb Guido van Rossum:

...
On Fri, Sep 23, 2011 at 4:25 PM, anatoly techtonik <techtonik@gmail.com> wrote:

...
Currently if you work in console and define a function and then immediately call it - it will fail with SyntaxError. For example, copy paste this completely valid Python script into console:

def some(): print "XXX" some()

There is an issue for that that was just closed by Eric. However, I'd like to know if there are people here that agree that if you paste a valid Python script into console - it should work without changes.

You can't fix this without completely changing the way the interactive console treats blank lines. None that it's not just that a blank line is required after a function definition -- you also *can't* have a blank line *inside* a function definition.

While the former could be changed (I think), the latter certainly cannot. So it's probably not worth changing established behavior.

I've just hit this UX bug once more, but now I more prepared. Despite Guido's proposal to move into python-ideas, I continue discussion here, because: 1. It is not a proposal, but a defect (well, you may argue, but please, don't) 2. This thread has a history of analysis of what's going wrong in console 3. This thread also has developer's decision that answers the question "why it's so wrong?" and "why it can't/won't be fixed" 4. Yesterday I've heard from a Java person that Python is hard to pick up and remembered how I struggled with indentation myself trying to 'learn by example' in console Right now I am trying to cope with point (3.). To summarize, let's speak code that is copy/pasted into console. Two things that will make me happy if they behave consistently in console from .py file: ---ex1--- def some(): print "XXX" some() ---/ex1--- --ex1.output-- [ex1.py] XXX [console] File "<stdin>", line 3 some() ^ SyntaxError: invalid syntax --/ex1.output-- --ex2-- def some(): pass --/ex2-- --ex2.output-- [ex2.py] File "./ex2.py", line 2 pass ^ IndentationError: expected an indented block [console] File "<stdin>", line 2 pass ^ IndentationError: expected an indented block --/ex2.output-- The second example already works as expected. Why it is not possible to fix ex1? Guido said:

...

You can't fix this without completely changing the way the interactive console treats blank lines.

But the fix doesn't require changing the way interactive console treats blank lines at all. It only requires to finish current block when a dedented line is encountered and not throwing obviously confusing SyntaxError. At the very least it should not say it is SyntaxError, because the code is pretty valid Python code. If it appears to be invalid "Python Console code" - the error message should say that explicitly. That would be a correct user-friendly fix for this UX issue, but I'd still like the behavior to be fixed - i.e. "allow dedented lines end current block in console without SyntaxError". Right now I don't see the reasons why it is not possible. Please speak code when replying about use cases/examples that will be broken - I didn't quite get the problem with "global scope if" statements. -- anatoly t.

Giampaolo Rodolà

9:40 a.m.

Il 15 dicembre 2011 09:58, anatoly techtonik <techtonik@gmail.com> ha scritto:

...

1. It is not a proposal, but a defect (well, you may argue, but please, don't)>

You can't copy/paste multiline scripts into system shell either, unless you append "\". It's likely that similar problems exists in a lot of other interactive shells (ruby?). And that makes sense to me, because they are supposed to be used interactively. It might be good to change this? Maybe. Is the current behavior objectively wrong? No, in my opinion. --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/

Mark Shannon

10:18 p.m.

New subject: A new dict for Xmas?

Hi all, The current dict implementation is getting pretty old, isn't it time we had a new one (for xmas)? I have a new dict implementation which allows sharing of keys between objects of the same class. You can check it out here: http://bitbucket.org/markshannon/hotpy_new_dict Performance: For numerical applications, with few instances of user-defined classes, performance is pretty much unchanged, degrading about 1% for pystones. For applications that create lots of instances of user-defined classes, performance is improved and memory savings are large. For the gcbench benchmark (from unladen swallow), cpython with the new dict is about 9% faster and, more importantly, reduces memory use from 99 Mbytes to 61Mbytes (a 38% reduction). All tests were done on my ancient 32 bit intel linux machine, please try it out on your machines and let me know what sort of results you get. By the way it passes all the tests, but there are strange interactions with weakrefs and the GC. (Try running the tests, you'll see what I mean) Cheers, Mark

Antoine Pitrou

11:15 p.m.

New subject: A new dict for Xmas?

On Thu, 15 Dec 2011 22:18:18 +0000 Mark Shannon <mark@hotpy.org> wrote:

...

For the gcbench benchmark (from unladen swallow), cpython with the new dict is about 9% faster and, more importantly, reduces memory use from 99 Mbytes to 61Mbytes (a 38% reduction).

All tests were done on my ancient 32 bit intel linux machine, please try it out on your machines and let me know what sort of results you get.

Benchmark results under a Core i5, 64-bit Linux: Report on Linux localhost.localdomain 2.6.38.8-desktop-8.mga #1 SMP Fri Nov 4 00:05:53 UTC 2011 x86_64 x86_64 Total CPU cores: 4 ### call_method ### Min: 0.292352 -> 0.274041: 1.07x faster Avg: 0.292978 -> 0.277124: 1.06x faster Significant (t=17.31) Stddev: 0.00053 -> 0.00351: 6.5719x larger ### call_method_slots ### Min: 0.284101 -> 0.273508: 1.04x faster Avg: 0.285029 -> 0.274534: 1.04x faster Significant (t=26.86) Stddev: 0.00068 -> 0.00135: 1.9969x larger ### call_simple ### Min: 0.225191 -> 0.222104: 1.01x faster Avg: 0.227443 -> 0.222776: 1.02x faster Significant (t=9.53) Stddev: 0.00181 -> 0.00056: 3.2266x smaller ### fastpickle ### Min: 0.482402 -> 0.493695: 1.02x slower Avg: 0.486077 -> 0.496568: 1.02x slower Significant (t=-5.35) Stddev: 0.00340 -> 0.00276: 1.2335x smaller ### fastunpickle ### Min: 0.394846 -> 0.433733: 1.10x slower Avg: 0.397362 -> 0.436318: 1.10x slower Significant (t=-23.73) Stddev: 0.00234 -> 0.00283: 1.2129x larger ### float ### Min: 0.052567 -> 0.051377: 1.02x faster Avg: 0.053812 -> 0.052669: 1.02x faster Significant (t=3.72) Stddev: 0.00110 -> 0.00107: 1.0203x smaller ### json_dump ### Min: 0.381395 -> 0.391053: 1.03x slower Avg: 0.381937 -> 0.393219: 1.03x slower Significant (t=-7.15) Stddev: 0.00043 -> 0.00350: 8.1447x larger ### json_load ### Min: 0.347112 -> 0.369763: 1.07x slower Avg: 0.347490 -> 0.370317: 1.07x slower Significant (t=-69.64) Stddev: 0.00045 -> 0.00058: 1.2717x larger ### nbody ### Min: 0.238068 -> 0.219208: 1.09x faster Avg: 0.238951 -> 0.220000: 1.09x faster Significant (t=36.09) Stddev: 0.00076 -> 0.00090: 1.1863x larger ### nqueens ### Min: 0.262282 -> 0.252576: 1.04x faster Avg: 0.263835 -> 0.254497: 1.04x faster Significant (t=7.12) Stddev: 0.00117 -> 0.00269: 2.2914x larger ### regex_effbot ### Min: 0.060298 -> 0.057791: 1.04x faster Avg: 0.060435 -> 0.058128: 1.04x faster Significant (t=17.82) Stddev: 0.00012 -> 0.00026: 2.1761x larger ### richards ### Min: 0.148266 -> 0.143755: 1.03x faster Avg: 0.150677 -> 0.145003: 1.04x faster Significant (t=5.74) Stddev: 0.00200 -> 0.00094: 2.1329x smaller ### silent_logging ### Min: 0.057191 -> 0.059082: 1.03x slower Avg: 0.057335 -> 0.059194: 1.03x slower Significant (t=-17.40) Stddev: 0.00020 -> 0.00013: 1.4948x smaller ### unpack_sequence ### Min: 0.000046 -> 0.000042: 1.10x faster Avg: 0.000048 -> 0.000044: 1.09x faster Significant (t=128.98) Stddev: 0.00000 -> 0.00000: 1.8933x smaller gcbench first showed no memory consumption difference (using "ps -u"). I then removed the "stretch tree" (which apparently reserves memory upfront) and I saw a ~30% memory saving as well as a 20% performance improvement on large sizes. Regards Antoine.

Mark Shannon

11:43 p.m.

New subject: A new dict for Xmas?

Antoine Pitrou wrote:

...

On Thu, 15 Dec 2011 22:18:18 +0000 Mark Shannon <mark@hotpy.org> wrote:

...
For the gcbench benchmark (from unladen swallow), cpython with the new dict is about 9% faster and, more importantly, reduces memory use from 99 Mbytes to 61Mbytes (a 38% reduction).

All tests were done on my ancient 32 bit intel linux machine, please try it out on your machines and let me know what sort of results you get.

Benchmark results under a Core i5, 64-bit Linux:

Report on Linux localhost.localdomain 2.6.38.8-desktop-8.mga #1 SMP Fri Nov 4 00:05:53 UTC 2011 x86_64 x86_64 Total CPU cores: 4

### call_method ### Min: 0.292352 -> 0.274041: 1.07x faster Avg: 0.292978 -> 0.277124: 1.06x faster Significant (t=17.31) Stddev: 0.00053 -> 0.00351: 6.5719x larger

### call_method_slots ### Min: 0.284101 -> 0.273508: 1.04x faster Avg: 0.285029 -> 0.274534: 1.04x faster Significant (t=26.86) Stddev: 0.00068 -> 0.00135: 1.9969x larger

### call_simple ### Min: 0.225191 -> 0.222104: 1.01x faster Avg: 0.227443 -> 0.222776: 1.02x faster Significant (t=9.53) Stddev: 0.00181 -> 0.00056: 3.2266x smaller

### fastpickle ### Min: 0.482402 -> 0.493695: 1.02x slower Avg: 0.486077 -> 0.496568: 1.02x slower Significant (t=-5.35) Stddev: 0.00340 -> 0.00276: 1.2335x smaller

### fastunpickle ### Min: 0.394846 -> 0.433733: 1.10x slower Avg: 0.397362 -> 0.436318: 1.10x slower Significant (t=-23.73) Stddev: 0.00234 -> 0.00283: 1.2129x larger

### float ### Min: 0.052567 -> 0.051377: 1.02x faster Avg: 0.053812 -> 0.052669: 1.02x faster Significant (t=3.72) Stddev: 0.00110 -> 0.00107: 1.0203x smaller

### json_dump ### Min: 0.381395 -> 0.391053: 1.03x slower Avg: 0.381937 -> 0.393219: 1.03x slower Significant (t=-7.15) Stddev: 0.00043 -> 0.00350: 8.1447x larger

### json_load ### Min: 0.347112 -> 0.369763: 1.07x slower Avg: 0.347490 -> 0.370317: 1.07x slower Significant (t=-69.64) Stddev: 0.00045 -> 0.00058: 1.2717x larger

### nbody ### Min: 0.238068 -> 0.219208: 1.09x faster Avg: 0.238951 -> 0.220000: 1.09x faster Significant (t=36.09) Stddev: 0.00076 -> 0.00090: 1.1863x larger

### nqueens ### Min: 0.262282 -> 0.252576: 1.04x faster Avg: 0.263835 -> 0.254497: 1.04x faster Significant (t=7.12) Stddev: 0.00117 -> 0.00269: 2.2914x larger

### regex_effbot ### Min: 0.060298 -> 0.057791: 1.04x faster Avg: 0.060435 -> 0.058128: 1.04x faster Significant (t=17.82) Stddev: 0.00012 -> 0.00026: 2.1761x larger

### richards ### Min: 0.148266 -> 0.143755: 1.03x faster Avg: 0.150677 -> 0.145003: 1.04x faster Significant (t=5.74) Stddev: 0.00200 -> 0.00094: 2.1329x smaller

### silent_logging ### Min: 0.057191 -> 0.059082: 1.03x slower Avg: 0.057335 -> 0.059194: 1.03x slower Significant (t=-17.40) Stddev: 0.00020 -> 0.00013: 1.4948x smaller

### unpack_sequence ### Min: 0.000046 -> 0.000042: 1.10x faster Avg: 0.000048 -> 0.000044: 1.09x faster Significant (t=128.98) Stddev: 0.00000 -> 0.00000: 1.8933x smaller

Thanks for running the benchmarks. It's probably best not to attach to much significance to a few percent her and there, but its good to see that performance is OK.

...

gcbench first showed no memory consumption difference (using "ps -u"). I then removed the "stretch tree" (which apparently reserves memory upfront) and I saw a ~30% memory saving as well as a 20% performance improvement on large sizes.

I should say how I did my memory tests. I did a search using ulimit to limit the maximum amount of memory the process was allowed. The given numbers were the minimum required to complete, I did not remove the "stretch tree". Cheers, Mark.

Greg Ewing

5:57 a.m.

New subject: A new dict for Xmas?

Mark Shannon wrote:

...

I have a new dict implementation which allows sharing of keys between objects of the same class.

We already have the __slots__ mechanism for memory savings. Have you done any comparisons with that? Seems to me that __slots__ ought to save even more memory, since it eliminates the per-instance dict altogether rather than just the keys half of it. -- Greg

Mark Shannon

10:03 a.m.

New subject: A new dict for Xmas?

Greg Ewing wrote:

...

Mark Shannon wrote:

...
I have a new dict implementation which allows sharing of keys between objects of the same class.

We already have the __slots__ mechanism for memory savings. Have you done any comparisons with that?

You can't make Python programmers use slots, neither can you automatically change existing programs. Are you suggesting that because the __slots__ mechanism exists, the dict implementation doesn't have to be efficient?

...

Seems to me that __slots__ ought to save even more memory, since it eliminates the per-instance dict altogether rather than just the keys half of it.

Of course using __slots__ saves more memory, but people don't use them much. Cheers, Mark.

Terry Reedy

9:32 p.m.

New subject: A new dict for Xmas?

On 12/16/2011 5:03 AM, Mark Shannon wrote:

...

Of course using __slots__ saves more memory, but people don't use them much.

Do you think the stdlib should be using __slots__ more? -- Terry Jan Reedy

Mark Shannon

9:42 p.m.

New subject: A new dict for Xmas?

Terry Reedy wrote:

...

On 12/16/2011 5:03 AM, Mark Shannon wrote:

...
Of course using __slots__ saves more memory, but people don't use them much.

Do you think the stdlib should be using __slots__ more?

For some things yes, but where it's critical slots are already used. Take the ordered dict, the nodes in that use slots. The advantage of improving things in the VM is that we don't have to rewrite half of the stdlib. Cheers, Mark.

Maciej Fijalkowski

11:53 a.m.

New subject: A new dict for Xmas?

On Fri, Dec 16, 2011 at 11:32 PM, Terry Reedy <tjreedy@udel.edu> wrote:

...

On 12/16/2011 5:03 AM, Mark Shannon wrote:

...
Of course using __slots__ saves more memory, but people don't use them much.

Do you think the stdlib should be using __slots__ more?

Note that unlike some other more advanced approaches, slots do change semantics. There are many cases out there where people would stuff arbitrary things on stdlib objects and this works fine without __slots__, but will stop working as soon as you introduce them. A change from no slots to using slots is not only a performance issue. Cheers, fijal

Dirkjan Ochtman

12:31 p.m.

New subject: A new dict for Xmas?

On Sat, Dec 17, 2011 at 12:53, Maciej Fijalkowski <fijall@gmail.com> wrote:

...

Note that unlike some other more advanced approaches, slots do change semantics. There are many cases out there where people would stuff arbitrary things on stdlib objects and this works fine without __slots__, but will stop working as soon as you introduce them. A change from no slots to using slots is not only a performance issue.

Yeah... This whole idea reeks of polymorphic inline caches (called "shapes" or "hidden classes" in SpiderMonkey and v8, respectively), where they dynamically try to infer what kind of class an object has, such that the __slots__ optimization can be done without making it visible in the semantics. The Unladen Swallow guys mention in their ProjectPlan that the overhead of opcode fetch/dispatch makes that hard, though. Cheers, Dirkjan

Maciej Fijalkowski

12:34 p.m.

New subject: A new dict for Xmas?

On Sat, Dec 17, 2011 at 2:31 PM, Dirkjan Ochtman <dirkjan@ochtman.nl> wrote:

...

On Sat, Dec 17, 2011 at 12:53, Maciej Fijalkowski <fijall@gmail.com> wrote:

...
Note that unlike some other more advanced approaches, slots do change semantics. There are many cases out there where people would stuff arbitrary things on stdlib objects and this works fine without __slots__, but will stop working as soon as you introduce them. A change from no slots to using slots is not only a performance issue.

Yeah... This whole idea reeks of polymorphic inline caches (called "shapes" or "hidden classes" in SpiderMonkey and v8, respectively), where they dynamically try to infer what kind of class an object has, such that the __slots__ optimization can be done without making it visible in the semantics. The Unladen Swallow guys mention in their ProjectPlan that the overhead of opcode fetch/dispatch makes that hard, though.

Cheers,

Dirkjan

It's done in PyPy btw. Works like a charm :) It's called sharing dict and the idea dates back to self and it's maps. There is also an ongoing effort to specialize on types of fields, so you don't have to box say ints stored on classes. That's however in-progress now :)

"Martin v. Löwis"

2:09 p.m.

New subject: A new dict for Xmas?

...

The current dict implementation is getting pretty old, isn't it time we had a new one (for xmas)?

I like the approach, and I think something should be done indeed. If you don't contribute your approach, I'd like to drop at least ma_smalltable for 3.3. A number of things about your branch came to my mind: - it would be useful to have a specialized representation for all-keys-are-strings. In that case, me_hash could be dropped from the representation. You would get savings compared to the status quo even in the non-shared case. - why does _dictkeys need to be a full-blown Python object? We need refcounting and the size, but not the type slot. - I wonder whether the shared keys could be computed at compile time, considering all attribute names that get assigned for self. The compiler could list those in the code object, and class creation could iterate over all methods (taking base classes into account). Regards, Martin

Maciej Fijalkowski

6:15 p.m.

New subject: A new dict for Xmas?

...

- I wonder whether the shared keys could be computed at compile time, considering all attribute names that get assigned for self. The compiler could list those in the code object, and class creation could iterate over all methods (taking base classes into account).

This is hard, because sometimes you don't quite know what the self *is* even, especially if __init__ calls some methods or there is any sort of control flow. You can however track what gets assigned at runtime at have shapes associated with objects.

"Martin v. Löwis"

8:57 a.m.

New subject: A new dict for Xmas?

Am 22.12.2011 19:15, schrieb Maciej Fijalkowski:

...

...
- I wonder whether the shared keys could be computed at compile time, considering all attribute names that get assigned for self. The compiler could list those in the code object, and class creation could iterate over all methods (taking base classes into account).

This is hard, because sometimes you don't quite know what the self *is* even, especially if __init__ calls some methods or there is any sort of control flow. You can however track what gets assigned at runtime at have shapes associated with objects.

Actually, it's fairly easy, as it only needs to be heuristical. I am proposing the exact heuristics as specified above ("attribute names that get assigned for self"). I don't think that __init__ calling methods is much of an issue here, since these methods then still have attributes assigned to self. Regards, Martin

Mark Shannon

9:51 a.m.

New subject: A new dict for Xmas?

Martin v. Löwis wrote:

...

...
The current dict implementation is getting pretty old, isn't it time we had a new one (for xmas)?

I like the approach, and I think something should be done indeed. If you don't contribute your approach, I'd like to drop at least ma_smalltable for 3.3.

A number of things about your branch came to my mind: - it would be useful to have a specialized representation for all-keys-are-strings. In that case, me_hash could be dropped from the representation. You would get savings compared to the status quo even in the non-shared case. It might tricky switching key tables and I dont think it would save much memory as keys that are widely shared take up very little memory anyway, and not many other dicts are long-lived.

(It might improve performance for dicts used for keyword arguments)

...

- why does _dictkeys need to be a full-blown Python object? We need refcounting and the size, but not the type slot. It doesn't. It's just a hangover from my original HotPy implementation where all objects needed a type for the GC. So yes, the type slot could be removed.

...

- I wonder whether the shared keys could be computed at compile time, considering all attribute names that get assigned for self. The compiler could list those in the code object, and class creation could iterate over all methods (taking base classes into account).

It probably wouldn't be that hard to make a guess at compile time as to what the shared keys would be, but it doesn't really matter. The generation of intermediate shared keys will only happen once per class, so the overhead would be negligible. To cut down on that overhead, we could use a ref-count trick: If the instance being updating and its class hold the only two refs to an immutable keys(-set -table -vector?) then just treat it as mutable. I'll modify the repo to incorporate these changes when I have a chance. Cheers, Mark.

"Martin v. Löwis"

10:33 a.m.

New subject: A new dict for Xmas?

...

...
- it would be useful to have a specialized representation for all-keys-are-strings. In that case, me_hash could be dropped from the representation. You would get savings compared to the status quo even in the non-shared case. It might tricky switching key tables and I dont think it would save much memory as keys that are widely shared take up very little memory anyway, and not many other dicts are long-lived.

Why do you say that? In a plain 3.3 interpreter, I counted 595 dict objects (see script below). Of these, 563 (so nearly of them) had only strings as keys. Among those, I found 286 different key sets, where 231 key sets occurred only once (i.e. wouldn't be shared). Together, the string dictionaries had 13282 keys, and you could save as many pointers (actually more, because there will be more key slots than keys). I'm not sure why you think the string dicts with unshared keys would be short-lived. But even if they were, what matters is the steady-state number of dictionaries - if for every short-lived dictionary that gets released another one is created, any memory savings from reducing the dict size would still materialize.

...

...
- I wonder whether the shared keys could be computed at compile time, considering all attribute names that get assigned for self. The compiler could list those in the code object, and class creation could iterate over all methods (taking base classes into account).

It probably wouldn't be that hard to make a guess at compile time as to what the shared keys would be, but it doesn't really matter. The generation of intermediate shared keys will only happen once per class, so the overhead would be negligible.

I'm not so much concerned about overhead, but about correctness/ effectiveness of the heuristics. For a class with dynamic attributes, you may well come up with a very large key set. With source analysis, you wouldn't attempt to grow the keyset beyond what likely is being shared. Regards, Martin import sys d = sys.getobjects(0,dict) print(len(d), "dicts") d2 = [] for o in d: keys = o.keys() if not keys:continue types = tuple(set(type(k) for k in keys)) if types != (str,): continue d2.append(tuple(sorted(keys))) print(len(d2), "str dicts") freq = {} for keys in d2: freq[keys] = freq.get(keys,0)+1 print(len(freq), "different key sets") freq = sorted(freq.items(), key=lambda t:t[1]) print(len([o for o in freq if o[1]==1]), "unsharable") print(sum(len(o[0]) for o in freq), "keys") print(freq[-10:])

Mark Shannon

11:21 a.m.

New subject: A new dict for Xmas?

Martin v. Löwis wrote:

...

...
...
- it would be useful to have a specialized representation for all-keys-are-strings. In that case, me_hash could be dropped from the representation. You would get savings compared to the status quo even in the non-shared case. It might tricky switching key tables and I dont think it would save much memory as keys that are widely shared take up very little memory anyway, and not many other dicts are long-lived.

Why do you say that? In a plain 3.3 interpreter, I counted 595 dict objects (see script below). Of these, 563 (so nearly of them) had only strings as keys. Among those, I found 286 different key sets, where 231 key sets occurred only once (i.e. wouldn't be shared).

Together, the string dictionaries had 13282 keys, and you could save as many pointers (actually more, because there will be more key slots than keys).

The question is how much memory needs to be saved to be worth adding the complexity, 10kb: No, 100Mb: yes. So data from "real" benchmarks would be useful. Also, I'm assuming that it would be tricky to implement correctly due to implicit assumptions in the rest of the code. If I'm wrong and its easy to implement then please do.

...

I'm not sure why you think the string dicts with unshared keys would be short-lived.

Not all, but most. Most dicts with unshared keys would most likely be for keyword parameters. Explicit dicts tend to be few in number. (When I say few I mean up to 1k, not 100k or 1M). Module dicts are very likely to have unshared keys; they number in the 10s or 100s, but they do tend to be large.

...

But even if they were, what matters is the steady-state number of dictionaries - if for every short-lived dictionary that gets released another one is created, any memory savings from reducing the dict size would still materialize. But only a few kb?

...

...
...
- I wonder whether the shared keys could be computed at compile time, considering all attribute names that get assigned for self. The compiler could list those in the code object, and class creation could iterate over all methods (taking base classes into account).

It probably wouldn't be that hard to make a guess at compile time as to what the shared keys would be, but it doesn't really matter. The generation of intermediate shared keys will only happen once per class, so the overhead would be negligible.

I'm not so much concerned about overhead, but about correctness/ effectiveness of the heuristics. For a class with dynamic attributes, you may well come up with a very large key set. With source analysis, you wouldn't attempt to grow the keyset beyond what likely is being shared.

I agree some sort of heuristic is required to limit excessive growth and prevent pathological behaviour. The current implementation just has a cut off at a certain size; it could definitely be improved. As I said, I'll update the code soon and then, well what's the phase... Oh yes, "patches welcome" ;) Thanks for the feedback. Cheers, Mark.

...

Regards, Martin

import sys d = sys.getobjects(0,dict) print(len(d), "dicts") d2 = [] for o in d: keys = o.keys() if not keys:continue types = tuple(set(type(k) for k in keys)) if types != (str,): continue d2.append(tuple(sorted(keys))) print(len(d2), "str dicts") freq = {} for keys in d2: freq[keys] = freq.get(keys,0)+1 print(len(freq), "different key sets") freq = sorted(freq.items(), key=lambda t:t[1]) print(len([o for o in freq if o[1]==1]), "unsharable") print(sum(len(o[0]) for o in freq), "keys") print(freq[-10:])

Stefan Behnel

12:03 p.m.

New subject: A new dict for Xmas?

Mark Shannon, 23.12.2011 12:21:

...

Martin v. Löwis wrote:

...
...
...
- it would be useful to have a specialized representation for all-keys-are-strings. In that case, me_hash could be dropped from the representation. You would get savings compared to the status quo even in the non-shared case. It might tricky switching key tables and I dont think it would save much memory as keys that are widely shared take up very little memory anyway, and not many other dicts are long-lived.

Why do you say that? In a plain 3.3 interpreter, I counted 595 dict objects (see script below). Of these, 563 (so nearly of them) had only strings as keys. Among those, I found 286 different key sets, where 231 key sets occurred only once (i.e. wouldn't be shared).

Together, the string dictionaries had 13282 keys, and you could save as many pointers (actually more, because there will be more key slots than keys).

The question is how much memory needs to be saved to be worth adding the complexity, 10kb: No, 100Mb: yes. So data from "real" benchmarks would be useful.

Consider taking a parsed MiniDOM tree as a benchmark. It contains so many instances of just a couple of different classes that it just has to make a huge difference if each of those instances is even just a bit smaller. It should also make a clear difference for plain Python ElementTree. I attached a benchmark script that measures the parsing speed as well as the total memory usage of the in-memory tree. You can get data files from the following places, just download them and pass their file names on the command line: http://gnosis.cx/download/hamlet.xml http://www.ibiblio.org/xml/examples/religion/ot/ot.xml Here are some results from my own machine for comparison: http://blog.behnel.de/index.php?p=197 Stefan

"Martin v. Löwis"

1:07 p.m.

New subject: A new dict for Xmas?

...

Consider taking a parsed MiniDOM tree as a benchmark. It contains so many instances of just a couple of different classes that it just has to make a huge difference if each of those instances is even just a bit smaller. It should also make a clear difference for plain Python ElementTree.

Of course, for minidom, Mark's current implementation should already save quite a lot of memory, since all elements and text nodes have the same attributes. Still, it would be good to see how Mark's implementation deals with that. Regards, Martin

"Martin v. Löwis"

1:05 p.m.

New subject: A new dict for Xmas?

...

If I'm wrong and its easy to implement then please do.

Ok, so I take it that you are not interested in the idea. No problem. Regards, Martin

Mark Shannon

3:08 p.m.

New subject: A new dict for Xmas?

Martin v. Löwis wrote:

...

...
If I'm wrong and its easy to implement then please do.

Ok, so I take it that you are not interested in the idea. No problem.

Its just that I don't think it would yield results commensurate with the effort. Also I think its worth keeping the initial version as simple as reasonably possible. Refinements can be added later. Cheers, Mark.

...

Regards, Martin

Terry Reedy

7:06 p.m.

On 12/15/2011 3:58 AM, anatoly techtonik wrote:

...

1. It is not a proposal, but a defect (well, you may argue, but please, don't)

You state a controversial opinion as a fact and then request that others not discuss it. To me, this is a somewhat obnoxious hit-and-run tactic. If you do not want the point discussed, don't bring it up. Anyway, I will follow your request and not argue. Since that opinion is a central point, not discussing it does not leave much to say. -- Terry Jan Reedy

Chris Withers

September 2011

5:46 p.m.

On 24/09/2011 00:32, Guido van Rossum wrote:

...

The interactive console is optimized for people entering code by typing, not by copying and pasting large gobs of text.

If you think you can have it both, show us the code.

Anatoly wants ipython's new qtconsole. This "does the right thing" because it's a GUI app and so can manipulate the content on paste... Not sure if you can do that in a console app... cheers, Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk

Fernando Perez

December 2011

7:46 a.m.

On Fri, 23 Sep 2011 16:32:30 -0700, Guido van Rossum wrote:

...

You can't fix this without completely changing the way the interactive console treats blank lines. None that it's not just that a blank line is required after a function definition -- you also *can't* have a blank line *inside* a function definition.

The interactive console is optimized for people entering code by typing, not by copying and pasting large gobs of text.

If you think you can have it both, show us the code.

Apology for the advertising, but if the OP is really interested in that kind of behavior, then instead of asking for making the default shell more complex, he can use ipython which supports what he's looking for: In [5]: def some(): ...: print 'xxx' ...: some() ...: xxx and even blank lines inside functions (albeit only in certain locations): In [6]: def some(): ...: ...: print 'xxx' ...: some() ...: xxx Now, the dances we have to do in ipython to achieve that are much more complex than what would be reasonable to have in the default '>>>' python shell, which should remain simple, light and robust. But ipython is a simple install for someone who wants fancier features for interactive work. Cheers, f

Stephen J. Turnbull

4:47 a.m.

Fernando Perez writes:

...

Apology for the advertising,

If there's any apologizing to be done, it's on Anatoly's part. Your post was short, to the point, information-packed, and should put a big fat open-centered ideographic full stop period to this thread.

anatoly techtonik

6:40 p.m.

On Mon, Dec 19, 2011 at 7:47 AM, Stephen J. Turnbull <stephen@xemacs.org>wrote:

...

Fernando Perez writes:

...
Apology for the advertising,

If there's any apologizing to be done, it's on Anatoly's part. Your post was short, to the point, information-packed, and should put a big fat open-centered ideographic full stop period to this thread.

Fernando clearly showed that IPython rocks, because CPython suxx. I don't think anybody should apologize for the intention to fix this by enhancing CPython, so as a python-dev subscriber you should be ashamed of yourself for this proposal already. ;) Thanks everyone else for explaining the problem with current implementation. I'll post a follow-up as soon as I have a time to wrap my head around the details and see for myself why the IPython solution is so hard to implement. -- anatoly t.

Stephen J. Turnbull

2:14 a.m.

anatoly techtonik writes:

...

Fernando clearly showed that IPython rocks, because CPython suxx.

<sigh/> No, IPython rocks because it focuses on doing one thing well: providing an interactive environment that takes advantage of the many features that Python provides in support. CPython should do the same: specifically, focus on the *language* that we all consider excellent but still can be improved, and on the (still) leading implementation of the language and the stdlib.[1]

...

so as a python-dev subscriber you should be ashamed of yourself for this proposal already. ;)

ROTFLMAO! No, I still think you're making an awfully big deal of something that doesn't need fixing, and I wish you would stop. Footnotes: [1] Note that this *is* *one* task, because CPython has chosen a definition of "language excellence" that includes prototype implementation of proposed language features and "batteries included".

Yuval Greenfield

September 2011

11:34 p.m.

I agree that it should and it doesn't. I also recall that not having empty lines between function/class definitions can cause indentation errors when pasting to the console on my windows machine. --Yuval On Sep 23, 2011 7:26 PM, "anatoly techtonik" <techtonik@gmail.com> wrote:

...

Currently if you work in console and define a function and then immediately call it - it will fail with SyntaxError. For example, copy paste this completely valid Python script into console:

def some(): print "XXX" some()

There is an issue for that that was just closed by Eric. However, I'd like to know if there are people here that agree that if you paste a valid Python script into console - it should work without changes. -- anatoly t. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ubershmekel%40gmail.com

Terry Reedy

11:49 p.m.

On 9/23/2011 7:25 PM, anatoly techtonik wrote:

...

Currently if you work in console and define a function and then immediately call it - it will fail with SyntaxError. For example, copy paste this completely valid Python script into console:

def some(): print "XXX" some()

There is an issue for that that was just closed by Eric. However, I'd like to know if there are people here that agree that if you paste a valid Python script into console - it should work without changes.

For this kind of multi-line, multi-statemenmt pasting, open an IDLE edit window for tem.py (my name) or such, paste, run with F5. I have found that this works for me than direct pasting. A interactive lisp interpreter can detect end-of-statement without a blank line by matching a closing paren to the open paren that starts every expression. -- Terry Jan Reedy

Brian Curtin

12:03 a.m.

On Fri, Sep 23, 2011 at 18:49, Terry Reedy <tjreedy@udel.edu> wrote:

...

A interactive lisp interpreter can detect end-of-statement without a blank line by matching a closing paren to the open paren that starts every expression.

Braces-loving programmers around the world are feverishly writing a PEP as we speak.

4798

Age (days ago)

4889

Last active (days ago)

List overview

Download

36 comments

17 participants

participants (17)

"Martin v. Löwis"
anatoly techtonik
Antoine Pitrou
Brian Curtin
Chris Withers
Dirkjan Ochtman
Fernando Perez
Georg Brandl
Giampaolo Rodolà
Greg Ewing
Guido van Rossum
Maciej Fijalkowski
Mark Shannon
Stefan Behnel
Stephen J. Turnbull
Terry Reedy
Yuval Greenfield

Inconsistent script/console behaviour

tags

participants (17)