Revisiting a frozenset display literal
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
Inspired by this enhancement request: https://bugs.python.org/issue46393 I thought it might be time to revist the idea of a frozenset display. This has been discussed a few times before, such as here: https://mail.python.org/pipermail/python-ideas/2018-July/051902.html We have displays for the most important builtin data structures: - lists [1, 2, 3] - tuples (1, 2, 3) - dicts {1: 0, 2: 0, 3: 0} - sets {1, 2, 3} (as well as literals for ints, floats, strings and bytes) but not for frozensets. So the only way to guarantee that you have a frozenset is to explicitly call the builtin, which is not only a runtime call rather than a compile-time operation, but can be monkey-patched or shadowed. CPython already has some neat optimizations in place to use frozensets instead of sets, for example:
and the compiler can build frozensets of literals as a constant. Ironically, this means that actually making a frozenset explicitly does far more work than needed:
Got that? To create a frozenset of literals, first the compiler creates a frozenset constant containing what you wanted. Then at runtime, it: - looks up frozenset in globals and builtins; - loads the pre-prepared frozenset (which is exactly what we want); - creates a new set from that frozenset; - calls the frozenset() function on that set to create a new frozenset that duplicates the pre-prepared one; - and finally garbage-collects the temporary set. So to get the frozenset we want, we start with the frozenset we want, and make an unnecessary copy the long way o_O If you didn't know that every step in that song and dance routine was necessary, it would seem ludicrously wasteful. If we had a frozenset display, we could avoid most of that work, optimizing that down to a LOAD_CONST like this:
It seems to me that all of the machinery to make this work already exists. The compiler already knows how to create frozensets at compile-time, avoiding the need to lookup and call the frozenset() builtin. All we need is syntax for a frozenset display. How does this work for you? f{1, 2, 3} -- Steve
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Sun, Jan 16, 2022 at 7:35 PM Steven D'Aprano <steve@pearwood.info> wrote:
How does this work for you?
f{1, 2, 3}
While it's tempting, it does create an awkward distinction. f(1, 2, 3) # look up f, call it with parameters f[1, 2, 3] # look up f, subscript it with paramters f{1, 2, 3} # construct a frozenset And that means it's going to be a bug magnet. Are we able to instead make a sort of vector literal? <1, 2, 3> Unfortunately there aren't many symbols available, and Python's kinda locked into a habit of using just one at each end (rather than, say, (<1, 2, 3>) or something), so choices are quite limited. ChrisA
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sun, Jan 16, 2022 at 09:18:40PM +1100, Chris Angelico wrote:
You forgot f"1, 2, {x+1}" # eval some code and construct a string Not to mention: r(1, 2, 3) # look up r, call it with parameters r[1, 2, 3] # look up r, subscript it r"1, 2, 3" # a string literal
And that means it's going to be a bug magnet.
I don't think that f{} will be any more of a bug magnet than f"" and r"" already are.
Are we able to instead make a sort of vector literal?
<1, 2, 3>
Back in the days when Python's parser was LL(1), that wasn't possible. Now that it uses a PEG parser, maybe it is, but is it desirable? Reading this makes my eyes bleed: >>> <1, 2, 3> < <1, 2, 3, 4> True
Triple quoted strings say hello :-) {{1, 2, 3}} would work, since that's currently a runtime error. But I prefer the f{} syntax. -- Steve
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Sun, Jan 16, 2022 at 11:18 PM Steven D'Aprano <steve@pearwood.info> wrote:
Strings behave differently in many many ways. Are there any non-string types that differ?
Fair point, but I can't imagine people comparing two literals like that. It's not quite as bad if you replace the left side with a variable or calculation, though it's still kinda weird.
See above, strings are different, and people treat them differently.
{{1, 2, 3}} would work, since that's currently a runtime error. But I prefer the f{} syntax.
Yeah, I think that ship has sailed. Double punctuation just isn't Python's thing, so there aren't really any good ways to shoehorn more data types into fewer symbols. ChrisA
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sun, Jan 16, 2022 at 11:41:52PM +1100, Chris Angelico wrote:
On Sun, Jan 16, 2022 at 11:18 PM Steven D'Aprano <steve@pearwood.info> wrote:
There are plenty of non-string types which differ :-) Differ in what way? I don't understand your question. You were concerned that adding a prefix to a delimiter in the form of f{...} would be a bug magnet, but we have had prefixes on delimiters for 30 years in the form of r"..." etc, and it hasn't been a problem. I mean, sure, the occasional beginner might get confused and write len{mystring} and if by some fluke they call f() rather than len() they will get a silent failure instead of a SyntaxError, but is this really a serious problem that is common enough to get labelled "a bug magnet"? I've been coding in Python for two decades and I still occassionally mess up round and square brackets, especially late at night, and I won't tell you how often I write my dict displays with equal signs {key=value}, or misspell str.center. And I still cringe about the time a few years back where my brain forgot that Python spells it "None" rather than "nil" like in Pascal, and I spent about an hour writing a ton of "if obj is nil"... tests. Typos and brain farts happen.
They don't have to be literals inside the brackets. Especially in the REPL, `{*a} < {*b}` is a quick way of testing that every element of a is an element of b. [...]
Triple quoted strings say hello :-)
See above, strings are different, and people treat them differently.
Do they? How are they different? You have a start delimiter and an end delimiter. The only difference I see is that with strings the delimiter is the same, instead of a distinct open and close delimiter. But that difference is surely not a reason to reject the use of a prefix. "We can't use a prefix because the closing delimiter is different from the opening delimiter" does not follow. -- Steve
data:image/s3,"s3://crabby-images/83003/83003405cb3e437d91969f4da1e4d11958d94f27" alt=""
On 2022-01-16 16:11, Steven D'Aprano wrote:
Well, there is a big difference, which is that the stuff between other delimiters (parentheses, brackets, etc.) is wholly constrained by Python syntax, whereas the stuff between string delimiters is free-form text, with only a few restrictions (like not being able to use the delimiter itself, or to include newlines in single-quoted strings). Whether that difference is important for your proposal I won't address right now. But it is a big difference. It also greatly affects how people view the code, since syntax highlighters will often color an entire string literal with the same color, whereas they don't typically do that for other kinds of delimited chunks, instead highlighting only the delimiters themselves. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sun, Jan 16, 2022 at 04:23:47PM -0800, Brendan Barnwell wrote:
Which is also wholly constrained by Python syntax, seeing as they are Python strings :-)
Are any of these differences relevant to putting a prefix on the opening delimiter? If they are not, why mention them? We could also talk about the difference between the numeric value of the ASCII symbols, or the number of pixels in a " versus a { glyph, or the linguistic history of the words "quotation mark" versus "curly bracket" too, but none of these things seem to be any more relevant than whether IDEs and syntax colourizers colour "..." differently to {1, None, 4.5}. Can we bypass what could end up being a long and painful discussion if I acknowledge that, yes, frozensets are different to strings, and so the syntax is different. (Different things necessarily have different syntax. Otherwise they would be indistinguishable.) - Sets (like lists, tuples and dicts) are compound objects that contain other objects, so their displays involve comma-separated items; - string (and byte) literals are not, except in the sense that strings can be considered to be an array of single-character substrings. I thought that was so obvious and so obviously irrelevant that it didn't even need mentioning. Perhaps I am wrong. (It has to happen eventually *wink*) If somebody can explain *why* that matters, rather than just declare that it rules out using a prefix, I would appreciate the education. Hell, even if your argument is just "Nope, I just don't like the look of it!", I would respect that even if I disagree. Aesthetics are important, even when they are totally subjective. If it helps, Julia supports this syntax for typed dicts: Dict{keytype, valuetype}(key => value) where the braces {keytype, valuetype} are optional. That's not a display syntax as such, or a prefix, but it is visually kinda similar. Here are some similar syntax forms with prefixes: * Dylan list displays: #(a, b, c) * Smalltalk drops the comma separators: #(a b c) * Scheme and Common Lisp: '(a b c) and double delimiters: * Pike: ({ a, b, c }) * Pike dicts: ([ a:b, c:d ]) Not that we could use any of those as-given. -- Steve
data:image/s3,"s3://crabby-images/2eb67/2eb67cbdf286f4b7cb5a376d9175b1c368b87f28" alt=""
On 2022-01-17 01:36, Steven D'Aprano wrote:
How about doubling-up the braces: {{1, 2, 3}} and for frozen dicts: {{1: 'one', 2: 'two', 3: 'three'}} if needed? Those currently raise exception because sets and dics are unhashable. It might be confusing, though, if you try to nest them, putting a frozenset in a frozenset: {{ {{1, 2, 3}} }} or, without the extra spaces: {{{{1, 2, 3}}}} but how often would you do that?
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Jan 17, 2022 at 03:05:36AM +0000, MRAB wrote:
How about doubling-up the braces:
{{1, 2, 3}}
I mentioned that earlier. Its not *awful*, but as you point out yourself, it does run into the problem that nested sets suffer from brace overflow. # A set of sets. {{{{}}, {{1, 2}}, {{'a', {{}}, None}}}}
and for frozen dicts:
We don't even have a frozen dict in the stdlib, so I'm not going to discuss that here. If and when we get a frozen dict, if it is important enough to be a builtin, then we can debate syntax for it.
but how often would you do that?
Often enough: https://www.delftstack.com/howto/python/python-set-of-sets/ https://stackoverflow.com/questions/37105696/how-to-have-a-set-of-sets-in-py... https://stackoverflow.com/questions/5931291/how-can-i-create-a-set-of-sets-i... -- Steve
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Mon, Jan 17, 2022 at 11:18 AM Steven D'Aprano <steve@pearwood.info> wrote:
*sigh* I know you love to argue for the sake of arguing, but seriously, can't you read back to your own previous message and get your own context? With punctuation like parentheses, square brackets, angle brackets, etc, does not ever, to my knowledge, have prefixes. ONLY strings behave differently. Are there any non-string types which have special behaviour based on a prefix, like you're suggesting for sets?
Yes. ONLY on strings. That's exactly what I said. Strings are different. For starters, we already have multiple different data types that can come from quoted literals, plus a non-literal form that people treat like a literal (f-strings). Is there any non-string type that doesn't follow that pattern?
len{mystring} looks a lot like len(mystring), but len"mystring" looks very very different. Or do you treat all punctuation exactly the same way?
You mess up round and square brackets, yes. You might mess up double and single quotes, too, in languages that care. But you aren't going to mess up brackets and quotes.
People treat them differently. That's why f-strings are a thing: we treat strings as strings even when they're expressions. Strings ARE special. I asked you if there are any non-strings that are similarly special. You have not found any examples. ChrisA
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Jan 17, 2022 at 12:16:07PM +1100, Chris Angelico wrote:
Speaking of context, it is not nice of you to strip the context of my very next sentence, in order to make me out to be the bad guy here. "Differ in what way? I don't understand your question." Let me repeat: I do not understand your question. In what way do you think that non-string types differ, that is relevant to the discussion? There are plenty of ways that they differ. I don't see how those differences are meaningful. If you do, please explain. (I see that further on, you made an attempt. Thank you, I will respond to that below.)
With punctuation like parentheses, square brackets, angle brackets, etc, does not ever, to my knowledge, have prefixes.
In another post, I have pointed out a few languages which do something very close to this, e.g. Scheme. A better example is Coconut. Coconut allows an optional 's' prefix on sets, so as to allow `s{}` for an empty set. It also allows the 'f' prefix for frozensets: https://coconut.readthedocs.io/en/v1.1.0/DOCS.html#set-literals But even if it has never been done before, someone has to be the first. There was a time that no language ever used r"..." for raw strings, or f"..." for f-strings. There was a time where there was no language in the world that used slice notation, or {a, b} for sets. There was once a time that no language had list comprehension syntax. And now Python has all those syntactic features. "It has never been done before" is a weak argument, and it is especially weak when it *has* been done before. We have at least three syntactic forms that use an alphabetical prefix on a delimiter. It seems rather dubious that you are happy to use the < and > operators as delimiters, just to avoid a prefix. - There is no precedent in Python of a symbol being used as both an operator and a delimiter: none of ( [ { } ] ) are ever used as operators, and no operator + - etc are ever used as delimiters. But you are happy to make this radical change to the language, even though you agree that it looks pretty awful when used with the < and > operators. - But there is precedent in Python of adding an alphabetic prefix to delimiters: we have b' r' f'. But you are unhappy with making a minor change to the language by putting the prefix on a symbol different from a quote mark, because the data type is not a string or bytes. Your position seems to be: We have used prefixes on delimiters before, therefore we cannot do it again; and we've never used operators as delimiters before, therefore we should do it now. Is that accurate? If not, I apologise for misunderstanding you, but please explain what you mean. These are not rhetorical questions: (1) Why does it matter that string and bytes syntax are the only types that currently use a prefix on a delimiter? Surely delimiters are delimiters, whatever the type they represent. (2) What is so special about the string and bytes types that tells you that it is okay to use a prefix on the delimiter, but other data structures must not do the same?
I am not proposing any special behaviour. This is *syntax*, not behaviour. Frozensets will continue to behave exactly the same as they behave now. If that was not clear, I apologise for giving you the wrong impression. As mentioned above, Coconut already supports this, and some other languages support similar prefixes on delimiters for lists and dicts.
So... not "ONLY" (your emphasis, not mine) strings. As you say, there are already two different data types, strings and bytes, plus a non-literal executable code (f-strings). Again, I don't understand your point. You have just told us that there are already non-string literals and types that use a prefix on the delimiter, and then you ask me if there are any non-string types that follows that pattern. Haven't you just answered your own question? Sure, bytes and f-strings are quite closely related to regular strings, and frozensets are not. But why does that matter? This is not a rhetorical question. You obviously feel that there is something special or magical about strings that makes it okay to stick a prefix on the opening delimiter, but for the life of me I can't see what it is. Is it just taste? You just don't like the look of `f{}`?
Yes it does look different, but what does that have to do with anything? f() and f"" look very similar, and r() and r"" do too. I think this idea that people will confuse f{} for a function call, and extrapolate to using it in arbitrary functions, is unjustified. But even if they do, except in the very special case where the function is called exactly `f`, they will get a SyntaxError. And even then, the error will be pretty obvious. Calling this a "bug magnet" seems to be a gross exaggeration.
Then why is this a problem for curly brackets, when you just agreed that it is not a problem for round and square brackets? We don't write b"" or r"" or f"" when we want b() or r[] or f(), or vice versa, but we'll suddenly start confusing f"" and f{}? I think that is pure FUD (Fear, Uncertainty, Doubt). -- Steve
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Mon, Jan 17, 2022 at 2:01 PM Steven D'Aprano <steve@pearwood.info> wrote:
An f-string still yields a string. It's not a literal but it's still a string. A bytestring is still a string. It's not a Unicode string but it's still a string. These are not the same as lists, tuples, sets, dicts, etc, which contain arbitrary objects. ONLY strings. You just picked up on the part where I said "not only literals" and got it completely backwards. Strings are not the same as lists. Strings are not the same as tuples. Strings are the only data type in Python that has prefixes that determine the data type you get. Strings are TREATED DIFFERENTLY by programmers, which is why f-strings get treated like string literals. Strings. Are. Different. I do not know how to make this any clearer. If you do not understand my position, please stop misrepresenting me. ChrisA
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
Is there no way to optimize the byte code without adding to the language? Not that it’s a bad idea anyway, but I wonder if frozen sets are common enough to warrant a change. Are there any performance advantages to a frozen set? I ask because I do often use sets that could be frozen, but don’t need to be. E.g. they don’t change, nor are they used as keys. For example: If flag in {‘the’, ‘allowable’, ‘flags’}: … If a frozen set was even a little bit faster or used less memory, it would be nice to be able to create one directly. -CHB On Sun, Jan 16, 2022 at 8:50 AM MRAB <python@mrabarnett.plus.com> wrote:
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/b4d21/b4d2111b1231b43e7a4c304a90dae1522aa264b6" alt=""
Summary: Further information is provided, which suggests that it may be best to amend Python so that "frozenset({1, 2, 3})" is the literal for eval("frozenset({1, 2, 3})"). Steve D'Aprano correctly notes that the bytecode generated by the expression x in {1, 2 ,3} is apparently not optimal. He then argues that introducing a frozenset literal would allow x in f{1, 2, 3} # New syntax, giving a frozenset literal would allow better bytecode to be generated. However, the following has the same semantics as "x in {1, 2, 3}" and perhaps gives optimal bytecode.
For comparison, here's the bytecode Steve correctly notes is apparently not optimal.
Steve states that "x in {1, 2, 3}" when executed calls "frozenset({1, 2, 3})", and in particular looks up "frozenset" in builtins and literals. I can see why he says that, but I've done an experiment that suggests otherwise.
I suspect that if you look up in the C-source for Python, you'll find that dis.dis ends up using frozenset({1, 2, 3}) as the literal for representing the result of evaluating frozenset({1, 2, 3}). The following is evidence for this hypothesis:
To conclude, I find it plausible that: 1. The bytecode generated by "x in {1, 2, 3}" is already optimal. 2. Python already uses "frozenset({1, 2, 3})" as the literal representation of a frozenset. Steve in his original post mentioned the issue https://bugs.python.org/issue46393, authored by Terry Reedy. Steve rightly comments on that issue that "may have been shadowed, or builtins monkey-patched, so we cannot know what frozenset({1, 2, 3}) will return until runtime." Steve's quite right about this shadowing problem. In light of my plausible conclusions I suggest his goal of a frozenset literal might be better achieved by making 'frozenset' a keyword, much as None and True and False are already keywords.
Once this is done we can then use frozenset({1, 2, 3}) as the literal for a frozenset, not only in dis.dis and repr and elsewhere, but also in source code. As a rough suggestion, something like from __future__ import literal_constructors_as_keywords would prevent monkey-patching of set, frozenset, int and so forth (just as True cannot be monkeypatched). I thank Steve for bringing this interesting question to our attention, for his earlier work on the issue, and for sharing his current thoughts on this matter. It's also worth looking at the message for Gregory Smith that Steve referenced in his original post. https://mail.python.org/pipermail/python-ideas/2018-July/051902.html Gregory wrote: frozenset is not the only base type that lacks a literals leading to loading values into these types involving creation of an intermediate throwaway object: bytearray. bytearray(b'short lived bytes object') I hope this helps. -- Jonathan
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
I’m a bit confused — would adding a “literal” form for frozenset provide much, if any, of an optimization? If not, that means it’s only worth doing for convenience. How often do folks need a frozen set literal? I don’t think I’ve ever used one. If we did, then f{‘this’: ‘that’} should make a frozen dict, yes? On Sun, Jan 16, I suggest his goal of a frozenset literal might be better achieved by making 'frozenset' a keyword, much as None and True and False are already keywords. Adding a keyword is a very Big Deal. I don’t think this rises to that level at all. It was done for True and False because having them as non-redefineable names referencing singletons is really helpful. -CHB
data:image/s3,"s3://crabby-images/5f8b2/5f8b2ad1b2b61ef91eb396773cce6ee17c3a4eca" alt=""
On Mon, 17 Jan 2022 at 00:46, Christopher Barker <pythonchb@gmail.com> wrote:
You won't have used one because they have not yet existed (hence this thread).
If we did, then f{‘this’: ‘that’} should make a frozen dict, yes?
A frozen dict would also be useful but the implementation doesn't exist. If it did exist then in combination with this proposal that syntax for frozen dicts would be an obvious extension. A more relevant question right now is if any other set syntax should apply to frozensets e.g. should this work: >>> squares = f{x**2 for x in range(10)} -- Oscar
data:image/s3,"s3://crabby-images/995d7/995d70416bcfda8f101cf55b916416a856d884b1" alt=""
On Mon, Jan 17, 2022 at 9:58 AM Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
Although we don't have frozenset literal now, we have frozenset. So we can estimate how frozenset literal is useful by seeing how frozenset is used. Unless how the literal improve codes is demonstrated, I am -0.5 on new literal only for consistency. Regards, -- Inada Naoki <songofacandy@gmail.com>
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
On Sun, Jan 16, 2022 at 7:05 PM Steven D'Aprano <steve@pearwood.info> wrote:
I never suggested adding this "for consistency".
Then what ARE you suggesting it for? As far as I can tell, it would be a handy shorthand. And you had suggested it could result in more efficient bytecode, but I think someone else thought that wasn't the case. It could lead to some optimization -- literals being treated as contents, yes? But what does that matter? are they heavenly used in any common code? -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sun, Jan 16, 2022 at 10:33:41PM -0800, Christopher Barker wrote:
Apologies if my initial post was not clear enough: https://mail.python.org/archives/list/python-ideas@python.org/message/GRMNMW... CPython already has all the machinery needed to create constant frozensets of constants at compile time. It already implicitly optimizes some expressions involving tuples and sets into frozensets, but such optimizations are fragile and refactoring your code can remove them. Ironically, that same optimization makes the explicit creation of a frozenset needlessly inefficient. See also b.p.o. #46393. The only thing we need in order to be able to explicitly create frozensets efficiently, without relying on fragile, implicit, implementation-dependent peephole optimizations which may or may not trigger, and without triggering the usual global+builtins name lookup, is (I think) syntax for a frozenset display. That would make the creation of frozensets more efficient, possibly encourage people who currently are writing slow and inefficient code like targets = (3, 5, 7, 11, 12, 18, 27, 28, 30, 35, 57, 88) if n in targets: do_something() to use a frozenset, as they probably should already be doing.
As far as I can tell, it would be a handy shorthand.
If you consider tuple, list and dict displays to be a handy shortcut, then I guess this would be too :-)
And you had suggested it could result in more efficient bytecode, but I think someone else thought that wasn't the case.
I see no reason why it wouldn't lead to more efficient bytecode, at least sometimes.
But what does that matter? are they heavenly used in any common code?
Personally, I think my code using frozensets is extremely heavenly :-) I doubt that frozensets are, or ever will be, as common as lists or dicts. In that sense, sets (frozen or otherwise) are, I guess, "Second Tier" data structures: - first tier are lists, tuples, dicts; - second tier are sets, deques etc. Or possibly "tier and a half" in that unlike deques they are builtins, which suggest that they are somewhat more important. In the top level of the stdlib (not dropping down into packages or subdirectories), I count 29 calls to frozenset. (Compared to 14 calls to deque, so on mere call count, I would say frozenset is twice as important as deque :-) Out of those 29 calls, I think that probably 13 would be good candidates to use a frozenset display form (almost half). For example: ast.py: binop_rassoc = frozenset(("**",)) # f{("**",)} asyncore.py: ignore_log_types = frozenset({'warning'}) # f{'warning'} Not all of them are purely literals, e.g. asyncore.py: _DISCONNECTED = frozenset({ECONNRESET, ENOTCONN, ...}) would still have to generate the frozenset at runtime, but it wouldn't need to look up the frozenset name to do so so there would still be some benefit. If we were particularly keen, that might go up to 19 out of the 29. The benefit is not huge. This is not list comprehensions or decorator syntax, which revolutionized the way we write Python, it is an incremental improvement. If the compiler didn't already have the machinery in place for building compile-time constant frozensets, this might not be worth the effort. But since we do, the cost of adding a frozenset display is relatively low (most of the work is already done, yes?) and so the benefit needs only to be small to justify the small(?) amount of work needed. -- Steve
data:image/s3,"s3://crabby-images/8e91b/8e91bd2597e9c25a0a8c3497599699707003a9e9" alt=""
On Mon, 17 Jan 2022 at 10:12, Steven D'Aprano <steve@pearwood.info> wrote:
More realistically, would they not use a set already, as in targets = {3, 5, 7, 11, 12, 18, 27, 28, 30, 35, 57, 88} if n in targets: do_something() ? Is using a frozenset a significant improvement for that case? Because I doubt that anyone currently using a tuple would suddenly switch to a frozenset, if they haven't already switched to a set. Sure, there might be the odd person who sees the release notes and is prompted by the mention of frozenset literals to re-think their code, but that's probably a vanishingly small proportion of the audience for this change. BTW, I should say that I'm actually +0.5 on the idea. It seems like a reasonable thing to want, and if an acceptable syntax can be found, then why not? But I doubt it's going to have a major impact either way. Paul
data:image/s3,"s3://crabby-images/995d7/995d70416bcfda8f101cf55b916416a856d884b1" alt=""
On Mon, Jan 17, 2022 at 7:44 PM Paul Moore <p.f.moore@gmail.com> wrote:
This is very inefficient because building a set is much heavier in `n in tuple`. We should write `if n in {3, 5, 7, 11, 12, 18, 27, 28, 30, 35, 57, 88}` for now. Or we should write `_TARGETS = frozenset((3, 5, 7, 11, 12, 18, 27, 28, 30, 35, 57, 88))` in global scope and use it as `if n in _TARGETS`. -- Inada Naoki <songofacandy@gmail.com>
data:image/s3,"s3://crabby-images/995d7/995d70416bcfda8f101cf55b916416a856d884b1" alt=""
On Mon, Jan 17, 2022 at 7:10 PM Steven D'Aprano <steve@pearwood.info> wrote:
Both are in class scope so the overhead is very small.
Name lookup is faster than building set in most case. So I don't think cost to look name up is important at all. Proposed literal might have significant efficiency benefit only when: * It is used in the function scope. and, * It can not be optimized by the compiler now. I am not sure how many such usages in stdlib. Regards, -- Inada Naoki <songofacandy@gmail.com>
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Jan 17, 2022 at 08:04:50PM +0900, Inada Naoki wrote:
Name lookup is faster than building set in most case. So I don't think cost to look name up is important at all.
But the cost to look up the name is *in addition* to building the set. If you saw this code in a review: t = tuple([1, 2, 3, 4, 5]) would you say "that is okay, because the name lookup is smaller than the cost of building the list"? I wouldn't. I would change the code to `(1, 2, 3, 4, 5)`.
Sometimes, now, the compiler *pessimizes* the construction of the frozen set. See b.p.o #46393. -- Steve
data:image/s3,"s3://crabby-images/b4d21/b4d2111b1231b43e7a4c304a90dae1522aa264b6" alt=""
The compiler can figure out that the value of {1, 2, 3} is a set containing the elements 1, 2 and 3. The problem with the value of frozenset({1, 2, 3}) is that the value of frozenset depends on the context. This is because frozenset = print is allowed. According to help(repr): repr(obj, /) Return the canonical string representation of the object. For many object types, including most builtins, eval(repr(obj)) == obj. Consistency suggests that if x = f{1, 2, 3} gives always gives frozenset as the value of x then repr(x) should be the string 'f{1, 2, 3}'. At present, I think, repr(x) always returns a literal if it can. However, changing the repr of frozenset introduces problems of backwards compatibility, particularly in doctests and documentation. Another way to achieve consistency is to make frozenset a keyword, in the same way that None, True and False are identifiers that are also language keywords. Both proposals as stated have negative side-effects. I suggest we explore ways of reducing the above and any other side effects. -- Jonathan
data:image/s3,"s3://crabby-images/995d7/995d70416bcfda8f101cf55b916416a856d884b1" alt=""
On Mon, Jan 17, 2022 at 8:49 PM Steven D'Aprano <steve@pearwood.info> wrote:
I meant it is negligible so we can just ignore it while this discussion.
* I never said it. I just said just lookup cost is not good reason because you listed name lookup cost for rationale. Please stop strawman. * tuple construction is much faster than set construction. So name lookup speed is more important for tuple. * Constant tuple is much much frequently used than constant set.
I saw. And I know all the discussions in the b.p.o. already. But how important it is for Python depends on how often it is used, especially in hot code. Regards, -- Inada Naoki <songofacandy@gmail.com>
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Jan 17, 2022 at 11:18:13PM +0900, Inada Naoki wrote:
On my computer, the name lookup is almost a quarter of the time to build a set: [steve ~]$ python3.10 -m timeit "frozenset" 10000000 loops, best of 5: 24.4 nsec per loop [steve ~]$ python3.10 -m timeit "{1, 2, 3, 4, 5}" 2000000 loops, best of 5: 110 nsec per loop and about 10% of the total time: [steve ~]$ python3.10 -m timeit "frozenset({1, 2, 3, 4, 5})" 1000000 loops, best of 5: 237 nsec per loop If I use a tuple instead of the set, it is about 12% of the total time: [steve ~]$ python3.10 -m timeit "frozenset((1, 2, 3, 4, 5))" 2000000 loops, best of 5: 193 nsec per loop So not negligible. -- Steve
data:image/s3,"s3://crabby-images/b4d21/b4d2111b1231b43e7a4c304a90dae1522aa264b6" alt=""
Earlier today in https://bugs.python.org/issue46393, Serhiy Storchaka wrote: As Steven have noted the compiler-time optimization is not applicable here because name frozenset is resolved at run-time. In these cases where a set of constants can be replaced with a frozenset of constants (in "x in {1,2,3}" and in "for x in {1,2,3}") the compiler does it. And I don't think there is an issue which is worth changing the language. Creating a frozenset of constants is pretty rare, and it is even more rare in tight loops. The most common cases (which are pretty rare anyway) are already covered. -- Jonathan
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
On Mon, Jan 17, 2022 at 3:50 AM Steven D'Aprano <steve@pearwood.info> wrote:
Of course, everyone would -- because tuple displays already exist. I'd suggest refactoring that code even if the compiler could completely optimize it away. Would you let: l = list([1, 2, 3, 4, 5]) pass code review either? even if there were no performance penalty? I wouldn't, because it's redundant, not because it's slower. Also that pattern is actually very common for types that aren't built-in (or even in the stdlib). It's always kind of bugged me that I need to write: arr = np.array([1, 2, 3, 4]) And I'm creating a list just so I can pass it to the array constructor. But in practice, it's not a performance problem at all. And in code in the wild, I'll bet numpy arrays are used orders of magnitude more than frozen sets ;-) Sometimes, now, the compiler *pessimizes* the construction of the frozen
set. See b.p.o #46393.
yup. Using a 'constant' frozenset is slower than 'constant' set, when doing not much else: In [29]: def setfun(): ...: s = {1, 3, 5, 2} ...: i = 3 ...: if i in s: ...: return 'yes' ...: In [30]: def fsetfun(): ...: s = frozenset((1, 3, 5, 2)) ...: i = 3 ...: if i in s: ...: return 'yes' ...: In [31]: %timeit setfun() 194 ns ± 10.6 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) In [32]: %timeit fsetfun() 286 ns ± 2.72 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) But: would you notice if that function did any real work? And I think we could call this one of the many micro-optimizations we have in Python: Don't use a frozenset as a constant when a regular set will do. So it comes down to how often frozen sets as constants are required. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
On Sun, Jan 16, 2022 at 4:55 PM Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
di you really notunderstand my point? I have never used the frozenset() with a literal. i.e. never had a use case for a frozenset literal. As I mentioned in another note, I do use set displays where they *could* be frozen sets, but I dont think they've ever needed to be. And if there isn't a performance advantage, then I'm fine with that.
Ah yes, I think my brain blipped because there have been multiple proposals on this list for such a thing -- but they were never realized.
>>> squares = f{x**2 for x in range(10)}
Interesting idea. It feels a bit like that's realyl opening a door to a lot if proposals -- is that good or bad thing? -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Jan 17, 2022 at 12:54:58AM +0000, Oscar Benjamin wrote:
If display syntax for frozensets were to be approved, then we should consider frozenset comprehensions as well. That would be an obvious extension of the syntax. But I don't know if there are technical difficulties with that proposal that might make it less attractive. -- Steve
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sun, Jan 16, 2022 at 04:43:36PM -0800, Christopher Barker wrote:
I’m a bit confused — would adding a “literal” form for frozenset provide much, if any, of an optimization?
Yes. In at least some cases, it would avoid going through the song and dance: 1. create a frozenset 2. convert the frozenset to a regular set 3. convert the regular set back to a frozenset 4. garbage collect the regular set See the b.p.o. ticket referenced earlier, as well as the disassembled code. In other cases it would avoid: 1. create a set, tuple or list 2. create a frozenset 3. garbage collect the set, tuple or list It would also avoid the name lookup of `frozenset`, and guarantee that even if that name was shadowed or monkey-patched, you still get a genuine frozenset. (Just as [1, 2, 3] is guaranteed to return a genuine list, even if the name "list" is deleted, shadowed or replaced.) At the moment, sets and frozensets still share the same implementation, but some years ago Serhiy suggested that he had some optimizations in mind that would make frozensets smaller than regular sets.
How often do folks need a frozen set literal? I don’t think I’ve ever used one.
If you are writing `if x in ("this", "that", "another", "more")` then you probably should be using a frozenset literal, since membership testing in sets is faster than linear search of a tuple. I think that the CPython peephole optimizer actually replaces that tuple with a frozenset, which is cool, but you can defeat that optimization and go back to slow linear search by refactoring the code and giving the targets a name: targets = ("this", "that", "another", "more") if x in targets: ...
If we did, then f{‘this’: ‘that’} should make a frozen dict, yes?
We would have to get a frozen dict first, but if we did, that would be an obvious syntax to use. -- Steve
data:image/s3,"s3://crabby-images/3ab06/3ab06bda198fd52a083b7803a10192f5e344f01c" alt=""
Not really relevant for the discussion, but CPython automaticly creates a frozenset here (set display with immutable members) as an optimisation.
AFAIK the primary advantage of doing this is that the frozenset gets created once instead of every time the expression is executed. Frozenset itself is not faster than a regular set. Ronald — Twitter / micro.blog: @ronaldoussoren Blog: https://blog.ronaldoussoren.net/
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
On Mon, Jan 17, 2022 at 9:07 AM Ronald Oussoren <ronaldoussoren@mac.com> wrote:
I think it's quite relevant to the discussion, because as far as I can tell, better performance in particular cases is the primary motivator. Funny that this has come up -- not too long ago, I did some experiments with code like the above: and to the surprise of myself and some other long-time Pythonistas I work with, using sets, rather tha tuples in those kinds of constructs, e.g.: if something in <a small collection or literals>: was always as faster or faster with sets than tuples. That was surprising because we assumed that construction of a set would be slower than construction of a tuple. And that was probably the case ten years ago. The proof is in the pudding,so I never bothered to figure out why, but now I know :-) Back to the topic at hand -- IIUC, set constants are already optimized, so the only places having a frozenset display would be when it is a constant, and it has to be a frozenset, where a regular one won't do. And that would only be noticeable if it was in a function that didn't do much else, and was called often. And in that case, it could be put in the global scope to ameliorate some of that cost. I believe Stephens' point is that the benefit may be fairly small, but so is the cost. I'm not so sure. I kind of like the idea myself, and the cost does seem small, but I don't think we should underestimate the cost of even this small complexity increase in the language. Sure, folks don't have toeven know it exists to write fine code, but it would be one more thing that newbies will need to figure out when they see it in others' code. In fact, there' a lot of what I might call "Python Scripters" that aren't even familiar with the set display at all. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/f3b2e/f3b2e2e3b59baba79270b218c754fc37694e3059" alt=""
but I don't think we should underestimate the cost of even this small complexity increase in the language.
Actually, I think _maybe_ in this case the "complexity increase" cost is _negative_. People might waste more time looking for a way of spelling a frozenset literal than just filling in "frozenset(....)". I for one, even knowing that the cost of writing "frozenset({1,2,3})" is negligible, would "feel" better there was a way to spell that without the needless conversions. That said, an appropriate prefix for the {} just as we do for strigns would be nice, and I disagree that it would be a significant source for "bugs". The "@{" is a nice way out if people think "f{}" would be too close to "f()". And "<1,2,3>" just for frozensets are indeed overkill. We already do "literal prefixing" with `"` after all. and formally extending this prefix usage as needed for other literals seems like a nice path. But, as far as bikeshedding go, we also have "literal sufixing" (2.0j anyone?)- maybe "{1,2,3}f" ? On Mon, Jan 17, 2022 at 2:43 PM Christopher Barker <pythonchb@gmail.com> wrote:
data:image/s3,"s3://crabby-images/47610/4761082e56b6ffcff5f7cd21383aebce0c5ed191" alt=""
On Tue, Jan 18, 2022 at 10:02 AM Joao S. O. Bueno <jsbueno@python.org.br> wrote:
I have been following along with not much to comment but this response sparked something in me. After reading all the viewpoints I think I would be +1 on the basic idea, and a +1 on the postfix/suffix syntax just suggested... the other syntaxes I'm more of +0.5 I like the way the suffix FLOWS with the act of writing the program. When I write a set, I am primarily focused on *what I am going to put in it*, and whether or not it should be mutable is kind of a later thought/debate in my head after I have established what it contains. As a dumb example, if my task at hand is "I need to create a bag of sports balls", I am mostly thinking about what goes into that bag at first, so I will write that first:
{Ball("basketball"), Ball("soccer"), Ball("football"), Ball("golf")}
Now I get to the end of that line, and I then sort of naturally think "ok does it make sense to freeze this" after i know what is in it. With the postfix syntax, I then either type the f:
{Ball("basketball"), Ball("soccer"), Ball("football"), Ball("golf")}f
...or not. With a prefix type syntax, or a smooth bracket syntax, either: A. it takes slightly more "work' at this point to "convert" the set to a frozenset, OR B. i have to think about ahead of time-- before i have actually written what is in the set- whether it will be frozen, or not. In contrast, when you are deciding whether to write a list vs a tuple, you are deciding between two things that are fundamentally far more different IDEAS than a "bag of things, frozen or unfrozen". A list is very often more of an open ended stack than it is "an unfrozen tuple". A tuple is very often much more of an object that can be used as a dictionary key, or a member of a set, than it is a container of things (of course, it is a container of things, too). These differences make is a lot easier to choose, ahead of time, which one makes sense before you have even written the line of code. Maybe I'm making too much of this, but I really like the idea of deciding at the END of the set literal whether to tack on that "f". --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler
data:image/s3,"s3://crabby-images/552f9/552f93297bac074f42414baecc3ef3063050ba29" alt=""
I'm +1 on the idea. I'm happy with the f{ ... } syntax (although I did suggest something else). We already have letter-prefixes, let's stick to them rather than adding something new (which conceivably might one day find another use). Best wishes Rob Cliffe On 18/01/2022 15:53, Ricky Teachey wrote:
data:image/s3,"s3://crabby-images/e15cd/e15cd966f7ed6ae679f0885777782b9db7cb880e" alt=""
One thing to consider is if we're going to have a syntax capable of creating an empty frozenset, we need one that creates an empty set. if f{...} exists, then s{...} should also exist? Regards João Bernardo On Tue, Jan 18, 2022 at 2:59 PM Rob Cliffe via Python-ideas < python-ideas@python.org> wrote:
data:image/s3,"s3://crabby-images/efe10/efe107798b959240e12a33a55e62a713508452f0" alt=""
Even if f{1} creates a frozenset, I don't think f{} should create a frozenset. I think it makes more sense to keep f{1: 2} open for frozendict if it ever makes it in. Also, {} should be consisten with f{} (both should create dicts). If you want an empty frozenset, you would have to do it the same way you do it for sets: either frozenset() or f{*()}. Best Neil On Tuesday, January 18, 2022 at 1:19:30 PM UTC-5 João Bernardo wrote:
data:image/s3,"s3://crabby-images/d9209/d9209bf5d3a65e4774057bb062dfa432fe6a311a" alt=""
Not a huge fan of an f-prefix for a frozen set (I prefer just recognizing the case and optimizing the byte code, I don't think frozensets are used often enough to justify its own syntax), but I love {,} for an empty set. On Tue, Jan 18, 2022 at 4:13 PM Rob Cliffe via Python-ideas < python-ideas@python.org> wrote:
-- -Dr. Jon Crall (him)
data:image/s3,"s3://crabby-images/2658f/2658f17e607cac9bc627d74487bef4b14b9bfee8" alt=""
On 19/01/22 6:41 am, Rob Cliffe via Python-ideas wrote:
I'm happy with the f{ ... }
Fine with me too. I'd also be happy with making frozenset a keyword. It's hard to imagine it breaking any existing code, it avoids having to make any syntax changes, and all current uses of frozenset() on a constant set would immediately benefit from it. -- Greg
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Wed, Jan 19, 2022 at 11:30:36AM +1300, Greg Ewing wrote:
I'd also be happy with making frozenset a keyword.
- int, float, str, tuple, dict, set, exceptions, len, etc are not keywords, so they can be shadowed (for good or bad); - alone out of all the builtin types and functions, frozenset is a keyword. Shadowing of builtin functions is a double-edged feature. But I think that, having made the decision to make them *not* keywords, they should either *all* be keywords, or *none*. It is weird to have some of them shadowable and some of them not. None, True and False are special values, in a category of their own, so I don't think the inconsistency there is important. But having frozenset _alone_ out of the builtin functions and types a keyword would be a real Wat? moment. I know about "foolish consistency", but then there is also foolish inconsistency. If we could just make one builtin function/type a keyword, with all the optimization advantages that allows for, would we *really* choose frozenset as the most important? I don't know, the answer isn't clear to me. But it certainly wouldn't be my first choice. -- Steve
data:image/s3,"s3://crabby-images/2eb67/2eb67cbdf286f4b7cb5a376d9175b1c368b87f28" alt=""
On 2022-01-18 23:50, Steven D'Aprano wrote:
A suggestion (that you're probably not going to like!) would be to have a way of indicating explicitly that you're referring a builtin, e.g. `frozenset` (using backticks). You could redefine "frozenset", but `frozenset` still refers to the builtin "frozenset".
data:image/s3,"s3://crabby-images/2658f/2658f17e607cac9bc627d74487bef4b14b9bfee8" alt=""
On 19/01/22 6:41 am, Rob Cliffe via Python-ideas wrote:
I'm happy with the f{ ... }
Fine with me too. I'd also be happy with making frozenset a keyword. It's hard to imagine it breaking any existing code, it avoids having to make any syntax changes, and all current uses of frozenset() on a constant set would immediately benefit from it. -- Greg
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Mon, Jan 17, 2022 at 9:55 AM Steven D'Aprano <steve@pearwood.info> wrote:
And semantically, I believe that's correct. The set display MUST create a new set every time, and since the compiler/optimizer can't assume that the name 'frozenset' hasn't been rebound, it has to then call the function to get its result. So it will, by language definition, build a set from the given values, then build a frozenset from that. It optimizes "add 1, add 2, add 3" down to "here's a frozenset, add them", but it can't skip the whole job. def frozenset(stuff): print("You can't optimize me out") stuff.remove(2) return builtins.frozenset(stuff) If the compiler did any of the optimizations it's currently not doing, this code would break. The only way to solve this is an actual frozenset literal, and I would love to have one, but it stands or falls on the syntax. ChrisA
data:image/s3,"s3://crabby-images/8e91b/8e91bd2597e9c25a0a8c3497599699707003a9e9" alt=""
On Sun, 16 Jan 2022 at 22:55, Steven D'Aprano <steve@pearwood.info> wrote:
I may just be reiterating your point here (if I am, I'm sorry - I'm not completely sure), but isn't that required by the definition of the frozenset function. You're calling frozenset(), which is defined to "Return a new frozenset object, optionally with elements taken from iterable". The iterable is the (non-frozen) set {1, 2, 3}. The function def f1(): return f{1, 2, 3} (using f{...} as a frozenset literal) does something different - it returns the *same* object, compiled once at function definition time, every time it's called. So frozenset literals would allow us to express something that we can't currently express (at least not without going through some complicated contortions) in Python. I'm not sure it's a particularly *important* thing to be able to do, but whatever ;-) Paul
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Mon, Jan 17, 2022 at 10:14 AM Paul Moore <p.f.moore@gmail.com> wrote:
Where is that definition? According to help(frozenset), it will "[b]uild an immutable unordered collection of unique elements", so it doesn't necessarily have to be a brand new object. With mutables, it does have to return a new one every time (list(x) will give you a shallow copy of x even if it's already a list), but with immutables, it's okay to return the same one (str and tuple will return self). ChrisA
data:image/s3,"s3://crabby-images/8e91b/8e91bd2597e9c25a0a8c3497599699707003a9e9" alt=""
On Mon, 17 Jan 2022 at 01:11, Chris Angelico <rosuav@gmail.com> wrote:
https://docs.python.org/3/library/functions.html#func-frozenset
I'd read "build" as meaning it is a new object. But regardless, I view the language documentation as authoritative here.
That's an optimisation, not a guarantee, so while it's relevant, it's not sufficient (IMO). However, it's not clear how relevant this is to the actual proposal, so the precise details probably aren't that important. Paul
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
On Sun, Jan 16, 2022 at 3:14 PM Paul Moore <p.f.moore@gmail.com> wrote:
Exactly. In a way, what this might do is open the door to interning (some) frozen sets, like cPython does for some ints and strings. But that's only helpful is they are very heavily used, which they are not. And I'm sure a way to intern them could be done without a literal anyway. (using f{...} as a frozenset literal) does something different - it
returns the *same* object, compiled once at function definition time, every time it's called.
why/how would it do that? It *could* do that -- as above, with interning. but: def fun(): return "some string" doesn't return the same string, unless it's iterned, which is an implementation detail, yes? Stephen -- I'm going to paraphrase you now: We don't make changes to Python syntax unless there is a compelling reason. I'm really lost on what the compelling reason is for this one? There are any number of python types with no "literal" (well, not any number, it's quite defined. but still) heck, we don't even have literals for Decimal. Why this one? -CHB --- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
On Sun, Jan 16, 2022 at 6:34 PM Chris Angelico <rosuav@gmail.com> wrote:
I *think* that's only if it's interned -- and in any case, is a guarantee of the language, or an optimization? I tried to test with a longer string, and it was the same one, but then I found in this arbitrary post on the internet: ... in Python 3.7, this has been changed to 4096 characters ( I guess I haven't played with that since 3.7) -- I haven't actually tried with a string literal linger than 4096 chars :-) But this certainly doesn't: In [1]: def fun(): ...: return [1,2,3] ...: In [2]: l1 = fun() In [3]: l2 = fun() In [4]: l1 is l2 Out[4]: False So the issue is immutability and interning, not "literal display". My point is that a frozenset litteral could open the door to interning frozen sets, but that could probably be done anyway. And why? Are they heavily used in any code base? -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Mon, Jan 17, 2022 at 4:38 PM Christopher Barker <pythonchb@gmail.com> wrote:
When you return a literal, like that, the literal becomes a function constant. That's not necessarily a language guarantee, but I would be hard-pressed to imagine a Python implementation that copied it for no reason.
The difference is that list display isn't a literal, but a simple quoted string is. The square brackets are guaranteed to construct a new list every time.
My point is that a frozenset litteral could open the door to interning frozen sets, but that could probably be done anyway. And why? Are they heavily used in any code base?
A frozenset literal would allow the same frozenset to be reused, but it's not about interning. You're right that interning would be possible even without a literal, but in order to intern frozensets without a literal, we'd still need to construct a brand new (non-frozen) set to pass to the constructor, since there needs to be a way to indicate which elements we want. Technically, there are no "complex literals", but thanks to constant folding, a value like 3+4j can become a function constant. In theory, I could imagine a Python implementation that treats 3 and 4j as constants, but adds them at runtime, producing a unique complex object every time. But since there are no shadowable name lookups, it's easy enough to take advantage and gain a nice optimization. In theory, what could be done for frozensets would be to use a *tuple* for the arguments, which can then be used for an intern-style lookup: _frozenset = frozenset _fscache = {} def frozenset(t): if type(t) is tuple: # strict check, don't allow subclasses if t not in _fscache: _fscache[t] = _frozenset(t) return _fscache[t] return _frozenset(t) This would remove some of the copying, but at the cost of retaining all the tuples (in other words: now you have a classic caching problem). Could be useful for situations where it really truly is constant, but it's still a bit of a roundabout way to do things. A true frozenset literal can't possibly be shadowed, so the compiler would be free to do all the same optimizations that it does with tuples. ChrisA
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sun, Jan 16, 2022 at 06:10:58PM -0800, Christopher Barker wrote:
In a way, what this might do is open the door to interning (some) frozen sets, like cPython does for some ints and strings.
I don't think that interning is relevant here. Interning is orthogonal to the existence of a frozenset display. You don't need to use a literal or display syntax to take advantage of interning: # At least some versions of CPython >>> m = int('78') >>> n = int('78') >>> m is n True And the existence of a literal or display syntax does not imply interning, e.g. floats, tuples. If there was a use for interning frozensets, we could do so regardless of whether or not there is a display syntax.
Inside a function, CPython can cache literals and immutable displays made from purely literals, e.g. the tuple (1, 2, 3) but not the tuple (1, 2, x)) in co_consts. But we can't do any of that if the only way to create a frozenset is to lookup the name "frozenset" and call that function. So while not all frozensets inside a function could be built at compile-time and retrieved from co_consts, some of them could -- if only we had a display syntax equivalent to tuple displays. Ironically, if the compiler can prove that a regular mutable set display is only ever used once inside a function, CPython will make it a frozenset:
So ironically the only way to put a constant frozenset into co_consts is to write it as a mutable set and only use it once!
We don't make changes to Python syntax unless there is a compelling reason.
"Compelling" depends on the cost of making the change. The bigger the cost, the more compelling the reason. The smaller the change, then the reason need not be as huge. I think this is a small change with a moderate benefit, which puts the cost vs benefit ratio on the benefit side.
Nearly all of those types are not builtins. As far as I can see, frozenset is the only commonly used, immutable, builtin type that doesn't have a literal display syntax, and the consequence of that is that code using frozensets does much more work than needed. -- Steve
data:image/s3,"s3://crabby-images/2eb67/2eb67cbdf286f4b7cb5a376d9175b1c368b87f28" alt=""
On 2022-01-17 06:07, Greg Ewing wrote:
U+2744 Snowflake, anyone?
my_frozenset = ❄{1, 2, 3}
That makes me think about using '@': my_frozenset = @{1, 2, 3} It's currently used as a prefix for decorators, but that's at the start of a line. It could be a problem for the REPL, though:
@{1, 2, 3}
Looks like a decorator there, but you could parenthesise it in that case:
(@{1, 2, 3})
data:image/s3,"s3://crabby-images/83003/83003405cb3e437d91969f4da1e4d11958d94f27" alt=""
On 2022-01-16 00:27, Steven D'Aprano wrote:
I don't like that syntax. In the first place, as others have noted, it treads too close to existing function-call and indexing syntax. In those syntaxes, `f` is a name, whereas here it is not a name but just a syntactic marker (as in f-strings). In the second place, I don't like the idea of using prefixes to change the types of objects like this. As far as I know we only have one example of that, namely byte strings. The other string prefixes do not create different types, they just affect how the literal is eventually parsed into a string. I would prefer some sort of augmented bracket notation, like {:1, 2, 3:} or something along those lines. I do think it would be nice to have a frozenset literal, because I'm a purity-beats-practicality person and it bugs me that there isn't one. However, from that perspective I'd much rather have a frozen-dict type and then a syntax for it. :-) -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sun, Jan 16, 2022 at 05:53:19PM -0800, Brendan Barnwell wrote:
I think the horse has well and truly bolted on that. We've had syntactic markers that look like names forever: r"" goes back to Python 1.x days. Given the proposed syntax `f{a, b, c}`, that would be a transformation of three arguments a, b and c to a frozenset. That makes it closer to a function call than r"abc", which is not a transformation at all. So if this looks like a function call, good, that's because semantically it is kind of like a function call.
Okay, so we have a precedence where a prefix on the delimiter changes the type: b'' is a different type to ''. In Python 2, it was the other way around, it was u'' that returned a different type. I think that the use of a prefix has been a moderately good design, good enough to cautiously extend it beyond raw strings to unicode strings, f-strings, and byte-strings, and beyond that to other delimiters. But if you think that the difference between '' and b'' is a terrible design flaw that has caused all sorts of badness in Python 3, I'm all ears. Please explain, and persuade me.
I would prefer some sort of augmented bracket notation, like {:1, 2, 3:} or something along those lines.
You don't think that people will associate {:1, 2:} with badly-written dicts rather than sets? If not, is there some sort of mental association between frozensets and those colons inside the set syntax? -- Steve
data:image/s3,"s3://crabby-images/083fb/083fb9fce1476ebe02d0a5d8c76d5547020ebe75" alt=""
My preferred syntax for a frozenset literal would be something like {1, 2, 3}.freeze() This requires no new syntax, and can be safely optimized at compile time (as far as I can tell). set.freeze would be a new method of sets which could also be used at run time. It would return a new frozenset object and wouldn't alter the set object (so perhaps the name I suggested isn't ideal). Of course frozenset.freeze would just return itself.
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Wed, Jan 19, 2022 at 6:31 PM Ben Rudiak-Gould <benrudiak@gmail.com> wrote:
+0.5. I'm not sure if CPython is currently optimizing this (I tried "spam".upper() and it didn't constant-fold), but it certainly could. Making this work would depend on several optimizations: 1) Recognize literal.method() as being potentially constant-foldable 2) Marke some methods as pure and therefore optimizable 3) Recognize that the (mutable) set to the left of .freeze() can be frozen just as "a in {1,2,3}" can But yes, in theory, this could work. There's no way that it can be semantically wrong, no way to shadow that method. ChrisA
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Wed, Jan 19, 2022 at 07:20:12AM +0000, Ben Rudiak-Gould wrote:
I like that, it is similar to a proposal for docstrings: https://bugs.python.org/issue36906 This would be safe to optimize at compile time, so long as the contents of the set were all literals. And for implementations that didn't optimize at compile time, it would be no worse than the situation now. Perhaps a better name would be "frozen()` since that doesn't imply an in-place operation like "freeze" does. -- Steve
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Wed, Jan 19, 2022 at 07:12:04AM -0500, Ricky Teachey wrote:
Why does it need to be called at all?
{1, 2, 3}.frozen
For the same reason that most methods are methods, not properties. The aim of a good API is not to minimize the amount of typing, it is to communicate the *meaning* of the code as best as possible. `{1, 2, 3}.frozen` says that the result is an attribute (property, member) of the set. Like *name* to a person, or *tail* to a dog, the attribute API represents something which is part of, or a quality of, the object. The frozenset is not an attribute of the set, it is a transformation of the set into a different type. A transformation should be written as an explicit function or method call, not as attribute access. Yes, we can hide that transformation behind a property just to save typing two characters, but that is an abuse of notation. Also, it is standard in Python to avoid properties if the computation could be expensive. Copying a large set or millions of elements into a frozenset could be expensive, so we should keep it a method call. -- Steve
data:image/s3,"s3://crabby-images/f3b2e/f3b2e2e3b59baba79270b218c754fc37694e3059" alt=""
Here is another hint that this usage would not resolve the problem of having a literal frozenset. Even in the core of this discussion, with folks participating and knowing what they are talking about, the first thing that comes to mind when seeing a method call is that the target set would be copied. The messages are just about no-need to copy this into a frozenset. But upon seeing a method call, we can just think first of a runtime behavior Correctly, btw. Any optimization there would be an exception, that people would have to know by heart. On Wed, Jan 19, 2022 at 10:31 AM Steven D'Aprano <steve@pearwood.info> wrote:
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
On Wed, Jan 19, 2022 at 6:21 AM Joao S. O. Bueno <jsbueno@python.org.br> wrote:
sure, and in the general case, a_set.frozen() would presumably return a copy. as does frozenset(a_set) -- which is all we have at the moment.
But people shouldn't have to think about to trigger a compiler optimization anyway, that's just a nice bonus the compiler does for you. If this does all come to pass, then: s = {3,8,2}.frozen() will be slightly faster, in some case, than s = frozenset({3,8,2} but the result would be the same. There are plenty of tricks to in python to get a touch more performance, this would just be one more and frankly pretty rare that it would make an noticable difference at all. +1 on this +0 on f{} -1 on making frozenset a keyword -CHB On Wed, Jan 19, 2022 at 10:31 AM Steven D'Aprano <steve@pearwood.info>
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/47610/4761082e56b6ffcff5f7cd21383aebce0c5ed191" alt=""
On Thu, Jan 20, 2022 at 3:35 AM Stephen J. Turnbull < stephenjturnbull@gmail.com> wrote:
Another agreement with Chris' ratings: +1 for .frozen() +0 on f{} -1 on keyword for frozenset But that still leaves the literal for the empty set as a problem. I'm still not sure what I think about {,} as an empty set. I tend to think it looks like "empty dictionary" and so could be confusing. Perhaps something like set.frozen() or set().frozen() could be optimized? --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler
data:image/s3,"s3://crabby-images/9304b/9304b5986315e7566fa59b1664fd4591833439eb" alt=""
Well, I've just waded through this discussion. This all feels to me like a special case of "wanting a constant for bytecode". What is we had a "freeze" operator, eg: |foo| which would produce a frozen version of foo. I'm liking the vertical bars myself, I think because it feels vaguely slightly analogous to "absolute value". So (assuming we can squeeze it into the expression syntax): |{1,2,3}| always makes a direct frozen set on the same basis that: x in {1,2,3} directly makes a frozenset by expression inspection. Then Paired with a __freeze__ dunder method, this applies to any type, not just sets. (Where appropriate of course.) So: |{1,2,3}| frozen set |[1,2,3]| tuple! |any-iterable| tuple! |{1:2, 3:4}| frozen dict Ths saves us (a) inventing a special syntax for frozen sets and (b) gateways to freezing many things, starting with the obvious above, via the __freeze__ dunder method. This feels more general and less bikeshedable. My main question is: is the syntax unambiguous? Cheers, Cameron Simpson <cs@cskk.id.au>
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Fri, 21 Jan 2022 at 19:53, Cameron Simpson <cs@cskk.id.au> wrote:
It's worth noting that a rejected PEP isn't the final and uneditable conclusion of a proposal. If you can show that something in the past seventeen years means this should be revisited, then by all means, revive the idea. (I suspect that, in this case, the rejection still applies - sets are still the only thing you'd viably want to freeze - but the option is there if you feel you can answer the original objections. Seventeen years is a long time.) ChrisA
data:image/s3,"s3://crabby-images/2eb67/2eb67cbdf286f4b7cb5a376d9175b1c368b87f28" alt=""
On 2022-01-21 00:18, Cameron Simpson wrote:
I don't know whether it's unambiguous, but it could be confusing. For example, what does this mean: | a | b | ? It's: | (a | b) | I think. The problem is that '|' could be an opening '|', a closing '|', or an infix '|'. You don't get this problem with differing open and closing pairs such as '(' and ')'.
data:image/s3,"s3://crabby-images/9304b/9304b5986315e7566fa59b1664fd4591833439eb" alt=""
On 21Jan2022 01:16, MRAB <python@mrabarnett.plus.com> wrote:
Yeah.
Probably. Running precedence the other way (or even worse, letting the valid combinations just shake out) would be confusing.
Yes, but have you _seen_ the bickering about the existing bracket choices just for frozenset? Eww. Hence the going for a distinct operator altogether. Yes, I'd prefer brackets of some kind too, but they're taken. Cheers, Cameron Simpson <cs@cskk.id.au>
data:image/s3,"s3://crabby-images/f3b2e/f3b2e2e3b59baba79270b218c754fc37694e3059" alt=""
lower and @) new brackets for {} . I can't see how they are "all taken" when the strongest argument against prefixing seems to be "but _only strings_ should have prefixes". (with the "typing f{} instead of f() is going to be a bug magnet" as a runner up). None of those stand up to any logical analysis It is ok voting that "the language should not be made more complex at this point, and we won't add any new syntax for a frozenset", but I think that if it is agreed that frozensets are ok, a prefix is just straightforward. And then, adopting prefixes for curly braces, you have 52 other bracket types to try and sell this "generic freezer operator" you are presenting here. :-). On Fri, Jan 21, 2022 at 5:52 AM Cameron Simpson <cs@cskk.id.au> wrote:
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Sat, 22 Jan 2022 at 00:30, Joao S. O. Bueno <jsbueno@python.org.br> wrote:
Distinguishing upper and lower would be even more inconsistent, since strings don't. Or rather, they sorta-kinda do, but always define that x and X mean the same thing. The at sign is an operator, and cannot be used as a prefix. So you have 26, which is approximately 25 more than you will actually want, or maybe 23 if there's an explicit prefix for "set" and "dict" in there somewhere.
Nice how you are the judge of whether it stands up to logical analysis. Nice way to pooh-pooh an argument without any actual reasoning. Oh wait, that's exactly what logical analysis would be... So actually it's your rebuttal that doesn't stand logical analysis :)
Prefixes on braces are not a generic freezer operator. An operator has to be able to act on any object, not a specific syntactic construct. Unless you're suggesting that q{obj} would mean "obj, but frozen", which would be a quite inconsistent protocol, given that it's doing the same job as a function call, looks disturbingly similar to a function call, but is actually an operator. I don't know of anything else in Python that behaves that way, except *maybe* the way a comma can create a tuple, or can be used as part of a series of things. (And that's a pretty weak line of argument, except in specific cases where it's actually ambiguous - see assert (x,y) for instance.) ChrisA
data:image/s3,"s3://crabby-images/9304b/9304b5986315e7566fa59b1664fd4591833439eb" alt=""
Joao, apologies for replying late - I never got back to this message. I wrote:
On 21Jan2022 10:29, Joao S. O. Bueno <jsbueno@python.org.br> wrote:
Agreed. I think I found a short prefix "f{" hard to see, but that is just "practice". I'm not inherently in the "but _only strings_ should have prefixes" camp. My remark about the bickering was aimed at unadorned brackets eg the "{{frozen literal set here }}" suggestion. Anyway, this post is just for clarification, not arguing any particular point. Cheers, Cameron Simpson <cs@cskk.id.au>
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Fri, Jan 21, 2022 at 11:18:27AM +1100, Cameron Simpson wrote:
A frozen "any iterable" is not necessarily a tuple. For example, a frozen binary tree should probably keep the tree structure and methods; a frozen dict.keys() should be a frozen set; and its not clear what a frozen iterator should do. Should it run the iterator to exhaustion? Seems odd. What about non-collections? What's a frozen re.MatchObject? -- Steve
data:image/s3,"s3://crabby-images/47610/4761082e56b6ffcff5f7cd21383aebce0c5ed191" alt=""
On Fri, Jan 21, 2022 at 5:04 AM Steven D'Aprano <steve@pearwood.info> wrote:
lord have mercy, what a can of worms this could end up being!: frozen iostream frozen property frozen function object frozen module object frozen iterator frozen datetime i mean, i could certainly imagine rational (maybe even useful...?) ideas for ALL of these. can you imagine the endless discussion about what to do with the shiny new frozen operator, for every object under the sun? obviously it would to nothing by raise an error by default. but people would be asking to freeze everything and there would be mountains of ideas threads and it would never end. i'm not saying that reason means we don't have such an operator, but it seems to me this just shows the decision on PEP 315 was the right one. why have such a general operator, that could conceivably- and will be endlessly requested and argued- be expanded to apply MANY things that even though YAGNI for nearly all of them, when the only REAL need is only for a frozenset? --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Fri, 21 Jan 2022 at 22:52, Ricky Teachey <ricky@teachey.org> wrote:
Let's be fair here... The idea of freezing is to make it hashable, so there's no point talking about freezing a function, module, or datetime, since they are already hashable. Don't saddle the proposal with issues it doesn't have :) (Technically this applies to an re.Match object too, actually, although I had to check to be sure. I've never once wanted to use one as a dict key. In contrast, I most certainly *have* used functions as dict keys, and it's safe and dependable.) ChrisA
data:image/s3,"s3://crabby-images/47610/4761082e56b6ffcff5f7cd21383aebce0c5ed191" alt=""
On Fri, Jan 21, 2022 at 6:57 AM Chris Angelico <rosuav@gmail.com> wrote:
Great point! I learned something. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Fri, Jan 21, 2022 at 10:56:42PM +1100, Chris Angelico wrote:
Let's be fair here... The idea of freezing is to make it hashable,
And immutable.
so there's no point talking about freezing a function, module,
Neither of which are immutable. -- Steve
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Sat, 22 Jan 2022 at 09:45, Steven D'Aprano <steve@pearwood.info> wrote:
Okay, so what would freezing a function be useful for, then? What is your use-case here? Mutable objects can't be used as dict keys, so there is a strong use-case for versions of them which can. But when arbitrary attributes don't contribute to equality, freezing becomes largely irrelevant. I can subclass frozenset and allow attributes. Do we then need a "really frozen set"? What is the point of such a protocol, given that my subclass is still hashable? Or are you just arguing for the sake of arguing? ChrisA
data:image/s3,"s3://crabby-images/9304b/9304b5986315e7566fa59b1664fd4591833439eb" alt=""
On 21Jan2022 20:57, Steven D'Aprano <steve@pearwood.info> wrote:
Yeah, I had misgivings myself. I can imagine the freeze operator falling back to iteration if there's no __freeze__dunder (like bool falls back to length). I can equally imagine just raising a TypeError for no __freeze__. More inclined to the latter on reflection - iterators get consumed.
For example, a frozen binary tree should probably keep the tree structure and methods; a frozen dict.keys() should be a frozen set;
Sure. According to their __freeze__.
What about non-collections? What's a frozen re.MatchObject?
A type error. Let's not get insane. The idea is a generic operator, but not everything can be used with such an operator. Cheers, Cameron Simpson <cs@cskk.id.au>
data:image/s3,"s3://crabby-images/083fb/083fb9fce1476ebe02d0a5d8c76d5547020ebe75" alt=""
There seem to be two different reasons people want a generic freeze syntax: 1. Making a hashable copy of an arbitrary object 2. Avoiding O(n) rebuilding of literals on every use (a constant for bytecode, as you put it) In both 1 and 2, not only the object but all of its children need to be immutable. For 2, that's the status quo, but for 1 it seems like a bigger problem. There is already a solution of sorts for 1: pickle. It may even be more efficient than a subobject-by-subobject deep freeze since it stores the result contiguously in RAM. On the other hand it can't share storage with already-hashable objects. For the second one, I would rather have an "inline static" syntax (evaluates to the value of an anonymous global variable that is initialized on first use), since it would be more broadly useful, and the disadvantages seem minor. (The disadvantages I see are that it's built per run instead of per compile, it's theoretically mutable (but mutation would be evident at the sole use site), and it may use more heap space depending on how constants are implemented which I don't know.) As for the syntax... well, backticks, obviously...
data:image/s3,"s3://crabby-images/9304b/9304b5986315e7566fa59b1664fd4591833439eb" alt=""
On 21Jan2022 12:22, Ben Rudiak-Gould <benrudiak@gmail.com> wrote:
The purpose of the operator (aside from genericity) was to enable expression inspection by the compiler so that it can do for "|{1,2,3}|" what it already does for "x in {1,2,3}". The "generic" side to the operator approach was to provide a "freeze" protocol one could use for generic objects.
In both 1 and 2, not only the object but all of its children need to be immutable.
This is not strictly true. My own notion was a "shallow" freeze, not a recursive freeze. For hashability, provided the hash and equality tests only consider the frozen components, there's no need for a deep freeze - just a frezze of the relevant aspects. Cheers, Cameron Simpson <cs@cskk.id.au>
data:image/s3,"s3://crabby-images/5f8b2/5f8b2ad1b2b61ef91eb396773cce6ee17c3a4eca" alt=""
On Thu, 20 Jan 2022 at 10:19, Ricky Teachey <ricky@teachey.org> wrote:
I really don't understand (having read everything above) why anyone prefers {1,2,3}.frozen() over f{1,2,3}. Yes, some people coming from some other languages might get confused (e.g. in Mathematica this is function call syntax) but that's true of anything: you have to learn Python syntax to use Python. The fact that {1,2,3} is a set and f{1,2,3} is a frozenset is not difficult to explain or to understand, especially in a language that already uses single letter prefixes for other things. The .frozen() method is a strangely indirect way to achieve a minor optimisation. Outside of attempting to achieve that optimisation it's basically useless because any time you would have written obj.frozen() you could have simply written frozenset(obj) so it does nothing to improve code that uses frozensets. With f{...} you have a nice syntax that clearly creates a frozenset directly and that can be used for repr. This is some actual code that I recently wrote using frozensets to represent monomials in a sparse representation of a multivariate polynomial:
With the f{...} proposal you have actual syntax for this:
With .frozen() it's
That difference in code/repr may or may not seem like an improvement to different people but that should be the real point of discussion if talking about a frozenset literal. The performance impact of frozenset literals is not going to be noticeable in any real application. My polynomial class makes extensive use of frozensets and is something that I do need to be as fast as possible. I just looked through the code I have for that class and none of the performance sensitive routines could benefit from this because they all actually need to build their elements in a dict before converting to a frozenset anyway e.g.: def mul(self, other): """multiply two (frozenset) monomials""" powermap = dict(self) for g, n in other: other_n = powermap.get(g) if other_n is None: powermap[g] = n else: powermap_n = other_n + n if powermap_n: powermap[g] = powermap_n else: powermap.pop(g) return frozenset(powermap.items()) I've just profiled this and the call to frozenset is always dwarfed by the time taken in the preceding loop which shows how cheap converting between builtins is compared to pretty much any other code. If you're using literals then of necessity you are talking about small sets. Even just using a small set over a small tuple is a hardly noticeable difference in speed in most situations: In [12]: s = {1,2,3} In [13]: t = (1,2,3) In [14]: timeit 2 in s 44.9 ns ± 0.17 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) In [15]: timeit 2 in t 59.9 ns ± 5.67 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) -- Oscar
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Fri, 21 Jan 2022 at 22:52, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
If set.frozen() is optimized, then str.upper() can be optimized the same way, which means there's a lot of places where constant folding can be used. We commonly write code like "7*24*60*60" to mean the number of seconds in a week, confident that it'll be exactly as fast as writing "604800", and there's no particular reason that method calls can't get the same optimization, other than that it hasn't been done yet. While dedicated syntax might be as good, it also wouldn't help with string methods (or int methods - I don't see it a lot currently, but maybe (1234).to_bytes() could become more popular), and it would also be completely backward incompatible - you can't feature-test for syntax without a lot of hassle with imports and alternates. In contrast, code that wants to use set.frozen() can at least test for that with a simple try/except in the same module. Not one of the proposed syntaxes has seen any sort of strong support. This isn't the first time people have proposed a syntactic form for frozensets, and it never achieves sufficient consensus to move forward.
With f{...} you have a nice syntax that clearly creates a frozenset directly and that can be used for repr. This is some actual code that I recently wrote using frozensets to represent monomials in a sparse representation of a multivariate polynomial:
"Clearly" is subjective. Any syntax could be used for repr, including {1,2,3}.frozen(), so f{1,2,3} doesn't have any particular edge there. Personally, I think that string literals are not the same thing as tuple/list/dict/set displays, and letter prefixes are not as useful on the latter.
Yes, it most certainly would change the repr. I don't see why that's an issue.
That difference in code/repr may or may not seem like an improvement to different people but that should be the real point of discussion if talking about a frozenset literal. The performance impact of frozenset literals is not going to be noticeable in any real application.
My polynomial class makes extensive use of frozensets and is something that I do need to be as fast as possible. I just looked through the code I have for that class and none of the performance sensitive routines could benefit from this because they all actually need to build their elements in a dict before converting to a frozenset anyway e.g.:
I don't understand polynomials as frozensets. What's the point of representing them that way? Particularly if you're converting to and from dicts all the time, why not represent them as dicts? Or as some custom mapping type, if you need it to be hashable?
Well, yes. This sort of code isn't what's being optimized here. Massaging data between different formats won't be enhanced by a literal form.
Again, not the point of the literal form. There's no significant difference between a set and a frozenset when testing for inclusion, so you're testing something meaningless here. The point of a literal form is that it is guaranteed to mean what you intend it to mean. That's why we (usually) use {"a":1, "b":2} rather than dict(a=1, b=2) - not because it's faster (it is, but not by that big a margin), but because it doesn't depend on the name dict. Maybe that's not important to your code. That's fine. Not every feature has to benefit every programmer. A frozenset display syntax would only benefit me in a few places. But the benefits aren't where you're looking for them, so naturally you're not going to find them there. :) ChrisA
data:image/s3,"s3://crabby-images/5f8b2/5f8b2ad1b2b61ef91eb396773cce6ee17c3a4eca" alt=""
On Fri, 21 Jan 2022 at 12:15, Chris Angelico <rosuav@gmail.com> wrote:
The proposal for .frozen() is not about optimising method calls on literals in general: the proposal is to add a method that is basically redundant but purely so that calls to the method can be optimised away.
I'm not saying it's an issue. That was a genuine question. So I guess you'd expect this:
This btw is my real point of my post which you seem to have missed (I probably should have kept it more direct):
That difference in code/repr may or may not seem like an improvement to different people but that should be the real point of discussion if talking about a frozenset literal. The performance impact of frozenset literals is not going to be noticeable in any real application.
If we take performance out of the equation would anyone actually propose to add a .frozen() method so that obj.frozen() could be used instead of frozenset(obj)? If so then what is the argument for having a redundant way of doing this?
Hashability is the point. The polynomial is a dict mapping monomials to coefficients and the monomials are frozensets of factors so that they are hashable with unordered equality. Another option would just be a sorted tuple of tuples and then instead of frozenset(d.items()) you'd have tuple(sorted(d.items())) but that's slower in my timings (for all input sizes). Any custom type with pure Python __eq__/__hash__ methods would slow everything down because apart from a couple of functions like the one I showed all these objects are used for is as dict keys. -- Oscar
data:image/s3,"s3://crabby-images/f3b2e/f3b2e2e3b59baba79270b218c754fc37694e3059" alt=""
Sorry for deviating here, but this kind of argumentation is one that is sensitive for me - but here it is: I personally do not think the comment above adds anything to the discussion at hand. We've been presented to a real-world use case of frozensets that would benefit in readability from having a dedicated literal. How good it is to question the way it is coded without even checking the project out? (and even so, do that publicly in a non related discussion?) I had this happen to me in an email here, when I tried an early version of match/case in a code in a project of mine. Despite being a more or less internal API, the code was bashed in a way, in an unasked for code review, it took out the fun I had in coding the project for months. So, please, take care when deviating from the discussion at hand. Back on topic: It looks like this thing of "prefixes are valid for strigns and no good for anything else" is, as yoiu put it, Chris, a personal thing. Do we have anyone else in this thread commenting (or even "+1ing") on that side? As I've mentioned a couple of times before: is there any other arguments against "f{}" other than "prefixes should be for strings only" (and the"bug magnet" perceived by many as a plain incorrect statement )? If there is not, then we are not at "there is no viable syntax", as prefixable braces are perfectly viable. It is whether it should be done or not, despite some people finding it ugly, which is subjective. At that point, I argue that despite adding still more things to the syntax, it is one that will spare time in average than the other way around, due to the time people, needing frozensets for the first time in any project, waste looking for a literal syntax for them only to find out there is not any. On Fri, Jan 21, 2022 at 9:16 AM Chris Angelico <rosuav@gmail.com> wrote:
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Sat, 22 Jan 2022 at 00:56, Joao S. O. Bueno <jsbueno@python.org.br> wrote:
My comment was part of a lengthy response to a post which itself added little, because it *would not benefit from literals*. My post was arguing that this was the case. It was, in fact, entirely part of this discussion. I did take care, and I was making points that are fully relevant to whether a literal syntax would be beneficial or not.
Every opinion expressed in this thread is a personal one. You can't dismiss one of them that way without dismissing them all. Does the viewpoint have merit? It either does, or does not, regardless of who came up with it. Please take care when deviating from discussion of actual arguments to discussion of people.
We don't have anyone justifying it or disproving it. Only people saying whether they like it or not. Ultimately, it is a matter of aesthetics - whether the syntax is ugly or elegant. My opinion is that it is ugly. Your opinion, it seems, is that it is elegant. These are two equally valid opinions.
On the contrary; if it is ugly, it probably shouldn't be done, so the two questions are actually the same question. Unless by "viable syntax" you are distinguishing from syntaxes which can be rejected swiftly as being ambiguous or already legal with other semantics (for instance, ({1,2,3}) should be rejected as unviable), but few of the proposed syntaxes fall foul of that.
Have you any stats on this? There is no literal/display syntax for datetimes, regular expressions, ranges, bytearrays, or a host of other common types. How often do people reach for a literal syntax for those? (I say "literal/display" since, technically, dicts have a display syntax, not a literal, and complex numbers are written as a constant-folded sum, but in practical terms, those count. The other types don't even get that.) If frozensets are so special that they need syntax, why not ranges, which are used far more frequently? With the method idea, there's a fully backward compatible way to add it to the language, and an optimization that can be applied to many types. So I'm -1 on dedicated syntax for frozensets, -1 on prefixes on braces, and +0.5 on set.frozen() and the ability to constant-fold that. But that's just personal opinion. Like everything else in this thread. ChrisA
data:image/s3,"s3://crabby-images/9304b/9304b5986315e7566fa59b1664fd4591833439eb" alt=""
[... big snip...] On 22Jan2022 01:41, Chris Angelico <rosuav@gmail.com> wrote:
Well, some data. a) if the, for example, f{constant-set-display} syntax is generalisable (notionally, where we generalise it or not), it offers a path to frozen literals for other things via a prefix notation, should be become desirable. b) literal regexps: people use these _all the time_, conceptually. To the latter: Perl has literal regexps, you just write: /regexp-goes-here/ It even has a verbose commentable form of that to aid writing understandable regexps (hahaha!). What, we're not Perl? True, but you see _lots_ of code like this: # apologies if I have the argument order wrong here if re.match('regexp-string', 'target-string'): which effectively relies on the re module's autocaching of regexps to be efficient while skipping the more overt: # top of module foo_re = re.compile('regexp-string'[,options]) .... # in the main code if m := foo_re.match('target-string'): The former is nothing else but a metaphor for a literal regexp. I'm _not_ arguing for regexp literals in Python - IMO they're undesirable, a separate argument. (Note: not "undesired", just undesirable: to be avoided except when they're the right solution.) Cheers, Cameron Simpson <cs@cskk.id.au>
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Tue, 1 Feb 2022 at 09:02, Cameron Simpson <cs@cskk.id.au> wrote:
In other words: If we create literals for a bunch of different things, then frozensets would be neat, but if we don't, then other types should have priority. Is that correct? I'm of the opinion that range objects should get literal syntax before frozensets do. You're of the opinion that regexps should get literal syntax before frozensets do. We're broadly in agreement here. I would *much* rather see {1,2,3}.frozen() be constant-foldable than f{1,2,3} as a literal. (That said: I'm actually not convinced that regexps need literal syntax, because it wouldn't benefit alternate regexp engines on PyPI.) ChrisA
data:image/s3,"s3://crabby-images/9304b/9304b5986315e7566fa59b1664fd4591833439eb" alt=""
On 01Feb2022 09:13, Chris Angelico <rosuav@gmail.com> wrote:
I don't have a priority really. I was trying to address your "There is no literal/display syntax for datetimes, regular expressions, ranges, bytearrays, or a host of other common types. How often do people reach for a literal syntax for those?" question with a concrete commonplace example. I expect that people often want a literal (or at least convenient) way to express whatever they're working with. "Make a class" only goes so far, though it is pretty far.
I'm of the opinion that range objects should get literal syntax before frozensets do.
Maybe; I'm of the opinion that maybe they should use a similar syntactic approach to avoid bloating the syntaxes one must deal with. Isn't a slice close to a literal range?
You're of the opinion that regexps should get literal syntax before frozensets do.
Very much NOT. I explicitly said I wasn't arguing for that, just arguing that many people write code which would be using literal regexps if they were there. Personally, literal regexps are NOT ahead of frozensets in my mind. To quote the last piece of my post: I'm _not_ arguing for regexp literals in Python - IMO they're undesirable, a separate argument. (Note: not "undesired", just undesirable: to be avoided except when they're the right solution.) I'm kind of -0.1 on literal regexps - it makes them _too_ convenient.
I would *much* rather see {1,2,3}.frozen() be constant-foldable than f{1,2,3} as a literal.
I'm ambivalent myself. The general "prefix" approach is slowly growing on me. However, "{1,2,3}.frozen() be constant-foldable" is indeed implementable without inventing new syntax.
(That said: I'm actually not convinced that regexps need literal syntax, because it wouldn't benefit alternate regexp engines on PyPI.)
That too. Unless there was a suffix syntax to attach a literal regexp to a specific implementation "/regexp-here/foo_class". <snark/> Cheers, Cameron Simpson <cs@cskk.id.au>
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
On Fri, Jan 21, 2022 at 3:52 AM Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
Because it doesn't require any change to the language -- it's a feature, not a language change. Every change to the language is s substantial burden to the whole community. Even a "small" one like this. It was absolutely worth for, e.g. f-strings, because they are a feature that has very broad use. This would have narrow use, and, I think even the OP said is more about potential optimization than a nicer syntax. If others think this syntax is substantially nicer, sure -- but then I'd argue that frozensets are simply not that commonly used -- you yourself realized that while you have an important use case, you aren't using literals that often anyway (at all?). Python is not a high-performance language -- has it ever had a feature added primirly so it could be optimized? (that is a serious question). And this does seem like a very small change, but is it? 1) folks, a couple years from now, reading new code might have never heard of a frozenset, and see: frozenset(something) or {1,5,2}.frozen() will probably have a pretty good idea what those mean, and if they have no clue, then it's easy to look up. f{3,1,6} not so much. And once we have ONE prefix on a bracket, I"ll bet you folks will suggest more ... small change medium churn tiny benefit (frankly, it's confusing enough that {a,b,c} makes a set and {} makes a dict, but what can we do? there's only so many brackets :-( -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Fri, Jan 21, 2022 at 08:36:37AM -0800, Christopher Barker wrote:
The change from "" ASCII byte strings to Unicode was a substaintial change that lead to a substantial burden on library developers, at least in the short-term. Adding a new operator like `@` has been barely noticed. Out of the two extremes, I don't think that adding f{} displays would be closer to changing the meaning of strings than to adding the matmul operator. I do acknowledge that f{} would be perplexing to beginners who just saw it written down with no explanation or context, but then the same applies to slicing and list comprehensions.
Python is not a high-performance language -- has it ever had a feature added primirly so it could be optimized? (that is a serious question).
Maybe we should stop dismissing performance as unimportant. Nobody has ever said that Python is too fast. I was going to joke that nobody has ever asked "How do I make Python slower?", but sure enough somebody has: https://stackoverflow.com/questions/16555120/how-can-i-slow-down-a-loop-in-p... *wink* I acknowledge that this specific change would likely be a micro- optimization, but looking beyond this proposal, perhaps we should start considering adding features specifically for performance? People are developing entire new Python interpreters and variants to improve performance, and have done so for many years. There is an impressive list of past and present such projects: - Cython - Nuitka - Pyston - Cinder - wpython - Hotpy - Unladen Swallow - Numba - Pythran among others. So maybe we should consider *not* saying "Python is fast enough", or "Just re-write it in C", and consider language features that are aimed specifically or mainly at performance. Just a thought to mull over. -- Steve
data:image/s3,"s3://crabby-images/c9741/c974183416df30c9f9c32ee54d2e8443d1cff567" alt=""
I hope the following remarks are constructive to the subject. It seems what's not liked about f{*x} notation is mainly that it looks parallel to f(*x) or e.g., often possibly f[(*x, )] , which seems to be a problem mostly only because of the character "f". If the prefix were "frozenset", then frozenset{*x} would be similar to frozenset(x) so it wouldn't be too wrong if the notation looks as if it's for a modification of a call of frozenset till this name is made to refer to something else by the programmer's own preference, even in which case, the name may not be very likely to refer to something conceptually far from the built-in frozenset. I also expect the notation wouldn't be too cryptic any more then. Best regards, Takuo Matsuoka
data:image/s3,"s3://crabby-images/437f2/437f272b4431eff84163c664f9cf0d7ba63c3b32" alt=""
Matsuoka Takuo writes:
I hope the following remarks are constructive to the subject.
They are constructive, but I disagree with the factual assessment:
It seems what's not liked about f{*x} notation is mainly that it looks parallel to f(*x) or e.g., often possibly f[(*x, )] ,
I think for many of us (specifically me, but I don't think I'm alone) it's equally important that we aren't persuaded that there's a need for a frozenset literal great enough to overcome the normal reluctance to add syntax. A number of important cases are already optimized to frozensets, and use cases and (more important) benchmarks showing that this optimization is important enough to add syntax so the programmer can hand-code it are mostly to entirely lacking. Steve
data:image/s3,"s3://crabby-images/b4d21/b4d2111b1231b43e7a4c304a90dae1522aa264b6" alt=""
Hi Steve D'Aprano started this thread on 16 Jan, referencing https://bugs.python.org/issue46393. In the 95th message in this thread, on 27 Jan, Stephen J. Turnbull wrote: I think for many of us (specifically me, but I don't think I'm alone) it's
On 17 Jan, Serhiy Storchaka wrote to the b.p.o issue Steve D'Aprano referenced: As Steven have noted the compiler-time optimization is not applicable here
because name frozenset is resolved at run-time.
In these cases where a set of constants can be replaced with a frozenset of
constants (in "x in {1,2,3}" and in "for x in {1,2,3}") the compiler does it.
And I don't think there is an issue which is worth changing the language.
This is message 96 in this thread. Perhaps something bad (or good) will happen when we get to message 100. https://www.theregister.com/2022/01/27/linux_999_commits/ -- Jonathan
data:image/s3,"s3://crabby-images/2658f/2658f17e607cac9bc627d74487bef4b14b9bfee8" alt=""
On 20/01/22 3:17 am, Joao S. O. Bueno wrote:
Frozensets are immutable, so nobody should be making any assumptions about whether equal frozensets are the same object or not -- just as with ints, strings, etc. It would be just as legitimate e.g. for "ABC".upper() to return the same string object. There is no exception here to be learned. -- Greg
data:image/s3,"s3://crabby-images/ffff5/ffff5de931393fdc626e9776683853497585ed98" alt=""
Hello, I'm new here. Has anyone proposed the following solution yet? list.from_args(1, 2, 3) tuple.from_args(1, 2, 3) set.from_args(1, 2, 3) frozenset.from_args(1, 2, 3) iter.from_args(1, 2, 3) array.array.from_args('i', 1, 2, 3)
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
On Sat, Jan 29, 2022 at 3:16 AM Jure Šorn <sornjure@gmail.com> wrote:
frozenset.from_args(1, 2, 3)
This wouldn’t solve the problem at hand, as ‘frozenset’ could be rebound. And what advantage does this have over the existing constructors? -CHB
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Sun, Jan 16, 2022 at 7:35 PM Steven D'Aprano <steve@pearwood.info> wrote:
How does this work for you?
f{1, 2, 3}
While it's tempting, it does create an awkward distinction. f(1, 2, 3) # look up f, call it with parameters f[1, 2, 3] # look up f, subscript it with paramters f{1, 2, 3} # construct a frozenset And that means it's going to be a bug magnet. Are we able to instead make a sort of vector literal? <1, 2, 3> Unfortunately there aren't many symbols available, and Python's kinda locked into a habit of using just one at each end (rather than, say, (<1, 2, 3>) or something), so choices are quite limited. ChrisA
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sun, Jan 16, 2022 at 09:18:40PM +1100, Chris Angelico wrote:
You forgot f"1, 2, {x+1}" # eval some code and construct a string Not to mention: r(1, 2, 3) # look up r, call it with parameters r[1, 2, 3] # look up r, subscript it r"1, 2, 3" # a string literal
And that means it's going to be a bug magnet.
I don't think that f{} will be any more of a bug magnet than f"" and r"" already are.
Are we able to instead make a sort of vector literal?
<1, 2, 3>
Back in the days when Python's parser was LL(1), that wasn't possible. Now that it uses a PEG parser, maybe it is, but is it desirable? Reading this makes my eyes bleed: >>> <1, 2, 3> < <1, 2, 3, 4> True
Triple quoted strings say hello :-) {{1, 2, 3}} would work, since that's currently a runtime error. But I prefer the f{} syntax. -- Steve
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Sun, Jan 16, 2022 at 11:18 PM Steven D'Aprano <steve@pearwood.info> wrote:
Strings behave differently in many many ways. Are there any non-string types that differ?
Fair point, but I can't imagine people comparing two literals like that. It's not quite as bad if you replace the left side with a variable or calculation, though it's still kinda weird.
See above, strings are different, and people treat them differently.
{{1, 2, 3}} would work, since that's currently a runtime error. But I prefer the f{} syntax.
Yeah, I think that ship has sailed. Double punctuation just isn't Python's thing, so there aren't really any good ways to shoehorn more data types into fewer symbols. ChrisA
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sun, Jan 16, 2022 at 11:41:52PM +1100, Chris Angelico wrote:
On Sun, Jan 16, 2022 at 11:18 PM Steven D'Aprano <steve@pearwood.info> wrote:
There are plenty of non-string types which differ :-) Differ in what way? I don't understand your question. You were concerned that adding a prefix to a delimiter in the form of f{...} would be a bug magnet, but we have had prefixes on delimiters for 30 years in the form of r"..." etc, and it hasn't been a problem. I mean, sure, the occasional beginner might get confused and write len{mystring} and if by some fluke they call f() rather than len() they will get a silent failure instead of a SyntaxError, but is this really a serious problem that is common enough to get labelled "a bug magnet"? I've been coding in Python for two decades and I still occassionally mess up round and square brackets, especially late at night, and I won't tell you how often I write my dict displays with equal signs {key=value}, or misspell str.center. And I still cringe about the time a few years back where my brain forgot that Python spells it "None" rather than "nil" like in Pascal, and I spent about an hour writing a ton of "if obj is nil"... tests. Typos and brain farts happen.
They don't have to be literals inside the brackets. Especially in the REPL, `{*a} < {*b}` is a quick way of testing that every element of a is an element of b. [...]
Triple quoted strings say hello :-)
See above, strings are different, and people treat them differently.
Do they? How are they different? You have a start delimiter and an end delimiter. The only difference I see is that with strings the delimiter is the same, instead of a distinct open and close delimiter. But that difference is surely not a reason to reject the use of a prefix. "We can't use a prefix because the closing delimiter is different from the opening delimiter" does not follow. -- Steve
data:image/s3,"s3://crabby-images/83003/83003405cb3e437d91969f4da1e4d11958d94f27" alt=""
On 2022-01-16 16:11, Steven D'Aprano wrote:
Well, there is a big difference, which is that the stuff between other delimiters (parentheses, brackets, etc.) is wholly constrained by Python syntax, whereas the stuff between string delimiters is free-form text, with only a few restrictions (like not being able to use the delimiter itself, or to include newlines in single-quoted strings). Whether that difference is important for your proposal I won't address right now. But it is a big difference. It also greatly affects how people view the code, since syntax highlighters will often color an entire string literal with the same color, whereas they don't typically do that for other kinds of delimited chunks, instead highlighting only the delimiters themselves. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sun, Jan 16, 2022 at 04:23:47PM -0800, Brendan Barnwell wrote:
Which is also wholly constrained by Python syntax, seeing as they are Python strings :-)
Are any of these differences relevant to putting a prefix on the opening delimiter? If they are not, why mention them? We could also talk about the difference between the numeric value of the ASCII symbols, or the number of pixels in a " versus a { glyph, or the linguistic history of the words "quotation mark" versus "curly bracket" too, but none of these things seem to be any more relevant than whether IDEs and syntax colourizers colour "..." differently to {1, None, 4.5}. Can we bypass what could end up being a long and painful discussion if I acknowledge that, yes, frozensets are different to strings, and so the syntax is different. (Different things necessarily have different syntax. Otherwise they would be indistinguishable.) - Sets (like lists, tuples and dicts) are compound objects that contain other objects, so their displays involve comma-separated items; - string (and byte) literals are not, except in the sense that strings can be considered to be an array of single-character substrings. I thought that was so obvious and so obviously irrelevant that it didn't even need mentioning. Perhaps I am wrong. (It has to happen eventually *wink*) If somebody can explain *why* that matters, rather than just declare that it rules out using a prefix, I would appreciate the education. Hell, even if your argument is just "Nope, I just don't like the look of it!", I would respect that even if I disagree. Aesthetics are important, even when they are totally subjective. If it helps, Julia supports this syntax for typed dicts: Dict{keytype, valuetype}(key => value) where the braces {keytype, valuetype} are optional. That's not a display syntax as such, or a prefix, but it is visually kinda similar. Here are some similar syntax forms with prefixes: * Dylan list displays: #(a, b, c) * Smalltalk drops the comma separators: #(a b c) * Scheme and Common Lisp: '(a b c) and double delimiters: * Pike: ({ a, b, c }) * Pike dicts: ([ a:b, c:d ]) Not that we could use any of those as-given. -- Steve
data:image/s3,"s3://crabby-images/2eb67/2eb67cbdf286f4b7cb5a376d9175b1c368b87f28" alt=""
On 2022-01-17 01:36, Steven D'Aprano wrote:
How about doubling-up the braces: {{1, 2, 3}} and for frozen dicts: {{1: 'one', 2: 'two', 3: 'three'}} if needed? Those currently raise exception because sets and dics are unhashable. It might be confusing, though, if you try to nest them, putting a frozenset in a frozenset: {{ {{1, 2, 3}} }} or, without the extra spaces: {{{{1, 2, 3}}}} but how often would you do that?
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Jan 17, 2022 at 03:05:36AM +0000, MRAB wrote:
How about doubling-up the braces:
{{1, 2, 3}}
I mentioned that earlier. Its not *awful*, but as you point out yourself, it does run into the problem that nested sets suffer from brace overflow. # A set of sets. {{{{}}, {{1, 2}}, {{'a', {{}}, None}}}}
and for frozen dicts:
We don't even have a frozen dict in the stdlib, so I'm not going to discuss that here. If and when we get a frozen dict, if it is important enough to be a builtin, then we can debate syntax for it.
but how often would you do that?
Often enough: https://www.delftstack.com/howto/python/python-set-of-sets/ https://stackoverflow.com/questions/37105696/how-to-have-a-set-of-sets-in-py... https://stackoverflow.com/questions/5931291/how-can-i-create-a-set-of-sets-i... -- Steve
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Mon, Jan 17, 2022 at 11:18 AM Steven D'Aprano <steve@pearwood.info> wrote:
*sigh* I know you love to argue for the sake of arguing, but seriously, can't you read back to your own previous message and get your own context? With punctuation like parentheses, square brackets, angle brackets, etc, does not ever, to my knowledge, have prefixes. ONLY strings behave differently. Are there any non-string types which have special behaviour based on a prefix, like you're suggesting for sets?
Yes. ONLY on strings. That's exactly what I said. Strings are different. For starters, we already have multiple different data types that can come from quoted literals, plus a non-literal form that people treat like a literal (f-strings). Is there any non-string type that doesn't follow that pattern?
len{mystring} looks a lot like len(mystring), but len"mystring" looks very very different. Or do you treat all punctuation exactly the same way?
You mess up round and square brackets, yes. You might mess up double and single quotes, too, in languages that care. But you aren't going to mess up brackets and quotes.
People treat them differently. That's why f-strings are a thing: we treat strings as strings even when they're expressions. Strings ARE special. I asked you if there are any non-strings that are similarly special. You have not found any examples. ChrisA
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Jan 17, 2022 at 12:16:07PM +1100, Chris Angelico wrote:
Speaking of context, it is not nice of you to strip the context of my very next sentence, in order to make me out to be the bad guy here. "Differ in what way? I don't understand your question." Let me repeat: I do not understand your question. In what way do you think that non-string types differ, that is relevant to the discussion? There are plenty of ways that they differ. I don't see how those differences are meaningful. If you do, please explain. (I see that further on, you made an attempt. Thank you, I will respond to that below.)
With punctuation like parentheses, square brackets, angle brackets, etc, does not ever, to my knowledge, have prefixes.
In another post, I have pointed out a few languages which do something very close to this, e.g. Scheme. A better example is Coconut. Coconut allows an optional 's' prefix on sets, so as to allow `s{}` for an empty set. It also allows the 'f' prefix for frozensets: https://coconut.readthedocs.io/en/v1.1.0/DOCS.html#set-literals But even if it has never been done before, someone has to be the first. There was a time that no language ever used r"..." for raw strings, or f"..." for f-strings. There was a time where there was no language in the world that used slice notation, or {a, b} for sets. There was once a time that no language had list comprehension syntax. And now Python has all those syntactic features. "It has never been done before" is a weak argument, and it is especially weak when it *has* been done before. We have at least three syntactic forms that use an alphabetical prefix on a delimiter. It seems rather dubious that you are happy to use the < and > operators as delimiters, just to avoid a prefix. - There is no precedent in Python of a symbol being used as both an operator and a delimiter: none of ( [ { } ] ) are ever used as operators, and no operator + - etc are ever used as delimiters. But you are happy to make this radical change to the language, even though you agree that it looks pretty awful when used with the < and > operators. - But there is precedent in Python of adding an alphabetic prefix to delimiters: we have b' r' f'. But you are unhappy with making a minor change to the language by putting the prefix on a symbol different from a quote mark, because the data type is not a string or bytes. Your position seems to be: We have used prefixes on delimiters before, therefore we cannot do it again; and we've never used operators as delimiters before, therefore we should do it now. Is that accurate? If not, I apologise for misunderstanding you, but please explain what you mean. These are not rhetorical questions: (1) Why does it matter that string and bytes syntax are the only types that currently use a prefix on a delimiter? Surely delimiters are delimiters, whatever the type they represent. (2) What is so special about the string and bytes types that tells you that it is okay to use a prefix on the delimiter, but other data structures must not do the same?
I am not proposing any special behaviour. This is *syntax*, not behaviour. Frozensets will continue to behave exactly the same as they behave now. If that was not clear, I apologise for giving you the wrong impression. As mentioned above, Coconut already supports this, and some other languages support similar prefixes on delimiters for lists and dicts.
So... not "ONLY" (your emphasis, not mine) strings. As you say, there are already two different data types, strings and bytes, plus a non-literal executable code (f-strings). Again, I don't understand your point. You have just told us that there are already non-string literals and types that use a prefix on the delimiter, and then you ask me if there are any non-string types that follows that pattern. Haven't you just answered your own question? Sure, bytes and f-strings are quite closely related to regular strings, and frozensets are not. But why does that matter? This is not a rhetorical question. You obviously feel that there is something special or magical about strings that makes it okay to stick a prefix on the opening delimiter, but for the life of me I can't see what it is. Is it just taste? You just don't like the look of `f{}`?
Yes it does look different, but what does that have to do with anything? f() and f"" look very similar, and r() and r"" do too. I think this idea that people will confuse f{} for a function call, and extrapolate to using it in arbitrary functions, is unjustified. But even if they do, except in the very special case where the function is called exactly `f`, they will get a SyntaxError. And even then, the error will be pretty obvious. Calling this a "bug magnet" seems to be a gross exaggeration.
Then why is this a problem for curly brackets, when you just agreed that it is not a problem for round and square brackets? We don't write b"" or r"" or f"" when we want b() or r[] or f(), or vice versa, but we'll suddenly start confusing f"" and f{}? I think that is pure FUD (Fear, Uncertainty, Doubt). -- Steve
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Mon, Jan 17, 2022 at 2:01 PM Steven D'Aprano <steve@pearwood.info> wrote:
An f-string still yields a string. It's not a literal but it's still a string. A bytestring is still a string. It's not a Unicode string but it's still a string. These are not the same as lists, tuples, sets, dicts, etc, which contain arbitrary objects. ONLY strings. You just picked up on the part where I said "not only literals" and got it completely backwards. Strings are not the same as lists. Strings are not the same as tuples. Strings are the only data type in Python that has prefixes that determine the data type you get. Strings are TREATED DIFFERENTLY by programmers, which is why f-strings get treated like string literals. Strings. Are. Different. I do not know how to make this any clearer. If you do not understand my position, please stop misrepresenting me. ChrisA
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
Is there no way to optimize the byte code without adding to the language? Not that it’s a bad idea anyway, but I wonder if frozen sets are common enough to warrant a change. Are there any performance advantages to a frozen set? I ask because I do often use sets that could be frozen, but don’t need to be. E.g. they don’t change, nor are they used as keys. For example: If flag in {‘the’, ‘allowable’, ‘flags’}: … If a frozen set was even a little bit faster or used less memory, it would be nice to be able to create one directly. -CHB On Sun, Jan 16, 2022 at 8:50 AM MRAB <python@mrabarnett.plus.com> wrote:
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/b4d21/b4d2111b1231b43e7a4c304a90dae1522aa264b6" alt=""
Summary: Further information is provided, which suggests that it may be best to amend Python so that "frozenset({1, 2, 3})" is the literal for eval("frozenset({1, 2, 3})"). Steve D'Aprano correctly notes that the bytecode generated by the expression x in {1, 2 ,3} is apparently not optimal. He then argues that introducing a frozenset literal would allow x in f{1, 2, 3} # New syntax, giving a frozenset literal would allow better bytecode to be generated. However, the following has the same semantics as "x in {1, 2, 3}" and perhaps gives optimal bytecode.
For comparison, here's the bytecode Steve correctly notes is apparently not optimal.
Steve states that "x in {1, 2, 3}" when executed calls "frozenset({1, 2, 3})", and in particular looks up "frozenset" in builtins and literals. I can see why he says that, but I've done an experiment that suggests otherwise.
I suspect that if you look up in the C-source for Python, you'll find that dis.dis ends up using frozenset({1, 2, 3}) as the literal for representing the result of evaluating frozenset({1, 2, 3}). The following is evidence for this hypothesis:
To conclude, I find it plausible that: 1. The bytecode generated by "x in {1, 2, 3}" is already optimal. 2. Python already uses "frozenset({1, 2, 3})" as the literal representation of a frozenset. Steve in his original post mentioned the issue https://bugs.python.org/issue46393, authored by Terry Reedy. Steve rightly comments on that issue that "may have been shadowed, or builtins monkey-patched, so we cannot know what frozenset({1, 2, 3}) will return until runtime." Steve's quite right about this shadowing problem. In light of my plausible conclusions I suggest his goal of a frozenset literal might be better achieved by making 'frozenset' a keyword, much as None and True and False are already keywords.
Once this is done we can then use frozenset({1, 2, 3}) as the literal for a frozenset, not only in dis.dis and repr and elsewhere, but also in source code. As a rough suggestion, something like from __future__ import literal_constructors_as_keywords would prevent monkey-patching of set, frozenset, int and so forth (just as True cannot be monkeypatched). I thank Steve for bringing this interesting question to our attention, for his earlier work on the issue, and for sharing his current thoughts on this matter. It's also worth looking at the message for Gregory Smith that Steve referenced in his original post. https://mail.python.org/pipermail/python-ideas/2018-July/051902.html Gregory wrote: frozenset is not the only base type that lacks a literals leading to loading values into these types involving creation of an intermediate throwaway object: bytearray. bytearray(b'short lived bytes object') I hope this helps. -- Jonathan
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
I’m a bit confused — would adding a “literal” form for frozenset provide much, if any, of an optimization? If not, that means it’s only worth doing for convenience. How often do folks need a frozen set literal? I don’t think I’ve ever used one. If we did, then f{‘this’: ‘that’} should make a frozen dict, yes? On Sun, Jan 16, I suggest his goal of a frozenset literal might be better achieved by making 'frozenset' a keyword, much as None and True and False are already keywords. Adding a keyword is a very Big Deal. I don’t think this rises to that level at all. It was done for True and False because having them as non-redefineable names referencing singletons is really helpful. -CHB
data:image/s3,"s3://crabby-images/5f8b2/5f8b2ad1b2b61ef91eb396773cce6ee17c3a4eca" alt=""
On Mon, 17 Jan 2022 at 00:46, Christopher Barker <pythonchb@gmail.com> wrote:
You won't have used one because they have not yet existed (hence this thread).
If we did, then f{‘this’: ‘that’} should make a frozen dict, yes?
A frozen dict would also be useful but the implementation doesn't exist. If it did exist then in combination with this proposal that syntax for frozen dicts would be an obvious extension. A more relevant question right now is if any other set syntax should apply to frozensets e.g. should this work: >>> squares = f{x**2 for x in range(10)} -- Oscar
data:image/s3,"s3://crabby-images/995d7/995d70416bcfda8f101cf55b916416a856d884b1" alt=""
On Mon, Jan 17, 2022 at 9:58 AM Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
Although we don't have frozenset literal now, we have frozenset. So we can estimate how frozenset literal is useful by seeing how frozenset is used. Unless how the literal improve codes is demonstrated, I am -0.5 on new literal only for consistency. Regards, -- Inada Naoki <songofacandy@gmail.com>
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
On Sun, Jan 16, 2022 at 7:05 PM Steven D'Aprano <steve@pearwood.info> wrote:
I never suggested adding this "for consistency".
Then what ARE you suggesting it for? As far as I can tell, it would be a handy shorthand. And you had suggested it could result in more efficient bytecode, but I think someone else thought that wasn't the case. It could lead to some optimization -- literals being treated as contents, yes? But what does that matter? are they heavenly used in any common code? -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sun, Jan 16, 2022 at 10:33:41PM -0800, Christopher Barker wrote:
Apologies if my initial post was not clear enough: https://mail.python.org/archives/list/python-ideas@python.org/message/GRMNMW... CPython already has all the machinery needed to create constant frozensets of constants at compile time. It already implicitly optimizes some expressions involving tuples and sets into frozensets, but such optimizations are fragile and refactoring your code can remove them. Ironically, that same optimization makes the explicit creation of a frozenset needlessly inefficient. See also b.p.o. #46393. The only thing we need in order to be able to explicitly create frozensets efficiently, without relying on fragile, implicit, implementation-dependent peephole optimizations which may or may not trigger, and without triggering the usual global+builtins name lookup, is (I think) syntax for a frozenset display. That would make the creation of frozensets more efficient, possibly encourage people who currently are writing slow and inefficient code like targets = (3, 5, 7, 11, 12, 18, 27, 28, 30, 35, 57, 88) if n in targets: do_something() to use a frozenset, as they probably should already be doing.
As far as I can tell, it would be a handy shorthand.
If you consider tuple, list and dict displays to be a handy shortcut, then I guess this would be too :-)
And you had suggested it could result in more efficient bytecode, but I think someone else thought that wasn't the case.
I see no reason why it wouldn't lead to more efficient bytecode, at least sometimes.
But what does that matter? are they heavenly used in any common code?
Personally, I think my code using frozensets is extremely heavenly :-) I doubt that frozensets are, or ever will be, as common as lists or dicts. In that sense, sets (frozen or otherwise) are, I guess, "Second Tier" data structures: - first tier are lists, tuples, dicts; - second tier are sets, deques etc. Or possibly "tier and a half" in that unlike deques they are builtins, which suggest that they are somewhat more important. In the top level of the stdlib (not dropping down into packages or subdirectories), I count 29 calls to frozenset. (Compared to 14 calls to deque, so on mere call count, I would say frozenset is twice as important as deque :-) Out of those 29 calls, I think that probably 13 would be good candidates to use a frozenset display form (almost half). For example: ast.py: binop_rassoc = frozenset(("**",)) # f{("**",)} asyncore.py: ignore_log_types = frozenset({'warning'}) # f{'warning'} Not all of them are purely literals, e.g. asyncore.py: _DISCONNECTED = frozenset({ECONNRESET, ENOTCONN, ...}) would still have to generate the frozenset at runtime, but it wouldn't need to look up the frozenset name to do so so there would still be some benefit. If we were particularly keen, that might go up to 19 out of the 29. The benefit is not huge. This is not list comprehensions or decorator syntax, which revolutionized the way we write Python, it is an incremental improvement. If the compiler didn't already have the machinery in place for building compile-time constant frozensets, this might not be worth the effort. But since we do, the cost of adding a frozenset display is relatively low (most of the work is already done, yes?) and so the benefit needs only to be small to justify the small(?) amount of work needed. -- Steve
data:image/s3,"s3://crabby-images/8e91b/8e91bd2597e9c25a0a8c3497599699707003a9e9" alt=""
On Mon, 17 Jan 2022 at 10:12, Steven D'Aprano <steve@pearwood.info> wrote:
More realistically, would they not use a set already, as in targets = {3, 5, 7, 11, 12, 18, 27, 28, 30, 35, 57, 88} if n in targets: do_something() ? Is using a frozenset a significant improvement for that case? Because I doubt that anyone currently using a tuple would suddenly switch to a frozenset, if they haven't already switched to a set. Sure, there might be the odd person who sees the release notes and is prompted by the mention of frozenset literals to re-think their code, but that's probably a vanishingly small proportion of the audience for this change. BTW, I should say that I'm actually +0.5 on the idea. It seems like a reasonable thing to want, and if an acceptable syntax can be found, then why not? But I doubt it's going to have a major impact either way. Paul
data:image/s3,"s3://crabby-images/995d7/995d70416bcfda8f101cf55b916416a856d884b1" alt=""
On Mon, Jan 17, 2022 at 7:44 PM Paul Moore <p.f.moore@gmail.com> wrote:
This is very inefficient because building a set is much heavier in `n in tuple`. We should write `if n in {3, 5, 7, 11, 12, 18, 27, 28, 30, 35, 57, 88}` for now. Or we should write `_TARGETS = frozenset((3, 5, 7, 11, 12, 18, 27, 28, 30, 35, 57, 88))` in global scope and use it as `if n in _TARGETS`. -- Inada Naoki <songofacandy@gmail.com>
data:image/s3,"s3://crabby-images/995d7/995d70416bcfda8f101cf55b916416a856d884b1" alt=""
On Mon, Jan 17, 2022 at 7:10 PM Steven D'Aprano <steve@pearwood.info> wrote:
Both are in class scope so the overhead is very small.
Name lookup is faster than building set in most case. So I don't think cost to look name up is important at all. Proposed literal might have significant efficiency benefit only when: * It is used in the function scope. and, * It can not be optimized by the compiler now. I am not sure how many such usages in stdlib. Regards, -- Inada Naoki <songofacandy@gmail.com>
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Jan 17, 2022 at 08:04:50PM +0900, Inada Naoki wrote:
Name lookup is faster than building set in most case. So I don't think cost to look name up is important at all.
But the cost to look up the name is *in addition* to building the set. If you saw this code in a review: t = tuple([1, 2, 3, 4, 5]) would you say "that is okay, because the name lookup is smaller than the cost of building the list"? I wouldn't. I would change the code to `(1, 2, 3, 4, 5)`.
Sometimes, now, the compiler *pessimizes* the construction of the frozen set. See b.p.o #46393. -- Steve
data:image/s3,"s3://crabby-images/b4d21/b4d2111b1231b43e7a4c304a90dae1522aa264b6" alt=""
The compiler can figure out that the value of {1, 2, 3} is a set containing the elements 1, 2 and 3. The problem with the value of frozenset({1, 2, 3}) is that the value of frozenset depends on the context. This is because frozenset = print is allowed. According to help(repr): repr(obj, /) Return the canonical string representation of the object. For many object types, including most builtins, eval(repr(obj)) == obj. Consistency suggests that if x = f{1, 2, 3} gives always gives frozenset as the value of x then repr(x) should be the string 'f{1, 2, 3}'. At present, I think, repr(x) always returns a literal if it can. However, changing the repr of frozenset introduces problems of backwards compatibility, particularly in doctests and documentation. Another way to achieve consistency is to make frozenset a keyword, in the same way that None, True and False are identifiers that are also language keywords. Both proposals as stated have negative side-effects. I suggest we explore ways of reducing the above and any other side effects. -- Jonathan
data:image/s3,"s3://crabby-images/995d7/995d70416bcfda8f101cf55b916416a856d884b1" alt=""
On Mon, Jan 17, 2022 at 8:49 PM Steven D'Aprano <steve@pearwood.info> wrote:
I meant it is negligible so we can just ignore it while this discussion.
* I never said it. I just said just lookup cost is not good reason because you listed name lookup cost for rationale. Please stop strawman. * tuple construction is much faster than set construction. So name lookup speed is more important for tuple. * Constant tuple is much much frequently used than constant set.
I saw. And I know all the discussions in the b.p.o. already. But how important it is for Python depends on how often it is used, especially in hot code. Regards, -- Inada Naoki <songofacandy@gmail.com>
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Jan 17, 2022 at 11:18:13PM +0900, Inada Naoki wrote:
On my computer, the name lookup is almost a quarter of the time to build a set: [steve ~]$ python3.10 -m timeit "frozenset" 10000000 loops, best of 5: 24.4 nsec per loop [steve ~]$ python3.10 -m timeit "{1, 2, 3, 4, 5}" 2000000 loops, best of 5: 110 nsec per loop and about 10% of the total time: [steve ~]$ python3.10 -m timeit "frozenset({1, 2, 3, 4, 5})" 1000000 loops, best of 5: 237 nsec per loop If I use a tuple instead of the set, it is about 12% of the total time: [steve ~]$ python3.10 -m timeit "frozenset((1, 2, 3, 4, 5))" 2000000 loops, best of 5: 193 nsec per loop So not negligible. -- Steve
data:image/s3,"s3://crabby-images/b4d21/b4d2111b1231b43e7a4c304a90dae1522aa264b6" alt=""
Earlier today in https://bugs.python.org/issue46393, Serhiy Storchaka wrote: As Steven have noted the compiler-time optimization is not applicable here because name frozenset is resolved at run-time. In these cases where a set of constants can be replaced with a frozenset of constants (in "x in {1,2,3}" and in "for x in {1,2,3}") the compiler does it. And I don't think there is an issue which is worth changing the language. Creating a frozenset of constants is pretty rare, and it is even more rare in tight loops. The most common cases (which are pretty rare anyway) are already covered. -- Jonathan
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
On Mon, Jan 17, 2022 at 3:50 AM Steven D'Aprano <steve@pearwood.info> wrote:
Of course, everyone would -- because tuple displays already exist. I'd suggest refactoring that code even if the compiler could completely optimize it away. Would you let: l = list([1, 2, 3, 4, 5]) pass code review either? even if there were no performance penalty? I wouldn't, because it's redundant, not because it's slower. Also that pattern is actually very common for types that aren't built-in (or even in the stdlib). It's always kind of bugged me that I need to write: arr = np.array([1, 2, 3, 4]) And I'm creating a list just so I can pass it to the array constructor. But in practice, it's not a performance problem at all. And in code in the wild, I'll bet numpy arrays are used orders of magnitude more than frozen sets ;-) Sometimes, now, the compiler *pessimizes* the construction of the frozen
set. See b.p.o #46393.
yup. Using a 'constant' frozenset is slower than 'constant' set, when doing not much else: In [29]: def setfun(): ...: s = {1, 3, 5, 2} ...: i = 3 ...: if i in s: ...: return 'yes' ...: In [30]: def fsetfun(): ...: s = frozenset((1, 3, 5, 2)) ...: i = 3 ...: if i in s: ...: return 'yes' ...: In [31]: %timeit setfun() 194 ns ± 10.6 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) In [32]: %timeit fsetfun() 286 ns ± 2.72 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) But: would you notice if that function did any real work? And I think we could call this one of the many micro-optimizations we have in Python: Don't use a frozenset as a constant when a regular set will do. So it comes down to how often frozen sets as constants are required. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
On Sun, Jan 16, 2022 at 4:55 PM Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
di you really notunderstand my point? I have never used the frozenset() with a literal. i.e. never had a use case for a frozenset literal. As I mentioned in another note, I do use set displays where they *could* be frozen sets, but I dont think they've ever needed to be. And if there isn't a performance advantage, then I'm fine with that.
Ah yes, I think my brain blipped because there have been multiple proposals on this list for such a thing -- but they were never realized.
>>> squares = f{x**2 for x in range(10)}
Interesting idea. It feels a bit like that's realyl opening a door to a lot if proposals -- is that good or bad thing? -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Mon, Jan 17, 2022 at 12:54:58AM +0000, Oscar Benjamin wrote:
If display syntax for frozensets were to be approved, then we should consider frozenset comprehensions as well. That would be an obvious extension of the syntax. But I don't know if there are technical difficulties with that proposal that might make it less attractive. -- Steve
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sun, Jan 16, 2022 at 04:43:36PM -0800, Christopher Barker wrote:
I’m a bit confused — would adding a “literal” form for frozenset provide much, if any, of an optimization?
Yes. In at least some cases, it would avoid going through the song and dance: 1. create a frozenset 2. convert the frozenset to a regular set 3. convert the regular set back to a frozenset 4. garbage collect the regular set See the b.p.o. ticket referenced earlier, as well as the disassembled code. In other cases it would avoid: 1. create a set, tuple or list 2. create a frozenset 3. garbage collect the set, tuple or list It would also avoid the name lookup of `frozenset`, and guarantee that even if that name was shadowed or monkey-patched, you still get a genuine frozenset. (Just as [1, 2, 3] is guaranteed to return a genuine list, even if the name "list" is deleted, shadowed or replaced.) At the moment, sets and frozensets still share the same implementation, but some years ago Serhiy suggested that he had some optimizations in mind that would make frozensets smaller than regular sets.
How often do folks need a frozen set literal? I don’t think I’ve ever used one.
If you are writing `if x in ("this", "that", "another", "more")` then you probably should be using a frozenset literal, since membership testing in sets is faster than linear search of a tuple. I think that the CPython peephole optimizer actually replaces that tuple with a frozenset, which is cool, but you can defeat that optimization and go back to slow linear search by refactoring the code and giving the targets a name: targets = ("this", "that", "another", "more") if x in targets: ...
If we did, then f{‘this’: ‘that’} should make a frozen dict, yes?
We would have to get a frozen dict first, but if we did, that would be an obvious syntax to use. -- Steve
data:image/s3,"s3://crabby-images/3ab06/3ab06bda198fd52a083b7803a10192f5e344f01c" alt=""
Not really relevant for the discussion, but CPython automaticly creates a frozenset here (set display with immutable members) as an optimisation.
AFAIK the primary advantage of doing this is that the frozenset gets created once instead of every time the expression is executed. Frozenset itself is not faster than a regular set. Ronald — Twitter / micro.blog: @ronaldoussoren Blog: https://blog.ronaldoussoren.net/
data:image/s3,"s3://crabby-images/a3b9e/a3b9e3c01ce9004917ad5e7689530187eb3ae21c" alt=""
On Mon, Jan 17, 2022 at 9:07 AM Ronald Oussoren <ronaldoussoren@mac.com> wrote:
I think it's quite relevant to the discussion, because as far as I can tell, better performance in particular cases is the primary motivator. Funny that this has come up -- not too long ago, I did some experiments with code like the above: and to the surprise of myself and some other long-time Pythonistas I work with, using sets, rather tha tuples in those kinds of constructs, e.g.: if something in <a small collection or literals>: was always as faster or faster with sets than tuples. That was surprising because we assumed that construction of a set would be slower than construction of a tuple. And that was probably the case ten years ago. The proof is in the pudding,so I never bothered to figure out why, but now I know :-) Back to the topic at hand -- IIUC, set constants are already optimized, so the only places having a frozenset display would be when it is a constant, and it has to be a frozenset, where a regular one won't do. And that would only be noticeable if it was in a function that didn't do much else, and was called often. And in that case, it could be put in the global scope to ameliorate some of that cost. I believe Stephens' point is that the benefit may be fairly small, but so is the cost. I'm not so sure. I kind of like the idea myself, and the cost does seem small, but I don't think we should underestimate the cost of even this small complexity increase in the language. Sure, folks don't have toeven know it exists to write fine code, but it would be one more thing that newbies will need to figure out when they see it in others' code. In fact, there' a lot of what I might call "Python Scripters" that aren't even familiar with the set display at all. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
data:image/s3,"s3://crabby-images/f3b2e/f3b2e2e3b59baba79270b218c754fc37694e3059" alt=""
but I don't think we should underestimate the cost of even this small complexity increase in the language.
Actually, I think _maybe_ in this case the "complexity increase" cost is _negative_. People might waste more time looking for a way of spelling a frozenset literal than just filling in "frozenset(....)". I for one, even knowing that the cost of writing "frozenset({1,2,3})" is negligible, would "feel" better there was a way to spell that without the needless conversions. That said, an appropriate prefix for the {} just as we do for strigns would be nice, and I disagree that it would be a significant source for "bugs". The "@{" is a nice way out if people think "f{}" would be too close to "f()". And "<1,2,3>" just for frozensets are indeed overkill. We already do "literal prefixing" with `"` after all. and formally extending this prefix usage as needed for other literals seems like a nice path. But, as far as bikeshedding go, we also have "literal sufixing" (2.0j anyone?)- maybe "{1,2,3}f" ? On Mon, Jan 17, 2022 at 2:43 PM Christopher Barker <pythonchb@gmail.com> wrote:
data:image/s3,"s3://crabby-images/47610/4761082e56b6ffcff5f7cd21383aebce0c5ed191" alt=""
On Tue, Jan 18, 2022 at 10:02 AM Joao S. O. Bueno <jsbueno@python.org.br> wrote:
I have been following along with not much to comment but this response sparked something in me. After reading all the viewpoints I think I would be +1 on the basic idea, and a +1 on the postfix/suffix syntax just suggested... the other syntaxes I'm more of +0.5 I like the way the suffix FLOWS with the act of writing the program. When I write a set, I am primarily focused on *what I am going to put in it*, and whether or not it should be mutable is kind of a later thought/debate in my head after I have established what it contains. As a dumb example, if my task at hand is "I need to create a bag of sports balls", I am mostly thinking about what goes into that bag at first, so I will write that first:
{Ball("basketball"), Ball("soccer"), Ball("football"), Ball("golf")}
Now I get to the end of that line, and I then sort of naturally think "ok does it make sense to freeze this" after i know what is in it. With the postfix syntax, I then either type the f:
{Ball("basketball"), Ball("soccer"), Ball("football"), Ball("golf")}f
...or not. With a prefix type syntax, or a smooth bracket syntax, either: A. it takes slightly more "work' at this point to "convert" the set to a frozenset, OR B. i have to think about ahead of time-- before i have actually written what is in the set- whether it will be frozen, or not. In contrast, when you are deciding whether to write a list vs a tuple, you are deciding between two things that are fundamentally far more different IDEAS than a "bag of things, frozen or unfrozen". A list is very often more of an open ended stack than it is "an unfrozen tuple". A tuple is very often much more of an object that can be used as a dictionary key, or a member of a set, than it is a container of things (of course, it is a container of things, too). These differences make is a lot easier to choose, ahead of time, which one makes sense before you have even written the line of code. Maybe I'm making too much of this, but I really like the idea of deciding at the END of the set literal whether to tack on that "f". --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler
data:image/s3,"s3://crabby-images/552f9/552f93297bac074f42414baecc3ef3063050ba29" alt=""
I'm +1 on the idea. I'm happy with the f{ ... } syntax (although I did suggest something else). We already have letter-prefixes, let's stick to them rather than adding something new (which conceivably might one day find another use). Best wishes Rob Cliffe On 18/01/2022 15:53, Ricky Teachey wrote:
data:image/s3,"s3://crabby-images/e15cd/e15cd966f7ed6ae679f0885777782b9db7cb880e" alt=""
One thing to consider is if we're going to have a syntax capable of creating an empty frozenset, we need one that creates an empty set. if f{...} exists, then s{...} should also exist? Regards João Bernardo On Tue, Jan 18, 2022 at 2:59 PM Rob Cliffe via Python-ideas < python-ideas@python.org> wrote:
data:image/s3,"s3://crabby-images/efe10/efe107798b959240e12a33a55e62a713508452f0" alt=""
Even if f{1} creates a frozenset, I don't think f{} should create a frozenset. I think it makes more sense to keep f{1: 2} open for frozendict if it ever makes it in. Also, {} should be consisten with f{} (both should create dicts). If you want an empty frozenset, you would have to do it the same way you do it for sets: either frozenset() or f{*()}. Best Neil On Tuesday, January 18, 2022 at 1:19:30 PM UTC-5 João Bernardo wrote:
data:image/s3,"s3://crabby-images/d9209/d9209bf5d3a65e4774057bb062dfa432fe6a311a" alt=""
Not a huge fan of an f-prefix for a frozen set (I prefer just recognizing the case and optimizing the byte code, I don't think frozensets are used often enough to justify its own syntax), but I love {,} for an empty set. On Tue, Jan 18, 2022 at 4:13 PM Rob Cliffe via Python-ideas < python-ideas@python.org> wrote:
-- -Dr. Jon Crall (him)
data:image/s3,"s3://crabby-images/2658f/2658f17e607cac9bc627d74487bef4b14b9bfee8" alt=""
On 19/01/22 6:41 am, Rob Cliffe via Python-ideas wrote:
I'm happy with the f{ ... }
Fine with me too. I'd also be happy with making frozenset a keyword. It's hard to imagine it breaking any existing code, it avoids having to make any syntax changes, and all current uses of frozenset() on a constant set would immediately benefit from it. -- Greg
participants (26)
-
Ben Rudiak-Gould
-
Brendan Barnwell
-
Cameron Simpson
-
Chris Angelico
-
Christopher Barker
-
Eric V. Smith
-
Greg Ewing
-
Inada Naoki
-
Jelle Zijlstra
-
Joao S. O. Bueno
-
Jonathan Crall
-
Jonathan Fine
-
João Bernardo
-
Jure Šorn
-
Matsuoka Takuo
-
MRAB
-
Neil Girdhar
-
Oscar Benjamin
-
Paul Bryan
-
Paul Moore
-
Ram Rachum
-
Ricky Teachey
-
Rob Cliffe
-
Ronald Oussoren
-
Stephen J. Turnbull
-
Steven D'Aprano