TL/DR: A new built-in attribute who's purpose is to provide a simple way for developers to detect the Python implementation like CPython, JPython, IronPython and PyPy among other information.
Ok, so the reason I'm suggesting this is for another suggestion I'll submit a later date (once I feel this one has ran it's course, or the contributors decide about it).
The goal of this attribute (as mentioned above) is to provide developers quick and simple information about the Python runtime like whether or not it's running on CPython or PyPy and other details.
Key information this attribute provides is implementation's name (ex: CPython), implementation's version (may be independent of Python's) and Python's version (ex: 3.10)
Optional information can include the platform's name/architecture, whether or not it's a JIT/Interpreter/BOTH environment.
This attribute is also flexible so implementators can also provide attributes to show information like whether or not it mimics another implementation, or information unique (or mimic'd) about it.
Another TL/DR: I suck at communicating ideas across, but I hope you get the idea behind it.
collections.Counter has most_common([n]) method which returns the most
common n elements of the counter, but in case of a tie the result is
unspecified --- whereas in practice the order of insertion breaks the
tie. For example:
>>> Counter(["a","a","b","a","b","c","c","d"]).most_common(2)
[('a', 3), ('b', 2)]
>>> Counter(["a","a","c","a","b","b","c","d"]).most_common(2)
[('a', 3), ('c', 2)]
In some cases (which I believe are not rare) you would like to break
the tie yourself or get the top elements by *rank*. Using our example:
Rank Elements
0 {"a"}
1 {"b", "c"}
2 {"d"}
I propose a new method top_n(n) that returns the top elements in the
first n ranks. For example:
>>> Counter(["a","a","b","a","b","c","c","d"]).top_n(0)
[('a', 3)]
>>> Counter(["a","a","b","a","b","c","c","d"]).top_n(1)
[('a', 3), ('b', 2), ('c', 2)]
>>> Counter(["a","a","b","a","b","c","c","d"]).top_n(2)
[('a', 3), ('b', 2), ('c', 2), ('d', 1)]
>>> Counter(["a","a","b","a","b","c","c","d"]).top_n(99)
[('a', 3), ('b', 2), ('c', 2), ('d', 1)]
>>> Counter(["a","a","b","a","b","c","c","d"]).top_n(-1)
[]
Some points to discuss:
* What the return type should be? A list of tuples like most_common()
or List[Tuple[int, List[T]] that conveys the rank information too?
Each tuple is a rank, whose first element is the frequency and
second element is the list of elements. E.g. [(3, ['a']), (2, ['b',
'c']), (1, ['d'])]
* Rank starts at 0 or 1?
* Shall negative numbers raise an exception or return an empty list
like most_common()?
I would love to hear your opinion on this, and if there is interest, I
am happy to try implement it too.
Regards,
Bora M. Alper
https://boramalper.org/
How about enabling subscription operator (`[]`) for generator expressions? Also for all `zip()`, `key()`, etc. They could be evaluated in the background only for the requested amount, to avoid evaluating the whole expression to something like a list or tuple, then indexed.
Hello,
This is my first mail here, so greetings for everyone :)
I would like to introduce an idea of the new operators for dict objects.
Any thoughts are welcome.
Since Python3.9 there is a great feature for merging dicts using | (OR)
operator:
{1: "a"} | {2: "b"} == {1: "a", 2: "b"}
Thus, this basic operation can be done smoothly (also inline).
PEP: https://www.python.org/dev/peps/pep-0584/
Another common operation on dicts is a substitution - building a new dict with
only a part of the data. Usually the part of the data is described as a list
of keys to keep or a list of keys to skip.
Therefore it would be very handy to have a build-in option to filter out a
dict with a similar fashion, for example using & (AND) operator against a
list/tuple/set/frozenset of keys that should be kept in the result):
{1: "a", 2: "b", 3: "c"} & [1, 3, 4] == {1: "a", 3: "c"}
{1: "a", 2: "b", 3: "c"} & {1, 3, 4} == {1: "a", 3: "c"}
Using the & operator:
dict_object & list_of_keys
would be equal to the following expression:
{key: value for key, value in dict_object .items() if key in list_of_keys}
Additionally, a similar option for omitting specified keys could be done with
another - (minus) operator against a list/tuple/set/frozenset of keys that
should be suppressed):
{1: "a", 2: "b", 3: "c"} - [3, 4] == {1: "a", 2: "b"}
{1: "a", 2: "b", 3: "c"} - {3, 4} == {1: "a", 2: "b"}
Using the - operator:
dict_object - list_of_keys
would be equal to following expression:
{key: value for key, value in dict_object .items() if key not in
list_of_keys}
Best,
Tomasz
I'm in favor of keeping the PEP as it currently is. Mappings are naturally
structural subtypes of one another, therefore mapping patterns should be
consistent with class patterns.
car = Car(...)
match car:
case Vehicle():
pass
case Car(): # will never match
pass
This example is analogous to the first one in the discussion. If Car is a
subclass of Vehicle, then Vehicle() is a more general pattern than Car()
and will always match despite the instance not being exactly of type
Vehicle. With mapping patterns it's exactly the same thing. You need to
match the more specific patterns first. Matching x and y is more general
than matching x, y and z.
One more thing. If we compare the proposed behavior to other languages the
most relevant example would be object destructuring in javascript:
const { 'x': x, 'y': y } = { 'x': 1, 'y': 2, 'z': 3 };
const { x, y } = { x: 1, y: 2, z: 3 }; // more common short form
Object destructuring only matches the specified fields. You can also match
the remaining fields but it's always explicit:
const { x, y, ...rest } = { x: 1, y: 2, z: 3 };
console.log(rest); // { z: 3 }
The pattern is widely adopted and the behavior generally lines up with
people's expectations.
I am working on a toolbox for computer-archaeology where old data media are "excavated" and presented on a web-page. (https://github.com/Datamuseum-DK/AutoArchaeologist for anybody who cares).
Since these data-media can easily sum tens of gigabytes, mmap and virtual memory is my weapons of choice and that has brought me into an obscure corner of python where few people seem to venture: I want to access the buffer-protocol from "userland".
The fundamental problem is that if I have a image of a disk and it has 2 partitions, I end up with the "mmap.mmap" object that mapped the raw disk image, and two "bytes" or "bytearray" objects, each containing one partition, for a total memory footprint of twice the size of the disk.
As the tool dives into the filesystems in the partitions and creates objects for the individual files in the filesystem, that grows to three times the size of the disk etc.
To avoid this, I am writing a "bytes-like" scatter-gather class (not yet committed), and that is fine as far as it goes.
If I want to write one of my scatter-gather objects to disk, I have to:
fd.write(bytes(myobj))
As a preliminary point, I think that is just wrong: A class with a __bytes__ method should satisfy any needs the buffer-protocol might have, so this should work:
fd.write(myobj)
But taking this a little bit further, I think __bytes__ should be allowed to be an iterator, provided the object also offers __len__, so that this would work:
class bar():
def __len__(self):
return 3
def __bytes__(self):
yield b'0'
yield b'1'
yield b'2'
open("/tmp/_", "wb").write(foo())
This example is of course trivial, but hav the yield statements hand out hundreds of megabytes, and the savings in time and malloc-space becomes very tangible.
Poul-Henning
Hello,
On Mon, 16 Nov 2020 08:39:30 +1100
Steven D'Aprano <steve(a)pearwood.info> wrote:
[]
> > The baseline of my version is much simpler:
> >
> > # This makes "const" a kind of hard keyword for this module
> > from __future__ import const
> >
> > FOO: const = 1 # obviously, this is constant
> Oh, well,
To start with, in the original thread I wanted to concentrate on issues
of the PEP634/PEP635, and whether these issues are prominent enough to
hope for them being addressed. So, I changed the subject (and we'd soon
be excused to python-ideas, indeed, I cc: there).
> if all it takes is to add a new keyword, then constants are
> easy!
A new annotation. And the new subject is "constants in Python: Starting
simple and gradually adding more", which hopefully should set the
theme. Please remember how we arrived here: it's from the fact that
PEP634 doesn't allow for the following trivial code:
MY_CONST = 1
match foo:
case MY_CONST:
We're considering (another alternative) how to address that. Please
don't make a leap towards supporting (at once) const-ness equivalent to
statically typed languages.
> No need to worry about how constantness affects name resolution,
> or how the syntax interacts with type-hints:
"const" is *the* type hint.
> spam: const: Widget = build_widget(*args)
That syntax is clearly invalid. And composability of type annotations
(aka type hints) is a known, looming issue. We now have
https://www.python.org/dev/peps/pep-0593/ , but in all fairness, it
seems like a stopgap measure rather than an elegant solution. (In full
fairness, entire "typing" module's annotations aren't very elegant,
but as we know, it's not a part of language core, but part of
CPython's stdlib. That means it's only *one* of possible annotation
schemes for *Python*).
So, under PEP593, the above would be written
spam: Annotated[Widget, const] = build_widget(*args)
If you want a glimpse of what alternatives might look, then: given that
"|" is going to be used for union types, why not try "&" for
composition?
spam: Widget & const = build_widget(*args)
But again, for "initial support of constants in Python, prompted by the
introduction of pattern matching facilities", we don't need to worry
about all that right away.
> # Maybe this is better?
> # let spam: Widget = build_widget(*args)
As you remember, at the beginning of my proposal, I wrote "The baseline
of my version ...". Baseline == level 0. For "level 2", I considered
const spam: Widget = ...
I don't think that "let" should be used. We're putting emphasis on
*constantness* (not lexical scoping of variables [immutable by
functional tradition, though that's "minor"]). JavaScript (now) has
both "const" and "let", and I think that's pretty reasonable approach.
So, let's save "let" for possible later uses.
So, what's "level 1"?
As I mentioned, under "level 0", "from __future__ import const" has a
magic meaning (akin to other "from __future__"'s). Under "level 1",
"const" is just a normal annotation symbol imported from a module. So,
following would be possible:
=========
from __future__ import const
FOO: const = 1
def fun():
const = 1 # Just a variable in function scope
def sub():
# Gimme powerz back
from __future__ import const
BAR: const = 2
# Back to annotation in global scope
BAZ: const = 3
=========
"level 0" should be implementable in CPython in ~1hr.
"level 1" is realistically what we should shoot for.
"level 2" (the dedicated keyword), I'm still not sure about. "const" is
very baseline annotation, and deserves a dedicated keyword. But the
underlying problem is composability of (arbitrary) annotations. I'd
rather keep searching for graal in that regard, before giving up and
introduce a dedicated thing just for "const".
> or how "constant" interacts with mutability:
>
> spam: const = []
> spam.append(99) # Allowed?
> spam.extend(items)
> spam.sort()
"const" is an annotation just like any other. And it affects *runtime*
in the same as any other annotation in CPython affects it: in no way.
"const" is however introduced as a hint for *compile-time*, so compiler
could make some simple inferences and maybe even optimizations based
on it.
> or for that matter, whether or not constants are actually constant.
>
> spam: const = 1
> spam = 2
Compiler can, and thus would, catch that as an error (warning for a
beta version?).
> If constants aren't actually constant, but just a tag on symbols,
> then you would be right, it probably would be trivially easy to add
> "constants" to Python.
Right.
> But I think most people agree making them behave as constants is a
> pretty important feature.
As mentioned, "const" is just an annotation like any other, except
compiler has some insight into it. Dealing with runtime is distant goal
for CPython. (Which is for now offloaded to static type checkers and
libraries/alternative Python implementations.)
> *The* most critical feature of all, really.
>
> Given a declared constant:
>
> # Module a.py
> x: const = None
>
> how do you prevent not just code in module `a` from rebinding the
> value:
Outside of the current scope of the discussion.
However, if you're interested in that topic, then I've implemented it in
my Python dialect, Pycopy (https://github.com/pfalcon/pycopy). It's on
my TODO to post an RFC to python-ideas for beating.
[]
--
Best regards,
Paul mailto:pmiscml@gmail.com
Sorry, forgot to use "reply to all"
---------- Forwarded message ---------
From: André Roberge <andre.roberge(a)gmail.com>
Date: Sat, Nov 14, 2020 at 11:06 AM
Subject: Re: [Python-ideas] Re: Global flag for whether a module is __main__
To: Steven D'Aprano <steve(a)pearwood.info>
On Sat, Nov 14, 2020 at 10:45 AM Steven D'Aprano <steve(a)pearwood.info>
wrote:
> On Sat, Nov 14, 2020 at 08:10:44AM -0400, André Roberge wrote:
>
> > > What if you import the `__main__` module? What does `__imported__` say
> > > now, and how do you check for "running as a script" if `__main__` has
> > > imported itself -- or some other module has imported it?
> > >
> >
> > Running a module (no matter what its name is) from a command line would
> set
> > __imported__ to False for that module.
> > Using import some_module (or __import__("some_module")) would set
> > some_module.__imported__ to True.
>
> Do you understand that a module can be both run and imported at the
> same time?
>
>
> # example.py
> import __main__
> print(__main__.__file__)
>
> As others have mentioned, many beginners are thoroughly confused by the
meaning of the idiom
if __name__ == "__main__":
...
The idea behind the name __imported__ (and, I gather, somewhat similar to
the original suggestion of __main__ that started this thread) is to reduce
such confusion.
For what I had in mind, the semantic would be the same as though the
following was inserted at the top of the module:
__imported__ = True
if __name__ == "__main__":
__imported__ = False
André
> If you save that snippet as "example.py", and then run it:
>
> python3 example.py
>
> you have an example of a module that is being run and imported
> simultaneously.
>
>
> --
> Steve
> _______________________________________________
> Python-ideas mailing list -- python-ideas(a)python.org
> To unsubscribe send an email to python-ideas-leave(a)python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/MK4ZO…
> Code of Conduct: http://python.org/psf/codeofconduct/
>
Currently, the simplest and most idiomatic way to check whether a module was
run as a script rather than imported is:
if __name__ == "__main__":
People generally learn this by rote memorization, because users often want the
ability to add testing code or command line interfaces to their modules before
they understand enough about Python's data model to have any idea why this
works. Understanding what's actually happening requires you to know that:
1. the script you ask Python to run is technically a module,
2. every module has a unique name assigned to it,
3. a module's `__name__` global stores this unique import name,
4. and "__main__" is a magic name for the initial script's module.
A new (writable) global attribute called `__main__` would simplify this case,
allowing users to simply test
if __main__:
It would behave as though
__main__ = (__name__ == "__main__")
is executed in each module's namespace before executing it.
Because this would be writable, I don't see any backwards compatibility issues.
It wouldn't negatively impact any modules that might already be defining
`__main__` (for example, by doing `import __main__`). They'd simply redefine it
and go on using the `__main__` module as they always have. And a package with
a `__main__.py` does not have a `__main__` attribute.
It would be easier to teach, easier to learn, and easier to memorize, and
a nice simplification for users at the cost of only very slightly more
complexity in the data model.