Hi folks,
I normally wouldn't bring something like this up here, except I think
that there is possibility of something to be done--a language
documentation clarification if nothing else, though possibly an actual
code change as well.
I've been having an argument with a colleague over the last couple
days over the proper way order of statements when setting up a
try/finally to perform cleanup of some action. On some level we're
both being stubborn I think, and I'm not looking for resolution as to
who's right/wrong or I wouldn't bring it to this list in the first
place. The original argument was over setting and later restoring
os.environ, but we ended up arguing over
threading.Lock.acquire/release which I think is a more interesting
example of the problem, and he did raise a good point that I do want
to bring up.
</prologue>
My colleague's contention is that given
lock = threading.Lock()
this is simply *wrong*:
lock.acquire()
try:
do_something()
finally:
lock.release()
whereas this is okay:
with lock:
do_something()
Ignoring other details of how threading.Lock is actually implemented,
assuming that Lock.__enter__ calls acquire() and Lock.__exit__ calls
release() then as far as I've known ever since Python 2.5 first came
out these two examples are semantically *equivalent*, and I can't find
any way of reading PEP 343 or the Python language reference that would
suggest otherwise.
However, there *is* a difference, and has to do with how signals are
handled, particularly w.r.t. context managers implemented in C (hence
we are talking CPython specifically):
If Lock.__enter__ is a pure Python method (even if it maybe calls some
C methods), and a SIGINT is handled during execution of that method,
then in almost all cases a KeyboardInterrupt exception will be raised
from within Lock.__enter__--this means the suite under the with:
statement is never evaluated, and Lock.__exit__ is never called. You
can be fairly sure the KeyboardInterrupt will be raised from somewhere
within a pure Python Lock.__enter__ because there will usually be at
least one remaining opcode to be evaluated, such as RETURN_VALUE.
Because of how delayed execution of signal handlers is implemented in
the pyeval main loop, this means the signal handler for SIGINT will be
called *before* RETURN_VALUE, resulting in the KeyboardInterrupt
exception being raised. Standard stuff.
However, if Lock.__enter__ is a PyCFunction things are quite
different. If you look at how the SETUP_WITH opcode is implemented,
it first calls the __enter__ method with _PyObjet_CallNoArg. If this
returns NULL (i.e. an exception occurred in __enter__) then "goto
error" is executed and the exception is raised. However if it returns
non-NULL the finally block is set up with PyFrame_BlockSetup and
execution proceeds to the next opcode. At this point a potentially
waiting SIGINT is handled, resulting in KeyboardInterrupt being raised
while inside the with statement's suite, and finally block, and hence
Lock.__exit__ are entered.
Long story short, because Lock.__enter__ is a C function, assuming
that it succeeds normally then
with lock:
do_something()
always guarantees that Lock.__exit__ will be called if a SIGINT was
handled inside Lock.__enter__, whereas with
lock.acquire()
try:
...
finally:
lock.release()
there is at last a small possibility that the SIGINT handler is called
after the CALL_FUNCTION op but before the try/finally block is entered
(e.g. before executing POP_TOP or SETUP_FINALLY). So the end result
is that the lock is held and never released after the
KeyboardInterrupt (whether or not it's handled somehow).
Whereas, again, if Lock.__enter__ is a pure Python function there's
less likely to be any difference (though I don't think the possibility
can be ruled out entirely).
At the very least I think this quirk of CPython should be mentioned
somewhere (since in all other cases the semantic meaning of the
"with:" statement is clear). However, I think it might be possible to
gain more consistency between these cases if pending signals are
checked/handled after any direct call to PyCFunction from within the
ceval loop.
Sorry for the tl;dr; any thoughts?
Hi,
For technical reasons, many functions of the Python standard libraries
implemented in C have positional-only parameters. Example:
-------
$ ./python
Python 3.7.0a0 (default, Feb 25 2017, 04:30:32)
>>> help(str.replace)
replace(self, old, new, count=-1, /) # <== notice "/" at the end
...
>>> "a".replace("x", "y") # ok
'a'
>>> "a".replace(old="x", new="y") # ERR!
TypeError: replace() takes at least 2 arguments (0 given)
-------
When converting the methods of the builtin str type to the internal
"Argument Clinic" tool (tool to generate the function signature,
function docstring and the code to parse arguments in C), I asked if
we should add support for keyword arguments in str.replace(). The
answer was quick: no! It's a deliberate design choice.
Quote of Yury Selivanov's message:
"""
I think Guido explicitly stated that he doesn't like the idea to
always allow keyword arguments for all methods. I.e. `str.find('aaa')`
just reads better than `str.find(needle='aaa')`. Essentially, the idea
is that for most of the builtins that accept one or two arguments,
positional-only parameters are better.
"""
http://bugs.python.org/issue29286#msg285578
I just noticed a module on PyPI to implement this behaviour on Python functions:
https://pypi.python.org/pypi/positional
My question is: would it make sense to implement this feature in
Python directly? If yes, what should be the syntax? Use "/" marker?
Use the @positional() decorator?
Do you see concrete cases where it's a deliberate choice to deny
passing arguments as keywords?
Don't you like writing int(x="123") instead of int("123")? :-) (I know
that Serhiy Storshake hates the name of the "x" parameter of the int
constructor ;-))
By the way, I read that "/" marker is unknown by almost all Python
developers, and [...] syntax should be preferred, but
inspect.signature() doesn't support this syntax. Maybe we should fix
signature() and use [...] format instead?
Replace "replace(self, old, new, count=-1, /)" with "replace(self,
old, new[, count=-1])" (or maybe even not document the default
value?).
Python 3.5 help (docstring) uses "S.replace(old, new[, count])".
Victor
I just spent a few minutes staring at a bug caused by a missing comma
-- I got a mysterious argument count error because instead of foo('a',
'b') I had written foo('a' 'b').
This is a fairly common mistake, and IIRC at Google we even had a lint
rule against this (there was also a Python dialect used for some
specific purpose where this was explicitly forbidden).
Now, with modern compiler technology, we can (and in fact do) evaluate
compile-time string literal concatenation with the '+' operator, so
there's really no reason to support 'a' 'b' any more. (The reason was
always rather flimsy; I copied it from C but the reason why it's
needed there doesn't really apply to Python, as it is mostly useful
inside macros.)
Would it be reasonable to start deprecating this and eventually remove
it from the language?
--
--Guido van Rossum (python.org/~guido)
Hi,
This mail is the consequence of a true story, a story where CPython
got defeated by Javascript, Java, C# and Go.
One of the teams of the company where Im working had a kind of
benchmark to compare the different languages on top of their
respective "official" web servers such as Node.js, Aiohttp, Dropwizard
and so on. The test by itself was pretty simple and tried to test the
happy path of the logic, a piece of code that fetches N rules from
another system and then apply them to X whatevers also fetched from
another system, something like that
def filter(rule, whatever):
if rule.x in whatever.x:
return True
rules = get_rules()
whatevers = get_whatevers()
for rule in rules:
for whatever in whatevers:
if filter(rule, whatever):
cnt = cnt + 1
return cnt
The performance of Python compared with the other languages was almost
x10 times slower. It's true that they didn't optimize the code, but
they did not for any language having for all of them the same cost in
terms of iterations.
Once I saw the code I proposed a pair of changes, remove the call to
the filter function making it "inline" and caching the rule's
attributes, something like that
for rule in rules:
x = rule.x
for whatever in whatevers:
if x in whatever.x:
cnt += 1
The performance of the CPython boosted x3/x4 just doing these "silly" things.
The case of the rule cache IMHO is very striking, we have plenty
examples in many repositories where the caching of none local
variables is a widely used pattern, why hasn't been considered a way
to do it implicitly and by default?
The case of the slowness to call functions in CPython is quite
recurrent and looks like its an unsolved problem at all.
Sure I'm missing many things, and I do not have all of the
information. This mail wants to get all of this information that might
help me to understand why we are here - CPython - regarding this two
slow patterns.
This could be considered an unimportant thing, but its more relevant
than someone could expect, at least IMHO. If the default code that you
can write in a language is by default slow and exists an alternative
to make it faster, this language is doing something wrong.
BTW: pypy looks like is immunized [1]
[1] https://gist.github.com/pfreixes/d60d00761093c3bdaf29da025a004582
--
--pau
Hi! I joined this list because I'm interested in filling a gap in Python's
standard library, relating to text encodings.
There is an encoding with no name of its own. It's supported by every
current web browser and standardized by WHATWG. It's so prevalent that if
you ask a Web browser to decode "iso-8859-1" or "windows-1252", you will
get this encoding _instead_. It is probably the second or third most common
text encoding in the world. And Python doesn't quite support it.
You can see the character table for this encoding at:
https://encoding.spec.whatwg.org/index-windows-1252.txt
For the sake of discussion, let's call this encoding "web-1252". WHATWG
calls it "windows-1252", but notice that it's subtly different from
Python's "windows-1252" encoding. Python's windows-1252 has bytes that are
undefined:
>>> b'\x90'.decode('windows-1252')
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 0:
character maps to <undefined>
In web-1252, the bytes that are undefined according to windows-1252 map to
the control characters in those positions in iso-8859-1 -- that is, the
Unicode codepoints with the same number as the byte. In web-1252, b'\x90'
would decode as '\u0090'.
This may seem like a silly encoding that encourages doing horrible things
with text. That's pretty much the case. But there's a reason every Web
browser implements it:
- It's compatible with windows-1252
- Any sequence of bytes can be round-tripped through it without losing
information
It's not just this one encoding. WHATWG's encoding standard (
https://encoding.spec.whatwg.org/) contains modified versions of
windows-1250 through windows-1258 and windows-874.
Support for these encodings matters to me, in part, because I maintain a
Unicode data-cleaning library, "ftfy". One thing it does is to detect and
undo encoding/decoding errors that cause mojibake, as long as they're
detectible and reversible. Looking at real-world examples of text that has
been damaged by mojibake, it's clear that lots of text is transferred
through what I'm calling the "web-1252" encoding, in a way that's
incompatible with Python's "windows-1252".
In order to be able to work with and fix this kind of text, ftfy registers
new codecs -- and I implemented this even before I knew that they were
standardized in Web browsers. When ftfy is imported, you can decode text as
"sloppy-windows-1252" (the name I chose for this encoding), for example.
ftfy can tell people a sequence of steps that they can use in the future to
fix text that's like the text they provided. Very often, these steps
require the sloppy-windows-1252 or sloppy-windows-1251 encoding, which
means the steps only work with ftfy imported, even for people who are not
using the features of ftfy.
Support for these encodings also seems highly relevant to people who use
Python for web scraping, as it would be desirable to maximize compatibility
with what a Web browser would do.
This really seems like it belongs in the standard library instead of being
an incidental feature of my library. I know that code in the standard
library has "one foot in the grave". I _want_ these legacy encodings to
have one foot in the grave. But some of them are extremely common, and
Python code should be able to deal with them.
Adding these encodings to Python would be straightforward to implement.
Does this require a PEP, a pull request, or further discussion?
The proposed implementation of dataclasses prevents defining fields with
defaults before fields without defaults. This can create limitations on
logical grouping of fields and on inheritance.
Take, for example, the case:
@dataclass
class Foo:
some_default: dict = field(default_factory=dict)
@dataclass
class Bar(Foo):
other_field: int
this results in the error:
5 @dataclass
----> 6 class Bar(Foo):
7 other_field: int
8
~/.pyenv/versions/3.6.2/envs/clover_pipeline/lib/python3.6/site-packages/dataclasses.py
in dataclass(_cls, init, repr, eq, order, hash, frozen)
751
752 # We're called as @dataclass, with a class.
--> 753 return wrap(_cls)
754
755
~/.pyenv/versions/3.6.2/envs/clover_pipeline/lib/python3.6/site-packages/dataclasses.py
in wrap(cls)
743
744 def wrap(cls):
--> 745 return _process_class(cls, repr, eq, order, hash, init,
frozen)
746
747 # See if we're being called as @dataclass or @dataclass().
~/.pyenv/versions/3.6.2/envs/clover_pipeline/lib/python3.6/site-packages/dataclasses.py
in _process_class(cls, repr, eq, order, hash, init, frozen)
675 # in __init__. Use "self" if
possible.
676 '__dataclass_self__' if 'self' in
fields
--> 677 else 'self',
678 ))
679 if repr:
~/.pyenv/versions/3.6.2/envs/clover_pipeline/lib/python3.6/site-packages/dataclasses.py
in _init_fn(fields, frozen, has_post_init, self_name)
422 seen_default = True
423 elif seen_default:
--> 424 raise TypeError(f'non-default argument {f.name!r} '
425 'follows default argument')
426
TypeError: non-default argument 'other_field' follows default argument
I understand that this is a limitation of positional arguments because the
effective __init__ signature is:
def __init__(self, some_default: dict = <something>, other_field: int):
However, keyword only arguments allow an entirely reasonable solution to
this problem:
def __init__(self, *, some_default: dict = <something>, other_field: int):
And have the added benefit of making the fields in the __init__ call
entirely explicit.
So, I propose the addition of a keyword_only flag to the @dataclass
decorator that renders the __init__ method using keyword only arguments:
@dataclass(keyword_only=True)
class Bar(Foo):
other_field: int
--George Leslie-Waksman
In South Asia, a different style of digit delimiters for large numbers is
used than in Europe, North America, Australia, etc. With some minor
spelling differences, the term lakh is used for a hundred-thousand, and it
is generally written as '1,00,000'.
In turn, a crore is 100 lakh, and is written as '1,00,00,000'. Extending
this pattern, larger numbers continue to use two digits in groups (other
than the smallest grouping of three digits. So, e.g. 1e12 is written
as 10,00,00,00,00,000.
It's nice that we now have the optional underscore in numeric literals. So
we could write a number as either `12_34_56_78_00_000` or
`1_234_567_800_000` depending on what region of the world and which
convention was more familiar.
However, in *formatting* those numbers, the format mini-language only
allows the European convention. So e.g.
In [1]: x = 12_34_56_78_00_000
In [2]: "{:,d}".format(x)
Out[2]: '1,234,567,800,000'
In [3]: f"{x:,d}"
Out[3]: '1,234,567,800,000'
In order to get Indian number delimiters, you'd have to write a custom
formatting function, notwithstanding that something like 1.5 billion people
use the three-then-two delimiting convention.
I propose that Python should have an additional grouping option, or some
other way to specify this grouping convention. Oddly, the '_' grouping
symbol is available, even though no one actually uses that grouper outside
of programming languages like Python, e.g.:
In [4]: f"{x:_d}"
Out[4]: '1_234_567_800_000'
I guess this is nice for something like round-tripping numbers used in
code, but it's not a symbol anyone uses "natively" (I understand why comma
or period cannot be used in numeric literals since they mean something else
in Python already).
I'm not sure what symbol or combination I would recommend, but finding
something suitable shouldn't be so hard. Perhaps now that backtick no
longer has any other meaning in Python, it could be used since it looks
similar to a comma. E.g. in Python 3.8 we might have:
>>> f"{x:`d}"
'12,34,56,78,00,000'
(actually, this probably isn't any parser issue even in Python 2 since it's
already inside quotes; but the issue is moot).
Or maybe a two character version like:
>>> f"{x:2,d}"
'12,34,56,78,00,000'
Or:
>>> f"{x:,,d}"
'12,34,56,78,00,000'
Even if `2,` was used, that wouldn't preclude giving an additional length
descriptor after it. Now we can have:
>>> f"{x:,.2f}"
'1,234,567,800,000.00'
Perhaps in the future this would work:
>>> f"{x:2,.2f}"
'12,34,56,78,00,000.00'
--
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons. Intellectual property is
to the 21st century what the slave trade was to the 16th.
Hi.
Currently, int(), str.isdigit(), str.isalnum(), etc... accepts
non-ASCII strings.
>>> s = 123"
>>> s
'123'
>>> s.isdigit()
True
>>> print(ascii(s))
'\uff11\uff12\uff13'
>>> int(s)
123
But sometimes, we want to accept only ascii string. For example,
ipaddress module uses:
_DECIMAL_DIGITS = frozenset('0123456789')
...
if _DECIMAL_DIGITS.issuperset(str):
ref: https://github.com/python/cpython/blob/e76daebc0c8afa3981a4c5a8b54537f756...
If str has str.isascii() method, it can be simpler:
`if s.isascii() and s.isdigit():`
I want to add it in Python 3.7 if there are no opposite opinions.
Regrads,
--
INADA Naoki <songofacandy(a)gmail.com>
In my work I make a lot of use of struct pack & unpack but sometimes
find the fact that I either have to supply the whole format string or
add position tracking mechanisms to my code explicitly so as to use
pack_to or unpack_from if the structures that I am dealing with are not
simple arrays of a single type. In my particular use case of de/encoding
messages from a remote device that is sometimes forwarding on collected
information to & from the devices that it controls the endianness is not
always consistent between message elements in a single message.
It would be very nice if:
a) iter_unpack.next produced an iterator that optionally took an
additional parameter of a format string, (with endianness permitted), to
replace the current format string in use.
b) there was a matching iter_pack function that the next method took
either more data to pack into the buffer extending it or a new format
string, either as a named parameter or with an exposed set_fmt method on
the packer.
This would allow my specific use case and the ones where a buffer or
stream is constructed of blocks consisting of a byte or word that
specifies the data type(s) to follow (and sometimes a count) followed by
the actual data.
--
Steve (Gadget) Barnes
Any opinions in this message are my personal opinions and do not reflect
those of my employer.
Hello,
Some time ago, I set up some logging using stdout in a program with the
`stdout_redirected()` context manager, which had to close and reopen
stdout to work.
Unsurprisingly, the StreamHandler didn't take it well.
So I made a Handler class which is able to reload the stream (AKA get
the new sys.stdout) whenever the old one wasn't writable.
But there might be some more legit use cases for stubborn StreamHandlers
like that (that are not ugly-looking, hopefully temporary patches).
The way I see it for now is a StreamHandler subclass which, instead of
having a `stream` argument, would have |`getStream`|,|`reloadStream`|
and |`location`|, and would be used this way:
On initialisation, it would load the stream object, then act as a
regular StreamHandler, but checking that the stream is writable at each
`handler.emit()` call. If it is not, then reload it.
If given, the |`getStream`| (a callable object which returns a
ready-to-use stream) argument is used to load/reload the underlying
stream, else it would be fetched at the location described by
`location`, and, if it is still not writable, call |`reloadStream()`
|(which should put a usable stream object at |`location`|) then try to
fetch it again.
Here is the current implementation I have:
|||```|
from .config import _resolve as resolve # will (uglily) be used later
|class ReloadingHandler(StreamHandler):|
| """|
| A stream handler which reloads the stream object from one place if
an error occurs|
| """|
| def __init__(self, getStream=None, reloadStream=None, location=None):|
| """|
| Initialize the handler.|
||
| If stream is not specified, sys.stderr is used.|
| """
self.getStream = getStream
self.stream = None # to be overwritten later
if getStream is None:|
|| if location is None:
self.location = 'sys.stderr' # note the lack of 'ext://'
self.reloadStream = None
else:
| self.reloadStream = reloadStream
self.location = location
stream = self.reload() # gets the stream
StreamHandler.__init__(self, stream||)
||||
def reload(self):
if self.getStream is not None:
stream = self.getStream()
else:
try:
stream = resolve(self.location)
exc = None
except Exception as err:
exc = err # is this really needed?
stream = None # just retry for now
if stream is None or not stream.writable():
if self.reloadStream is None:
if exc:
raise exc
else:
raise ValueError("ReloadingHandler couldn't
reload a valid stream")
stream = resolve(self.location) # if it fails this
time, do not catch exception here
return stream
|||
| def emit(self, record):|
| """|
| Emit a record.|
||
| If a formatter is specified, it is used to format the record.|
| The record is then written to the stream with a trailing
newline. If|
| exception information is present, it is formatted using|
| traceback.print_exception and appended to the stream. If the
stream|
| has an 'encoding' attribute, it is used to determine how to do the|
| output to the stream.|
| """|
| if not self.stream.writable():|
| self.reload()|
| StreamHandler.emit(self, record)|
|||```
|
What do you think? (about the idea, the implementation, and the way I
wrote this email)|
|
---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus