So when putting together my enum implementation I came up against something
I've run into a couple of times in a few different situations, and I
thought warranted some discussion as its own proposal: There are times
where it would be extremely convenient if there was a way when unpacking an
object (using the "a, b, c = foo" notation) for that object to know how
many values it needed to provide.
I would like to propose the following:
When the unpack notation is used, the interpreter will automatically look
for an __unpack__ method on the rvalue, and if it exists will call it as:
rvalue.__unpack__(min_count, max_count)
... and use the returned value as the iterable to unpack into the variables
instead of the original object. (If no __unpack__ method exists, it will
just use the rvalue itself, as it does currently)
In the case of simple unpacking ("a, b, c = foo"), min_count and max_count
will be the same value (3). In the case of extended unpacking ("a, b, c,
*d = foo"), min_count would be the smallest length that will not cause an
unpacking exception (3, in this case), and max_count would be None.
By effectively separating the notion of "unpackable" from "iterable", this
would allow for a few useful things:
1. It would allow otherwise non-iterable objects to support the unpack
notation if it actually makes sense to do so
2. It would allow for "smart" unpackable objects (such as __ in my enum
example), which would "just work" no matter how many values they're
required to produce.
3. It would allow (properly designed) infinite (or
unknown-possibly-large-size) iterators to be unpackable into a finite
sequence of variables if desired (which can only be done with extra
special-case code currently)
4. It could make backwards-compatibility easier for APIs which return
iterables (i.e. if you want to change a function that used to return a
3-tuple to return a 5-tuple, instead of inherently breaking all existing
code that chose to unpack the return value (assuming only 3 items would
ever be returned), you could return an unpackable object which will work
correctly with both old and new code)
Thoughts?
--Alex
Dear python enthousiasts,
While replacing some of my bash scripts by python scripts, I found the
following useful. I wrote a small utility module that allows piping
input and output in a very similar way to how you do it in bash. The
major advantage over just using shell=True is that this does not
expose us to the latter's security risks and quoting hazards.
The major advantage of using a pipe instead of just manually feeding
the output of one process to the other is that with a pipe, it doesn't
all have to fit into memory at once. However, the Python way of doing
that is rather cumbersome:
>>> from subprocess import Popen, PIPE
>>> p1 = Popen(['echo', 'a\nb'], stdout=PIPE)
>>> p2 = Popen([head], '-n1'], stdin=p1.stdout, stdout=PIPE)
>>> p2.communicate()
('a\n', None)
I've been able to replace this by:
>>> from pipeutils import call
>>> (call('echo', 'a\nb') | call('head', '-n1')).output()
'a\n'
And similarly, I can do direct file redirects like this:
>>> (call('echo', 'Hello world!') > 'test.txt').do()
0
>>> (call('cat') < 'test.txt').output()
'Hello world!\n'
I think that this is a lot more concise and readable. As far as I'm
concerned, this was the only reason I would ever use bash scripts
instead of python scripts.
I would love to hear opinions on this. How would people like this as a library?
The source is online at https://bitbucket.org/tkluck/pipeutils
Best, Timo
An iterator iter represents the remainder of some collection, concrete
or not, finite or not. If the remainder is not empty, its .__next__
method selects one object, mutates iter to represent the reduced
remainder (now possibly empty), and returns the one item.
At various times, people have asked for the ability to determine whether
an iterator is exhausted, whether a next() call will return or raise. If
it will return, people have also asked for the ability to peek at what
the return would be, without disturbing what the return will be. For
instance, on the 'iterable.__unpack__ method' Alex Stewart today wrote:
> The problem is that there is no standard way to query an iterator
> to find out if it has more data available without automatically
> consuming that next data element in the rocess.
It turns out that there is a solution that gives the ability to both
test emptiness (in the standard way) and peek ahead, without modifying
the iterator protocol. It merely require a wrapper iterator, much like
the ones in itertools. I have posted one before and give my current
version below.
Does anyone else this should be added to itertools? It seems to not be
completely obvious to everyone, is more complex that some of the
existing itertools, and cannot be composed from them either. (Nor can it
be written as a generator function.)
Any of the names can be changed. Perhaps the class should be 'peek' and
the lookahead object something else. The sentinel should be read-only if
possible. I considered whether the peek object should be read-only, but
someone would say that they *want* be able to replace the next object to
be yielded. Peeking into an exhausted iterable could raise instead of
returning the sentinel, but I don't know if that would be more useful.
----------------
class lookahead():
"Wrap iterator with lookahead to both peek and test exhausted"
_NONE = object()
def __init__(self, iterable):
self._it = iter(iterable)
self._set_peek()
def __iter__(self):
return self
def __next__(self):
ret = self.peek
self._set_peek()
return ret
def _set_peek(self):
try:
self.peek = next(self._it)
except StopIteration:
self.peek = self._NONE
def __bool__(self):
return self.peek is not self._NONE
def test_lookahead():
it = lookahead('abc')
while it:
a = it.peek
b = next(it)
print('next:', b, '; is peek:', a is b )
test_lookahead()
--------------------
>>>
next: a ; is peek: True
next: b ; is peek: True
next: c ; is peek: True
--
Terry Jan Reedy
Context: http://bugs.python.org/issue16942 (my patch, changing
FileCookieJar to be an abc, defining the interfaces for
*FileCookieJar).
This pertains to Terry's question about whether or not it makes sense
that an abstract base class extends a concrete class. After putting in
a little thought, he's right. It doesn't make sense.
After further thought, I'm relatively confident that the hierarchy as
it stands should be changed. Currently what's implemented in the
stdlib looks like this:
CookieJar
|
FileCookieJar
| |
| MozillaCookieJar
LWPCookieJar
What I'm proposing is that the structure is broken to be the following:
FileCookieJarProcessor CookieJar
| |
| MozillaCookieJarProcessor
LWPCookieJarProcessor
The intention here is to have processors that operate /on/ a cookiejar
object via composition rather than inheritance. This aligns with how
urllib.request.HTTPCookieProcessor works (which has the added bonus of
cross-module consistency).
The only attributes that concrete FileCookieJarProcessor classes touch
(at least, in the stdlib) are _cookies and _cookies_lock. I have mixed
feelings about whether these should continue to be noted as
"non-public" with the _ prefix or not as keeping the _ would break
convention of operating on non-public fields, but am unsure of the
ramifications of changing them to public.
Making this change then allows for FileCookieJar(Processor) to be an
abstract base class without inheriting from CookieJar which doesn't
make a whole lot of sense from an architecture standpoint.
I have yet to see what impact these changes have to the cookiejar
extensions at http://wwwsearch.sf.net but plan on doing so if this
approach seems sound.
This will obviously break backwards compatibility, so I'm not entirely
sure what best practice is around that: leave well enough alone even
though it might not make sense, keep the old implementations around
and deprecate them to be eventually replaced by the processors, or
other ideas?
--
Demian Brecht
http://demianbrecht.github.com
I've been noticing a lot of security-related issues being discussed in
the Python world since the Ruby YAML problemcame out. Is it time to
consider adding an alternative to pickle that is safe(r) by default?
Pickle is usable in situations few other things are, because it can
handle cyclic references and virtually any python object. The only
stdlib alternative I'm aware of is json, which can do neither of those
things. (Or at least, not without significant extra serialization
code.) I would imagine that any alternative supplied should be easy
enough to use that pickle users would seriously consider switching,
and include at least those features.
The benefit of using a secure alternative to pickle is that it
increases the difficulty of creating an insecure application, even for
those that are aware of the risks of the pickle module. With the
pickle module, you are one mistake away from an insecure program: all
you need is to have a way for the attacker to influence input to
pickle. With a secure alternative, even if you make that mistake, it
doesn't immediately result in a compromised application. You would
need another mistake on top of that that results in the deserialized
input being used improperly.
The only third party library I'm aware of that attempts to be a
safe/usable pickle replacement is cerealizer[1]_. Would it be
worth considering adding cerealizer, or something like it, to the
stdlib?
.. [1]: http://home.gna.org/oomadness/en/cerealizer/index.html
-- Devin
I understand it's still beta but anyways, here's my little wish list for Tulip.
* provide a 'demo' / 'example' directory containing very simple
scripts showing the most basic usages such as:
- echo_tcp_client.py
- echo_tcp_server.py
- echo_tcp_server_w_timeout.py (same as echo_server.py but also
disconnects the client after a certain time of inactivity)
- echo_tcp_ssl_client.py
- echo_tcp_ssl_server.py
- echo_udp_client.py
- echo_udp_server.py
* move all *test*.py scripts in a separate 'test' directory
* if it's not part of the API intended to be public move
tulip/http_client.py elsewhere ('examples'/'demo' or a brand new
'scripts'/'tools' directory)
* (minor) same for check.py, crawl.py, curl.py, sslsrv.py which looks
like they belong elsewhere
* write a simple benchmark framework testing (at least) sending,
receiving and the internal scheduler (I'd like to help with this one)
--- Giampaolo
http://code.google.com/p/pyftpdlib/http://code.google.com/p/psutil/http://code.google.com/p/pysendfile/
Ok, so at the risk of muddying the waters even more, I've put together yet
another possible way to do enums, and would be interested to hear comments..
Based on the list of required/desired/undesired properties laid out in the
enum PEP thread, I believe a lot of the back and forth to date has been due
to the fact that all of the proposed implementations fall short of
fulfilling at least some of the desired properties for a general-purpose
enum implementation in different ways. I've put together something based
on ideas from Tim's, Barry's, other things thrown out in discussion, and a
few of my own which I think comes closer to ticking off most of the boxes.
(The code and some examples are available at
https://github.com/foogod/pyenum)
Enums/groups are defined by creating subclasses of Enum, similarly to
Barry's implementation. The distinction is that the base "Enum" class does
not have any associated values (they're just singleton objects with names).
At a basic level, the values themselves are defined like so:
class Color (Enum):
RED = __
GREEN = __
BLUE = __
As has been (quite correctly) pointed out before, the single-underscore (_)
has a well-established meaning in most circles related to gettext/etc, so
for this application I picked the next best thing, the double-underscore
(__) instead. I think this is reasonably mnemonic as a "fill in the blank"
placeholder, and also not unduly verbose. One advantage of using __ over,
say, ellipsis (...) is that since it involves a name resolution, we can add
(just a little!) magic to generate distinct __ objects on each reference,
so, for example, the following actually works as the user probably expects
it to:
class Color (Enum):
RED = CRIMSON = __
BLUE = __
(RED and CRIMSON actually become aliases referring to the same enum value,
but BLUE is a different enum value)
One other rather notable advantage to __ is that we can define a special
multiplication behavior for it which allows us to make the syntax much more
compact:
class Color (Enum):
RED, GREEN, BLUE, ORANGE, VIOLET, BEIGE, ULTRAVIOLET, ULTRABEIGE = __ *
8
(as an aside, I have an idea about how we might be able to get rid of the
"* 8" altogether, but it requires a different (I think generally useful)
change to the language which I will propose separately)
Each enum value is actually an instance of its defining class, so you can
determine what type of enum something is with simple inheritance checks:
>>> isinstance(Color.RED, Color)
True
For enums which need to have int values, we can use IntEnum instead of Enum:
class Errno (IntEnum):
EPERM = 1
ENOENT = 2
ESRCH = 3
EINTR = 4
(Technically, IntEnum is just a special case of the more generic TypeEnum
class:
class IntEnum (TypeEnum, basetype=int):
pass
..which means that theoretically you could define enums based on (almost)
any base type (examples.py has an example using floats))
Anyway, as expected, enums have reasonable strs/reprs, and can be easily
translated to/from base values using traditional "casting" syntax:
>>> Errno.EPERM
<__main__.Errno.EPERM (1)>
>>> str(Errno.EPERM)
'EPERM'
>>> int(Errno.EPERM)
1
>>> Errno(1)
<__main__.Errno.EPERM (1)>
You can also lookup enums by name using index notation:
>>> Errno['EPERM']
<__main__.Errno.EPERM (1)>
It's actually worth noting here that TypeEnums are actually subclasses of
their basetype, so IntEnums are also ints, and can be used as drop-in
replacements for any existing code which is expecting an int argument.
They can also be compared directly as if they were ints:
if exc.errno == Errno.EPERM:
do_something()
TypeEnums enforce uniqueness:
>>> class Foo (IntEnum):
... A = 1
... B = 1
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
...
ValueError: Duplicate enum value: 1
However, you can define "aliases" for enums, if you want to:
>>> class Foo (IntEnum):
... A = 1
... B = A
...
>>> Foo.A
<__main__.Foo.A (1)>
>>> Foo.B
<__main__.Foo.A (1)>
Of course, pure (valueless) Enum instances are logical singletons (i.e.
they only compare equal to themselves), so there's no potential for
duplication there.
Finally, a special feature of __ is that it can also be called with
parameters to create enum values with docstrings or other metadata:
>>> class Errno (IntEnum):
... EPERM = __(1, doc="Permission denied")
...
>>> Errno.EPERM.__doc__
'Permission denied'
>>> class Color (Enum):
... RED = __(doc="The color red", rgb=(1., 0., 0.))
...
>>> Color.RED.__doc__
'The color red'
>>> Color.RED.rgb
(1.0, 0.0, 0.0)
A few other notable properties:
- Enum values are hashable, so they can be used as dict keys, etc.
- Though enum values will compare equal to their associated basetype
values (Errno.EPERM == 1), enum values from different classes never compare
equal to each other (Foo.A != Errno.EPERM). This prevents enum-aware code
from accidentally mistaking one enum for a completely different enum (with
a completely different meaning) simply because they map to the same int
value.
- Using the class constructor does not create new enums, it just returns
the existing singleton enum associated with the passed value (Errno(1)
returns Errno.EPERM, not a new Errno). If there is no predefined enum
matching that value, a ValueError is raised, thus using the constructor is
basically a "cast" operation, and only works within the supported range of
values. (New enum values can be created using the .new() method on the
class, however, if desired)
So... thoughts?
--Alex
Hi all,
The ideas thrown around and Tim's prototype in the thread "constant/enum
type in stdlib" show that it's possible to use some metaclass magic to
implement very convenient syntax for enums, while only including them in
the stdlib (no language changes):
from enum import Enum
class Color(Enum):
RED, BLUE, GREEN
I'm, for one, convinced that with this in hand we should *not* add special
syntax for enums, since the gains would be minimal over the above proposal.
While the parallel thread (and Tim's Bitbucket issue tracker) discusses his
proposed implementation, I think it may be worthwhile to start from the
other direction by writing a PEP that aims to include this in 3.4. The way
I see it, the PEP should discuss and help us settle upon a minimal set of
features deemed important in an enum.
If that sounds OK to people, then someone should write that PEP :-) This
could be Tim, or Barry who's been maintaining flufl.enum for a long time.
If no one steps up, I will gladly do it.
Thoughts?
Eli
Hi all,
Sorry to jump into this discussion so late, but I just finished reading
through this thread and had a few thoughts..
(Sorry this message is a bit long. TL;DR: Please check the list of
required/desired/undesired properties at the end and let me know if I've
gotten anything seriously wrong with my interpretation of the discussion
thus far)
It seems to me that this particular thread started out as a call to step
away from existing implementations and take a look at this issue from the
direction of "what do we want/need" instead, but then it quickly got
sidetracked back into discussing all the details of various
existing/proposed implementations. I'd like to try to take a step back
(again) for a minute and raise the question: What do we actually want to
get out of this whole endeavor?
First of all, as I see it, there are two main (and fairly distinct) use
cases for enums in Python:
1. Predefined "unique values" for passing around within Python code
2. API-defined constants for interacting with non-Python libraries, etc
(i.e. C defines/enums that need to be represented in Python, or database
field values, etc)
In non-Python code, typically enums have always been represented under the
covers as ints, and therefore must be passed back and forth as numbers.
The fact that they have an integer value, however, is purely an
implementation artifact. It comes from the fact that C and some other
languages don't have a rich enough type system to properly make enums their
own distinct types, but Python does not have this limitation, and I think
we should be careful not to constrain the way we do things within Python
just because of the limitations of other languages.
Where possible I believe we should conceptually be thinking of enums not as
"sequences of ints" but more as "collections of singletons". That is, they
are simply objects, with a defined name and type, which compare equal to
themselves but not to others, and are generally related to others by some
sort of grouping mechanism (and the same name always maps to the same
object). In this context, the idea of assigning a "value" to an enum makes
little sense and is arguably completely unnecessary. (and if we eliminate
this aspect, it mitigates many of the issues that have been brought up
about evaluation order and accidental duplication, in addition to
potentially making the base design a lot simpler)
Obviously, the second use case does require an association between enums
and (typically int) values, but that could be viewed as simply a special
case of the larger definition of "enums", rather than the other way around.
I do think one thing worth noting, however, is that (at least in my
experience) the cases which require associating names with values pretty
much also always require that every name has a specific value, so the value
for each and every enum within the group should probably be being defined
explicitly anyway (I have encountered very few cases where it's actually
useful to mix-and-match "I care about this value but I don't care about
those ones"). It doesn't seem unreasonable, therefore, to define two
different categories of enums: one that has no concept of "value" (for
pure-Python), and one which does have associated values but all values have
to be specified explicitly (for the "mapping constants" case).
On a related note, to be honest, I'm not really sure I can think of any
realistic use cases for "string enums" (or really anything other than ints
in general). Does anybody have an example of where this would actually be
useful as opposed to just using "pure" (valueless) enums (which would
already have string names)?
Anyway, in the interest of trying to get the discussion back onto more
theoretical ground, I also wanted to try to summarize the more general
thoughts/impressions I've gleaned from the discussions up to this point.
From what I can tell, these are some of the properties that there seems to
be some general consensus enums probably should or shouldn't have:
Required properties (without these, any implementation is not generally
useful, or is at least something different than an "enum"):
1. Enums must be groupable in some way (i.e. "Colors", or "Error values")
2. Enums within the same group must not compare equal to each other
(unless two names are intentionally defined to map to the same enum (i.e.
"aliases"))
3. (Within reason and the limitations of Python) Once defined, an enum's
properties (i.e. its name, identity, group membership, relationships to
other objects, etc) must be treated as immutable (i.e. not change out from
under the programmer unexpectedly). Conceptually they should be considered
to be "constants".
Desirable properties (which ones are more or less desirable will vary for
different people, but from what I've heard I think everybody sorta agrees
that all of these could be good things as long as they don't cause other
problems):
1. Enums should represent themselves (str/repr) by symbolic names, not
as ints, etc.
2. Enums from different groups should preferably not compare equal to
each other (even if they have the same associated int value).
3. It should be possible to determine what group an enum belongs to.
4. Enums/groups should be definable inline using a fairly simple Python
syntax.
5. It should also be relatively easy to define enums/groups
programmatically.
6. By default, enums should be referenceable as relatively simple
identifiers (i.e. no need for quoting, function-calls, etc, just
variables/attributes/etc)
7. If the programmer doesn't care about the value of an enum, she
shouldn't have to explicitly state a meaningless value.
8. (If an enum does have an associated value) it should be easy to
compare with and/or convert back and forth between enums and values (so
that they can be used with existing APIs).
9. It would be nice to be able to associate docstrings, and possibly
other metadata with enums.
Undesirable properties:
1. Enum syntax should not be "too magic". (In particular, it's pretty
clear at this point that creating new enums as a side-effect of name
lookups (even as convenient as it does make the syntax) is ultimately not
gonna fly)
2. The syntax for defining enums should not be so onerous or verbose
that it's significantly harder to use than the existing idioms people are
already using.
3. The syntax for defining enums should not be so alien that it will
completely baffle programmers who are already used to typical Python
constructs.
4. It shouldn't be necessary to quote enum names when defining them
(since they won't be quoted when they're used)
I want to check: Is this a valid summary of things? Anything I missed, or
do people have substantial objections to any of the
required/desirable/undesirable points I mentioned?
Obviously, it may not be possible to achieve all of the desirable
properties at the same time, but I think it's useful to start with an idea
of what we'd ideally like, and then we can sit down and see how close we
can actually get to it..
(Actually, on pondering these properties, I've started to put together a
slightly different enum implementation which I think has some potential
(it's somewhat a cross between Barry's and Tim's with a couple of ideas of
my own). I think I'll flesh it out a little more and then put it up for
comment as a separate thread, if people don't mind..)
--Alex
---------- Forwarded message ----------
From: Ian Cordasco <graffatcolmingov(a)gmail.com>
Date: Tue, Feb 19, 2013 at 11:35 AM
Subject: Re: [Python-ideas] argparse - add support for environment variables
To: Miki Tebeka <miki.tebeka(a)gmail.com>
Cc: python-ideas(a)googlegroups.com
Why not:
parser.add_argument('--spam', default=os.environ.get('SPAM', 7))
This way if SPAM isn't set, your default is 7. If spam is set, your
default becomes that.
On Tue, Feb 19, 2013 at 11:03 AM, Miki Tebeka <miki.tebeka(a)gmail.com> wrote:
> Greetings,
>
> The usual way of resolving configuration is command line -> environment ->
> default.
> Currently argparse supports only command line -> default, I'd like to
> suggest an optional "env" keyword to add_argument that will also resolve
> from environment. (And also optional env dictionary to the ArgumentParser
> __init__ method [or to parse_args], which will default to os.environ).
>
> Example:
> [spam.py]
>
> parser = ArgumentParser()
>
> parser.add_argument('--spam', env='SPAM', default=7)
> args = parser.parse_args()
> print(args.spam)
>
> ./spam.py -> 7
> ./spam.py --spam=12 -> 12
> SPAM=9 ./spam.py -> 9
> SPAM=9 ./spam.py --spam=12 -> 12
>
> What do you think?
> --
> Miki
GMail decided to reply to python-ideas at googlegroups.com, but this
was my response which I shamefully did not bottom post.