I've been thinking about some ideas for reducing the
amount of refcount adjustment that needs to be done,
with a view to making GIL removal easier.
1) Permanent objects
In a typical Python program there are many objects
that are created at the beginning and exist for the
life of the program -- classes, functions, literals,
etc. Refcounting these is a waste of effort, since
they're never going to go away.
So perhaps there could be a way of marking such
objects as "permanent" or "immortal". Any refcount
operation on a permanent object would be a no-op,
so no locking would be needed. This would also have
the benefit of eliminating any need to write to the
object's memory at all when it's only being read.
2) Objects owned by a thread
Python code creates and destroys temporary objects
at a high rate -- stack frames, argument tuples,
intermediate results, etc. If the code is executed
by a thread, those objects are rarely if ever seen
outside of that thread. It would be beneficial if
refcount operations on such objects could be carried
out by the thread that created them without locking.
To achieve this, two extra fields could be added
to the object header: an "owning thread id" and a
"local reference count". (The existing refcount
field will be called the "global reference count"
in what follows.)
An object created by a thread has its owning thread
id set to that thread. When adjusting an object's
refcount, if the current thread is the object's owning
thread, the local refcount is updated without locking.
If the object has no owning thread, or belongs to
a different thread, the object is locked and the
global refcount is updated.
The object is considered garbage only when both
refcounts drop to zero. Thus, after a decref, both
refcounts would need to be checked to see if they
are zero. When decrementing the local refcount and
it reaches zero, the global refcount can be checked
without locking, since a zero will never be written
to it until it truly has zero non-local references
remaining.
I suspect that these two strategies together would
eliminate a very large proportion of refcount-related
activities requiring locking, perhaps to the point
where those remaining are infrequent enough to make
GIL removal practical.
--
Greg
Hello all,
I was looking into function annotations, that are to be introduced in
Python3k, and I found this old message sent to this mailing list :
http://mail.python.org/pipermail/python-ideas/2007-January/000037.html
I would like to restart the discussion of attribute annotation, because in
my opinion it can be a very powerful feature. I personally think about using
it for SOAP message serialization, or any kind of XML serialization, the
Idea would be to annotate various object attribues to be either marshaled as
XML Elements or XML Attributes of the current Node that reflects the Object.
Thank you,
--
Ali
Hi
List comprehensions (and generator expressions) come in two
'flavours' at the moment:
(1) [f(x) for x in L], which stands for map(f, L). Let's call this a
'map comprehension'
(2) [f(x) for x in L if p(x)], which stands for map(f, filter(p, L)).
Let's call this a 'map-filter comprehension'.
Now if one wants to write simply filter(p, L) as a list
comprehension, one has to write:
(3) [x for x in L if p(x)]. This could be called a 'filter
comprehension'.
the 'x for x in L' is not very nice IMHO, but it is often handy to
use such expressions over 'filter(...)', eg building the sublist of a
given list consisting of all the items of a given type could be
written as:
filter(lambda x: isinstance(x, FilteringType), heterogeneous_list)
or:
[x for x in heterogenous_list if isinstance(x, FilteringType)]
I still prefer the list comprehension over the lambda/filter
combination, but neither feels very satisfying (to me :) (not that
one cannot use partial in the filter version)
Why not just drop the 'x for' at the start of a 'filter
comprehension' (or generator expression)? Thus (3) could be written
more simply as:
(3') [x in L if p(x)]
This is consistent with common mathematical notation:
* { f(x) | x \in L } means the set of all f(x) for x in L
* { f(x) | x \in L, p(x) } means the set of all f(x) for x in L
satisfying predicate p.
* { x \in L | p(x) } means the set of all x in L satisfying predicate p.
--
Arnaud
ABSTRACT:
I propose the following syntax:
rowexpr: convert_to_euros(salary) > 50000 and
deptno = 10
Rowexpr returns a function object that can be called
with one argument, row, and the items "salary" and
"deptno" are used to derefence row.
RATIONALE:
A lot of time people write Python code that works with
other small interpreters, such as SQL engines, math
libraries, etc., that have an expression syntax of
their own. To use SQL as an example, I may want my
SQL engine to use this Python expression:
where convert_to_euros(salary) > 50000 and deptno =
10
To interact with SQL engines, I could imagine code
like the following:
sql(..., lambda row: convert_to_euros(row['salary']
> 50000 and row['deptno'] = 10) # not concise
sql(..., sql_parse('convert_to_euros(salary) > 50000
and deptno = 10') # requires sophisticated library
sql(..., [and_op,
[ge_op(func_call_op(convert_to_euros, 'salary'),
50000), eq_opt(itemgetter_op('deptno'), 10)]) #
UGLY!!!
SOLUTION:
Let the Python interpreter build expression objects
for you. You write:
rowexpr: convert_to_euros(salary) > 50000 and
deptno = 10
Python translates that into the same bytecode as:
lambda row: convert_to_euros(row['salary'] > 50000 and
row['deptno'] == 10
Benefits:
1) Allows smaller code.
2) Having Python translate the rowexpr creates an
opportunity for better error messages, catching more
things at startup time, etc.
3) In the particular case of SQL, it enable the
development methodology that you might start off by
writing code that processes data in memory, but with
mostly SQL-like syntax, and then switch over to a real
SQL database as scalability becomes a bigger concern.
COMMON OBJECTIONS:
1) More syntax, of course.
2) Folks used to SQL would still need to adjust to
differences between Python expressions (==) and SQL
(=).
3) In the case of SQL, people may already be happy
enough with current tools, and of course, interpreters
external to Python can cache expressions, etc.
OPEN QUESTIONS:
Would a rowexpr be limited to dictionaries, or could
it also try to access object attributes?
____________________________________________________________________________________Be a better Globetrotter. Get better travel answers from someone who knows. Yahoo! Answers - Check it out.
http://answers.yahoo.com/dir/?link=list&sid=396545469
This is in response to the various suggestions for embedding SQL
syntax in Python and various other suggestions made over the years.
Note that this suggestion is only a starting point for discussion, I
don't have any illusion that the idea as presented here would have
strong support. I'm fully aware that the actual proposal given below
looks "butt-ugly", but maybe someone can invent something better.
Background: Python is frequently used as a foundation for Domain
Specific Languages (DSLs). Examples include SQLObject, SCons, and many
others. These are all examples of a mini-language which is constructed
on top of the Python syntax, using operator overloading. The DSL is
often embedded within Python code (as in the case of SQLObject), and
can be invoked programmatically from regular (non-DSL) Python code.
However, there are some limitations on what can be expressed. For
example, in current interpreters, it is not possible to override the
'and' and 'or' operator. While there currently is a PEP that proposes
adding this capability, there is some disagreement as to how much this
would impact the efficiency of the non-override case. The reason is
because in the non-override case, the compiler is able to take
advantage of the "shortcut" properties of these operators; If the
operators can be overridden (which is a post-compiler operation), then
the compiler would no longer be able to make these optimization
assumptions.
A similar limitation in DSLs is the treatment of unbound identifiers.
Identifiers which have already been bound to objects are not a
problem, since you can simply override the various operators of those
objects. However, for identifiers which have not yet been assigned to
a variable, a number of ugly hacks are required. Typically what is
done is to have a set of "special" objects which act as placeholders
for unbound variable names, for example _x, _y, _z, and so on.
Unfortunately, this requires the person writing the DSL expression to
pre-declare unbound variables, which is exactly the opposite of how
regular Python works. Another method that is sometimes used is to
declare unbound variables as attributes of some general "unbound
variable" object, so you can say something like Query.x, Query.y, and
so forth. But again, this is rather wordy and robs the DSL of its
ability to concisely express a particular semantic intent, which is
the whole point of DSLs in the first place.
Another limitation of Python's ability to represent DSLs is that you
cannot override the assignment operator.
My idea is to alleviate these problems by giving DSLs some help at the
parsing level. By informing the parser that a given expression is a
DSL, we can relax some of the normal assumptions of the Python syntax,
and instead require the DSL interpreter to handle them, which allows
the DSL to have greater flexibility.
Two things we want to avoid: First, we want to avoid "programmable
syntax", meaning that we cannot define DSL-specific semantics at parse
time. The reason for this is that DSLs are generally defined in their
own specific imported modules, and the Python parser has no knowledge
of imports, and therefore has no access to the DSL's syntax rules.
Secondly, we want to avoid conflicts between multiple DSLs. So for
example, if I am using two different DSLs in the same module, we can't
have two different meanings for "and", for example.
The solution to both of these concerns is to say that the parser's
support for DSLs is completely generic - in other words, we can tell
the parser that a given expression is a DSL, but we can tell it
nothing specific about that *particular* DSL. Instead, it is up to the
DSL interpreter - which operates at runtime, not parse time - to take
up where the parser left off.
This is similar to PJE's suggestion of a way to "quote" Python syntax,
returning an AST. In such a scheme, the Python parser doesn't
completely compile the quoted expression to Python byte code, but
instead produces an AST (or AST-like data structure which might not
exactly be an AST), which is then interpreted by the DSL.
So far this all seems reasonable enough to my ear :) Next is the part
where I go mad. (As my friend Sara said to me recently, "I'm not a mad
scientist - I'm a mad natural philosopher.")
For purposes of dicussion I am going to introduce the token '??' (two
consecutive question marks) as the "DSL operator", in other words a
signal to the parser that a given expression is a DSL. I'm sure that
there could be a long and intense bikeshed discussion debating the
specific operator syntax, but for now I'll choose ?? since it's
unambiguously parsable.
The '??' operator can be used in one of three ways. First, you can
declare an entire expression as a DSL:
a = ??(c + d)
The parser will take all of the tokens inside the parenthesized group
and return them as an AST, it will not attempt to generate code to
evaluate any of the operators within the group.
There's an extra bit of syntactic sugar: If the DSL expression is the
only argument to a function, you can combine the function call and the
DSL quote:
func??(c+d)
Meaning "pass the AST for 'c + d' to the function 'func'". For most
DSLs, the 'func' will actually be the name of the DSL, so the typical
use pattern will be 'name_of_dsl??(expr)'.
The second use is to quote an identifier:
a = ??c + d
Now, you would think that this would quote the variable 'c' and
attempt to add it to 'd', but this is not the way it works. Instead,
the ?? operator invokes a kind of operator overloading that happens at
the parser level, meaning that any attempt to create an expression
that operates on a quoted variable also gets quoted. This only affects
binary and unary operators, not functions calls, array references or
assignment. So in this case, the entire expression "??c + d" gets
quoted.
The third way a DSL operator can be used is to quote an operator:
a = c ??= d
Again, this allows us to generate the AST "c = d" and assign it to the
variable "a".
(I recognize that these last two syntactical variations would make the
interpretation of regular Python ASTs significantly more complex,
which is a serious drawback. Part of the motivation for both of these
variations is to deal with the 'and', 'or' and '=' problem, by
overriding the Python compiler's normal treatment of these operators.
However, it may be that the first form alone is sufficient, which
would be relatively easy to handle in the Python compiler I think -
you would simply have the AST.c module look for a DSL operator, and
don't call the normal AST processing steps for arguments to that
operator.)
Some examples of DSLs using the operator:
An SQL query as a DSL:
query = SQL??(name == "John" and age > 25)
result_iter = query.execute()
Syntax for an algebraic solver:
@arity??(x + x)
def reduce(x):
return 2 * x
@arity??(x + 0)
def reduce(x):
return x
@arity??(x * 0)
def reduce(x):
return 0
@arity??(x * 1)
def reduce(x):
return x
@arity??(x * a + y * a)
def reduce(x):
return (x+y)*a
# (and so on for all the other reduce() rules)
print(reduce??((a+b*3) + (a+b*3)))
>>> a*2+b*6
(In the above, we define a function 'reduce' that is similar to a
generic function, except that the dispatch logic is based on the form
of the AST being passed as an argument, using a one-way unification
algorithm to do argument pattern matching.)
I realize that there are some aspects of this proposal that may seem
silly, and that's true, and the only reason I'm even suggesting this
at all is that much of what I'm writing here is already been suggested
or at least implied by previous discussions.
Flame on!
--
-- Talin
I know about cheeseshop and I know about setuptools. But how about
all this be standard tools on python 3.0, so we can have something
like CPAN but better. With setuptools being default on py3k you can
already easy_install everything, ok but then we can do even more.
When the user run a python script, if it requires a version of a lib
that he doesn't have it could download it and install in his home
folder (something in the lines of ~/pythonlibs). This will make it
much easier to distribute python programs. I think this is somewhat
like what java webstart does. What do you guys think? Also why no one
is talking about having a tool like setuptools in the stdlib?
--
Leonardo Santagada
"If it looks like a duck, and quacks like a duck, we have at least to
consider the possibility that we have a small aquatic bird of the
family anatidae on our hands." - Douglas Adams
Ok, looks like there's not much chance of agreeing on a syntax, so
here's a decorator that covers the two use cases I know of::
* matching the signature of dict() and dict.update()
* allowing arguments to have their names changed without worrying
about backwards compatibility (e.g. ``def func(sequence)`` changing to
``def func(iterable)``)
I've pasted the docstring and code below. The docstring should give
you a pretty good idea of the functionality. Let me know if you have
use cases this wouldn't cover.
"""
Functions declared ad @positional_only prohibit any arguments from being
passed in as keyword arguments::
>>> @positional_only
... def foo(a, b=None, *args):
... return a, b, args
...
>>> foo(1, 2, 3)
(1, 2, (3,))
>>> foo(a=1, b=2)
Traceback (most recent call last):
...
TypeError: foo() does not allow keyword arguments
Note that you can't use **kwargs arguments with a @positional_only
function::
>>> @positional_only
... def foo(a, b=None, **kwargs):
... pass
...
Traceback (most recent call last):
...
TypeError: foo() may not have **kwargs
If you need the functionality of **kwargs, you can request it by
adding a final argument with the name _kwargs. The keyword arguments
will be passed in (as a dict) through this argument::
>>> @positional_only
... def foo(a, b=None, _kwargs=None):
... return a, b, _kwargs
...
>>> foo(1)
(1, None, {})
>>> foo(1, 2)
(1, 2, {})
>>> foo(1, bar=42, baz='spam')
(1, None, {'baz': 'spam', 'bar': 42})
When your function is called with the wrong number of arguments, your
callers will only see the number of arguments they expected. That is,
argument counts will not include the _kwargs argument::
>>> foo()
Traceback (most recent call last):
...
TypeError: foo() takes at least 1 positional argument(s) (0 given)
>>> foo(1, 2, 3)
Traceback (most recent call last):
...
TypeError: foo() takes at most 2 positional argument(s) (3 given)
Note that unlike a normal **kwargs, using _kwargs in a
@positional_only function causes *all* keyword arguments to be
collected, even those with the same names as other function
parameters::
>>> foo(1, a=42, b='spam')
(1, None, {'a': 42, 'b': 'spam'})
>>> foo(1, _kwargs='this is just silly')
(1, None, {'_kwargs': 'this is just silly'})
This is of course because all function parameters are considered to
be positional only and therefore unnamed for the purposes of argument
parsing.
Note that you cannot use *args or **kwargs with the _kwargs argument::
>>> @positional_only
... def foo(a, b, _kwargs, *args):
... pass
...
Traceback (most recent call last):
...
TypeError: foo() may not have *args
>>> @positional_only
... def foo(a, b, _kwargs, **kwargs):
... pass
...
Traceback (most recent call last):
...
TypeError: foo() may not have **kwargs
"""
import doctest
import inspect
import functools
def positional_only(func):
name = func.__name__
# don't allow **kwargs
arg_names, args_name, kwargs_name, _ = inspect.getargspec(func)
if kwargs_name is not None:
raise TypeError('%s() may not have **kwargs' % name)
# if no keyword arguments were requested, create a wrapper that
# only accepts positional arguments (as *args)
if not arg_names or arg_names[-1] != '_kwargs':
# wrapper that raises an error for **kwargs
def positional_wrapper(*args, **kwargs):
if kwargs:
msg = '%s() does not allow keyword arguments'
raise TypeError(msg % name)
return func(*args)
# if keyword arguments were requested (through a final _kwargs),
# create a wrapper that collects **kwargs and passes them in as
# the final positional argument
else:
# don't allow *args
if args_name is not None:
msg = '%s() may not have *args'
raise TypeError(msg % func.__name__)
# determine defaults and expected argument counts
defaults = func.func_defaults
defaults = defaults and defaults[:-1] or []
n_expected = func.func_code.co_argcount - 1
min_expected = n_expected - len(defaults)
def positional_wrapper(*args, **kwargs):
# raise a TypeError for wrong number of arguments here
# because func() needs to take the extra 'kwargs'
# argument but the caller shouldn't know it
arg_count_err = None
if len(args) < min_expected:
arg_count_err = 'at least %i' % min_expected
if len(args) > n_expected:
arg_count_err = 'at most %i' % n_expected
if arg_count_err is not None:
msg = '%s() takes %s positional argument(s) (%i given)'
raise TypeError(msg % (name, arg_count_err, len(args)))
# fill in defaults and add the final _kwargs argument,
# then call the function with only positional arguments
n_missing = n_expected - len(args)
args += func.func_defaults[-n_missing - 1:-1]
args += (kwargs,)
return func(*args)
# return the wrapped function
functools.update_wrapper(positional_wrapper, func)
return positional_wrapper
if __name__ == '__main__':
doctest.testmod()
STeVe
--
I'm not *in*-sane. Indeed, I am so far *out* of sane that you appear a
tiny blip on the distant coast of sanity.
--- Bucky Katt, Get Fuzzy
The dictionary fromkeys method seems out of place as well as miss-named. IMHO
In the current 2.5 branch it occurs only twice. (excluding tests)
----
$ grep -r "fromkeys" Lib/*.py
Lib/httplib.py: header_names = dict.fromkeys([k.lower() for k in
headers])
Lib/UserDict.py: def fromkeys(cls, iterable, value=None):
----
In httplib.py, it is used as a set to remove duplicates.
There are enough correct uses of it in the wild to keep the behavior, but
it can be done in a better way.
I feel it really should be called set_keys and implemented as a method that
operates on the current dictionary instead of being a constructor for a new
dictionary. That will allow you to add keys with a default value to an
already existing dictionary, or to create a new one with a dictionary
constructor.
dict().set_keys(s, v=None) # The current fromkeys behavior.
I think this reads better and can be used in a wider variety of situations.
It could be useful for setting an existing dictionary to a default state.
# reset status of items.
status.set_keys(status.keys(), v=0)
Or more likely, resetting a partial sub set of the keys to some initial state.
The reason I started looking at this is I wanted to split a dictionary into
smaller dictionaries and my first thought was that fromkeys would do that.
But of course it doesn't.
What I wanted was to be able to specify the keys and get the values from
the existing dictionary into the new dictionary without using a for loop to
iterate over the keys.
d = dict(1='a', 2='b', 3='c', 4='d', 5='e')
d_odds = d.from_keys([1, 3, 5]) # new dict of items 1, 3, 5
d_evens = d.from_keys([2, 4]) # new dict of items 2, 4
There currently isn't a way to split a dictionary without iterating it's
contents even if you know the keys you need before hand.
A from_keys method would be the inverse complement of the update method.
A del_keys method could replace the clear method. del_keys would be more
useful as it could operate on a partial set of keys.
d.delkeys(d.keys()) # The current clear method behavior.
Some potentially *very common* uses:
# This first one works now, but I included it for completeness. ;-)
mergedicts(d1, d2):
""" Combine two dictionaries. """
dd = dict(d1)
return dd.update(d2)
splitdict(d, keys):
""" Split dictionary d using keys. """
keys_rest = set(d.keys()) - set(keys)
return d.from_keys(keys), d.from_keys(keys_rest)
split_from_dict(d, keys):
""" Removes and returns a subdict of d with keys. """
dd = d.from_keys(keys)
d.del_keys(keys)
return dd
copy_items(d1, d2, keys):
""" Copy items from dictionary d1 to d2. """
d2.update(d1.from_keys(keys)) # I really like this!
move_items(d1, d2, keys):
""" Move items from dictionary d1 to d2. """
d2.update(d1.from_keys(keys))
d1.del_keys(keys)
I think the set_keys, from_keys, and del_keys methods could add both
performance and clarity benefits to python.
So to summarize...
1. Replace existing fromkeys method with a set_keys method.
2. Add a partial copy items from_keys method.
3. Replace the clear method with a del_keys method.
So this replaces two methods and adds one more. Overall I think the
usefulness of these would be very good.
I also think it will work very well with the python 3000 keys method
returning an iterator. (And still be two fewer methods than we currently
have.)
Any thoughts?
Cheers,
Ron
--- Steve Howell <showell30(a)yahoo.com> wrote:
>
> --- Josiah Carlson <jcarlson(a)uci.edu> wrote:
> > It could, but I would happily bet you $100 that it
> > won't.
>
> Maybe not by 2010, but I'll make a gentlemen's bet
> that native SQL is in Python by 2005.
>
...when I come back in my time machine.
2015
____________________________________________________________________________________Sick sense of humor? Visit Yahoo! TV's
Comedy with an Edge to see what's on, when.
http://tv.yahoo.com/collections/222
I find myself often writing somewhat tedious code in
Python that is just slicing/aggregating fairly simple
data structures like a list of dictionaries.
Here is something that I'm trying to express, roughly:
charges =
sum float(charge) from events
on setup, install
group by name
Here is the Python code that I wrote:
def groupDictBy(lst, keyField):
dct = {}
for item in lst:
keyValue = item[keyField]
if keyValue not in dct:
dct[keyValue] = []
dct[keyValue].append(item)
return dct
dct = groupDictBy(events, 'name')
for name in dct:
events = dct[name]
charges = {}
for bucket in ('setup', 'install'):
charges[bucket] = sum(
[float(event['charge']) for
event in events
if event['bucket'] == bucket])
Comments are welcome on improving the code itself, but
I wonder if Python 3k (or 4k?) couldn't have some kind
of native SQL-like ways of manipulating lists and
dictionaries. I don't have a proposal myself, just
wonder if others have felt this kind of pain, and
maybe it will spark a Pythonic solution.
____________________________________________________________________________________Sick sense of humor? Visit Yahoo! TV's
Comedy with an Edge to see what's on, when.
http://tv.yahoo.com/collections/222