There's a whole matrix of these and I'm wondering why the matrix is
currently sparse rather than implementing them all. Or rather, why we
can't stack them as:
class foo(object):
@classmethod
@property
def bar(cls, ...):
...
Essentially the permutation are, I think:
{'unadorned'|abc.abstract}{'normal'|static|class}{method|property|non-callable
attribute}.
concreteness
implicit first arg
type
name
comments
{unadorned}
{unadorned}
method
def foo():
exists now
{unadorned} {unadorned} property
@property
exists now
{unadorned} {unadorned} non-callable attribute
x = 2
exists now
{unadorned} static
method @staticmethod
exists now
{unadorned} static property @staticproperty
proposing
{unadorned} static non-callable attribute {degenerate case -
variables don't have arguments}
unnecessary
{unadorned} class
method @classmethod
exists now
{unadorned} class property @classproperty or @classmethod;@property
proposing
{unadorned} class non-callable attribute {degenerate case - variables
don't have arguments}
unnecessary
abc.abstract {unadorned} method @abc.abstractmethod
exists now
abc.abstract {unadorned} property @abc.abstractproperty
exists now
abc.abstract {unadorned} non-callable attribute
@abc.abstractattribute or @abc.abstract;@attribute
proposing
abc.abstract static method @abc.abstractstaticmethod
exists now
abc.abstract static property @abc.staticproperty
proposing
abc.abstract static non-callable attribute {degenerate case -
variables don't have arguments} unnecessary
abc.abstract class method @abc.abstractclassmethod
exists now
abc.abstract class property @abc.abstractclassproperty
proposing
abc.abstract class non-callable attribute {degenerate case -
variables don't have arguments} unnecessary
I think the meanings of the new ones are pretty straightforward, but in
case they are not...
@staticproperty - like @property only without an implicit first
argument. Allows the property to be called directly from the class
without requiring a throw-away instance.
@classproperty - like @property, only the implicit first argument to the
method is the class. Allows the property to be called directly from the
class without requiring a throw-away instance.
@abc.abstractattribute - a simple, non-callable variable that must be
overridden in subclasses
@abc.abstractstaticproperty - like @abc.abstractproperty only for
@staticproperty
@abc.abstractclassproperty - like @abc.abstractproperty only for
@classproperty
--rich
At the moment, the array module of the standard library allows to
create arrays of different numeric types and to initialize them from
an iterable (eg, another array).
What's missing is the possiblity to specify the final size of the
array (number of items), especially for large arrays.
I'm thinking of suffix arrays (a text indexing data structure) for
large texts, eg the human genome and its reverse complement (about 6
billion characters from the alphabet ACGT).
The suffix array is a long int array of the same size (8 bytes per
number, so it occupies about 48 GB memory).
At the moment I am extending an array in chunks of several million
items at a time at a time, which is slow and not elegant.
The function below also initializes each item in the array to a given
value (0 by default).
Is there a reason why there the array.array constructor does not allow
to simply specify the number of items that should be allocated? (I do
not really care about the contents.)
Would this be a worthwhile addition to / modification of the array module?
My suggestions is to modify array generation in such a way that you
could pass an iterator (as now) as second argument, but if you pass a
single integer value, it should be treated as the number of items to
allocate.
Here is my current workaround (which is slow):
def filled_array(typecode, n, value=0, bsize=(1<<22)):
"""returns a new array with given typecode
(eg, "l" for long int, as in the array module)
with n entries, initialized to the given value (default 0)
"""
a = array.array(typecode, [value]*bsize)
x = array.array(typecode)
r = n
while r >= bsize:
x.extend(a)
r -= bsize
x.extend([value]*r)
return x
I just spent a few minutes staring at a bug caused by a missing comma
-- I got a mysterious argument count error because instead of foo('a',
'b') I had written foo('a' 'b').
This is a fairly common mistake, and IIRC at Google we even had a lint
rule against this (there was also a Python dialect used for some
specific purpose where this was explicitly forbidden).
Now, with modern compiler technology, we can (and in fact do) evaluate
compile-time string literal concatenation with the '+' operator, so
there's really no reason to support 'a' 'b' any more. (The reason was
always rather flimsy; I copied it from C but the reason why it's
needed there doesn't really apply to Python, as it is mostly useful
inside macros.)
Would it be reasonable to start deprecating this and eventually remove
it from the language?
--
--Guido van Rossum (python.org/~guido)
Stephen J. Turnbull wrote:
> Vernon D. Cole writes:
>
>> I cannot compile a Python extension module with any Microsoft compiler
>> I can obtain.
>
> Your pain is understood, but it's not simple to address it.
FWIW, I'm working on making the compiler easily obtainable. The VS 2008 link that was posted is unofficial, and could theoretically disappear at any time (I'm not in control of that), but the Windows SDK for Windows 7 and .NET 3.5 SP1 (http://www.microsoft.com/en-us/download/details.aspx?id=3138) should be around for as long as Windows 7 is supported. The correct compiler (VC9) is included in this SDK, but unfortunately does not install the vcvarsall.bat file that distutils expects. (Though it's pretty simple to add one that will switch on %1 and call the correct vcvars(86|64|...).bat.)
The SDK needed for Python 3.3 and 3.4 (VC10) is even worse - there are many files missing. I'm hoping we'll be able to set up some sort of downloadable package/tool that will fix this. While we'd obviously love to move CPython onto our latest compilers, it's simply not possible (for good reason). Python 3.4 is presumably locked to VC10, but hopefully 3.5 will be able to use whichever version is current when that decision is made.
> The basic problem is that the ABI changes. Therefore it's going to require
> a complete new set of *all* C extensions for Windows, and the duplication
> of download links for all those extensions from quite a few different vendors
> is likely to confuse a lot of users.
Specifically, the CRT changes. The CRT is an interesting mess of data structures that are exposed in header files, which means while you can have multiple CRTs loaded, they cannot touch each other's data structures at all or things will go bad/crash, and there's no nice way to set it up to avoid this (my colleague who currently owns MSVCRT suggested a not-very-nice way to do it, but I don't think it's going to be reliable enough). Python's stable ABI helps, but does not solve this problem.
The file APIs are the worst culprits. The layout of FILE* objects can and do change between CRT versions, and file descriptors are simply indices into an array of these objects that is exposed through macros rather than function calls. As a result, you cannot mix either FILE pointers or file descriptors between CRTs. The only safe option is to build with the matching CRT, and for MSVCRT, this means with the matching compiler. It's unfortunate, and the responsible teams are well aware of the limitation, but it's history at this point, so we have no choice but to work with it.
Cheers,
Steve
Dear all,
I guess this is so obvious that someone must have suggested it before:
in list comprehensions you can currently exclude items based on the if
conditional, e.g.:
[n for n in range(1,1000) if n % 4 == 0]
Why not extend this filtering by allowing a while statement in addition to
if, as in:
[n for n in range(1,1000) while n < 400]
Trivial effect, I agree, in this example since you could achieve the same by
using range(1,400), but I hope you get the point.
This intuitively understandable extension would provide a big speed-up for
sorted lists where processing all the input is unnecessary.
Consider this:
some_names=["Adam", "Andrew", "Arthur", "Bob", "Caroline","Lancelot"] #
a sorted list of names
[n for n in some_names if n.startswith("A")]
# certainly gives a list of all names starting with A, but .
[n for n in some_names while n.startswith("A")]
# would have saved two comparisons
Best,
Wolfgang
The transition to Python 3 is happening but there is still a massive
amount of code that needs to be ported. One of the most disruptive
changes in Python 3 is the strict separation of bytes from unicode
strings. Most of the other incompatible changes can be handled by
2to3.
Here is a far out idea to make transition smoother. Release version
2.8 of Python with nearly all Python 3.x incompatible changes except
for the bytes/unicode changes. This could include:
- print as function
- default string literal as unicode
- return view objects for dict.keys(), etc
- rename modules in standard library
- rename long to int
- rename .next() to __next__()
- accept only new 'raise' syntax
- remove backticks for repr
- rename unicode to str
- removal of 'apply', 'buffer', 'callable', 'execfile'
- exec as function
- rename os.getcwdu() to os.getcwd()
- remove dict.has_key
- move intern to sys.intern()
- rename xrange to range
- remove xreadlines
New features of Python 3.x could be backported if easy since they
could be useful to entice developers to move from 2.7 to 2.8.
Problems with this idea:
- it would be a huge amount of work. There are thousands of
commits to Python 3.x since it was branched. Most of them are not
related to the above features but back porting them would still be
a huge effort. I tried backport 'print' as a function just to get
an idea of the work.
- if people install this new version of Python as the default, old
scripts and programs will break. I believe this breakage was the
movation for making Python 3 an all-at-once jump. I'm not sure
how to handle this, maybe this version could be used only by
developers during their Python 3 porting efforts. Alternatively,
only install it as 'python2.8', never 'python' or 'python2'.
An alternative approach to producing Python 2.8 would be to start
with the Python 3.x latest branch. Modify bytesobject and
unicodeobject to have as close to Python 2 behavior as practical.
A-journey-of-a-thousand-miles-begins-ly y'rs
Neil
Yet another idea that some of you will find strange.
It is a parallel Python development process. It doesn't affect or
replace current practice, so nobody gets hurt. It is also about open
process, where openness means transparency (eliminate hidden
communication), inclusiveness (eliminate exclusive rights and
privileges) and accessibility (eliminate awkward practices and poor
user experience).
The idea is to split development of Python into two weeks cycle. Every
two weeks is "iteration". Iteration consists of phases:
1. Planning (one, two days)
2. Execution
3. Testing
4. Demo
5. Retrospective
Some of you, who familiar with concept of "sprint" and know something
about "agile" buzzwords will find this idea familiar. In fact, this is
borrowed from some of the best practices of working with remote teams
who use this methodology.
(Planning) So, during these the first, planning phase, people, who'd
like to participate - choose what should be implemented in this
iteration. For that there should be a list of things to be done. This
list is called "backlog". People collaboratively estimate complexity
and sort the things by priority.
(Execution) You take a thing from backlog, mark that you're working on
it, so that other people who are also interested can find you. If you
need help, you split the thing into subtasks and make these tasks open
for people to find and jump in.
(Testing) This is a phase when work done is compared with actual thing
description. Sometimes this leads to new insights, new ideas, new bugs
and more work to be done in subsequent iteration. Sometimes it appears
that during execution the thing completely diverged from what was
originally planned.
(Demo) Demonstration of the things done. Record progress, give credits
and close mark things in backlog as done. Demo is made for broader
community that just for a list of participants.
(Retrospective) This is an important phase that is dedicated to
gathering and processing feedback to improve the iteration loop. Every
person reports what he/she liked and disliked, what was the % of
overall fun. Then some things and ideas are being born from the
feedback - what can be improved - being it tools, interaction with
people or some other things that get in the way.
--
anatoly t.
On 25 Jan 2014 04:29, "Andrew Barnert" <abarnert(a)yahoo.com> wrote:
>
> On Jan 24, 2014, at 10:20, Antoine Pitrou <solipsis(a)pitrou.net> wrote:
>
> > On Fri, 24 Jan 2014 20:13:26 +0200
> > Serhiy Storchaka <storchaka(a)gmail.com>
> > wrote:
> >> 24.01.14 19:36, Antoine Pitrou написав(ла):
> >>> On Fri, 24 Jan 2014 19:30:00 +0200
> >>> Serhiy Storchaka <storchaka(a)gmail.com>
> >>> wrote:
> >>>> 24.01.14 18:56, Antoine Pitrou написав(ла):
> >>>>> On Fri, 24 Jan 2014 08:47:14 -0800 (PST)
> >>>>> Ram Rachum <ram.rachum(a)gmail.com> wrote:
> >>>>>> I propose implementing str.rreplace. (It'll be to str.replace what
> >>>>>> str.rsplit is to str.split.)
> >>>>>
> >>>>> I suppose it only differs when the count parameter is supplied?
> >>>>>
> >>>>> I don't think it can hurt, except for the funny looks of its name.
> >>>>> In any case, if str.rreplace is added then so should bytes.rreplace
and
> >>>>> bytearray.rreplace.
> >>>>
> >>>> bytearray.rremove, tuple.rindex, list.rindex, list.rremove.
> >>>
> >>> Not sure what those have to do with rreplace(). Overgeneralization
> >>> doesn't help.
> >>
> >> If open a door for rreplace, it would be not easy to close it for
rindex
> >> and rremove.
> >
> > Perhaps you underestimate our collective door closing skills ;)
>
> While we're speculatively overgeneralizing, couldn't all of the
index/find/remove/replace/etc. methods take a negative n to count from the
end, making r variants unnecessary?
Strings already provide rfind and rindex (they're just not part of the
general sequence API).
Since strings are immutable, there's also no call for an "rremove".
rreplace (pronounced as 'ar-replace", like "ar-split" et al) is more
obvious than a negative count, and seems like an almost exact parallel to
rsplit.
On the other hand, I don't recall ever lamenting its absence. Call me +0 on
the idea.
Cheers,
Nick.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas(a)python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
Dear all,
I am still testing the new statistics module and I found two cases were the
behavior of the module seems suboptimal to me.
My most important concern is the module's internal _sum function and its
implications, the other one about passing Counter objects to module
functions.
As for the first subject:
Specifically, I am not happy with the way the function handles different
types. Currently _coerce_types gets called for every element in the
function's input sequence and type conversion follows quite complicated
rules, and - what is worst - make the outcome of _sum() and thereby mean()
dependent on the order of items in the input sequence, e.g.:
>>> mean((1,Fraction(2,3),1.0,Decimal(2.3),2.0, Decimal(5)))
1.9944444444444445
>>> mean((1,Fraction(2,3),Decimal(2.3),1.0,2.0, Decimal(5)))
Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
mean((1,Fraction(2,3),Decimal(2.3),1.0,2.0, Decimal(5)))
File "C:\Python33\statistics.py", line 369, in mean
return _sum(data)/n
File "C:\Python33\statistics.py", line 157, in _sum
T = _coerce_types(T, type(x))
File "C:\Python33\statistics.py", line 327, in _coerce_types
raise TypeError('cannot coerce types %r and %r' % (T1, T2))
TypeError: cannot coerce types <class 'fractions.Fraction'> and <class
'decimal.Decimal'>
(this is because when _sum iterates over the input type Fraction wins over
int, then float wins over Fraction and over everything else that follows in
the first example, but in the second case Fraction wins over int, but then
Fraction vs Decimal is undefined and throws an error).
Confusing, isn't it? So here's the code of the _sum function:
def _sum(data, start=0):
"""_sum(data [, start]) -> value
Return a high-precision sum of the given numeric data. If optional
argument ``start`` is given, it is added to the total. If ``data`` is
empty, ``start`` (defaulting to 0) is returned.
Examples
--------
>>> _sum([3, 2.25, 4.5, -0.5, 1.0], 0.75)
11.0
Some sources of round-off error will be avoided:
>>> _sum([1e50, 1, -1e50] * 1000) # Built-in sum returns zero.
1000.0
Fractions and Decimals are also supported:
>>> from fractions import Fraction as F
>>> _sum([F(2, 3), F(7, 5), F(1, 4), F(5, 6)])
Fraction(63, 20)
>>> from decimal import Decimal as D
>>> data = [D("0.1375"), D("0.2108"), D("0.3061"), D("0.0419")]
>>> _sum(data)
Decimal('0.6963')
"""
n, d = _exact_ratio(start)
T = type(start)
partials = {d: n} # map {denominator: sum of numerators}
# Micro-optimizations.
coerce_types = _coerce_types
exact_ratio = _exact_ratio
partials_get = partials.get
# Add numerators for each denominator, and track the "current" type.
for x in data:
T = _coerce_types(T, type(x))
n, d = exact_ratio(x)
partials[d] = partials_get(d, 0) + n
if None in partials:
assert issubclass(T, (float, Decimal))
assert not math.isfinite(partials[None])
return T(partials[None])
total = Fraction()
for d, n in sorted(partials.items()):
total += Fraction(n, d)
if issubclass(T, int):
assert total.denominator == 1
return T(total.numerator)
if issubclass(T, Decimal):
return T(total.numerator)/total.denominator
return T(total)
Internally, the function uses exact ratios for its calculations (which I
think is very nice) and only goes through all the pain of coercing types to
return
T(total.numerator)/total.denominator
where T is the final type resulting from the chain of conversions.
I think a much cleaner (and probably faster) implementation would be to
gather first all the types in the input sequence, then decide what to
return in an input order independent way. My tentative implementation:
def _sum2(data, start=None):
if start is not None:
t = set((type(start),))
n, d = _exact_ratio(start)
else:
t = set()
n = 0
d = 1
partials = {d: n} # map {denominator: sum of numerators}
# Micro-optimizations.
exact_ratio = _exact_ratio
partials_get = partials.get
# Add numerators for each denominator, and build up a set of all types.
for x in data:
t.add(type(x))
n, d = exact_ratio(x)
partials[d] = partials_get(d, 0) + n
T = _coerce_types(t) # decide which type to use based on set of all
types
if None in partials:
assert issubclass(T, (float, Decimal))
assert not math.isfinite(partials[None])
return T(partials[None])
total = Fraction()
for d, n in sorted(partials.items()):
total += Fraction(n, d)
if issubclass(T, int):
assert total.denominator == 1
return T(total.numerator)
if issubclass(T, Decimal):
return T(total.numerator)/total.denominator
return T(total)
this leaves the re-implementation of _coerce_types. Personally, I'd prefer
something as simple as possible, maybe even:
def _coerce_types (types):
if len(types) == 1:
return next(iter(types))
return float
, but that's just a suggestion.
In this case then:
>>> _sum2((1,Fraction(2,3),1.0,Decimal(2.3),2.0, Decimal(5)))/6
1.9944444444444445
>>> _sum2((1,Fraction(2,3),Decimal(2.3),1.0,2.0, Decimal(5)))/6
1.9944444444444445
lets check the examples from the _sum docstring just to be sure:
>>> _sum2([3, 2.25, 4.5, -0.5, 1.0], 0.75)
11.0
>>> _sum2([1e50, 1, -1e50] * 1000) # Built-in sum returns zero.
1000.0
>>> from fractions import Fraction as F
>>> _sum2([F(2, 3), F(7, 5), F(1, 4), F(5, 6)])
Fraction(63, 20)
>>> from decimal import Decimal as D
>>> data = [D("0.1375"), D("0.2108"), D("0.3061"), D("0.0419")]
>>> _sum2(data)
Decimal('0.6963')
Now the second issue:
It is maybe more a matter of taste and concerns the effects of passing a
Counter() object to various functions in the module.
I know this is undocumented and it's probably the user's fault if he tries
that, but still:
>>> from collections import Counter
>>> c=Counter((1,1,1,1,2,2,2,2,2,3,3,3,3))
>>> c
Counter({1: 4, 2: 5, 3: 4})
>>> mode(c)
2
Cool, mode knows how to work with Counters (interpreting them as frequency
tables)
>>> median(c)
2
Looks good
>>> mean(c)
2.0
Very well
But the truth is that only mode really works as you may think and we were
just lucky with the other two:
>>> c=Counter((1,1,2))
>>> mean(c)
1.5
oops
>>> median(c)
1.5
hmm
>From a quick look at the code you can see that mode actually converts your
input to a Counter behind the scenes anyway, so it has no problem.
mean and median, on the other hand, are simply iterating over their input,
so if that input happens to be a mapping, they'll use just the keys.
I think there are two simple ways to avoid this pitfall:
1) add an explicit warning to the docs explaining this behavior or
2) make mean and median do the same magic with Counters as mode does, i.e.
make them check for Counter as the input type and deal with it as if it
were a frequency table. I'd favor this behavior because it looks like
little extra code, but may be very useful in many situations. I'm not quite
sure whether maybe even all mappings should be treated that way?
Ok, that's it for now I guess. Opinions anyone?
Best,
Wolfgang
On Thu, Jan 30, 2014 at 11:03:38AM -0800, Larry Hastings wrote:
> On Mon, Jan 27, 2014 at 9:41 AM, Wolfgang
> <wolfgang.maier(a)biologie.uni-freiburg.de
> <mailto:wolfgang.maier@biologie.uni-freiburg.de>> wrote:
> >I think a much cleaner (and probably faster) implementation would be
> >to gather first all the types in the input sequence, then decide what
> >to return in an input order independent way.
>
> I'm willing to consider this a "bug fix". And since it's a new function
> in 3.4, we don't have an installed base. So I'm willing to consider
> fixing this for 3.4.
I'm hesitant to require two passes over the data in _sum. Some
higher-order statistics like variance are currently implemented using
two passes, but ultimately I've like to support single-pass algorithms
that can operate on large but finite iterators.
But I will consider it as an option.
I'm also hesitant to make the promise that _sum will be
order-independent. Addition in Python isn't:
py> class A(int):
... def __add__(self, other):
... return type(self)(super().__add__(other))
... def __repr__(self):
... return "%s(%d)" % (type(self).__name__, self)
...
py> class B(A):
... pass
...
py> A(1) + B(1)
A(2)
py> B(1) + A(1)
B(2)
[...]
> Yes, exactly. If the support for Counter is half-baked, let's prevent
> it from being used now.
I strongly disagree with this. Counters are currently treated the same
as any other iterable, and built-in sum and math.fsum don't treat them
specially:
py> from collections import Counter
py> c = Counter([1, 1, 1, 1, 1, 2])
py> c
Counter({1: 5, 2: 1})
py> sum(c)
3
py> from math import fsum
py> fsum(c)
3.0
If you're worried about people coming to rely on this, and thus running
into trouble in the future if Counters get treated specially for (say)
weighted data, then I'd accept a warning in the docs, or even a runtime
warning. But not an exception.
--
Steven