Mailman 3 January 2014 - Python-ideas

@classproperty, @abc.abstractclasspropery, etc.
by K. Richard Pixley 16 Dec '20

16 Dec '20

There's a whole matrix of these and I'm wondering why the matrix is currently sparse rather than implementing them all. Or rather, why we can't stack them as: class foo(object): @classmethod @property def bar(cls, ...): ... Essentially the permutation are, I think: {'unadorned'|abc.abstract}{'normal'|static|class}{method|property|non-callable attribute}. concreteness implicit first arg type name comments {unadorned} {unadorned} method def foo(): exists now {unadorned} {unadorned} property @property exists now {unadorned} {unadorned} non-callable attribute x = 2 exists now {unadorned} static method @staticmethod exists now {unadorned} static property @staticproperty proposing {unadorned} static non-callable attribute {degenerate case - variables don't have arguments} unnecessary {unadorned} class method @classmethod exists now {unadorned} class property @classproperty or @classmethod;@property proposing {unadorned} class non-callable attribute {degenerate case - variables don't have arguments} unnecessary abc.abstract {unadorned} method @abc.abstractmethod exists now abc.abstract {unadorned} property @abc.abstractproperty exists now abc.abstract {unadorned} non-callable attribute @abc.abstractattribute or @abc.abstract;@attribute proposing abc.abstract static method @abc.abstractstaticmethod exists now abc.abstract static property @abc.staticproperty proposing abc.abstract static non-callable attribute {degenerate case - variables don't have arguments} unnecessary abc.abstract class method @abc.abstractclassmethod exists now abc.abstract class property @abc.abstractclassproperty proposing abc.abstract class non-callable attribute {degenerate case - variables don't have arguments} unnecessary I think the meanings of the new ones are pretty straightforward, but in case they are not... @staticproperty - like @property only without an implicit first argument. Allows the property to be called directly from the class without requiring a throw-away instance. @classproperty - like @property, only the implicit first argument to the method is the class. Allows the property to be called directly from the class without requiring a throw-away instance. @abc.abstractattribute - a simple, non-callable variable that must be overridden in subclasses @abc.abstractstaticproperty - like @abc.abstractproperty only for @staticproperty @abc.abstractclassproperty - like @abc.abstractproperty only for @classproperty --rich

10 15

Specify number of items to allocate for array.array() constructor
by Sven Rahmann 22 Feb '20

22 Feb '20

At the moment, the array module of the standard library allows to create arrays of different numeric types and to initialize them from an iterable (eg, another array). What's missing is the possiblity to specify the final size of the array (number of items), especially for large arrays. I'm thinking of suffix arrays (a text indexing data structure) for large texts, eg the human genome and its reverse complement (about 6 billion characters from the alphabet ACGT). The suffix array is a long int array of the same size (8 bytes per number, so it occupies about 48 GB memory). At the moment I am extending an array in chunks of several million items at a time at a time, which is slow and not elegant. The function below also initializes each item in the array to a given value (0 by default). Is there a reason why there the array.array constructor does not allow to simply specify the number of items that should be allocated? (I do not really care about the contents.) Would this be a worthwhile addition to / modification of the array module? My suggestions is to modify array generation in such a way that you could pass an iterator (as now) as second argument, but if you pass a single integer value, it should be treated as the number of items to allocate. Here is my current workaround (which is slow): def filled_array(typecode, n, value=0, bsize=(1<<22)): """returns a new array with given typecode (eg, "l" for long int, as in the array module) with n entries, initialized to the given value (default 0) """ a = array.array(typecode, [value]*bsize) x = array.array(typecode) r = n while r >= bsize: x.extend(a) r -= bsize x.extend([value]*r) return x

14 20

Implicit string literal concatenation considered harmful?
by Guido van Rossum 14 Mar '18

14 Mar '18

I just spent a few minutes staring at a bug caused by a missing comma -- I got a mysterious argument count error because instead of foo('a', 'b') I had written foo('a' 'b'). This is a fairly common mistake, and IIRC at Google we even had a lint rule against this (there was also a Python dialect used for some specific purpose where this was explicitly forbidden). Now, with modern compiler technology, we can (and in fact do) evaluate compile-time string literal concatenation with the '+' operator, so there's really no reason to support 'a' 'b' any more. (The reason was always rather flimsy; I copied it from C but the reason why it's needed there doesn't really apply to Python, as it is mostly useful inside macros.) Would it be reasonable to start deprecating this and eventually remove it from the language? -- --Guido van Rossum (python.org/~guido)

51 165

Re: [Python-ideas] Update the required C compiler for Windows to a supported version.
by Steve Dower 02 Mar '14

02 Mar '14

Stephen J. Turnbull wrote: > Vernon D. Cole writes: > >> I cannot compile a Python extension module with any Microsoft compiler >> I can obtain. > > Your pain is understood, but it's not simple to address it. FWIW, I'm working on making the compiler easily obtainable. The VS 2008 link that was posted is unofficial, and could theoretically disappear at any time (I'm not in control of that), but the Windows SDK for Windows 7 and .NET 3.5 SP1 (http://www.microsoft.com/en-us/download/details.aspx?id=3138) should be around for as long as Windows 7 is supported. The correct compiler (VC9) is included in this SDK, but unfortunately does not install the vcvarsall.bat file that distutils expects. (Though it's pretty simple to add one that will switch on %1 and call the correct vcvars(86|64|...).bat.) The SDK needed for Python 3.3 and 3.4 (VC10) is even worse - there are many files missing. I'm hoping we'll be able to set up some sort of downloadable package/tool that will fix this. While we'd obviously love to move CPython onto our latest compilers, it's simply not possible (for good reason). Python 3.4 is presumably locked to VC10, but hopefully 3.5 will be able to use whichever version is current when that decision is made. > The basic problem is that the ABI changes. Therefore it's going to require > a complete new set of *all* C extensions for Windows, and the duplication > of download links for all those extensions from quite a few different vendors > is likely to confuse a lot of users. Specifically, the CRT changes. The CRT is an interesting mess of data structures that are exposed in header files, which means while you can have multiple CRTs loaded, they cannot touch each other's data structures at all or things will go bad/crash, and there's no nice way to set it up to avoid this (my colleague who currently owns MSVCRT suggested a not-very-nice way to do it, but I don't think it's going to be reliable enough). Python's stable ABI helps, but does not solve this problem. The file APIs are the worst culprits. The layout of FILE* objects can and do change between CRT versions, and file descriptors are simply indices into an array of these objects that is exposed through macros rather than function calls. As a result, you cannot mix either FILE pointers or file descriptors between CRTs. The only safe option is to build with the matching CRT, and for MSVCRT, this means with the matching compiler. It's unfortunate, and the responsible teams are well aware of the limitation, but it's history at this point, so we have no choice but to work with it. Cheers, Steve

4 3

while conditional in list comprehension ??
by Wolfgang Maier 21 Feb '14

21 Feb '14

Dear all, I guess this is so obvious that someone must have suggested it before: in list comprehensions you can currently exclude items based on the if conditional, e.g.: [n for n in range(1,1000) if n % 4 == 0] Why not extend this filtering by allowing a while statement in addition to if, as in: [n for n in range(1,1000) while n < 400] Trivial effect, I agree, in this example since you could achieve the same by using range(1,400), but I hope you get the point. This intuitively understandable extension would provide a big speed-up for sorted lists where processing all the input is unnecessary. Consider this: some_names=["Adam", "Andrew", "Arthur", "Bob", "Caroline","Lancelot"] # a sorted list of names [n for n in some_names if n.startswith("A")] # certainly gives a list of all names starting with A, but . [n for n in some_names while n.startswith("A")] # would have saved two comparisons Best, Wolfgang

19 70

Create Python 2.8 as a transition step to Python 3.x
by Neil Schemenauer 21 Feb '14

21 Feb '14

The transition to Python 3 is happening but there is still a massive amount of code that needs to be ported. One of the most disruptive changes in Python 3 is the strict separation of bytes from unicode strings. Most of the other incompatible changes can be handled by 2to3. Here is a far out idea to make transition smoother. Release version 2.8 of Python with nearly all Python 3.x incompatible changes except for the bytes/unicode changes. This could include: - print as function - default string literal as unicode - return view objects for dict.keys(), etc - rename modules in standard library - rename long to int - rename .next() to __next__() - accept only new 'raise' syntax - remove backticks for repr - rename unicode to str - removal of 'apply', 'buffer', 'callable', 'execfile' - exec as function - rename os.getcwdu() to os.getcwd() - remove dict.has_key - move intern to sys.intern() - rename xrange to range - remove xreadlines New features of Python 3.x could be backported if easy since they could be useful to entice developers to move from 2.7 to 2.8. Problems with this idea: - it would be a huge amount of work. There are thousands of commits to Python 3.x since it was branched. Most of them are not related to the above features but back porting them would still be a huge effort. I tried backport 'print' as a function just to get an idea of the work. - if people install this new version of Python as the default, old scripts and programs will break. I believe this breakage was the movation for making Python 3 an all-at-once jump. I'm not sure how to handle this, maybe this version could be used only by developers during their Python 3 porting efforts. Alternatively, only install it as 'python2.8', never 'python' or 'python2'. An alternative approach to producing Python 2.8 would be to start with the Python 3.x latest branch. Modify bytesobject and unicodeobject to have as close to Python 2 behavior as practical. A-journey-of-a-thousand-miles-begins-ly y'rs Neil

20 38

Iterative development
by anatoly techtonik 08 Feb '14

08 Feb '14

Yet another idea that some of you will find strange. It is a parallel Python development process. It doesn't affect or replace current practice, so nobody gets hurt. It is also about open process, where openness means transparency (eliminate hidden communication), inclusiveness (eliminate exclusive rights and privileges) and accessibility (eliminate awkward practices and poor user experience). The idea is to split development of Python into two weeks cycle. Every two weeks is "iteration". Iteration consists of phases: 1. Planning (one, two days) 2. Execution 3. Testing 4. Demo 5. Retrospective Some of you, who familiar with concept of "sprint" and know something about "agile" buzzwords will find this idea familiar. In fact, this is borrowed from some of the best practices of working with remote teams who use this methodology. (Planning) So, during these the first, planning phase, people, who'd like to participate - choose what should be implemented in this iteration. For that there should be a list of things to be done. This list is called "backlog". People collaboratively estimate complexity and sort the things by priority. (Execution) You take a thing from backlog, mark that you're working on it, so that other people who are also interested can find you. If you need help, you split the thing into subtasks and make these tasks open for people to find and jump in. (Testing) This is a phase when work done is compared with actual thing description. Sometimes this leads to new insights, new ideas, new bugs and more work to be done in subsequent iteration. Sometimes it appears that during execution the thing completely diverged from what was originally planned. (Demo) Demonstration of the things done. Record progress, give credits and close mark things in backlog as done. Demo is made for broader community that just for a list of participants. (Retrospective) This is an important phase that is dedicated to gathering and processing feedback to improve the iteration loop. Every person reports what he/she liked and disliked, what was the % of overall fun. Then some things and ideas are being born from the feedback - what can be improved - being it tools, interaction with people or some other things that get in the way. -- anatoly t.

16 49

Re: [Python-ideas] str.rreplace
by Nick Coghlan 04 Feb '14

04 Feb '14

On 25 Jan 2014 04:29, "Andrew Barnert" <abarnert(a)yahoo.com> wrote: > > On Jan 24, 2014, at 10:20, Antoine Pitrou <solipsis(a)pitrou.net> wrote: > > > On Fri, 24 Jan 2014 20:13:26 +0200 > > Serhiy Storchaka <storchaka(a)gmail.com> > > wrote: > >> 24.01.14 19:36, Antoine Pitrou написав(ла): > >>> On Fri, 24 Jan 2014 19:30:00 +0200 > >>> Serhiy Storchaka <storchaka(a)gmail.com> > >>> wrote: > >>>> 24.01.14 18:56, Antoine Pitrou написав(ла): > >>>>> On Fri, 24 Jan 2014 08:47:14 -0800 (PST) > >>>>> Ram Rachum <ram.rachum(a)gmail.com> wrote: > >>>>>> I propose implementing str.rreplace. (It'll be to str.replace what > >>>>>> str.rsplit is to str.split.) > >>>>> > >>>>> I suppose it only differs when the count parameter is supplied? > >>>>> > >>>>> I don't think it can hurt, except for the funny looks of its name. > >>>>> In any case, if str.rreplace is added then so should bytes.rreplace and > >>>>> bytearray.rreplace. > >>>> > >>>> bytearray.rremove, tuple.rindex, list.rindex, list.rremove. > >>> > >>> Not sure what those have to do with rreplace(). Overgeneralization > >>> doesn't help. > >> > >> If open a door for rreplace, it would be not easy to close it for rindex > >> and rremove. > > > > Perhaps you underestimate our collective door closing skills ;) > > While we're speculatively overgeneralizing, couldn't all of the index/find/remove/replace/etc. methods take a negative n to count from the end, making r variants unnecessary? Strings already provide rfind and rindex (they're just not part of the general sequence API). Since strings are immutable, there's also no call for an "rremove". rreplace (pronounced as 'ar-replace", like "ar-split" et al) is more obvious than a negative count, and seems like an almost exact parallel to rsplit. On the other hand, I don't recall ever lamenting its absence. Call me +0 on the idea. Cheers, Nick. > _______________________________________________ > Python-ideas mailing list > Python-ideas(a)python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/

10 18

statistics module in Python3.4
by Wolfgang 02 Feb '14

02 Feb '14

Dear all, I am still testing the new statistics module and I found two cases were the behavior of the module seems suboptimal to me. My most important concern is the module's internal _sum function and its implications, the other one about passing Counter objects to module functions. As for the first subject: Specifically, I am not happy with the way the function handles different types. Currently _coerce_types gets called for every element in the function's input sequence and type conversion follows quite complicated rules, and - what is worst - make the outcome of _sum() and thereby mean() dependent on the order of items in the input sequence, e.g.: >>> mean((1,Fraction(2,3),1.0,Decimal(2.3),2.0, Decimal(5))) 1.9944444444444445 >>> mean((1,Fraction(2,3),Decimal(2.3),1.0,2.0, Decimal(5))) Traceback (most recent call last): File "<pyshell#7>", line 1, in <module> mean((1,Fraction(2,3),Decimal(2.3),1.0,2.0, Decimal(5))) File "C:\Python33\statistics.py", line 369, in mean return _sum(data)/n File "C:\Python33\statistics.py", line 157, in _sum T = _coerce_types(T, type(x)) File "C:\Python33\statistics.py", line 327, in _coerce_types raise TypeError('cannot coerce types %r and %r' % (T1, T2)) TypeError: cannot coerce types <class 'fractions.Fraction'> and <class 'decimal.Decimal'> (this is because when _sum iterates over the input type Fraction wins over int, then float wins over Fraction and over everything else that follows in the first example, but in the second case Fraction wins over int, but then Fraction vs Decimal is undefined and throws an error). Confusing, isn't it? So here's the code of the _sum function: def _sum(data, start=0): """_sum(data [, start]) -> value Return a high-precision sum of the given numeric data. If optional argument ``start`` is given, it is added to the total. If ``data`` is empty, ``start`` (defaulting to 0) is returned. Examples -------- >>> _sum([3, 2.25, 4.5, -0.5, 1.0], 0.75) 11.0 Some sources of round-off error will be avoided: >>> _sum([1e50, 1, -1e50] * 1000) # Built-in sum returns zero. 1000.0 Fractions and Decimals are also supported: >>> from fractions import Fraction as F >>> _sum([F(2, 3), F(7, 5), F(1, 4), F(5, 6)]) Fraction(63, 20) >>> from decimal import Decimal as D >>> data = [D("0.1375"), D("0.2108"), D("0.3061"), D("0.0419")] >>> _sum(data) Decimal('0.6963') """ n, d = _exact_ratio(start) T = type(start) partials = {d: n} # map {denominator: sum of numerators} # Micro-optimizations. coerce_types = _coerce_types exact_ratio = _exact_ratio partials_get = partials.get # Add numerators for each denominator, and track the "current" type. for x in data: T = _coerce_types(T, type(x)) n, d = exact_ratio(x) partials[d] = partials_get(d, 0) + n if None in partials: assert issubclass(T, (float, Decimal)) assert not math.isfinite(partials[None]) return T(partials[None]) total = Fraction() for d, n in sorted(partials.items()): total += Fraction(n, d) if issubclass(T, int): assert total.denominator == 1 return T(total.numerator) if issubclass(T, Decimal): return T(total.numerator)/total.denominator return T(total) Internally, the function uses exact ratios for its calculations (which I think is very nice) and only goes through all the pain of coercing types to return T(total.numerator)/total.denominator where T is the final type resulting from the chain of conversions. I think a much cleaner (and probably faster) implementation would be to gather first all the types in the input sequence, then decide what to return in an input order independent way. My tentative implementation: def _sum2(data, start=None): if start is not None: t = set((type(start),)) n, d = _exact_ratio(start) else: t = set() n = 0 d = 1 partials = {d: n} # map {denominator: sum of numerators} # Micro-optimizations. exact_ratio = _exact_ratio partials_get = partials.get # Add numerators for each denominator, and build up a set of all types. for x in data: t.add(type(x)) n, d = exact_ratio(x) partials[d] = partials_get(d, 0) + n T = _coerce_types(t) # decide which type to use based on set of all types if None in partials: assert issubclass(T, (float, Decimal)) assert not math.isfinite(partials[None]) return T(partials[None]) total = Fraction() for d, n in sorted(partials.items()): total += Fraction(n, d) if issubclass(T, int): assert total.denominator == 1 return T(total.numerator) if issubclass(T, Decimal): return T(total.numerator)/total.denominator return T(total) this leaves the re-implementation of _coerce_types. Personally, I'd prefer something as simple as possible, maybe even: def _coerce_types (types): if len(types) == 1: return next(iter(types)) return float , but that's just a suggestion. In this case then: >>> _sum2((1,Fraction(2,3),1.0,Decimal(2.3),2.0, Decimal(5)))/6 1.9944444444444445 >>> _sum2((1,Fraction(2,3),Decimal(2.3),1.0,2.0, Decimal(5)))/6 1.9944444444444445 lets check the examples from the _sum docstring just to be sure: >>> _sum2([3, 2.25, 4.5, -0.5, 1.0], 0.75) 11.0 >>> _sum2([1e50, 1, -1e50] * 1000) # Built-in sum returns zero. 1000.0 >>> from fractions import Fraction as F >>> _sum2([F(2, 3), F(7, 5), F(1, 4), F(5, 6)]) Fraction(63, 20) >>> from decimal import Decimal as D >>> data = [D("0.1375"), D("0.2108"), D("0.3061"), D("0.0419")] >>> _sum2(data) Decimal('0.6963') Now the second issue: It is maybe more a matter of taste and concerns the effects of passing a Counter() object to various functions in the module. I know this is undocumented and it's probably the user's fault if he tries that, but still: >>> from collections import Counter >>> c=Counter((1,1,1,1,2,2,2,2,2,3,3,3,3)) >>> c Counter({1: 4, 2: 5, 3: 4}) >>> mode(c) 2 Cool, mode knows how to work with Counters (interpreting them as frequency tables) >>> median(c) 2 Looks good >>> mean(c) 2.0 Very well But the truth is that only mode really works as you may think and we were just lucky with the other two: >>> c=Counter((1,1,2)) >>> mean(c) 1.5 oops >>> median(c) 1.5 hmm >From a quick look at the code you can see that mode actually converts your input to a Counter behind the scenes anyway, so it has no problem. mean and median, on the other hand, are simply iterating over their input, so if that input happens to be a mapping, they'll use just the keys. I think there are two simple ways to avoid this pitfall: 1) add an explicit warning to the docs explaining this behavior or 2) make mean and median do the same magic with Counters as mode does, i.e. make them check for Counter as the input type and deal with it as if it were a frequency table. I'd favor this behavior because it looks like little extra code, but may be very useful in many situations. I'm not quite sure whether maybe even all mappings should be treated that way? Ok, that's it for now I guess. Opinions anyone? Best, Wolfgang

11 19

Re: [Python-ideas] statistics module in Python3.4
by Steven D'Aprano 01 Feb '14

01 Feb '14

On Thu, Jan 30, 2014 at 11:03:38AM -0800, Larry Hastings wrote: > On Mon, Jan 27, 2014 at 9:41 AM, Wolfgang > <wolfgang.maier(a)biologie.uni-freiburg.de > <mailto:wolfgang.maier@biologie.uni-freiburg.de>> wrote: > >I think a much cleaner (and probably faster) implementation would be > >to gather first all the types in the input sequence, then decide what > >to return in an input order independent way. > > I'm willing to consider this a "bug fix". And since it's a new function > in 3.4, we don't have an installed base. So I'm willing to consider > fixing this for 3.4. I'm hesitant to require two passes over the data in _sum. Some higher-order statistics like variance are currently implemented using two passes, but ultimately I've like to support single-pass algorithms that can operate on large but finite iterators. But I will consider it as an option. I'm also hesitant to make the promise that _sum will be order-independent. Addition in Python isn't: py> class A(int): ... def __add__(self, other): ... return type(self)(super().__add__(other)) ... def __repr__(self): ... return "%s(%d)" % (type(self).__name__, self) ... py> class B(A): ... pass ... py> A(1) + B(1) A(2) py> B(1) + A(1) B(2) [...] > Yes, exactly. If the support for Counter is half-baked, let's prevent > it from being used now. I strongly disagree with this. Counters are currently treated the same as any other iterable, and built-in sum and math.fsum don't treat them specially: py> from collections import Counter py> c = Counter([1, 1, 1, 1, 1, 2]) py> c Counter({1: 5, 2: 1}) py> sum(c) 3 py> from math import fsum py> fsum(c) 3.0 If you're worried about people coming to rely on this, and thus running into trouble in the future if Counters get treated specially for (say) weighted data, then I'd accept a warning in the docs, or even a runtime warning. But not an exception. -- Steven

7 10