At the moment, the array module of the standard library allows to
create arrays of different numeric types and to initialize them from
an iterable (eg, another array).
What's missing is the possiblity to specify the final size of the
array (number of items), especially for large arrays.
I'm thinking of suffix arrays (a text indexing data structure) for
large texts, eg the human genome and its reverse complement (about 6
billion characters from the alphabet ACGT).
The suffix array is a long int array of the same size (8 bytes per
number, so it occupies about 48 GB memory).
At the moment I am extending an array in chunks of several million
items at a time at a time, which is slow and not elegant.
The function below also initializes each item in the array to a given
value (0 by default).
Is there a reason why there the array.array constructor does not allow
to simply specify the number of items that should be allocated? (I do
not really care about the contents.)
Would this be a worthwhile addition to / modification of the array module?
My suggestions is to modify array generation in such a way that you
could pass an iterator (as now) as second argument, but if you pass a
single integer value, it should be treated as the number of items to
Here is my current workaround (which is slow):
def filled_array(typecode, n, value=0, bsize=(1<<22)):
"""returns a new array with given typecode
(eg, "l" for long int, as in the array module)
with n entries, initialized to the given value (default 0)
a = array.array(typecode, [value]*bsize)
x = array.array(typecode)
r = n
while r >= bsize:
r -= bsize
I just spent a few minutes staring at a bug caused by a missing comma
-- I got a mysterious argument count error because instead of foo('a',
'b') I had written foo('a' 'b').
This is a fairly common mistake, and IIRC at Google we even had a lint
rule against this (there was also a Python dialect used for some
specific purpose where this was explicitly forbidden).
Now, with modern compiler technology, we can (and in fact do) evaluate
compile-time string literal concatenation with the '+' operator, so
there's really no reason to support 'a' 'b' any more. (The reason was
always rather flimsy; I copied it from C but the reason why it's
needed there doesn't really apply to Python, as it is mostly useful
Would it be reasonable to start deprecating this and eventually remove
it from the language?
--Guido van Rossum (python.org/~guido)
*the pep 397 says that any python script is able to choose the language
version which will run it, between all the versions installed on the
computer, using on windows a launcher in the "C:\windows" folder.*
can the idle version be chosen like this too, or can the idle "run" command
(except it is already like this and I have a problem)
thank you, and have a nice day/evening!
(and sorry if this sounds irritating)
Hi all, maybe the best list would be python-dev but I don't dare
waking the sleeping lions there :)
So my question is this: is there a coherent mobile strategy among core
dev people? I mean you guys release python for linux/macos/windows and
the question is if there are any plans to do the same for a mobile
platform. It doesn't have to be android or ios just anything that the
core dev team chooses and sticks with.
I've been developing python apps for smartphones (mostly hobby
projects though) using sl4a but that seems like is dead. Now people
suggest me using kivy which seems to be alive but who knows how long.
There are some other projects qpython, etc, which are small and
equally not so reliable at least looking from the outside. Is kivy now
an officially blessed distribution? Since google was so integral to
both python (through employing Guido) and android I'd think it would
make sense for google to have an official python environment for
android in cooperation with the python dev team.
Does the PSF has an opinion on this? It would be great if there would
be something for mobile phones that we could rely on not going away
just as with linux/macos/windows.
Or there are some issues which preclude this from the start?
Psss, psss, put it down! - http://www.cafepress.com/putitdown
For context please see http://bugs.python.org/issue22937 and
I have two questions I'm hoping to get answered through this thread:
- does the change in question need a PEP? Antoine seemed to think it
didn't, as long as it was opt-in (via the unittest CLI etc)
- Is my implementation approach sound (for traceback, unittest I
think I have covered :))?
Implementation wise, I think its useful to work in the current
traceback module layout - that is to alter extract_stack to
(optionally) include rendered data about locals and then look for that
and format it in format_list.
I'm sure there is code out there that depends on the quadruple nature
of extract_stack though, so I think we need to preserve that. Three
strategies occured to me; one is to have parallel functions, one
quadruple, one quintuple. A second one is to have the return value of
extract_stack be a quintuple when a new keyword parameter
include_locals is included. Lastly, and this is my preferred one, if
we return a tuple subclass with an attribute containing a dict with
the rendered data on the locals; this can be present but None, or even
just absent when extract_stack was not asked to include locals.
The last option is my preferred one because the other two both imply
having a data structure which is likely to break existing code - and
while you'd have to opt into having them it seems likely to require a
systematic set of upgrades vs having an optional attribute that can be
So - thoughts?
Robert Collins <rbtcollins(a)hp.com>
HP Converged Cloud
I want to have a method of `pathlib.Path` that would send a file to the
recycle bin. (i.e. soft delete.)
What do you think about adding this?
I see there's a PyPI package `Send2Trash` that does this, but it would be
nice if this was in the standard library.
On Wed, Dec 31, 2014 at 6:00 AM, Antoine Pitrou <solipsis(a)pitrou.net> wrote:
>> Why would you use it? The official documentation mentions
>> implementing a [descriptor]. See [this paste] for a simple
>> example of this sort of thing.
> How about something like this (untested):
A cursory glance suggests that might work for my use-case, but I would
want to examine it more thoroughly for race conditions and other
hazards. In particular, I'd be concerned about the potential
performance cost of removing things immediately on finalization (the
stdlib implementation uses a semi-lazy approach here, which I still
don't entirely understand). I'm also a little nervous about doing
anything remotely interesting from a finalizer.
With a little fine-tuning, however, this could be pulled into a
separate WeakKeyIDDictionary implementation, which would be more
modular and easier to work with.
I mainly did up this patch to see how hard it would be, and now it's
turned out to be fairly simple, I'm curious as to whether it would
actually be useful to people.
At the point where a global/builtin name lookup is about to raise
NameError, first try calling a Python function, along the same lines
as __getattr__. If that function hasn't been defined, raise NameError
as normal; but if it has, let it either raise NameError or return some
object, which is then used as if the name had been bound to it.
Patch is here:
The idea is to allow convenient interactive use; auto-importing
modules is easy, and importing names from modules ("exp" --> math.exp)
can be done easily enough too, given a list of modules to try.
It's probably not a good idea to use this in application code, and I
definitely wouldn't encourage it in library code, but it has its uses
Let's talk about weakref.WeakKeyDictionary.
First, what is it? It's a dictionary whose keys are referenced
weakly. That is, the dictionary takes weak references to its keys.
If a key is garbage collected, it magically vanishes from the
dictionary. This saves programmers much of the trouble of manually
culling dead weak references from data structures.
Why would you use it? The official documentation mentions
implementing a [descriptor]. See [this paste] for a simple
example of this sort of thing. Turning the WeakKeyDictionary into a
regular dictionary would be troublesome. Since the descriptor object
will (probably) never die, if it used a normal dictionary, it would
keep all/most/some/more-than-one instances of Bar() alive forever.
While this simple example would be better implemented as a property,
or even a bare attribute, it is nice if you have many separate Bar
classes and complicated per-property processing.
Why is it broken? To go back to our simple example, imagine someone
subclasses Bar (let's call the subclass Baz) and implements __eq__()
and/or __hash__(). Instances of Baz may not work at all (if __hash__
is None) or they may "work" but behave nonsensically (all values-equal
instances of Baz will have the same .foo attribute - imagine trying to
track that bug down!). While we could forbid such a subclass in our
API docs, this runs afoul of the open/closed principle.
What can we do about it? It's possible someone is relying on this
behavior, so we probably should not change the existing class.
Instead, we can offer a new class called, say, WeakKeyIDDictionary.
Said class would use object identity instead of object
equality/hashing to uniquify keys. It would also reference keys
weakly, just like WeakKeyDictionary does now.
Implementing such a class is somewhat tricky, since weak references
are difficult to reason about correctly. I've written up a
[simplistic prototype], but I'm concerned about both performance
and correctness. Specifically, I know that I need to call
_remove_dead_refs() periodically, but I don't see where I can do that
without compromising performance (e.g. making key lookup O(n) is a Bad
Idea). I also would note it looks nothing like the stdlib
implementations of WeakKey/ValueDictionary, which finagle raw weakrefs
directly. These problems are related, of course, since the stdlib
uses finalizer callbacks to cull dead references as they die.
Frankly, the stdlib's code looks incredibly hairy to me and I doubt I
could correctly re-implement it. Finally, I've no idea if it's
subject to race conditions or other issues (it is a completely
untested example I threw together in 5 minutes).
Because this class is simultaneously useful and hard to write
correctly, it would be a nice addition to the standard library
(probably not in the form I've written it!), or failing that, some
other library. I've Googled for such a library and come up empty.
Should I submit a bug, a PEP, or is this whole idea just stupid?
On Tue, Dec 30, 2014 at 9:33 PM, Nils Bruin <bruin.nils(a)gmail.com> wrote:
> We ran into this for sage (www.sagemath.org) and we implemented
> "MonoDict" with basically the semantics you describe. It is
> implemented as a cython extension class. See:
> I can confirm your observation that it's hard to get right, but I think our
> implementation by now is both correct and fast. The idiosyncratic
> name reveals its origin in "TripleDict" defined in the same file, which
> was developed first.
> Incidentally, we also have a WeakValueDict implementation that is faster
> and a little safer than at least the Python2 version:
> These implementations receive quite a bit of exercise in sage,
> so I would recommend that you look at them before developing
> something from scratch yourself.
This is quite interesting. Is there any interest in maintaining that
code in a separate library? Sage is a rather large dependency which,
for better or for worse, has little to do with my own work. I'm
reluctant to add something that large for a single utility class. If
there is no such interest, I might consider forking.
In my code, I've worked around it by having the owner class and the
descriptor collude (basically, the owner's instances have a private
attribute which the descriptor uses), but it's far from the cleanest