Mailman 3 April 2014 - Python-ideas

Re: [Python-ideas] 'from os.path import FILE, DIR' or internal structure of filenames
by Cameron Simpson April 20, 2014

April 20, 2014

On 20Apr2014 14:41, anatoly techtonik <techtonik(a)gmail.com> wrote: >On Sun, Sep 29, 2013 at 2:26 AM, Cameron Simpson <cs(a)zip.com.au> wrote: >> -1 for any names commencing with __ (or even _). > >So you like it DIR, PATH As per-module/file predefined global "constants"? Possibly. Still -0, but open to argument on naming issues. >> -1 for new globals. > >So you want this: > from os.path import DIR, PATH I'm less good on these. That would import … [View More]some staticish values (or are they to be functions?) which can't be right for the importing module. I suspect I'm misunderstanding your intent here. >> -1 because I can imagine wanting different nuances on the definitions >> above; in particular for DIR I can well imagine wanting bare >> dirname(abspath(FILE)) - semanticly different to your construction. >> There's lots of scope for bikeshedding here. > >Assuming that FILE is always absolute, how that: > dirname(abspath(FILE)) >is different from that: > abspath(dirname(FILE)) >? Depends what "absolute" means. When the source file was obtained by traversing a symlink, is FILE naive (ignored the symlink, just has to start at "/") or resolved (absolute path to FILE which does not traverse a symlink)? I can imagine use cases for both, and therefore the bikeshedding. >> -1 because this is trivial trival code. > >Code is trivial, but the problem is not. In particular, this code >doesn't work after os.chdir I'm fairly sure I acknowledged this. [...] Aha: On Mon, 30 Sep 2013 08:17:46 +1000 I wrote: Of course, chdir and passing paths to programs-which-are-not-my-children present scope for wanting abspath, but in the very common simple case: unnecessary and therefore undesirable. And I'm aware that modules-inside-zip-files don't work with this; let us ignore that; they won't work with abspath either:-) And what should happen for code from zipfiles? Something must happen, even if it is just NameError or the like, for "not supported". >> -1 because you can do all this with relative paths anyway, no need for abspath > >Non reliable because of os.chdir As mentioned above, yes. But very often irrelevant. (Hmm, maybe not in the general case of library code imported before a chdir and used afterwards.) Chdir's global nature (witin the process) is one reason casual use of it is discouraged; an amazing amount of stuff can be broken by a chdir. >> -1 because I can imagine being unable to compute abspath in certain >> circumstances ( certainly on older UNIX systems you could be >> inside a directory without sufficient privileges to walk back >> up the tree for getcwd and its equivalents ) > >Fear and uncertainty. Can you confirm that Python is able to launch script, but >abspath fails in these circumstances? There might be a bc break in 3.5 I'd need a suitable system today. On Linux, getcwd() has been a system call for a long time (I remember being slapped down on that very issue once). Let's try this Mac: $ mkdir -p fooo/baaa $ cd fooo/baaa $ chmod 0 .. $ /bin/pwd pwd: .: Permission denied $ So, yeah, not always reliable. This means that anything that presupplies names for [PHP-like __FILE__ and __DIR__] needs to compute it at need, not always. And the downside is that if these idoms, and importantly reliance on these idioms, becomes common then suddenly every Python program becomes a ticking time bomb for being run in an unusual-but-valid environment. Any kind of security conscious environment may require programs to run in constrained contexts. It is best to minimise the presumptions that a program makes in the name of convenience. More than once I've met devs whose code ran only in their (very very open/unsecured) dev environment and failed even in our staging environments. So I while it is rare, I don't think "/bin/pwd doesn't work" should be entirely ignored. Cheers, Cameron Simpson <cs(a)zip.com.au> Every particle continues in its state of rest or uniform motion in a straight line except insofar as it doesn't. - Sir Arther Eddington [View Less]

1 1

Re: [Python-ideas] Add a datatype collections.ListView
by David Mertz April 20, 2014

April 20, 2014

On Apr 20, 2014 3:02 AM, "Tal Einat" <taleinat(a)gmail.com> wrote: > > FYI your handling of negative indexes is buggy. > It's horrible! I'm not quite sure it can even be called "handling." Take my example only as a prototype of the concept that lets me run the benchmark. That said, doing the negative indices right should just be some arithmetic on self.start and self.stop, not anything too complicated or differently performing. > > On Thu, Apr 17, 2014 at 6:49 PM, David … [View More]Mertz <mertz(a)gnosis.cx> wrote: >> >> Following Brandon Rhodes very nice PyCon 2014 talk on datatypes, I was struck by increased guilt over a program I am developing. I spoke with him in the hallway about a good approach to improving performance while keeping code nice looking. >> >> The basic idea in my program is that I have a large(ish) list of tokens that I generate in parsing a special language, and I frequently take slices from this list--and often enough, slices out of those slices. In some cases, the most straightforward approach in code is a slice-to-end, e.g.: >> >> start = find_thing(tokens) >> construct_in = tokens[start:] >> >> Obviously, this winds up doing a lot of copying (of object references, not of actual data, but still). Often the above step is followed by something like: >> >> end = find_end(construct_in) >> construct = construct_in[:end] >> >> This second step isn't bad in my case since the actual construct will be dozens of tokens, not thousands or millions, and once I find it I want to keep it around and process it further. >> >> I realize, of course, that I could program my 'find_end()' differently so that it took a signature more like 'find_end(tokens, start=start)'. But with recursion and some other things I do, this becomes inelegant. >> >> What I'd really like is a "ListView" that acts something like NumPy's non-copying slices. However numpy, of course, only deals with arrays and matrices of uniform numeric types. I want a non-copying "slice" of a list of generic Python objects. Moreover, I believe that such a type is useful enough to be worth including in the collections module generally. >> >> As an initial implementation, I created the below. In the module self-tests, the performance increase is about 100x, but the particular ad-hoc benchmark I wrote isn't necessarily well-representative of all use-cases. >> >> % python3 ListView.py >> A bunch of slices from list: 1.29 seconds >> A bunch of slices from DummyListView: 1.19 seconds >> A bunch of slices from ListView: 0.01 seconds >> ------------------------------------------------------------------- >> ### ListView.py ### >> import sys >> from time import time >> from random import randint, random >> from collections import Sequence >> >> class DummyListView(Sequence): >> def __init__(self, l): >> self.list = l >> def __len__(self): >> return len(self.list) >> def __getitem__(self, i): >> return self.list[i] >> >> class ListView(Sequence): >> def __init__(self, seq, start=0, stop=None): >> if hasattr(seq, '__getitem__'): >> self.list = seq >> else: >> self.list = list(seq) >> self.start = start >> self.stop = len(self.list) if stop is None else stop >> >> def __len__(self): >> return self.stop - self.start >> >> def __getitem__(self, i): >> if isinstance(i, slice): >> start = self.start if i.start is None else self.start+i.start >> if i.stop is None: >> stop = self.stop >> else: >> stop = self.start + i.stop >> return ListView(self.list, start, stop) >> else: >> val = self.list[i+self.start] >> if i < 0: >> return val >> elif not self.start <= i+self.start < self.stop: >> raise IndexError("View of sequence [%d:%d], index %d" % ( >> self.start, self.stop, i)) >> return val >> >> def __str__(self): >> return "ListView of %d item list, slice [%d:%d]" % ( >> len(self.list), self.start, self.stop) >> >> def __repr__(self): >> return "ListView(%s)" % self.list[self.start:self.stop] >> >> def to_list(self): >> return list(self.list[self.start:self.stop]) >> >> class Thing(object): >> def __init__(self, x): >> self.x = x >> def __repr__(self): >> return "Thing(%f)" % self.x >> >> if __name__ == '__main__': >> NUM = 100000 >> things = [Thing(random()) for _ in range(NUM)] >> slices = [sorted((randint(0, NUM-1), randint(0, NUM-1))) >> for _ in range(100)] >> offset = randint(0, 100) >> for name, cls in (("list", list), >> ("DummyListView", DummyListView), >> ("ListView",ListView)): >> begin = time() >> s = "A bunch of slices from %s: " % name >> print(s.rjust(38), end='') >> sys.stdout.flush() >> l = cls(things) >> for i in range(8): >> for start, stop in slices: >> sl = l[start:stop] >> size = stop-start >> for i in range(3): >> subslice1 = sl[:offset] >> subslice2 = sl[offset:] >> print("%0.2f seconds" % (time()-begin)) >> >> >> -- >> Keeping medicines from the bloodstreams of the sick; food >> from the bellies of the hungry; books from the hands of the >> uneducated; technology from the underdeveloped; and putting >> advocates of freedom in prisons. Intellectual property is >> to the 21st century what the slave trade was to the 16th. >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas(a)python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > [View Less]

1 0

from __history__ import ...
by Anthony Towns April 19, 2014

April 19, 2014

Antoine Pitrou wrote (on python-dev): > On Fri, 18 Apr 2014 22:31:29 -0400 > Nick Coghlan <ncoghlan at gmail.com> wrote: > > After spending some time talking to the folks at the PyCon Twisted > > sprints, they persuaded me that adding back the iterkeys/values/items > > methods for mapping objects would be a nice way to eliminate a key > > porting hassle for them (and likely others), without significantly > > increasing the complexity of Python 3. > I'… [View More]

2 1

Re: [Python-ideas] py launcher for Linux?
by Vernon D. Cole April 18, 2014

April 18, 2014

> Date: Fri, 18 Apr 2014 18:27:04 -0400 > From: Nick Coghlan <ncoghlan(a)gmail.com> > > On 18 April 2014 15:49, ?ric Araujo <merwok(a)netwok.org> wrote: > > > > It seems to me the problem is defined as specific to Windows, and the > > solution takes inspiration from other operating systems. I think a new > > rationale explaining why bring back that solution to these other OSes is > > needed. > > It would be about removing the … [View More]

1 0

str.find() and friends support a lists of inputs
by Alex Rodrigues April 18, 2014

April 18, 2014

It's a fairly common problem to want to .find() or .replace() or .split() any one of multiple characters. Currently the go to solution are: A: clean_number = formatted_phone_number.replace('-', '').replace('(', '').replace(')','').replace(' ','') B: get_rid_of = ["-","(",")"," "] clean_number = formatted_phone_number for ch in get_rid_of: clean_number = clean_number.replace(ch,'') C: import re … [View More] clean_number = re.sub('[- ]', '', formatted_phone_number) While none of these is especially terrible they're also far from nice or clean. And whenever I'm faced with this kind of problem my automatic reaction is to type: clean_number = formatted_phone_number.replace(["-","(",")"," "],"") That is what I intuitively want to do, and it is the syntax other people often use when trying to describe that they want to replace multiple characters. I think this is because its semantics follow very logically from the original replace statement you learn. Instead of saying "replace this with that" it's saying "replace these with that". In the case of split() it gets even worse to try and split on multiple delimiters and you almost have to resort to using re. However for such simple cases re is serious overkill. You have to teach people about an entire new module, explain what regular expressions are and explain what new syntax like "(abc|def)" and "[abcdef]" means. When you could just use the string functions and list syntax you already understand. While re is an absolute life saver in certain situations, it is very non-performant for little one-of operations because it still has to compile a whole regular expression. Below is a quick test in iPython, intentionally bypassing the cache: In [1]: a = "a"*100+"b" In [2]: %timeit -n 1 -r 1 a.find('b') 1 loops, best of 1: 3.31 µs per loop In [3]: import re In [4]: %%timeit -n 1 -r 1 re.purge() ...: re.search('[b]', 'a') ...: 1 loops, best of 1: 132 µs per loop So for all those reasons, this is what I propose. Making .find() support lists of targets, .split() support lists of delimiters and .replace() support lists of targets. The objective of this is not to support all possible permutations of string operations, I expect there are many cases that this will not solve, however it is meant to make the built in string operations support a slightly larger set of very common operations which fit intuitively with the existing syntax. I'd also like to note what my own concerns were with this idea: My first concern was that this might break existing code. But a quick check shows that this is invalid syntax at the moment, so it doesn't affect backwards compatibility at all. My second concern was with handling the possibility of collisions within the list (i.e. "snowflake".replace(['snow', 'snowflake'])) This could be ameliorated by explicitly deciding that whichever match begins earlier will be applied before considering the others and if they start at the same position the one earlier in the list will be resolved first. However, I'd argue that if you really need explicit control over the match order of words which contain each other that's a pretty good time to start considering regular expressions. Below are a sampling of questions from Stack Overflow which would have benefited from the existence of this syntax. http://stackoverflow.com/questions/21859203/how-to-replace-multiple-charact… http://stackoverflow.com/questions/4998629/python-split-string-with-multipl… http://stackoverflow.com/questions/10017147/python-replace-characters-in-st… http://stackoverflow.com/questions/14215338/python-remove-multiple-characte… Cheers, - Alex [View Less]

9 14

py launcher for Linux?
by Nick Coghlan April 18, 2014

April 18, 2014

Something that came up during PyCon was the idea of a "py" script that brought "Python Launcher for Windows" explicit version dispatch to Linux. Anyone care to try their hand at writing such a script? Cheers, Nick.

11 16

socket.sendall() "counter" parameter
by Giampaolo Rodola' April 18, 2014

April 18, 2014

With current socket.sendall() implementation if an error occurs it's impossible to tell how much data was sent. As such I'm wondering whether it would make sense to add a "counter" parameter which gets incremented internally: sent = 0 try: sock.sendall(data, counter=sent) except socket.error as err: priint("only %s bytes were sent" % sent) This would both allow to not lose information on error and avoid keeping track of the total data being sent, which usually requires and extra len(… [View More]

5 10

Add a datatype collections.ListView
by David Mertz April 17, 2014

April 17, 2014

Following Brandon Rhodes very nice PyCon 2014 talk on datatypes, I was struck by increased guilt over a program I am developing. I spoke with him in the hallway about a good approach to improving performance while keeping code nice looking. The basic idea in my program is that I have a large(ish) list of tokens that I generate in parsing a special language, and I frequently take slices from this list--and often enough, slices out of those slices. In some cases, the most straightforward … [View More]approach in code is a slice-to-end, e.g.: start = find_thing(tokens) construct_in = tokens[start:] Obviously, this winds up doing a lot of copying (of object references, not of actual data, but still). Often the above step is followed by something like: end = find_end(construct_in) construct = construct_in[:end] This second step isn't bad in my case since the actual construct will be dozens of tokens, not thousands or millions, and once I find it I want to keep it around and process it further. I realize, of course, that I could program my 'find_end()' differently so that it took a signature more like 'find_end(tokens, start=start)'. But with recursion and some other things I do, this becomes inelegant. What I'd really like is a "ListView" that acts something like NumPy's non-copying slices. However numpy, of course, only deals with arrays and matrices of uniform numeric types. I want a non-copying "slice" of a list of generic Python objects. Moreover, I believe that such a type is useful enough to be worth including in the collections module generally. As an initial implementation, I created the below. In the module self-tests, the performance increase is about 100x, but the particular ad-hoc benchmark I wrote isn't necessarily well-representative of all use-cases. % python3 ListView.py A bunch of slices from list: 1.29 seconds A bunch of slices from DummyListView: 1.19 seconds A bunch of slices from ListView: 0.01 seconds ------------------------------------------------------------------- ### ListView.py ### import sys from time import time from random import randint, random from collections import Sequence class DummyListView(Sequence): def __init__(self, l): self.list = l def __len__(self): return len(self.list) def __getitem__(self, i): return self.list[i] class ListView(Sequence): def __init__(self, seq, start=0, stop=None): if hasattr(seq, '__getitem__'): self.list = seq else: self.list = list(seq) self.start = start self.stop = len(self.list) if stop is None else stop def __len__(self): return self.stop - self.start def __getitem__(self, i): if isinstance(i, slice): start = self.start if i.start is None else self.start+i.start if i.stop is None: stop = self.stop else: stop = self.start + i.stop return ListView(self.list, start, stop) else: val = self.list[i+self.start] if i < 0: return val elif not self.start <= i+self.start < self.stop: raise IndexError("View of sequence [%d:%d], index %d" % ( self.start, self.stop, i)) return val def __str__(self): return "ListView of %d item list, slice [%d:%d]" % ( len(self.list), self.start, self.stop) def __repr__(self): return "ListView(%s)" % self.list[self.start:self.stop] def to_list(self): return list(self.list[self.start:self.stop]) class Thing(object): def __init__(self, x): self.x = x def __repr__(self): return "Thing(%f)" % self.x if __name__ == '__main__': NUM = 100000 things = [Thing(random()) for _ in range(NUM)] slices = [sorted((randint(0, NUM-1), randint(0, NUM-1))) for _ in range(100)] offset = randint(0, 100) for name, cls in (("list", list), ("DummyListView", DummyListView), ("ListView",ListView)): begin = time() s = "A bunch of slices from %s: " % name print(s.rjust(38), end='') sys.stdout.flush() l = cls(things) for i in range(8): for start, stop in slices: sl = l[start:stop] size = stop-start for i in range(3): subslice1 = sl[:offset] subslice2 = sl[offset:] print("%0.2f seconds" % (time()-begin)) -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. [View Less]

5 8

optional parameters
by Yury Selivanov April 16, 2014

April 16, 2014

Hello, There is a very common pattern for creating optional arguments when you can't use None: _optional = object() def foo(*, arg1='spam', arg3=None, arg4=_optional): if arg4 is _optional: # caller didn't pass *anything* for arg4 else: # caller did pass some (maybe None) value for arg4 It's a bit annoying to create this marker objects, and also, if you try to render a signature of such function, you'll get something like: "(*, arg1='spam', arg3=None, arg4=<… [View More]

6 7

Re: [Python-ideas] optional parameters
by Vernon D. Cole April 16, 2014

April 16, 2014

On Wed, Apr 16, 2014 at 10:09 AM, Yury Selivanov <yselivanov.ml(a)gmail.com> wrote: > There is a very common pattern for creating optional arguments > when you can't use None: > > > > It's a bit annoying to create this marker objects, and also, > if you try to render a signature of such function, you'll get > something like: > > "(*, arg1='spam', arg3=None, arg4=<object object at 0x104be7080>)" > > What if we add a standard marker for this use-… [View More]

1 0