On 20Apr2014 14:41, anatoly techtonik <techtonik(a)gmail.com> wrote:
>On Sun, Sep 29, 2013 at 2:26 AM, Cameron Simpson <cs(a)zip.com.au> wrote:
>> -1 for any names commencing with __ (or even _).
>
>So you like it DIR, PATH
As per-module/file predefined global "constants"? Possibly. Still -0, but open
to argument on naming issues.
>> -1 for new globals.
>
>So you want this:
> from os.path import DIR, PATH
I'm less good on these. That would import some staticish values (or are they to
be functions?) which can't be right for the importing module. I suspect I'm
misunderstanding your intent here.
>> -1 because I can imagine wanting different nuances on the definitions
>> above; in particular for DIR I can well imagine wanting bare
>> dirname(abspath(FILE)) - semanticly different to your construction.
>> There's lots of scope for bikeshedding here.
>
>Assuming that FILE is always absolute, how that:
> dirname(abspath(FILE))
>is different from that:
> abspath(dirname(FILE))
>?
Depends what "absolute" means. When the source file was obtained by traversing
a symlink, is FILE naive (ignored the symlink, just has to start at "/") or
resolved (absolute path to FILE which does not traverse a symlink)?
I can imagine use cases for both, and therefore the bikeshedding.
>> -1 because this is trivial trival code.
>
>Code is trivial, but the problem is not. In particular, this code
>doesn't work after os.chdir
I'm fairly sure I acknowledged this. [...] Aha:
On Mon, 30 Sep 2013 08:17:46 +1000 I wrote:
Of course, chdir and passing paths to programs-which-are-not-my-children
present scope for wanting abspath, but in the very common simple
case: unnecessary and therefore undesirable.
And I'm aware that modules-inside-zip-files don't work with this;
let us ignore that; they won't work with abspath either:-)
And what should happen for code from zipfiles? Something must happen, even if
it is just NameError or the like, for "not supported".
>> -1 because you can do all this with relative paths anyway, no need for abspath
>
>Non reliable because of os.chdir
As mentioned above, yes. But very often irrelevant. (Hmm, maybe not in the
general case of library code imported before a chdir and used afterwards.)
Chdir's global nature (witin the process) is one reason casual use of it is
discouraged; an amazing amount of stuff can be broken by a chdir.
>> -1 because I can imagine being unable to compute abspath in certain
>> circumstances ( certainly on older UNIX systems you could be
>> inside a directory without sufficient privileges to walk back
>> up the tree for getcwd and its equivalents )
>
>Fear and uncertainty. Can you confirm that Python is able to launch script, but
>abspath fails in these circumstances? There might be a bc break in 3.5
I'd need a suitable system today. On Linux, getcwd() has been a system call for
a long time (I remember being slapped down on that very issue once). Let's try
this Mac:
$ mkdir -p fooo/baaa
$ cd fooo/baaa
$ chmod 0 ..
$ /bin/pwd
pwd: .: Permission denied
$
So, yeah, not always reliable.
This means that anything that presupplies names for [PHP-like __FILE__ and
__DIR__] needs to compute it at need, not always. And the downside is that if
these idoms, and importantly reliance on these idioms, becomes common then
suddenly every Python program becomes a ticking time bomb for being run in an
unusual-but-valid environment.
Any kind of security conscious environment may require programs to run in
constrained contexts. It is best to minimise the presumptions that a program
makes in the name of convenience. More than once I've met devs whose code ran
only in their (very very open/unsecured) dev environment and failed even in our
staging environments.
So I while it is rare, I don't think "/bin/pwd doesn't work" should be entirely
ignored.
Cheers,
Cameron Simpson <cs(a)zip.com.au>
Every particle continues in its state of rest or uniform motion in a straight
line except insofar as it doesn't. - Sir Arther Eddington
On Apr 20, 2014 3:02 AM, "Tal Einat" <taleinat(a)gmail.com> wrote:
>
> FYI your handling of negative indexes is buggy.
>
It's horrible! I'm not quite sure it can even be called "handling." Take my
example only as a prototype of the concept that lets me run the benchmark.
That said, doing the negative indices right should just be some arithmetic
on self.start and self.stop, not anything too complicated or differently
performing.
>
> On Thu, Apr 17, 2014 at 6:49 PM, David Mertz <mertz(a)gnosis.cx> wrote:
>>
>> Following Brandon Rhodes very nice PyCon 2014 talk on datatypes, I was
struck by increased guilt over a program I am developing. I spoke with him
in the hallway about a good approach to improving performance while keeping
code nice looking.
>>
>> The basic idea in my program is that I have a large(ish) list of tokens
that I generate in parsing a special language, and I frequently take slices
from this list--and often enough, slices out of those slices. In some
cases, the most straightforward approach in code is a slice-to-end, e.g.:
>>
>> start = find_thing(tokens)
>> construct_in = tokens[start:]
>>
>> Obviously, this winds up doing a lot of copying (of object references,
not of actual data, but still). Often the above step is followed by
something like:
>>
>> end = find_end(construct_in)
>> construct = construct_in[:end]
>>
>> This second step isn't bad in my case since the actual construct will be
dozens of tokens, not thousands or millions, and once I find it I want to
keep it around and process it further.
>>
>> I realize, of course, that I could program my 'find_end()' differently
so that it took a signature more like 'find_end(tokens, start=start)'. But
with recursion and some other things I do, this becomes inelegant.
>>
>> What I'd really like is a "ListView" that acts something like NumPy's
non-copying slices. However numpy, of course, only deals with arrays and
matrices of uniform numeric types. I want a non-copying "slice" of a list
of generic Python objects. Moreover, I believe that such a type is useful
enough to be worth including in the collections module generally.
>>
>> As an initial implementation, I created the below. In the module
self-tests, the performance increase is about 100x, but the particular
ad-hoc benchmark I wrote isn't necessarily well-representative of all
use-cases.
>>
>> % python3 ListView.py
>> A bunch of slices from list: 1.29 seconds
>> A bunch of slices from DummyListView: 1.19 seconds
>> A bunch of slices from ListView: 0.01 seconds
>> -------------------------------------------------------------------
>> ### ListView.py ###
>> import sys
>> from time import time
>> from random import randint, random
>> from collections import Sequence
>>
>> class DummyListView(Sequence):
>> def __init__(self, l):
>> self.list = l
>> def __len__(self):
>> return len(self.list)
>> def __getitem__(self, i):
>> return self.list[i]
>>
>> class ListView(Sequence):
>> def __init__(self, seq, start=0, stop=None):
>> if hasattr(seq, '__getitem__'):
>> self.list = seq
>> else:
>> self.list = list(seq)
>> self.start = start
>> self.stop = len(self.list) if stop is None else stop
>>
>> def __len__(self):
>> return self.stop - self.start
>>
>> def __getitem__(self, i):
>> if isinstance(i, slice):
>> start = self.start if i.start is None else self.start+i.start
>> if i.stop is None:
>> stop = self.stop
>> else:
>> stop = self.start + i.stop
>> return ListView(self.list, start, stop)
>> else:
>> val = self.list[i+self.start]
>> if i < 0:
>> return val
>> elif not self.start <= i+self.start < self.stop:
>> raise IndexError("View of sequence [%d:%d], index %d" % (
>> self.start, self.stop, i))
>> return val
>>
>> def __str__(self):
>> return "ListView of %d item list, slice [%d:%d]" % (
>> len(self.list), self.start, self.stop)
>>
>> def __repr__(self):
>> return "ListView(%s)" % self.list[self.start:self.stop]
>>
>> def to_list(self):
>> return list(self.list[self.start:self.stop])
>>
>> class Thing(object):
>> def __init__(self, x):
>> self.x = x
>> def __repr__(self):
>> return "Thing(%f)" % self.x
>>
>> if __name__ == '__main__':
>> NUM = 100000
>> things = [Thing(random()) for _ in range(NUM)]
>> slices = [sorted((randint(0, NUM-1), randint(0, NUM-1)))
>> for _ in range(100)]
>> offset = randint(0, 100)
>> for name, cls in (("list", list),
>> ("DummyListView", DummyListView),
>> ("ListView",ListView)):
>> begin = time()
>> s = "A bunch of slices from %s: " % name
>> print(s.rjust(38), end='')
>> sys.stdout.flush()
>> l = cls(things)
>> for i in range(8):
>> for start, stop in slices:
>> sl = l[start:stop]
>> size = stop-start
>> for i in range(3):
>> subslice1 = sl[:offset]
>> subslice2 = sl[offset:]
>> print("%0.2f seconds" % (time()-begin))
>>
>>
>> --
>> Keeping medicines from the bloodstreams of the sick; food
>> from the bellies of the hungry; books from the hands of the
>> uneducated; technology from the underdeveloped; and putting
>> advocates of freedom in prisons. Intellectual property is
>> to the 21st century what the slave trade was to the 16th.
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas(a)python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
Antoine Pitrou wrote (on python-dev):
> On Fri, 18 Apr 2014 22:31:29 -0400
> Nick Coghlan <ncoghlan at gmail.com> wrote:
> > After spending some time talking to the folks at the PyCon Twisted
> > sprints, they persuaded me that adding back the iterkeys/values/items
> > methods for mapping objects would be a nice way to eliminate a key
> > porting hassle for them (and likely others), without significantly
> > increasing the complexity of Python 3.
> I'm -1 on this. This is destroying the simplification effort of the
> dict API in Python 3.
I'm one of the masses who basically ignores py3 in favour of py2.7 (or
2.6 or even 2.4...) because the code I write has to run on old python.
And even if I /want/ to be compatible with py3, the only way I can do
that is by doing things that perform poorly in the version of python
I'll actually be using in production, or by having ugly special cases
and boiler plate all over the place. Things like this really are a
barrier to adoption for py3...
There's already a "from __future__ ..." mechanism to help with
backwards incompatible changes (creating new keywords and syntax in
particular). But that only helps people stay on old versions of
python.
So why not create the opposite: "from __history__ import ..." that
allows you to use obsoleted ways of programming with new versions of
python? If you want to use iter* on dicts in py3.5, require a "from
__history__ import dict_iter" statement at the top of the file.
For py2.7, there is no __history__ module, but it's easy to add one to
your project that just says "dict_iter = True" to let the import
statement succeed. In general (looking forward to py4?) you could (at
least in theory) have "from __history__ import FOO" be a noop if FOO
is unknown, add support for whatever historical practice if it's known
and a barrier to adoption, and have it be a syntax error when it's
finally no longer supported at all. That way you only need to actually
define a symbol via the __history__ module when you're actually
removing an obsolete idiom from the language.
Cheers,
aj
--
Anthony Towns <aj(a)erisian.com.au>
> Date: Fri, 18 Apr 2014 18:27:04 -0400
> From: Nick Coghlan <ncoghlan(a)gmail.com>
>
> On 18 April 2014 15:49, ?ric Araujo <merwok(a)netwok.org> wrote:
> >
> > It seems to me the problem is defined as specific to Windows, and the
> > solution takes inspiration from other operating systems. I think a new
> > rationale explaining why bring back that solution to these other OSes is
> > needed.
>
> It would be about removing the current cross-platform discrepancy in
> the instructions at
>
> https://docs.python.org/3/installing/#work-with-multiple-versions-of-python…
>
> Not a high priority for me personally, but I figured it was worth
> mentioning in case it captured someone's interest.
>
> Cheers,
> Nick.
>
+1
I am a switch hitter. I spend almost as much time on Windows as on Linux,
and to keep myself from being completely confused, I make the environments
as similar as possible. The most frequently-used utility on my Windows box
must be "ls.bat" (which runs "dir").
It is true that the "real reason" for py.exe was to enable #! processing,
and that works wonderfully -- mine also launches IronPython, Jython, PyPy,
and (just to prove a point) Perl. [Trivia: hello_world.pl runs perfectly in
Python.]
But, the "py" command-line command is really habit forming. For weeks now,
"py: command not found" has been haunting me.
Somehow the name "Python Launcher for Windows for Linux" sounds wrong, but
I want one. It works so well.
It's a fairly common problem to want to .find() or .replace() or .split() any one of multiple characters. Currently the go to solution are:
A:
clean_number = formatted_phone_number.replace('-', '').replace('(', '').replace(')','').replace(' ','')
B:
get_rid_of = ["-","(",")"," "]
clean_number = formatted_phone_number
for ch in get_rid_of:
clean_number = clean_number.replace(ch,'')
C:
import re
clean_number = re.sub('[-\(\) ]', '', formatted_phone_number)
While none of these is especially terrible they're also far from nice or clean. And whenever I'm faced with this kind of problem my automatic reaction is to type:
clean_number = formatted_phone_number.replace(["-","(",")"," "],"")
That is what I intuitively want to do, and it is the syntax other people often use when trying to describe that they want to replace multiple characters. I think this is because its semantics follow very logically from the original replace statement you learn.
Instead of saying "replace this with that" it's saying "replace these with that".
In the case of split() it gets even worse to try and split on multiple delimiters and you almost have to resort to using re. However for such simple cases re is serious overkill. You have to teach people about an entire new module, explain what regular expressions are and explain what new syntax like "(abc|def)" and "[abcdef]" means. When you could just use the string functions and list syntax you already understand.
While re is an absolute life saver in certain situations, it is very non-performant for little one-of operations because it still has to compile a whole regular expression. Below is a quick test in iPython, intentionally bypassing the cache:
In [1]: a = "a"*100+"b"
In [2]: %timeit -n 1 -r 1 a.find('b')
1 loops, best of 1: 3.31 µs per loop
In [3]: import re
In [4]: %%timeit -n 1 -r 1 re.purge()
...: re.search('[b]', 'a')
...:
1 loops, best of 1: 132 µs per loop
So for all those reasons, this is what I propose. Making .find() support lists of targets, .split() support lists of delimiters and .replace() support lists of targets. The objective of this is not to support all possible permutations of string operations, I expect there are many cases that this will not solve, however it is meant to make the built in string operations support a slightly larger set of very common operations which fit intuitively with the existing syntax.
I'd also like to note what my own concerns were with this idea:
My first concern was that this might break existing code. But a quick check shows that this is invalid syntax at the moment, so it doesn't affect backwards compatibility at all.
My second concern was with handling the possibility of collisions within the list (i.e. "snowflake".replace(['snow', 'snowflake'])) This could be ameliorated by explicitly deciding that whichever match begins earlier will be applied before considering the others and if they start at the same position the one earlier in the list will be resolved first. However, I'd argue that if you really need explicit control over the match order of words which contain each other that's a pretty good time to start considering regular expressions.
Below are a sampling of questions from Stack Overflow which would have benefited from the existence of this syntax.
http://stackoverflow.com/questions/21859203/how-to-replace-multiple-charact…http://stackoverflow.com/questions/4998629/python-split-string-with-multipl…http://stackoverflow.com/questions/10017147/python-replace-characters-in-st…http://stackoverflow.com/questions/14215338/python-remove-multiple-characte…
Cheers,
- Alex
Something that came up during PyCon was the idea of a "py" script that
brought "Python Launcher for Windows" explicit version dispatch to Linux.
Anyone care to try their hand at writing such a script?
Cheers,
Nick.
With current socket.sendall() implementation if an error occurs it's
impossible to tell how much data was sent. As such I'm wondering whether it
would make sense to add a "counter" parameter which gets incremented
internally:
sent = 0
try:
sock.sendall(data, counter=sent)
except socket.error as err:
priint("only %s bytes were sent" % sent)
This would both allow to not lose information on error and avoid keeping
track of the total data being sent, which usually requires and extra len()
call. E.g. when sending a file:
file = open('somefile', 'rb')
total = 0
while True:
chunk = file.read(8192)
if not chunk:
break
sock.sendall(chunk, counter=total)
Thoughts?
--
Giampaolo - http://grodola.blogspot.com
Following Brandon Rhodes very nice PyCon 2014 talk on datatypes, I was
struck by increased guilt over a program I am developing. I spoke with him
in the hallway about a good approach to improving performance while keeping
code nice looking.
The basic idea in my program is that I have a large(ish) list of tokens
that I generate in parsing a special language, and I frequently take slices
from this list--and often enough, slices out of those slices. In some
cases, the most straightforward approach in code is a slice-to-end, e.g.:
start = find_thing(tokens)
construct_in = tokens[start:]
Obviously, this winds up doing a lot of copying (of object references, not
of actual data, but still). Often the above step is followed by something
like:
end = find_end(construct_in)
construct = construct_in[:end]
This second step isn't bad in my case since the actual construct will be
dozens of tokens, not thousands or millions, and once I find it I want to
keep it around and process it further.
I realize, of course, that I could program my 'find_end()' differently so
that it took a signature more like 'find_end(tokens, start=start)'. But
with recursion and some other things I do, this becomes inelegant.
What I'd really like is a "ListView" that acts something like NumPy's
non-copying slices. However numpy, of course, only deals with arrays and
matrices of uniform numeric types. I want a non-copying "slice" of a list
of generic Python objects. Moreover, I believe that such a type is useful
enough to be worth including in the collections module generally.
As an initial implementation, I created the below. In the module
self-tests, the performance increase is about 100x, but the particular
ad-hoc benchmark I wrote isn't necessarily well-representative of all
use-cases.
% python3 ListView.py
A bunch of slices from list: 1.29 seconds
A bunch of slices from DummyListView: 1.19 seconds
A bunch of slices from ListView: 0.01 seconds
-------------------------------------------------------------------
### ListView.py ###
import sys
from time import time
from random import randint, random
from collections import Sequence
class DummyListView(Sequence):
def __init__(self, l):
self.list = l
def __len__(self):
return len(self.list)
def __getitem__(self, i):
return self.list[i]
class ListView(Sequence):
def __init__(self, seq, start=0, stop=None):
if hasattr(seq, '__getitem__'):
self.list = seq
else:
self.list = list(seq)
self.start = start
self.stop = len(self.list) if stop is None else stop
def __len__(self):
return self.stop - self.start
def __getitem__(self, i):
if isinstance(i, slice):
start = self.start if i.start is None else self.start+i.start
if i.stop is None:
stop = self.stop
else:
stop = self.start + i.stop
return ListView(self.list, start, stop)
else:
val = self.list[i+self.start]
if i < 0:
return val
elif not self.start <= i+self.start < self.stop:
raise IndexError("View of sequence [%d:%d], index %d" % (
self.start, self.stop, i))
return val
def __str__(self):
return "ListView of %d item list, slice [%d:%d]" % (
len(self.list), self.start, self.stop)
def __repr__(self):
return "ListView(%s)" % self.list[self.start:self.stop]
def to_list(self):
return list(self.list[self.start:self.stop])
class Thing(object):
def __init__(self, x):
self.x = x
def __repr__(self):
return "Thing(%f)" % self.x
if __name__ == '__main__':
NUM = 100000
things = [Thing(random()) for _ in range(NUM)]
slices = [sorted((randint(0, NUM-1), randint(0, NUM-1)))
for _ in range(100)]
offset = randint(0, 100)
for name, cls in (("list", list),
("DummyListView", DummyListView),
("ListView",ListView)):
begin = time()
s = "A bunch of slices from %s: " % name
print(s.rjust(38), end='')
sys.stdout.flush()
l = cls(things)
for i in range(8):
for start, stop in slices:
sl = l[start:stop]
size = stop-start
for i in range(3):
subslice1 = sl[:offset]
subslice2 = sl[offset:]
print("%0.2f seconds" % (time()-begin))
--
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons. Intellectual property is
to the 21st century what the slave trade was to the 16th.
Hello,
There is a very common pattern for creating optional arguments
when you can't use None:
_optional = object()
def foo(*, arg1='spam', arg3=None, arg4=_optional):
if arg4 is _optional:
# caller didn't pass *anything* for arg4
else:
# caller did pass some (maybe None) value for arg4
It's a bit annoying to create this marker objects, and also,
if you try to render a signature of such function, you'll get
something like:
"(*, arg1='spam', arg3=None, arg4=<object object at 0x104be7080>)"
What if we add a standard marker for this use-case:
functools.optional or inspect.Parameter.optional?
Yury
On Wed, Apr 16, 2014 at 10:09 AM, Yury Selivanov <yselivanov.ml(a)gmail.com>
wrote:
> There is a very common pattern for creating optional arguments
> when you can't use None:
>
>
>
> It's a bit annoying to create this marker objects, and also,
> if you try to render a signature of such function, you'll get
> something like:
>
> "(*, arg1='spam', arg3=None, arg4=<object object at 0x104be7080>)"
>
> What if we add a standard marker for this use-case:
> functools.optional or inspect.Parameter.optional?
>
>
There is already a singleton which works very well for this use case:
def foo(*, arg1='spam', arg3=None, arg4=NotImplemented):
if arg4 is NotImplemented:
# caller didn't pass *anything* for arg4
else:
# caller did pass some (maybe None) value for arg4
It is already defined, and reads like sensible English.