In a lot of uses of os.walk it is desirable to skip version control
directories, (which are usually hidden directories), to the point that
almost all of the examples given look like:
import os
for root, dirs, files in os.walk(some_dir):
if 'CVS' in dirs:
dirs.remove('CVS') # or .svn or .hg etc.
# do something...
But of course there are many version control systems to the point that
much of my personal code looks like, (note that I have to use a
multitude of version control systems due to project requirements):
import os
vcs_dirs = ['.hg', '.svn', 'CSV', '.git', '.bz'] # Version control
directory names I know
for root, dirs, files in os.walk(some_dir):
for dirname in vcs_dirs:
dirs.remove(dirname)
I am sure that I am missing many other version control systems but the
one thing that all of the ones that I am familiar with default to
creating their files in hidden directories. I know that the above
sometimes hits problems on Windows if someone manually created a
directory and you end up with abortions such as Csv\ or .SVN ....
Since it could be argued that hidden directories are possibly more
common than simlinks, (especially in the Windows world of course), and
that hidden directories have normally been hidden by someone for a
reason it seems to make sense to me to normally ignore them in directory
traversal.
Obviously there are also occasions when it makes sense to include VCS,
or other hidden, directories files, (e.g. "Where did all of my disk
space go?" or "delete recursively"), so I would like to suggest
including in the os.walk family of functions an additional parameter to
control skipping all hidden directories - either positively or negatively.
Names that spring to mind include:
* nohidden
* nohidden_dirs
* hidden
* hidden_dirs
This change could be made with no impact on current behaviour by
defaulting to hidden=True (or nohidden=False) which would just about
ensure that no existing code is broken or quite a few bugs in existing
code could be quietly fixed, (and some new ones introduced), by
defaulting to this behaviour.
Since the implementation of os.walk has changed to use os.scandir which
exposes the returned file statuses in the os.DirEntry.stat() the
overhead should be minimal.
An alternative would be to add another new function, say os.vwalk(), to
only walk visible entries.
Note that a decision would have to be made on whether to include such
filtering when topdown is False, personally I am tempted to include the
filtering so as to maintain consistency but ignoring the filter when
topdown is False, (or if topdown is False and the hidden behaviour is
unspecified), might make sense if the skipping of hidden directories
becomes the new default (then recursively removing files & directories
would still include processing hidden items by default).
If this receives a positive response I would be happy to undertake the
effort involved in producing a PR.
--
Steve (Gadget) Barnes
Any opinions in this message are my personal opinions and do not reflect
those of my employer.
---
This email has been checked for viruses by AVG.
http://www.avg.com
Hi everybody,
Our PEP idea would be to purpose to add a global default value for
itemgeet and attrgetter method.
This was inspired from bug 14384 (https://bugs.python.org/issue14384);
opened by Miki TEBEKA.
For example, we could do:
p1 = {'x': 43; 'y': 55}
x, y, z = itemgetter('x', 'y', 'z', default=0)(values)
print(x, y, z)
43, 55, 0
instead of:
values = {'x': 43; 'y': 55}
x = values.get('x', 0)
y = values.get('y', 0)
z = values.get('z', 0)
print(x, y, z)
43, 55, 0
The goal is to have have concise code and improve consistency with
getattr, attrgetter and itemgetter
What are you thinking about this?
MAILLOL Vincent
GALODE Alexandre
2018-05-07 21:56 GMT+02:00 Neil Girdhar <mistersheik(a)gmail.com>:
> Regular expressions are not just "an order of magnitude better"—they're
> asymptotically faster. See
> https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm
> for a non-regular-expression algorithm.
Hence my
>> [Jacco wrote, capitalized important words]
>> regular expressions would probably be AT LEAST an order of magnitude
>> better in speed, if it's a bottleneck to you. But pure python
>> implementation for this is a lot easier than it would be for the
>> current string.count().
>>
But I think my point stands that that's what you need to do if speed
is an issue, and python code is fine when it isn't.
Also, intersting read. Thanks.
Jacco
Hi,
There’s an error with the string method count().
x = ‘AAA’
y = ‘AA’
print(x.count(y))
The output is 1, instead of 2.
I write programs on SoloLearn mobile app.
Warm regards,
Julia Kim
Hi all!
Really interesting discussion on here.
Personally I feel that PEP 572, as it stands, undermines the readability of
the language, in particular for those new to the language, programming.
I have issue with both the in-expression assignment, and the current
proposed operator approach.
Below is a modified version of a comment I made on a Reddit thread about
this.
I think the first point is that, beginners often read more code than they
write - to build a mental picture of how the language works, and to build
their solution from existing parts. Whether this is a good means of
learning, or not, it's quite commonplace (and I learned using such a
method).
If this PEP passes, you will find people use the syntax. And it may well
end up being disproportionately used in the early stages because of "new
feature" semantics.
Here is an extract from the Wikipedia A* search
function reconstruct_path(cameFrom, current)
total_path := [current]
while current in cameFrom.Keys:
current := cameFrom[current]
total_path.append(current)
return total_path
In Python 3.7, that looks like this:
def reconstruct_path(cameFrom, current):
total_path = [current]
while current in cameFrom.keys():
current = cameFrom[current]
total_path.append(current)
return total_path
(of course it's not entirely Pythonic), but the point is - the pseudo code
is designed to be semantically readable, and the Python code is very similar
However with PEP572, now, the beginner will encounter both := assignments,
and = assignments. When to use which one? Now they need to learn the edge
cases / semantic differences between expression and statement explicitly,
rather than picking this up as they go. I am not making this argument
because I think that this is they best way to learn a programming language,
I'm simply arguing that there is more cognitive overhead when unfamiliar
with the language, in order to use the appropriate feature.
In terms of implementation, it's especially odd that the current proposal
(AFAICT) only binds to a name, rather than an assignment target. This feels
very wrong, despite the fact that I can understand why it is suggested. I
feel like the Python 3 series in particular has been making syntax more
uniform, so that there aren't quirks and edge cases of "this only works in
this context" (besides async, of course), which is one of the things that
makes Python so expressive.
Furthermore, with this PEP, assignment can now happen inside of expressions
- and so one of the most fundamental benefits of expressions being
effectively immutable in terms of local names is lost.
In terms of prevalence, I don't think that these situations do occur all
that often. I definitely agree that regex is the prime candidate for this
kind of syntax. However, in the examples given (matching 3+ regexes), I
would use a loop for simplicity anyway. When it comes to assigning to a
result that is only used in the conditional block, this is certainly a case
that benefits from the PEP.
--------------------------------------------------------
If, however, the motivation for the PEP was deemed significant enough that
warrant its inclusion in a future release, then I would like to suggest
that the keyword approach is superior to the operator variant. In
particular, I prefer the `where` to the `given` or 'let' candidates, as I
think it is more descriptive and slightly shorter to type ;)
Thanks!
Angus Hollands
Hi all,
This is a bit of a wacky idea, but I think it might be doable and have
significant benefits, so throwing it out there to see what people
think.
In asyncio, there are currently three kinds of calling conventions for
asynchronous functions:
1) Ones which return a Future
2) Ones which return a raw coroutine object
3) Ones which return a Future, but are documented to return a
coroutine object, because we want to possibly switch to doing that in
the future and are hoping people won't depend on them returning a
Future
In practice these have slightly different semantics. For example,
types (1) and (3) start executing immediately, while type (2) doesn't
start executing until passed to 'await' or some function like
asyncio.gather. For type (1), you can immediately call
.add_done_callback:
func_returning_future().add_done_callback(...)
while for type (2) and (3), you have to explicitly call ensure_future first:
asyncio.ensure_future(func_returning_coro()).add_done_callback(...)
In practice, these distinctions are mostly irrelevant and annoying to
users; the only thing you can do with a raw coroutine is pass it to
ensure_future() or equivalent, and the existence of type (3) functions
means that you can't even assume that functions documented as
returning raw coroutines actually return raw coroutines, or that these
will stay the same across versions. But it is a source of confusion,
see e.g. this thread on async-sig [1], or this one [2]. It also makes
it harder to evolve asyncio, since any function documented as
returning a Future cannot take advantage of async/await syntax. And
it's forced the creation of awkward APIs like the "coroutine hook"
used in asyncio's debug mode.
Other languages with async/await, like C# and Javascript, don't have
these problems, because they don't have raw coroutine objects at all:
when you mark a function as async, that directly converts it into a
function that returns a Future (or local equivalent). So the
difference between async functions and Future-returning functions is
only relevant to the person writing the function; callers don't have
to care, and can assume that the full Future interface is always
available.
I think Python did a very smart thing in *not* hard-coding Futures
into the language, like C#/JS do. But, I also think it would be nice
if we didn't force regular asyncio users to be aware of all these
details.
So here's an idea: we add a new kind of hook that coroutine runners
can set. In async_function.__call__, it creates a coroutine object,
and then invokes this hook, which then can wrap the coroutine into a
Task (or Deferred or whatever is appropriate for the current coroutine
runner). This way, from the point of view of regular asyncio users,
*all* async functions become functions-returning-Futures (type 1
above):
async def foo():
pass
# This returns a Task running on the current loop
foo()
Of course, async loops need a way to get at the actual coroutine
objects, so we should also provide some method on async functions to
do that:
foo.__corocall__() -> returns a raw coroutine object
And as an optimization, we can make 'await <funcall>' invoke this, so
that in regular async function -> async function calls, we don't pay
the cost of setting up an unnecessary Task object:
# This
await foo(*args, **kwargs)
# Becomes sugar for:
try:
_callable = foo.__corocall__
except AttributeError:
# Fallback, so 'await function_returning_promise()' still works:
_callable = foo
_awaitable = _callable(*args, **kwargs)
await _awaitable
(So this hook is actually quite similar to the existing coroutine
hook, except that it's specifically only invoked on bare calls, not on
await-calls.)
Of course, if no coroutine runner hook is registered, then the default
should remain the same as now. This also means that common idioms
like:
loop.run_until_complete(asyncfn())
still work, because at the time asyncfn() is called, no loop is
running, asyncfn() silently returns a regular coroutine object, and
then run_until_complete knows how to handle that.
This would also help libraries like Trio that remove Futures
altogether; in Trio, the convention is that 'await asyncfn()' is
simply the only way to call asyncfn, and writing a bare 'asyncfn()' is
always a mistake – but one that is currently confusing and difficult
to detect because all it does is produce a warning ("coroutine was
never awaited") at some potentially-distant location that depends on
what the GC does. In this proposal, Trio could register a hook that
raises an immediate error on bare 'asyncfn()' calls.
This would also allow libraries built on Trio-or-similar to migrate a
function from sync->async or async->sync with a deprecation period.
Since in Trio sync functions would always use __call__, and async
functions would always use __corocall__, then during a transition
period one could use a custom object that defines both, and has one of
them emit a DeprecationWarning. This is a problem that comes up a lot
in new libraries, and currently doesn't have any decent solution.
(It's actually happened to Trio itself, and created the only case
where Trio has been forced to break API without a deprecation period.)
The main problem I can see here is cases where you have multiple
incompatible coroutine runners in the same program. In this case,
problems could arise when you call asyncfn() under runner A and pass
it to runner B for execution, so you end up with a B-flavored
coroutine getting passed to A's wrapping hook. The kinds of cases
where this might happen are:
- Using the trio-asyncio library to run asyncio libraries under trio
- Using Twisted's Future<->Deferred conversion layer
- Using async/await to implement an ad hoc coroutine runner (e.g. a
state machine) inside a program that also uses an async library
I'm not sure if this is OK or not, but I think it might be?
trio-asyncio's API is already designed to avoid passing coroutine
objects across the boundary – to call an asyncio function from trio
you write
await run_asyncio(async_fn, *args)
and then run_asyncio switches into asyncio context before actually
calling async_fn, so that's actually totally fine. Twisted's API does
not currently work this way, but I think it's still in enough of an
early provisional state that it could be fixed. And for users
implementing ad hoc coroutine runners, I think this is (a) rare, (b)
using generators is probably better style, since the only difference
between async/await and generators is that async/await is explicitly
supposed to be opaque to users and signal "this is your async
library", (c) if they're writing a coroutine runner then they can set
up their coroutine runner hook appropriately anyway. But there would
be some costs here; the trade-off would be a significant
simplification and increase in usability, because regular users could
simply stop having to know about 'coroutine objects' and 'awaitables'
and all that entirely, and we'd be able to take more advantage of
async/await in existing libraries.
What do you think?
-n
[1] https://mail.python.org/pipermail/async-sig/2018-May/000484.html
[2] https://mail.python.org/pipermail/async-sig/2018-April/000470.html
--
Nathaniel J. Smith -- https://vorpus.org
That's buggy code in either version. A month is not necessarily 30 days,
and a year is not necessarily 365 days. An example without such painful
bugs might be more compelling... It also likely makes the use case less
obvious for the bug-free version.
On Thu, May 3, 2018, 2:37 PM Robert Roskam <raiderrobert(a)gmail.com> wrote:
> Hey Chris,
>
> So I started extremely generally with my syntax, but it seems like I
> should provide a lot more examples of real use. Examples are hard. Here's
> my hastily put together example from an existing piece of production code:
>
>
> # Existing Production Code
> from datetime import timedelta, date
> from django.utils import timezone
>
>
> def convert_time_to_timedelta(unit:str, amount:int, now:date):
> if unit in ['days', 'hours', 'weeks']:
> return timedelta(**{unit: amount})
> elif unit == 'months':
> return timedelta(days=30 * amount)
> elif unit == 'years':
> return timedelta(days=365 * amount)
> elif unit == 'cal_years':
> return now - now.replace(year=now.year - amount)
>
>
>
>
> # New Syntax for same problem
>
>
> def convert_time_to_timedelta_with_match(unit:str, amount:int, now:date):
> return match unit:
> 'days', 'hours', 'weeks' => timedelta(**{unit: amount})
> 'months' => timedelta(days=30 * amount)
> 'years' => timedelta(days=365 * amount)
> 'cal_years' => now - now.replace(year=now.year - amount)
>
>
>
>
>
>
>
> On Thursday, May 3, 2018 at 2:02:54 PM UTC-4, Chris Angelico wrote:
>>
>> On Fri, May 4, 2018 at 3:18 AM, Ed Kellett <e+pytho...(a)kellett.im>
>> wrote:
>> > I believe the intention in the example you quoted is syntax something
>> like:
>> >
>> > <match-case> ::= <pattern>
>> > | <pattern> "if" <expression>
>> >
>> > where the expression is a guard expression evaluated in the context of
>> > the matched pattern.
>> >
>> > IOW, it could be written like this, too:
>> >
>> > number = match x:
>> > 1 if True => "one"
>> > y if isinstance(y, str) => f'The string is {y}'
>> > _ if True => "anything"
>> >
>> > I do see a lot of room for bikeshedding around the specific spelling.
>> > I'm going to try to resist the temptation ;)
>>
>> Okay, let me try to tease apart your example.
>>
>> 1) A literal matches anything that compares equal to that value.
>> 2) A name matches anything at all, and binds it to that name.
>> 2a) An underscore matches anything at all. It's just a name, and
>> follows a common convention.
>> 3) "if cond" modifies the prior match; if the condition evaluates as
>> falsey, the match does not match.
>> 4) As evidenced below, a comma-separated list of comparisons matches a
>> tuple with as many elements, and each element must match.
>>
>> Ultimately, this has to be a series of conditions, so this is
>> effectively a syntax for an elif tree as an expression.
>>
>> For another example, here's a way to use inequalities to pick a
>> numeric formatting:
>>
>> display = match number:
>> x if x < 1e3: f"{number}"
>> x if x < 1e6: f"{number/1e3} thousand"
>> x if x < 1e9: f"** {number/1e6} million **"
>> x if x < 1e12: f"an incredible {number/1e9} billion"
>> _: "way WAY too many"
>>
>> I guarantee you that people are going to ask for this to be spelled
>> simply "< 1e3" instead of having the "x if x" part. :)
>>
>> > How about this?
>> >
>> > def hyperop(n, a, b):
>> > return match (n, a, b):
>> > (0, _, b) => b + 1
>> > (1, a, 0) => a
>> > (2, _, 0) => 0
>> > (_, _, 0) => 1
>> > (n, a, b) => hyperop(n-1, a, hyperop(n, a, b-1))
>> >
>> > versus:
>> >
>> > def hyperop(n, a, b):
>> > if n == 0:
>> > return b + 1
>> > if n == 1 and b == 0:
>> > return a
>> > if n == 2 and b == 0:
>> > return 0
>> > if b == 0:
>> > return 1
>> > return hyperop(n-1, a, hyperop(n, a, b-1))
>>
>> I have no idea what this is actually doing, and it looks like a port
>> of Haskell code. I'd want to rewrite it as a 'while' loop with maybe
>> one level of recursion in it, instead of two. (Zero would be better,
>> but I think that's not possible. Maybe?) Is this something that you do
>> a lot of? Is the tuple (n, a, b) meaningful as a whole, or are the
>> three values independently of interest?
>>
>> Not sure how this is useful without a lot more context.
>>
>> ChrisA
>> _______________________________________________
>> Python-ideas mailing list
>> Python...(a)python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas(a)python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
A brain dump, inspired by various use cases that came up during the
binding expression discussions.
Idea: introduce a "local" pseudo-function to capture the idea of
initialized names with limited scope.
As an expression, it's
"local" "(" arguments ")"
- Because it "looks like" a function call, nobody will expect the targets
of named arguments to be fancier than plain names.
- `a=12` in the "argument" list will (& helpfully so) mean pretty much the
same as "a=12" in a "def" statement.
- In a "local call" on its own, the scope of a named argument begins at the
start of the next (if any) argument, and ends at the closing ")". For the
duration, any variable of the same name in an enclosing scope is shadowed.
- The parentheses allow for extending over multiple lines without needing
to teach editors (etc) any new tricks (they already know how to format
function calls with arglists spilling over multiple lines).
- The _value_ of a local "call" is the value of its last "argument". In
part, this is a way to sneak in C's comma operator without adding cryptic
new line noise syntax.
Time for an example. First a useless one:
a = 1
b = 2
c = local(a=3) * local(b=4)
Then `c` is 12, but `a` is still 1 and `b` is still 2. Same thing in the end:
c = local(a=3, b=4, a*b)
And just to be obscure, also the same:
c = local(a=3, b=local(a=2, a*a), a*b)
There the inner `a=2` temporarily shadows the outer `a=3` just long
enough to compute `a*a` (4).
This is one that little else really handled nicely:
r1, r2 = local(D = b**2 - 4*a*c,
sqrtD = math.sqrt(D),
twoa = 2*a,
((-b + sqrtD)/twoa, (-b - sqrtD)/twoa))
Everyone's favorite:
if local(m = re.match(regexp, line)):
print(m.group(0))
Here's where it's truly essential that the compiler know everything
about "local", because in _that_ context it's required that the new
scope extend through the end of the entire block construct (exactly
what that means TBD - certainly through the end of the `if` block, but
possibly also through the end of its associated (if any) `elif` and
`else` blocks - and similarly for while/else constructs).
Of course that example could also be written as:
if local(m = re.match(regexp, line), m):
print(m.group(0))
or more specifically:
if local(m = re.match(regexp, line), m is not None):
print(m.group(0))
or even:
if local(m = re.match(regexp, line)) is not None:
print(m.group(0))
A listcomp example, building the squares of integers from an iterable
but only when the square is a multiple of 18:
squares18 = [i2 for i in iterable if local(i2=i*i) % 18 == 0]
That's a bit mind-bending, but becomes clear if you picture the
kinda-equivalent nest:
for i in iterable:
if local(i2=i*i) % 18 == 0:
append i2 to the output list
That should also make clear that if `iterable` or `i` had been named
`i2` instead, no problem. The `i2` created by `local()` is in a
wholly enclosed scope.
Drawbacks: since this is just a brain dump, absolutely none ;-)
Q: Some of those would be clearer if it were the more Haskell-like
local(...) "in" expression
A: Yup, but for some of the others needing to add "in m" would be
annoyingly redundant noise. Making an "in" clause optional doesn't
really fly either, because then
local(a='z') in 'xyz'
would be ambiguous. Is it meant to return `'xyz'`, or evaluate `'z'
in 'xyz'`? And any connector other than "in" would make the loose
resemblance to Haskell purely imaginary ;-)
Q: Didn't you drone on about how assignment expressions with complex
targets seemed essentially useless without also introducing a "comma
operator" - and now you're sneaking the latter in but _still_ avoiding
complex targets?!
A. Yes, and yes :-) The syntactic complexity of the fully general
assignment statement is just too crushing to _sanely_ shoehorn into
any "expression-like" context.
Q: What's the value of this? local(a=7, local(a=a+1, a*2))
A: 16. Obviously.
Q: Wow - that _is_ obvious! OK, what about this, where there is no
`a` in any enclosing scope: local(a)
A: I think it should raise NameError, just like a function call would.
There is no _intent_ here to allow merely declaring a local variable
without supplying an initial value.
Q: What about local(2, 4, 5)?
A: It should return 5, and introduce no names. I don't see a point to
trying to outlaw stupidity ;-) Then again, it would be consistent
with the _intent_ to require that all but the last "argument" be of
the `name=expression` form.
Q: Isn't changing the meaning of scope depending on context waaaay magical?
A: Yup! But in a language with such a strong distinction between
statements and expressions, without a bit of deep magic there's no
single syntax I can dream up that could work well for both that didn't
require _some_ deep magic. The gimmick here is something I expect
will be surprising the first time it's seen, less so the second, and
then you're never confused about it again.
Q: Are you trying to kill PEP 572?
A: Nope! But since this largely subsumes the functionality of binding
expressions, I did want to put this out there before 572's fate is
history. Binding expressions are certainly easier to implement, and I
like them just fine :-)
Note: the thing I'm most interested in isn't debates, but in whether
this would be of real use in real code.
Hi all,
I've been following the discussion of assignment expressions and what the
syntax for them should be for awhile. The ones that seem to crop up most
are the original spelling, :=, the "as" keyword (and variants including
it), and the recently "local" pseudo-function idea.
I have another idea for a spelling of assignment expressions, mostly
inspired by real sentence structure. In a review I was writing recently, I
referred to CGI, Computer Generated Imagery. As a way of introducing the
full term and its abbreviation, I said "Computer Generated Imagery (CGI)".
That got me thinking: why not have something similar in Python? Obviously
simple parentheses ("expr(name)") definitely wouldn't work, that's a
function call. Similarly, brackets ("expr[name]") would be interpreted as
subscription. So why not use curly brackets, "expr{name}"? That doesn't
conflict with anything (currently it's a syntax error), and could be
documented appropriately.
Going back to the regex example, this is how it would look in that case:
if re.match(exp, string){m}:
print(m.group(0))
I am currently unsure how it would affect scope. I think it should be
effectively equivalent to a regular assignment statement (and hence follow
identical scope rules), i.e. the above example would be equivalent to the
following in all ways except syntax:
m = re.match(exp, string):
if m:
print(m.group(0))
Thoughts? Please do let me know if there's some critical flaw with this
idea that I missed (or tell me if it's an amazing idea ;)), and just give
feedback, I guess.
Sincerely,
Ken;