Where I work we have some teams using flake8 and some teams that use pylint, and while pylint is more thorough, it is also slower and pickier, and the general sense is to strongly prefer flake8.
I honestly expect that running either with close-to-default flags on stdlib code would be a nightmare, and I wouldn't want *any* directives for either one to appear in stdlib code, ever.
In some ideal future all code would just be reformatted before it's checked in -- we're very far from that, and I used to be horrified by the very idea, but in the Go world this is pretty much standard practice, and the people at work who are using it are loving it. So I'm trying to have an open mind about this. But there simply isn't a tool that does a good enough job of this.
What I was thinking of was a much weaker option like tabnanny.py by Tim Peters (still in the stdlib!), but I don't know whether this is feasible.
What we need now is not more opinions on which formatter or linter is best. We need someone to actually do some work and estimate how much code would be changed if we ran e.g. tabnanny.py (or something more advanced!) over the entire stdlib, how much code would break (even the most conservative formatter sometimes breaks code that wasn't expecting to be reformatted -- e.g. we used to have tests with significant trailing whitespace), and how often the result would be just too ugly to look at. If you're not willing to work on that, please don't respond to this thread.
--Guido