There's a whole matrix of these and I'm wondering why the matrix is
currently sparse rather than implementing them all. Or rather, why we
can't stack them as:
class foo(object):
@classmethod
@property
def bar(cls, ...):
...
Essentially the permutation are, I think:
{'unadorned'|abc.abstract}{'normal'|static|class}{method|property|non-callable
attribute}.
concreteness
implicit first arg
type
name
comments
{unadorned}
{unadorned}
method
def foo():
exists now
{unadorned} {unadorned} property
@property
exists now
{unadorned} {unadorned} non-callable attribute
x = 2
exists now
{unadorned} static
method @staticmethod
exists now
{unadorned} static property @staticproperty
proposing
{unadorned} static non-callable attribute {degenerate case -
variables don't have arguments}
unnecessary
{unadorned} class
method @classmethod
exists now
{unadorned} class property @classproperty or @classmethod;@property
proposing
{unadorned} class non-callable attribute {degenerate case - variables
don't have arguments}
unnecessary
abc.abstract {unadorned} method @abc.abstractmethod
exists now
abc.abstract {unadorned} property @abc.abstractproperty
exists now
abc.abstract {unadorned} non-callable attribute
@abc.abstractattribute or @abc.abstract;@attribute
proposing
abc.abstract static method @abc.abstractstaticmethod
exists now
abc.abstract static property @abc.staticproperty
proposing
abc.abstract static non-callable attribute {degenerate case -
variables don't have arguments} unnecessary
abc.abstract class method @abc.abstractclassmethod
exists now
abc.abstract class property @abc.abstractclassproperty
proposing
abc.abstract class non-callable attribute {degenerate case -
variables don't have arguments} unnecessary
I think the meanings of the new ones are pretty straightforward, but in
case they are not...
@staticproperty - like @property only without an implicit first
argument. Allows the property to be called directly from the class
without requiring a throw-away instance.
@classproperty - like @property, only the implicit first argument to the
method is the class. Allows the property to be called directly from the
class without requiring a throw-away instance.
@abc.abstractattribute - a simple, non-callable variable that must be
overridden in subclasses
@abc.abstractstaticproperty - like @abc.abstractproperty only for
@staticproperty
@abc.abstractclassproperty - like @abc.abstractproperty only for
@classproperty
--rich
At the moment, the array module of the standard library allows to
create arrays of different numeric types and to initialize them from
an iterable (eg, another array).
What's missing is the possiblity to specify the final size of the
array (number of items), especially for large arrays.
I'm thinking of suffix arrays (a text indexing data structure) for
large texts, eg the human genome and its reverse complement (about 6
billion characters from the alphabet ACGT).
The suffix array is a long int array of the same size (8 bytes per
number, so it occupies about 48 GB memory).
At the moment I am extending an array in chunks of several million
items at a time at a time, which is slow and not elegant.
The function below also initializes each item in the array to a given
value (0 by default).
Is there a reason why there the array.array constructor does not allow
to simply specify the number of items that should be allocated? (I do
not really care about the contents.)
Would this be a worthwhile addition to / modification of the array module?
My suggestions is to modify array generation in such a way that you
could pass an iterator (as now) as second argument, but if you pass a
single integer value, it should be treated as the number of items to
allocate.
Here is my current workaround (which is slow):
def filled_array(typecode, n, value=0, bsize=(1<<22)):
"""returns a new array with given typecode
(eg, "l" for long int, as in the array module)
with n entries, initialized to the given value (default 0)
"""
a = array.array(typecode, [value]*bsize)
x = array.array(typecode)
r = n
while r >= bsize:
x.extend(a)
r -= bsize
x.extend([value]*r)
return x
I just spent a few minutes staring at a bug caused by a missing comma
-- I got a mysterious argument count error because instead of foo('a',
'b') I had written foo('a' 'b').
This is a fairly common mistake, and IIRC at Google we even had a lint
rule against this (there was also a Python dialect used for some
specific purpose where this was explicitly forbidden).
Now, with modern compiler technology, we can (and in fact do) evaluate
compile-time string literal concatenation with the '+' operator, so
there's really no reason to support 'a' 'b' any more. (The reason was
always rather flimsy; I copied it from C but the reason why it's
needed there doesn't really apply to Python, as it is mostly useful
inside macros.)
Would it be reasonable to start deprecating this and eventually remove
it from the language?
--
--Guido van Rossum (python.org/~guido)
Dear all,
I guess this is so obvious that someone must have suggested it before:
in list comprehensions you can currently exclude items based on the if
conditional, e.g.:
[n for n in range(1,1000) if n % 4 == 0]
Why not extend this filtering by allowing a while statement in addition to
if, as in:
[n for n in range(1,1000) while n < 400]
Trivial effect, I agree, in this example since you could achieve the same by
using range(1,400), but I hope you get the point.
This intuitively understandable extension would provide a big speed-up for
sorted lists where processing all the input is unnecessary.
Consider this:
some_names=["Adam", "Andrew", "Arthur", "Bob", "Caroline","Lancelot"] #
a sorted list of names
[n for n in some_names if n.startswith("A")]
# certainly gives a list of all names starting with A, but .
[n for n in some_names while n.startswith("A")]
# would have saved two comparisons
Best,
Wolfgang
Wouldn't it be nice if this
>>> import ast
>>> print repr(ast.parse("(1 + 1)").body[0].value)
<_ast.BinOp object at 0x0000000001E94B38>
printed something more useful?
>>> print repr(ast.parse("(1 + 1)").body[0].value)
BinOp(left=Num(n=1), op=Add(), right=Num(n=1))
I've been doing some work on macropy <https://github.com/lihaoyi/macropy>,
which uses the ast.* classes extensively, and it's annoying that we have to
resort to dirty-tricks like monkey-patching the AST classes (for CPython
2.7) or even monkey-patching __builtin__.repr (to get it working on PyPy)
just to get
eval(repr(my_ast)) == my_ast
to hold true. And a perfectly good solution already exists in the
ast.dump() method, too! (It would also be nice if "==" did a structural
comparison on the ast.* classes too, but that's a different issue).
-Haoyi
Gzip files can contains an extra field [1] and some applications use
this for extending gzip format. The current GzipFile implementation
ignores this field on input and doesn't allow to create a new file with
an extra field.
ZIP file entries also can contains an extra field [2]. Currently it just
saved as bytes in the `extra` attribute of ZipInfo.
I propose to save an extra field for gzip file and provide structural
access to subfields.
f = gzip.GzipFile('somefile.gz', 'rb')
f.extra_bytes # A raw extra field as bytes
# iterating over all subfields
for xid, data in f.extra_map.items():
...
# get Apollo file type information
f.extra_map[b'AP'] # (or f.extra_map['AP']?)
# creating gzip file with extra field
f = gzip.GzipFile('somefile.gz', 'wb', extra=extrabytes)
f = gzip.GzipFile('somefile.gz', 'wb', extra=[(b'AP', apollodata)])
f = gzip.GzipFile('somefile.gz', 'wb', extra={b'AP': apollodata})
# change Apollo file type information
f.extra_map[b'AP'] = ...
Issue #17681 [3] has preliminary patches. There is some open doubt about
interface. Is not it over-engineered?
Currently GzipFile supports seamless reading a sequence of separately
compressed gzip files. Every such chunk can have own extra field (this
is used in dictzip for example). It would be desirable to be able to
read only until the end of current chunk in order not to miss an extra
field.
[1] http://www.gzip.org/format.txt
[2] http://www.pkware.com/documents/casestudies/APPNOTE.TXT
[3] http://bugs.python.org/issue17681
FWIW, I am +1 on for the ability to read YAML based configs Python
without dependencies, but waiting for several years is hard.
Maybe try an alternative data driven development process as opposed
to traditional PEP based all-or-nothing style to speed up the process?
It is possible to make users happy incrementally and keep development
fun without sacrificing too much on the Zen side. If open source is
about scratching your own itches, then the most effective way to
implement a spec would be to allow people add support for their own
flavor without disrupting works of the others.
For some reason I think most people don't need full YAML
speccy, especially if final implementation will be slow and heavy.
So instead of:
import yaml
I'd start with something more generic and primitive:
from datatrans import yamlish
Where `datatrans` is data transformation framework taking care
of usual text parsing (data partitioning), partition mapping (structure
transformation) and conversion (binary to string etc.) trying to be as fast
and lightweight as possible, bringing vast field for future optimizations on
algorithmic level. `yamlish` is an implementation which is not vastly
optimized (datatrans to yamlish is like RPython to PyPy) and can be
easily extended to cover more YAML (hopefully). Therefore the name -
it doesn't pretend to parse YAML - it parses some supported subset,
which is improved over time by different parties (if datatrans is done right
to provide readable (maintainable + extensible) code for implementation).
There is an exisiting package called `yamlish` on PyPI - I am not talking
about it - it is PyYAML based, which is not an option for now as I see it.
So I stole its name. Sorry. This PyPI package was used to parse TAP
format, which is again, a subset. Subset..
It appears that YAML is good for humans for its subsets. It leaves an
impression (maybe it's just an illusion) that development work for subset
support can also be partitioned. If `datatrans` "done right" is possible, it
will allow incremental addition of new YAML features as the need for
them arises (new data examples are added). Or it can help to build
parsers for YAML subsets that are intentionally limited to make them
performance efficient.
Because `datatrans` is a package isolating parsing, mapping and
conversion parts of the process to make it modular and extensible, it can
serve as a reference point for various kinds of (scientific) papers
including
the ones that prove that such data transformation framework is impossible.
As for `yamlish` submodule, the first important paper covering it will be a
reference table matrix of supported features.
While it all sounds too complicated, driving development by data and real
short term user needs (as opposed to designing everything upfront) will
make the process more attractive. In data driven development, there are not
many things that can break - you either continue parsing previous data or
not. The output from the parsing process may change over time, but it may
be controlled by configuring the last step of data transformation phase.
`Parsing AppEngine config file` or `reading package meta data`
are good starting points. Once package meta data subset is parsed, it is
done and won't break. The implementation for meta data parsing may mature
in distutils package, for AppEngine in its SDK, and merged in either of
those,
sending patches for `datastrans` to stdlib. The question is only to design
output format for the parse stage. I am not sure everything should be
convertible into Python objects using the "best fit" technique.
I will be pretty comfortable if target format will not be native Python
objects
at all. More than that - I will even insist to avoid converting to native
Python
object from the start. The ideal output for the first version should be
generic
tree structure with defined names for YAML elements. The tree that can be
represented as XML where these names are tags. The tree can be
therefore traversed and selected using XPath/JQuery syntax.
It will take several years for implementation to mature and in the end there
will be a plenty of backward compatibility matters with the API, formatting
and serializing. So the first thing I'd do is [abandon serialization]. From
the
point of view of my self-proclaimed data transformation theory, the input
and
output formats are data. If output format is not human readable - as some
linked Python data structures in memory - it wastes time and hinders
development. Serializing Python is a problem of different level, which is an
example of binary, abstract, memory-only output format - a lot of properties
that you don't want to deal with while working with data.
To summarize:
1. full spec support is no goal
2. data driven (using real world examples/stories)
3. incremental (one example/story at a time)
4. research based (beautiful ideas vs ugly real world limitations)
5. maintainable (code is easy to read and understand the structure)
6. extensible (easy to find out the place to be modified)
7. core "generic tree" data structure as an intermediate format and
"yaml tree" data structure as a final format from parsing process
P.S. I am willing to wok on this "data transformation theory" stuff and
prototype implementation, because it is generally useful in many
areas. But I need support.
--
anatoly t.
The recent thread/post/whatever on os.path.join has gotten me thinking. Say
I wanted to join a Windows path...on Ubuntu. This is what I get:
ryan@DevPC-LX:~$ python
Python 2.7.3 (default, Sep 26 2013, 20:03:06)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.path.join('C:\\', 'x.jpg')
'C:\\/x.jpg'
>>>
Isn't something wrong there? My idea: check for \'s in the path. If there
are any, assume \ is the path separator, not /.
--
Ryan
In the comments of
http://python-history.blogspot.com/2013/10/why-python-uses-0-based-indexing…
were some complaints about the interpretation of the bounds for
negative strides, and I have to admin it feels wrong. Where did we go
wrong? For example,
"abcde"[::-1] == "edcba"
as you'd expect, but there is no number you can put as the second bound to
get the same result:
"abcde"[:1:-1] == "edc"
"abcde"[:0:-1] == "edcb"
but
"abcde":-1:-1] == ""
I'm guessing it all comes from the semantics I assigned to negative stride
for range() long ago, unthinkingly combined with the rules for negative
indices.
Are we stuck with this forever? If we want to fix this in Python 4 we'd
have to start deprecating negative stride with non-empty lower/upper bounds
now. And we'd have to start deprecating negative step for range()
altogether, recommending reversed(range(lower, upper)) instead.
Thoughts? Is NumPy also affected?
--
--Guido van Rossum (python.org/~guido)