Towards statically checked ORM projections
Dear typing-sig, I've noticed recently that there is practically no support for type-safe ORM/ODM projections in the broader Python ecosystem. A little context: ORMs (object-relational mapper) and ODM (object-document mapper/mapping) are tools or libraries for interacting with databases. SQLAlchemy and the Django ORM are probably the most famous examples. Basically, you model a database table as a class, and rows in the table are instances of that class. ODMs are a little simpler than ORMs, they practically only map rows/documents to instances of a class, ORMs are more complex. Writing a type-safe ODM nowadays is super simple, and there are loads out there. The idea is this: ``` @dataclass class User: id: int username: str user = await fetch(User, id=1) ``` This can be made to work with the proper type annotations without much fuss. Now, the issue is doing projections in a type-safe manner. A projection basically means loading only a subset of the fields, usually for performance. Let's say that we're only interested in the user username (imagine there are 30 other fields in the class that we don't care about). That would look kinda like: ``` from dataclasses import fields user_projection: tuple[str] = await fetch_projection(User, id=1, fields(User)[1]) ``` This can't really be very type-safe since Mypy treats `fields(User)[1]` as `dataclasses.Field*[Any]`. Now, if Mypy treated it as `dataclasses.Field*[str]`, I assume that would be a different story, and the function could be annotated to return a 1-tuple of `str`. As mentioned, I've looked at a bunch of ORM/ODM libraries (and written a few internally) and as far as I can tell this use-case is very much unsupported as of yet. (I would appreciate counter-examples, obviously!) This made me sad so I came to the list to see if there's anything to be done. To be super honest I'm more interested in getting attrs support for this, but attrs and dataclasses are so similar I figured if someone did the work in the dataclass plugin, the logic could also be ported over to the attrs plugin. attrs also has a nicer API for actually getting the fields, so the equivalent attrs example would be: ``` from attrs import fields as f user_projection = await fetch_projection(User, id=1, f(User).username) ``` Wouldn't it be super cool if Mypy (or other type checkers) could check this statically?
user_projection: tuple[str] = await fetch_projection(User, id=1, fields(User)[1]) This can't really be very type-safe since Mypy treats `fields(User)[1]` as `dataclasses.Field*[Any]`. Now, if Mypy treated it as `dataclasses.Field*[str]`, I assume that would be a different story, and the function could be annotated to return a 1-tuple of `str`.
Typecheckers would have to special-case `dataclasses.fields(Foo)` to return a tuple of Fields from the specific dataclass. So, `fields(User)` would have type `Tuple[Field[int], Field[str]]`. In your example, `fields(User)[1]` would have type `Field[str]`. This approach would work when projecting a single field or a fixed number of fields. # Variadic tuple + Map To support projecting *arbitrary* numbers of fields, we can use variadic tuples (PEP 646 [1]) and the `Map` operator (to be introduced in a follow-up PEP). As a concrete example, sqlalchemy's `Session.query` accepts arbitrary columns (or classes) and returns a Query object. A Query object is basically an iterator of tuples. Example of `query` from the sqlalchemy docs [2]: ```python class User(Base): __tablename__ = 'users' id = Column(Integer, Sequence('user_id_seq'), primary_key=True) name = Column(String(50)) fullname = Column(String(50)) nickname = Column(String(50))
for name, fullname in session.query(User.name, User.fullname): ... print(name, fullname) ed Ed Jones wendy Wendy Williams mary Mary Contrary fred Fred Flintstone
We could type the `query` function as follows:
```python
# Generic alias to capture the type of a class or a column (i.e., a field).
ClassOrColumn = Type[T] | Column[T]
Ts = TypeVarTuple("Ts")
class Session:
def query(self, *args: *Map[ClassOrColumn, Ts]) -> Query[Tuple[*Ts]]:
...
```python # (1) reveal_type(User.name) # => Column[str] # actually Column[Optional[str]], but keeping it simple here reveal_type(User.fullname) # => Column[str] session.query(User.name, Query.fullname) # => Query[Tuple[str, str]] # For the above function call, the `query` function behaves as if it were the following: def query(self, entity1: ClassOrColumn[T1], entity2: ClassOrColumn[T2]) -> Query[Tuple[T1, T2]]: ... ``` Step-by-step explanation: + Because `query` is given two arguments, `Ts` is seen as a tuple of two TypeVars: `Tuple[T1, T2]`. + The `Map[ClassOrColumn, Ts]` maps `ClassOrColumn` over each element of `Tuple[T1, T2]` to give `Tuple[ClassOrColumn[T1], ClassOrColumn[T2]]`. + Finally, using `*args: *<some_tuple>` means that it will accept arguments corresponding to the tuple. + So, `*args: *Tuple[ClassOrColumn[T1], ClassOrColumn[T2]]` means it will accept two arguments, one of type `ClassOrColumn[T1]` and another of `ClassOrColumn[T2]`. + That binds `T1` to `str` and `T2` to `str`, giving a return type of `Query[Tuple[str, str]]`. Other examples follow similarly: ```python # (2) session.query(User, User.name, User.fullname) # => Query[Tuple[User, str, str]] ``` We can adapt the above approach to the ORM/ODM projection functions you had in mind. But, first, both PEP 646 and the yet-to-be-submitted `Map` PEP have to be accepted :) If you're interested in these developments, you could attend the monthly "Tensor typing" meetings announced on this list or read the meeting minutes [3]. [1]: https://www.python.org/dev/peps/pep-0646/ [2]: https://docs.sqlalchemy.org/en/14/orm/tutorial.html Note that `query` omits the tuple when there is just one argument. So, we'd need an `overload` for that case. [3]: https://mail.python.org/archives/list/typing-sig@python.org/message/52SAAWTG... On Fri, Apr 23, 2021 at 10:18 AM Tin Tvrtković <tinchester@gmail.com> wrote:
Dear typing-sig,
I've noticed recently that there is practically no support for type-safe ORM/ODM projections in the broader Python ecosystem.
A little context: ORMs (object-relational mapper) and ODM (object-document mapper/mapping) are tools or libraries for interacting with databases. SQLAlchemy and the Django ORM are probably the most famous examples. Basically, you model a database table as a class, and rows in the table are instances of that class. ODMs are a little simpler than ORMs, they practically only map rows/documents to instances of a class, ORMs are more complex.
Writing a type-safe ODM nowadays is super simple, and there are loads out there. The idea is this:
``` @dataclass class User: id: int username: str
user = await fetch(User, id=1) ```
This can be made to work with the proper type annotations without much fuss.
Now, the issue is doing projections in a type-safe manner. A projection basically means loading only a subset of the fields, usually for performance. Let's say that we're only interested in the user username (imagine there are 30 other fields in the class that we don't care about). That would look kinda like:
``` from dataclasses import fields
user_projection: tuple[str] = await fetch_projection(User, id=1, fields(User)[1]) ```
This can't really be very type-safe since Mypy treats `fields(User)[1]` as `dataclasses.Field*[Any]`. Now, if Mypy treated it as `dataclasses.Field*[str]`, I assume that would be a different story, and the function could be annotated to return a 1-tuple of `str`.
As mentioned, I've looked at a bunch of ORM/ODM libraries (and written a few internally) and as far as I can tell this use-case is very much unsupported as of yet. (I would appreciate counter-examples, obviously!) This made me sad so I came to the list to see if there's anything to be done.
To be super honest I'm more interested in getting attrs support for this, but attrs and dataclasses are so similar I figured if someone did the work in the dataclass plugin, the logic could also be ported over to the attrs plugin. attrs also has a nicer API for actually getting the fields, so the equivalent attrs example would be:
``` from attrs import fields as f
user_projection = await fetch_projection(User, id=1, f(User).username) ```
Wouldn't it be super cool if Mypy (or other type checkers) could check this statically? _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: gohanpra@gmail.com
-- S Pradeep Kumar
TypeScript has a way to do this, doesn't it? I'm just copying this from a book: type Pick<T, K extends keyof T> = { [propname in K]: T[propname] } (FWIW This is the kind of stuff for which I had wanted to relax annotation syntax. But reading the python-dev tealeaves it doesn't look like it's in our future.) On Fri, Apr 23, 2021 at 10:18 AM Tin Tvrtković <tinchester@gmail.com> wrote:
Dear typing-sig,
I've noticed recently that there is practically no support for type-safe ORM/ODM projections in the broader Python ecosystem.
A little context: ORMs (object-relational mapper) and ODM (object-document mapper/mapping) are tools or libraries for interacting with databases. SQLAlchemy and the Django ORM are probably the most famous examples. Basically, you model a database table as a class, and rows in the table are instances of that class. ODMs are a little simpler than ORMs, they practically only map rows/documents to instances of a class, ORMs are more complex.
Writing a type-safe ODM nowadays is super simple, and there are loads out there. The idea is this:
``` @dataclass class User: id: int username: str
user = await fetch(User, id=1) ```
This can be made to work with the proper type annotations without much fuss.
Now, the issue is doing projections in a type-safe manner. A projection basically means loading only a subset of the fields, usually for performance. Let's say that we're only interested in the user username (imagine there are 30 other fields in the class that we don't care about). That would look kinda like:
``` from dataclasses import fields
user_projection: tuple[str] = await fetch_projection(User, id=1, fields(User)[1]) ```
This can't really be very type-safe since Mypy treats `fields(User)[1]` as `dataclasses.Field*[Any]`. Now, if Mypy treated it as `dataclasses.Field*[str]`, I assume that would be a different story, and the function could be annotated to return a 1-tuple of `str`.
As mentioned, I've looked at a bunch of ORM/ODM libraries (and written a few internally) and as far as I can tell this use-case is very much unsupported as of yet. (I would appreciate counter-examples, obviously!) This made me sad so I came to the list to see if there's anything to be done.
To be super honest I'm more interested in getting attrs support for this, but attrs and dataclasses are so similar I figured if someone did the work in the dataclass plugin, the logic could also be ported over to the attrs plugin. attrs also has a nicer API for actually getting the fields, so the equivalent attrs example would be:
``` from attrs import fields as f
user_projection = await fetch_projection(User, id=1, f(User).username) ```
Wouldn't it be super cool if Mypy (or other type checkers) could check this statically? _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: guido@python.org
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
Thank you for bringing this up! I hope popular web frameworks start adopting dataclasses for representing their models and use type safe method chaining APIs to interface with business logic. On Fri, Apr 23, 2021 at 10:18 AM Tin Tvrtković <tinchester@gmail.com> wrote:
``` from dataclasses import fields
user_projection: tuple[str] = await fetch_projection(User, id=1, fields(User)[1]) ```
In fquery [1], this would be spelled as: UserQuery([1]).project(["username"]).to_json().send() My request is to consider that some of the implementations may not be presenting a flat result set. For example this query: resp = ( UserQuery([1]) .edge("friends") .edge("friends") .project(["name", ":id"]) .take(3) .to_json() .send() ) produces: https://github.com/adsharma/fquery/blob/main/tests/test_data/test_data_two_h... Without the "to_json()" it produces a graph of similarly nested python objects. Type checkers that can understand these queries and provide type safety to consumers of the result set would be a great reason for web frameworks to adopt this vs the status quo where the internals of a relational database are exposed via a python API. -Arun [1] https://github.com/adsharma/fquery/
If you want that to happen you should probably lobby specific web frameworks (e.g. by starting a discussion in their issue tracker). On Tue, May 18, 2021 at 06:32 Arun Sharma <arun@sharma-home.net> wrote:
Thank you for bringing this up! I hope popular web frameworks start adopting dataclasses for representing their models and use type safe method chaining APIs to interface with business logic.
On Fri, Apr 23, 2021 at 10:18 AM Tin Tvrtković <tinchester@gmail.com> wrote:
```
from dataclasses import fields
user_projection: tuple[str] = await fetch_projection(User, id=1, fields(User)[1]) ```
In fquery [1], this would be spelled as:
UserQuery([1]).project(["username"]).to_json().send()
My request is to consider that some of the implementations may not be presenting a flat result set. For example this query:
resp = ( UserQuery([1]) .edge("friends") .edge("friends") .project(["name", ":id"]) .take(3) .to_json() .send() )
produces:
https://github.com/adsharma/fquery/blob/main/tests/test_data/test_data_two_h...
Without the "to_json()" it produces a graph of similarly nested python objects.
Type checkers that can understand these queries and provide type safety to consumers of the result set would be a great reason for web frameworks to adopt this vs the status quo where the internals of a relational database are exposed via a python API.
-Arun
[1] https://github.com/adsharma/fquery/ _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: guido@python.org
-- --Guido (mobile)
Like Arun says, I was interested in solving this (or at least a critical subset of this) at the level of dataclasses/attrs since I figure those can be used downstream. Since the discussion sprung back to life somehow, a progress update: I've started a PR to the attrs plugin with the necessary updates: https://github.com/python/mypy/pull/10467. David Euresti has volunteered to provide feedback while I polish it up and fix the tests. I'm very new to the Mypy codebase so it might be a while though. On Tue, May 18, 2021 at 4:28 PM Guido van Rossum <guido@python.org> wrote:
If you want that to happen you should probably lobby specific web frameworks (e.g. by starting a discussion in their issue tracker).
On Tue, May 18, 2021 at 06:32 Arun Sharma <arun@sharma-home.net> wrote:
Thank you for bringing this up! I hope popular web frameworks start adopting dataclasses for representing their models and use type safe method chaining APIs to interface with business logic.
On Fri, Apr 23, 2021 at 10:18 AM Tin Tvrtković <tinchester@gmail.com> wrote:
```
from dataclasses import fields
user_projection: tuple[str] = await fetch_projection(User, id=1, fields(User)[1]) ```
In fquery [1], this would be spelled as:
UserQuery([1]).project(["username"]).to_json().send()
My request is to consider that some of the implementations may not be presenting a flat result set. For example this query:
resp = ( UserQuery([1]) .edge("friends") .edge("friends") .project(["name", ":id"]) .take(3) .to_json() .send() )
produces:
https://github.com/adsharma/fquery/blob/main/tests/test_data/test_data_two_h...
Without the "to_json()" it produces a graph of similarly nested python objects.
Type checkers that can understand these queries and provide type safety to consumers of the result set would be a great reason for web frameworks to adopt this vs the status quo where the internals of a relational database are exposed via a python API.
-Arun
[1] https://github.com/adsharma/fquery/ _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: guido@python.org
-- --Guido (mobile)
Awesome, that sounds like a good plan! On Tue, May 18, 2021 at 7:41 AM Tin Tvrtković <tinchester@gmail.com> wrote:
Like Arun says, I was interested in solving this (or at least a critical subset of this) at the level of dataclasses/attrs since I figure those can be used downstream.
Since the discussion sprung back to life somehow, a progress update: I've started a PR to the attrs plugin with the necessary updates: https://github.com/python/mypy/pull/10467. David Euresti has volunteered to provide feedback while I polish it up and fix the tests. I'm very new to the Mypy codebase so it might be a while though.
On Tue, May 18, 2021 at 4:28 PM Guido van Rossum <guido@python.org> wrote:
If you want that to happen you should probably lobby specific web frameworks (e.g. by starting a discussion in their issue tracker).
On Tue, May 18, 2021 at 06:32 Arun Sharma <arun@sharma-home.net> wrote:
Thank you for bringing this up! I hope popular web frameworks start adopting dataclasses for representing their models and use type safe method chaining APIs to interface with business logic.
On Fri, Apr 23, 2021 at 10:18 AM Tin Tvrtković <tinchester@gmail.com> wrote:
```
from dataclasses import fields
user_projection: tuple[str] = await fetch_projection(User, id=1, fields(User)[1]) ```
In fquery [1], this would be spelled as:
UserQuery([1]).project(["username"]).to_json().send()
My request is to consider that some of the implementations may not be presenting a flat result set. For example this query:
resp = ( UserQuery([1]) .edge("friends") .edge("friends") .project(["name", ":id"]) .take(3) .to_json() .send() )
produces:
https://github.com/adsharma/fquery/blob/main/tests/test_data/test_data_two_h...
Without the "to_json()" it produces a graph of similarly nested python objects.
Type checkers that can understand these queries and provide type safety to consumers of the result set would be a great reason for web frameworks to adopt this vs the status quo where the internals of a relational database are exposed via a python API.
-Arun
[1] https://github.com/adsharma/fquery/ _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: guido@python.org
-- --Guido (mobile)
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
Thank you for that suggestion. Opened one for Django here: https://code.djangoproject.com/ticket/32759 -Arun On Tue, May 18, 2021 at 7:29 AM Guido van Rossum <guido@python.org> wrote:
If you want that to happen you should probably lobby specific web frameworks (e.g. by starting a discussion in their issue tracker).
On Tue, May 18, 2021 at 06:32 Arun Sharma <arun@sharma-home.net> wrote:
Thank you for bringing this up! I hope popular web frameworks start adopting dataclasses for representing their models and use type safe method chaining APIs to interface with business logic.
On Fri, Apr 23, 2021 at 10:18 AM Tin Tvrtković <tinchester@gmail.com> wrote:
```
from dataclasses import fields
user_projection: tuple[str] = await fetch_projection(User, id=1, fields(User)[1]) ```
In fquery [1], this would be spelled as:
UserQuery([1]).project(["username"]).to_json().send()
My request is to consider that some of the implementations may not be presenting a flat result set. For example this query:
resp = ( UserQuery([1]) .edge("friends") .edge("friends") .project(["name", ":id"]) .take(3) .to_json() .send() )
produces:
https://github.com/adsharma/fquery/blob/main/tests/test_data/test_data_two_h...
Without the "to_json()" it produces a graph of similarly nested python objects.
Type checkers that can understand these queries and provide type safety to consumers of the result set would be a great reason for web frameworks to adopt this vs the status quo where the internals of a relational database are exposed via a python API.
-Arun
[1] https://github.com/adsharma/fquery/ _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: guido@python.org
-- --Guido (mobile) _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: arun.sharma@gmail.com
Thank you for bringing this up! I hope popular web frameworks start adopting dataclasses for representing their models and use type safe method chaining APIs to interface with business logic. On Fri, Apr 23, 2021 at 10:18 AM Tin Tvrtković <tinchester@gmail.com> wrote:
``` from dataclasses import fields
user_projection: tuple[str] = await fetch_projection(User, id=1, fields(User)[1]) ```
In fquery [1], this would be spelled as: UserQuery([1]).project(["username"]).to_json().send() My request is to consider that some of the implementations may not be presenting a flat result set. For example this query: resp = ( UserQuery([1]) .edge("friends") .edge("friends") .project(["name", ":id"]) .take(3) .to_json() .send() ) produces: https://github.com/adsharma/fquery/blob/main/tests/test_data/test_data_two_h... Without the "to_json()" it produces a graph of similarly nested python objects. Type checkers that can understand these queries and provide type safety to consumers of the result set would be a great reason for web frameworks to adopt this vs the status quo where the internals of a relational database are exposed via a python API. -Arun [1] https://github.com/adsharma/fquery/
On Fri, Apr 23, 2021 at 7:18 PM Tin Tvrtković <tinchester@gmail.com> wrote:
Dear typing-sig,
I've noticed recently that there is practically no support for type-safe ORM/ODM projections in the broader Python ecosystem.
I was under the impression that SQLAlchemy is moving in this direction with the 1.4 release: https://docs.sqlalchemy.org/en/14/orm/mapping_styles.html#declarative-mappin... and: https://docs.sqlalchemy.org/en/14/orm/mapping_styles.html#imperative-mapping... I have not used SQLAlchemy 1.4 yet so I can't report on the effectiveness of this approach yet. S. -- Stefane Fermigier - http://fermigier.com/ - http://twitter.com/sfermigier - http://linkedin.com/in/sfermigier Founder & CEO, Abilian - Enterprise Social Software - http://www.abilian.com/ Chairman, National Council for Free & Open Source Software (CNLL) - http://cnll.fr/ Founder & Organiser, PyParis & PyData Paris - http://pyparis.org/ & http://pydata.fr/
On Tue, May 18, 2021 at 9:12 AM Stéfane Fermigier <sf@fermigier.com> wrote:
On Fri, Apr 23, 2021 at 7:18 PM Tin Tvrtković <tinchester@gmail.com> wrote:
Dear typing-sig,
I've noticed recently that there is practically no support for type-safe ORM/ODM projections in the broader Python ecosystem.
I was under the impression that SQLAlchemy is moving in this direction with the 1.4 release:
https://docs.sqlalchemy.org/en/14/orm/mapping_styles.html#declarative-mappin...
Why not: https://github.com/adsharma/dataclasses-sql/blob/master/tests/test_decorator... @dataclass @sql class Car: brand: str = field(metadata={"key": True}) model: str = field(metadata={"key": True}) mileage: float This is also built on top of sqlalchemy. -Arun
On Tue, May 18, 2021 at 9:33 AM Arun Sharma <arun@sharma-home.net> wrote:
Why not:
https://github.com/adsharma/dataclasses-sql/blob/master/tests/test_decorator...
Please ignore. The second example here does something equivalent and seems to be well thought out. https://docs.sqlalchemy.org/en/14/orm/mapping_styles.html#example-two-datacl... -Arun
participants (6)
-
Arun Sharma
-
Arun Sharma
-
Guido van Rossum
-
S Pradeep Kumar
-
Stéfane Fermigier
-
Tin Tvrtković