Would keeping pyarrow optional allow us to bump the minimum version more aggressively than if it becomes required?
yep optional actually allows us to have different version mins as well eg parquet vs csv reader can string could all be different (though possibly confusing)
On Jul 8, 2020, at 3:16 PM, Brock Mendel <jbrockmendel@gmail.com> wrote:
Would keeping pyarrow optional allow us to bump the minimum version more aggressively than if it becomes required? _______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
Definitely, we will require the latest Arrow release quite some time for strings. I guess we will keep bumping the min version for that for at least until the end of the year continuously. On Thu, Jul 9, 2020, at 2:22 AM, Jeff Reback wrote:
yep
optional actually allows us to have different version mins as well
eg parquet vs csv reader can string could all be different (though possibly confusing)
On Jul 8, 2020, at 3:16 PM, Brock Mendel <jbrockmendel@gmail.com> wrote:
Would keeping pyarrow optional allow us to bump the minimum version more aggressively than if it becomes required? _______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
Hi all, To try to clarify my hesitation to decide now about requiring Arrow for the "string" dtype (instead of using it optional, with fallback on the current python object based implementation of StringDtype). One of the goals of the string dtype, apart from potential speed-ups using arrow in the future, is also a better usability: getting rid of the confusing "object" dtype for something simple as strings (a use case many newcomers will see in the first example). And I think that the idea is to make this new string dtype the new default in case you have only string values, somewhere in the relatively (but undecided) near future. For example, let's say we do a 2.0 release next year using string dtype as default for string columns. Requiring Arrow for the string dtype would thus basically mean requiring Arrow for pandas in general, *if* we keep the plan to make this the default. And starting to use Arrow much more in pandas and add it as a required dependency is certainly a discussion we should have, but it's also a much bigger discussion as just the string dtype (how easy is it nowadays to install (including source installations), platform support, install size increase, minimum required dependency and possible conflicts with other packages/systems, ...). At the same time: I think it is rather easy to, at least on the short term, start experimenting with an Arrow-backed string dtype without requiring Arrow for the "string" dtype in general (we already have the Python code for it, we can keep that side by side for now). But we can discuss the details about this on the issue (https://github.com/pandas-dev/pandas/issues/35169). So to summarize: we should discuss this, but I think we should frame the question not as "require Arrow for an experimental, opt-in dtype", but "Arrow as required dependency for pandas". And given that this is a larger discussion: let's see that for now as a separate discussion from advancing the arrow-backed string dtype. Joris On Thu, 9 Jul 2020 at 09:12, Uwe L. Korn <mail@uwekorn.com> wrote:
Definitely, we will require the latest Arrow release quite some time for strings. I guess we will keep bumping the min version for that for at least until the end of the year continuously.
On Thu, Jul 9, 2020, at 2:22 AM, Jeff Reback wrote:
yep
optional actually allows us to have different version mins as well
eg parquet vs csv reader can string could all be different (though possibly confusing)
On Jul 8, 2020, at 3:16 PM, Brock Mendel <jbrockmendel@gmail.com> wrote:
Would keeping pyarrow optional allow us to bump the minimum version more aggressively than if it becomes required? _______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
participants (4)
-
Brock Mendel -
Jeff Reback -
Joris Van den Bossche -
Uwe L. Korn