On Tue, Apr 3, 2018 at 1:06 AM, Daπid <davidmenhur@gmail.com> wrote:


On 31 March 2018 at 02:17, Ralf Gommers <ralf.gommers@gmail.com> wrote:


On Fri, Mar 30, 2018 at 12:03 PM, Eric Larson <larson.eric.d@gmail.com> wrote:
Top-level module for them alone sounds overkill, and I'm not sure if discoverability alone is enough.

Fine by me. And if we follow the idea that these should be added sparingly, we can maintain discoverability without it growing out of hand by populating the See Also sections of each function.

I agree with this, the 2 images and 1 ECG signal (to be added) that we have doesn't justify a top-level module. We don't want to grow more than the absolute minimum of datasets. The package is already very large, which is problematic in certain cases. E.g. numpy + scipy still fits in the AWS Lambda limit of 50 MB, but there's not much margin.

The biggest subpackage is sparse, and there most of the space is taken by _sparsetools.cpython-35m-x86_64-linux-gnu.so According to size -A -d, the biggest sections are debug. The same goes for the second biggest, special. Can it run without those sections? On preliminary checks, it seems that stripping .debug_info and .debug_loc trim down the size from 38 to 3.7 MB, and the test suite still passes.

Should work. That's a lot more gain than I'd realized. Given that we hardly ever get useful gdb tracebacks, it may be worth considering doing that for releases.
 

If we really need to trim down the size for installing in things like Lambda, could we have a scipy-lite for production environments, that is the same as scipy but without unnecessary debug? I imagine tracebacks would not be as informative, but that shouldn't matter for production environments. My first thought was to remove docstrings, comments, tests, and data, but maybe they don't amount to so much for the trouble.

Recipes for such things are floating around, and it makes sense to do that. I'd rather not maintain an official scipy-lite package though, rather just make choices within scipy that enable third parties to do that.

Ralf

 


On the topic at hand, I would agree to having a few, small datasets to showcase functionality. I think a few kilobytes can go a long way to show and benchmark. As far as I can see, a top level module is free: it wouldn't add any maintenance burden, and would make them easier to find.

/David.

_______________________________________________
SciPy-Dev mailing list
SciPy-Dev@python.org
https://mail.python.org/mailman/listinfo/scipy-dev