[SciPy-Dev] New subpackage: scipy.data

Scott Sievert sievert.scott at gmail.com
Thu Mar 29 21:00:54 EDT 2018


Including some datasets would also help the scipy benchmarks be a more
realistic. Right now the benchmarks use synthetic data (at least the signal
benchmarks do).

Scott

On March 29, 2018 at 7:17:28 PM, Ilhan Polat (ilhanpolat at gmail.com) wrote:

Yes, that's true but GitHub seems like a robust place to live. Otherwise we
can just point to any hardcoded URL. But if the size gets bigger in terms
of wheels and cloning I think within SciPy doesn't seem to be a viable
option. These all depend on what the future of datasets would be.

On Fri, Mar 30, 2018 at 2:03 AM, <josef.pktd at gmail.com> wrote:

> On Thu, Mar 29, 2018 at 7:54 PM, Ilhan Polat <ilhanpolat at gmail.com> wrote:
> > Would a separate repo scipy-datasets help ? Then something like
> >
> > try:
> >      importing
> > except:
> >     warn('I'm off to interwebz')
> >     download from the repo
> >
> > might be feasible. The download part can either be that particular
> dataset
> > or the whole scipy-datasets clone.
> >
>
> IMO:
>
> It depends on the scale where this should go.
> I don't think it's worth it (maintaining and installing another
> package or repo) for scipy
> given that scipy is mostly a basic numerical library and not driven by
> specific
> applications.
>
> For most areas there should be already some online repos or packages and
> it would be enough to have the accessing functions in scipy.datasets.
> The only area that I can think of where there might not be some readily
> available online source for datasets is signal.
>
> Josef
>
>
> >
> >
> >
> > On Fri, Mar 30, 2018 at 1:16 AM, Stefan van der Walt <
> stefanv at berkeley.edu>
> > wrote:
> >>
> >> On Thu, 29 Mar 2018 18:54:52 -0400, Warren Weckesser wrote:
> >> > Can you summarize the problems that make you regret including the
> >> > data?
> >>
> >> - The size of the repository (extra time on each clone, and that for
> >>   data that isn't necessary in most use cases)
> >>
> >> - Artificial limit on data sizes: we now have a default place to store
> >>   data, but we still need an additional mechanism for larger datasets.
> >>   How do you choose the threshold for what goes in, what is too big?
> >>
> >> - Because these tiny embedded datasets are easily available, they become
> >>   the default for demos.  If data is stored externally, realistic
> >>   examples become more feasible and likely.
> >>
> >> Best regards
> >> Stéfan
> >> _______________________________________________
> >> SciPy-Dev mailing list
> >> SciPy-Dev at python.org
> >> https://mail.python.org/mailman/listinfo/scipy-dev
> >
> >
> >
> > _______________________________________________
> > SciPy-Dev mailing list
> > SciPy-Dev at python.org
> > https://mail.python.org/mailman/listinfo/scipy-dev
> >
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>

_______________________________________________
SciPy-Dev mailing list
SciPy-Dev at python.org
https://mail.python.org/mailman/listinfo/scipy-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20180329/81811864/attachment-0001.html>


More information about the SciPy-Dev mailing list