<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Thank you for your feedback Alex!<br>
<br>
<div class="moz-cite-prefix">On 10/02/2018 09:28 AM, Alex Garel
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:cf272c39-96d3-9b11-fe8c-d931e0bd411a@garel.org">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<br>
<ul>
<li>chunk processing (kind of handling streaming data) : when
dealing with lot of data, the ability to fit_partial, then use
transform on chunks of data is of good help. But it's not well
exposed in current doc and API,</li>
</ul>
</blockquote>
This has been discussed in the past, but it looks like no-one was
excited enough about it to add it to the roadmap.<br>
This would require quite some additions to the API. Olivier, who has
been quite interested in this before now seems<br>
to be more interested in integration with dask, which might achieve
the same thing.<br>
<blockquote type="cite"
cite="mid:cf272c39-96d3-9b11-fe8c-d931e0bd411a@garel.org">
<ul>
<li> and a lot of models do not support it, while they could.</li>
</ul>
</blockquote>
Can you give examples of that?
<blockquote type="cite"
cite="mid:cf272c39-96d3-9b11-fe8c-d931e0bd411a@garel.org">
<ul>
<li>Also pipeline does not support fit_partial and there is not
fit_transform_partial.</li>
</ul>
</blockquote>
What would you expect those to do? Each step in the pipeline might
require passing over the whole dataset multiple times<br>
before being able to transform anything. That basically makes the
current interface impossible to work with the pipeline.<br>
Even if only a single pass of the dataset was required, that
wouldn't work with the current interface.<br>
If we would be handing around generators that allow to loop over the
whole data, that would work. But it would be unclear<br>
how to support a streaming setting.<br>
<br>
<blockquote type="cite"
cite="mid:cf272c39-96d3-9b11-fe8c-d931e0bd411a@garel.org">
<ul>
<li>while handling "Passing around information that is not (X,
y)", is there any plan to have transform being able to
transform X and y ? This would ease lots of problems like
subsampling, resampling or masking data when too incomplete. <br>
</li>
</ul>
</blockquote>
An API for subsampling is on the roadmap :)<br>
<blockquote type="cite"
cite="mid:cf272c39-96d3-9b11-fe8c-d931e0bd411a@garel.org">
<ul>
</ul>
<br>
</blockquote>
<br>
</body>
</html>