<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Thank you for your feedback Alex!<br>

    <br>

    <div class="moz-cite-prefix">On 10/02/2018 09:28 AM, Alex Garel

      wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:cf272c39-96d3-9b11-fe8c-d931e0bd411a@garel.org">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      <br>

      <ul>

        <li>chunk processing (kind of handling streaming data) :  when

          dealing with lot of data, the ability to fit_partial, then use

          transform on chunks of data is of good help. But it's not well

          exposed in current doc and API,</li>

      </ul>

    </blockquote>

    This has been discussed in the past, but it looks like no-one was

    excited enough about it to add it to the roadmap.<br>

    This would require quite some additions to the API. Olivier, who has

    been quite interested in this before now seems<br>

    to be more interested in integration with dask, which might achieve

    the same thing.<br>

    <blockquote type="cite"

      cite="mid:cf272c39-96d3-9b11-fe8c-d931e0bd411a@garel.org">

      <ul>

        <li> and a lot of models do not support it, while they could.</li>

      </ul>

    </blockquote>

    Can you give examples of that?

    <blockquote type="cite"

      cite="mid:cf272c39-96d3-9b11-fe8c-d931e0bd411a@garel.org">

      <ul>

        <li>Also pipeline does not support fit_partial and there is not

          fit_transform_partial.</li>

      </ul>

    </blockquote>

    What would you expect those to do? Each step in the pipeline might

    require passing over the whole dataset multiple times<br>

    before being able to transform anything. That basically makes the

    current interface impossible to work with the pipeline.<br>

    Even if only a single pass of the dataset was required, that

    wouldn't work with the current interface.<br>

    If we would be handing around generators that allow to loop over the

    whole data, that would work. But it would be unclear<br>

    how to support a streaming setting.<br>

    <br>

    <blockquote type="cite"

      cite="mid:cf272c39-96d3-9b11-fe8c-d931e0bd411a@garel.org">

      <ul>

        <li>while handling "Passing around information that is not (X,

          y)", is there any plan to have transform being able to

          transform X and y ? This would ease lots of problems like

          subsampling, resampling or masking data when too incomplete. <br>

        </li>

      </ul>

    </blockquote>

    An API for subsampling is on the roadmap :)<br>

    <blockquote type="cite"

      cite="mid:cf272c39-96d3-9b11-fe8c-d931e0bd411a@garel.org">

      <ul>

      </ul>

      <br>

    </blockquote>

    <br>

  </body>

</html>