2014-05-15 14:00 GMT-03:00 <numpy-discussion-request@scipy.org>:

Send NumPy-Discussion mailing list submissions to
numpy-discussion@scipy.org

To subscribe or unsubscribe via the World Wide Web, visit
http://mail.scipy.org/mailman/listinfo/numpy-discussion
or, via email, send a message with subject or body 'help' to
numpy-discussion-request@scipy.org

You can reach the person managing the list at
numpy-discussion-owner@scipy.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of NumPy-Discussion digest..."

Today's Topics:

1. smoothing function (rodrigo koblitz)
2. Fancy Indexing of Structured Arrays is Slow (Dave Hirschfeld)
3. [JOB] Scientific software engineer at the Met Office (Phil Elson)
4. Re: smoothing function (josef.pktd@gmail.com)
5. Re: smoothing function (Nathaniel Smith)
6. Re: smoothing function (josef.pktd@gmail.com)

----------------------------------------------------------------------

Message: 1
Date: Thu, 15 May 2014 09:04:03 -0300
From: rodrigo koblitz <rodrigokoblitz@gmail.com>
Subject: [Numpy-discussion] smoothing function
To: numpy-discussion@scipy.org
Message-ID:
<CAAZkdU_5yw9qigWVofVrPZLptgs75q14Y7vaWoGpQW_nqtrpdA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Buenos,
I'm reading Zuur book (ecology models with R) and try make it entire in
python.
Have this function in R:
M4 <- gam(So ? s(De) + factor(ID), subset = I1)

the 's' term indicated with So is modelled as a smoothing function of De

I'm looking for something close to this in python.

Someone can help me?

abra?os,
Koblitz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20140515/04d32736/attachment-0001.html

------------------------------

Message: 2
Date: Thu, 15 May 2014 12:31:50 +0000 (UTC)
From: Dave Hirschfeld <dave.hirschfeld@gmail.com>
Subject: [Numpy-discussion] Fancy Indexing of Structured Arrays is
Slow
To: numpy-discussion@scipy.org
Message-ID: <loom.20140515T135603-598@post.gmane.org>
Content-Type: text/plain; charset=us-ascii

As can be seen from the code below (or in the notebook linked beneath) fancy
indexing of a structured array is twice as slow as indexing both fields
independently - making it 4x slower?

I found that fancy indexing was a bottleneck in my application so I was
hoping to reduce the overhead by combining the arrays into a structured
array and only doing one indexing operation. Unfortunately that doubled the
time that it took!

Is there any reason for this? If not, I'm happy to open an enhancement issue
on GitHub - just let me know.

Thanks,
Dave

In [32]: nrows, ncols = 365, 10000

In [33]: items = np.rec.fromarrays(randn(2,nrows, ncols), names=
['widgets','gadgets'])

In [34]: row_idx = randint(0, nrows, ncols)
...: col_idx = np.arange(ncols)

In [35]: %timeit filtered_items = items[row_idx, col_idx]
100 loops, best of 3: 3.45 ms per loop

In [36]: %%timeit
...: widgets = items['widgets'][row_idx, col_idx]
...: gadgets = items['gadgets'][row_idx, col_idx]
...:
1000 loops, best of 3: 1.57 ms per loop

http://nbviewer.ipython.org/urls/gist.githubusercontent.com/dhirschfeld/98b9
970fb68adf23dfea/raw/10c0f968ea1489f0a24da80d3af30de7106848ac/Slow%20Structu
red%20Array%20Indexing.ipynb

https://gist.github.com/dhirschfeld/98b9970fb68adf23dfea

------------------------------

Message: 3
Date: Thu, 15 May 2014 16:13:10 +0100
From: Phil Elson <pelson.pub@gmail.com>
Subject: [Numpy-discussion] [JOB] Scientific software engineer at the
Met Office
To: Discussion of Numerical Python <numpy-discussion@scipy.org>,
matplotlib development list <matplotlib-devel@lists.sourceforge.net>
Message-ID:
<CA+L60sAj1zoedxALDhuHp6aTo+KvcJxzVRJV7nq76Xy_OirurQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

I just wanted to let you know that there is currently a vacancy for a
full-time developer at the Met Office, the UK's National Weather Service,
within our Analysis, Visualisation and Data (AVD) team.

I'm posting on this list as the Met Office's AVD team are heavily involved
in the development of Python packages to support the work that our
scientists undertake on a daily basis. The vast majority of the AVD team's
time is spent working on our own open source Python packages Iris, cartopy
and biggus as well as working on packages such as numpy, scipy, matplotlib
and IPython; so we don't see this as just a great opportunity to work
within a world class scientific organisation, but a role which will also
deliver real benefits to the wider scientific Python community.

Please see http://goo.gl/3ScFaZ for full details and how to apply, or
contact HREnquiries@metoffice.gov.uk if you have any questions.

Many Thanks,

Phil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20140515/9ed32579/attachment-0001.html

------------------------------

Message: 4
Date: Thu, 15 May 2014 11:54:30 -0400
From: josef.pktd@gmail.com
Subject: Re: [Numpy-discussion] smoothing function
To: Discussion of Numerical Python <numpy-discussion@scipy.org>
Message-ID:
<CAMMTP+AkRLNgqiXO0PtfW_KRdGThdP8++Wcy3Bc23YZMV-h+PA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

On Thu, May 15, 2014 at 8:04 AM, rodrigo koblitz
<rodrigokoblitz@gmail.com>wrote:

> Buenos,
> I'm reading Zuur book (ecology models with R) and try make it entire in
> python.
> Have this function in R:
> M4 <- gam(So ? s(De) + factor(ID), subset = I1)
>
> the 's' term indicated with So is modelled as a smoothing function of De
>
> I'm looking for something close to this in python.
>

These kind of general questions are better asked on the scipy-user mailing
list which covers more general topics than numpy-discussion.

As far as I know, GAMs are not available in python, at least I never came
across any.

statsmodels has an ancient GAM in the sandbox that has never been connected
to any smoother, since, lowess, spline and kernel regression support was
missing. Nobody is working on that right now.
If you have only a single nonparametric variable, then statsmodels also has
partial linear model based on kernel regression, that is not cleaned up or
verified, but Padarn is currently working on this.

I think in this case using a penalized linear model with spline basis
functions would be more efficient, but there is also nothing clean
available, AFAIK.

It's not too difficult to write the basic models, but it takes time to
figure out the last 10% and to verify the results and write unit tests.

If you make your code publicly available, then I would be very interested
in a link. I'm trying to collect examples from books that have a python
solution.

Josef

>
> Someone can help me?
>
> abra?os,
> Koblitz
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20140515/e15c73fe/attachment-0001.html

------------------------------

Message: 5
Date: Thu, 15 May 2014 17:17:43 +0100
From: Nathaniel Smith <njs@pobox.com>
Subject: Re: [Numpy-discussion] smoothing function
To: Discussion of Numerical Python <numpy-discussion@scipy.org>
Message-ID:
<CAPJVwBns59n=Ddd3O-M7ESc0=deA+MTdp1CfsPkeTr3qfPiVzg@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

On Thu, May 15, 2014 at 1:04 PM, rodrigo koblitz
<rodrigokoblitz@gmail.com> wrote:
> Buenos,
> I'm reading Zuur book (ecology models with R) and try make it entire in
> python.
> Have this function in R:
> M4 <- gam(So ? s(De) + factor(ID), subset = I1)
>
> the 's' term indicated with So is modelled as a smoothing function of De
>
> I'm looking for something close to this in python.

The closest thing that doesn't require writing your own code is
probably to use patsy's [1] support for (simple unpenalized) spline
basis transformations [2]. I think using statsmodels this works like:

import statsmodels.formula.api as smf
# adjust '5' to taste -- bigger = wigglier, less bias, more overfitting
results = smf.ols("So ~ bs(De, 5) + C(ID)", data=my_df).fit()
print results.summary()

To graph the resulting curve you'll want to use the results to somehow
do "prediction" -- I'm not sure what the API for that looks like in
statsmodels. If you need help figuring it out then the asking on the
statsmodels list or stackoverflow is probably the quickest way to get
help.

-n

[1] http://patsy.readthedocs.org/en/latest/
[2] http://patsy.readthedocs.org/en/latest/builtins-reference.html#patsy.builtins.bs

--
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

------------------------------

Message: 6
Date: Thu, 15 May 2014 12:47:25 -0400
From: josef.pktd@gmail.com
Subject: Re: [Numpy-discussion] smoothing function
To: Discussion of Numerical Python <numpy-discussion@scipy.org>
Message-ID:
<CAMMTP+Be-OZfidm-Gw+EzJm4fcb9zyQZX_aF+mWfSMaH9GZPhQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

On Thu, May 15, 2014 at 12:17 PM, Nathaniel Smith <njs@pobox.com> wrote:

> On Thu, May 15, 2014 at 1:04 PM, rodrigo koblitz
> <rodrigokoblitz@gmail.com> wrote:
> > Buenos,
> > I'm reading Zuur book (ecology models with R) and try make it entire in
> > python.
> > Have this function in R:
> > M4 <- gam(So ? s(De) + factor(ID), subset = I1)
> >
> > the 's' term indicated with So is modelled as a smoothing function of De
> >
> > I'm looking for something close to this in python.
>
> The closest thing that doesn't require writing your own code is
> probably to use patsy's [1] support for (simple unpenalized) spline
> basis transformations [2]. I think using statsmodels this works like:
>
> import statsmodels.formula.api as smf
> # adjust '5' to taste -- bigger = wigglier, less bias, more overfitting
> results = smf.ols("So ~ bs(De, 5) + C(ID)", data=my_df).fit()
> print results.summary()
>

Nice

>
> To graph the resulting curve you'll want to use the results to somehow
> do "prediction" -- I'm not sure what the API for that looks like in
> statsmodels. If you need help figuring it out then the asking on the
> statsmodels list or stackoverflow is probably the quickest way to get
> help.
>

seems to work (in a very simple made up example)

results.predict({'De':np.arange(1,5), 'ID':['a']*4}, transform=True)
#array([ 0.75 , 1.08333333, 0.75 , 0.41666667])

Josef

> -n
>
> [1] http://patsy.readthedocs.org/en/latest/
> [2]
> http://patsy.readthedocs.org/en/latest/builtins-reference.html#patsy.builtins.bs
>
> --
> Nathaniel J. Smith
> Postdoctoral researcher - Informatics - University of Edinburgh
> http://vorpus.org
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20140515/c98cbd0a/attachment-0001.html

------------------------------

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

End of NumPy-Discussion Digest, Vol 92, Issue 19
************************************************