On Mon, May 25, 2009 at 4:27 PM, Skipper Seabold

On Mon, May 25, 2009 at 3:40 PM, Joe Harrington

wrote: On Mon, 25 May 2009 13:51:38 -0400, josef.pktd@gmail.com wrote:

On Mon, May 25, 2009 at 11:50 AM, Joe Harrington

wrote: On Sun, 24 May 2009 18:14:42 -0400 josef.pktd@gmail.com wrote:

On Sun, May 24, 2009 at 4:33 PM, Joe Harrington

wrote: I hate to ask for another function in numpy, but there's an obvious one missing in the financial group: xirr. ?It could be done as a new function or as an extension to the existing np.irr.

The internal rate of return (np.irr) is defined as the growth rate that would give you a zero balance at the end of a period of investment given a series of cash flows into or out of the investment at regular intervals (the first and last cash flows are usually an initial deposit and a withdrawal of the current balance).

This is useful in academics, but if you're tracking a real investment, you don't just withdraw or add money on a perfectly annual basis, nor do you want a calc with thousands of days of zero entries just so you can handle the uneven intervals by evening them out. ?Both excel and openoffice define a "xirr" function that pairs each cash flow with a date. ?Would there be an objection to either a xirr or adding an optional second arg (or a keyword arg) to np.irr in numpy? ?Who writes the code is a different question, but that part isn't hard.

3 comments:

* open office has also the other function in an x??? version, so it might be good to add it consistently to all functions

* date type: scikits.timeseries and the gsoc for implementing a date type would be useful to have a clear date type, or would you want to base it only on python standard library

* real life accuracy: given that there are large differences in the definition of a year for financial calculations, any simple implementation would be only approximately accurate. for example in the open office help, oddlyield list the following option

Basis is chosen from a list of options and indicates how the year is to be calculated. Basis Calculation 0 or missing US method (NASD), 12 months of 30 days each 1 Exact number of days in months, exact number of days in year 2 Exact number of days in month, year has 360 days 3 Exact number of days in month, year has 365 days 4 European method, 12 months of 30 days each

So, my question: what's the purpose of the financial function in numpy? Currently it provides convenient functions for (approximate) interest calculations. If they get expanded to a "serious" implementation of, for example, the main financial functions listed in the open office help (just for reference) then maybe numpy is not the right location for it.

I started to do something similar in matlab, and once I tried to use real dates instead of just counting months, the accounting rules get quickly very messy.

Using dates as you propose would be very convenient, but the users shouldn't be surprised that their actual payments at the end of the year don't fully match up with what numpy told them.

my 3cents

Josef

First point: agreed. ?I wish this community had a design review process for numpy and scipy, so that these things could get properly hashed out, and not just one person (even Travis) suggesting something and everyone else saying yeah-sure-whatever.

Does anyone on the list have the financial background to suggest what functions "should" be included in a basic set of financial routines? xirr is the only one I've ever used in a spreadsheet, myself.

Other points: Yuk. ?You're right.

When these first came up for discussion, I had a Han Solo moment ("I've got a baaad feeling about this...") but I couldn't put my finger on why. ?They seemed like simple and limited functions with high utility. ?Certainly anything as open-ended as financial-industry rules should go elsewhere (scikits, scipy, monpy, whatever).

But, that doesn't prevent a user-supplied, floating-point time array from going into a function in numpy. ?The rate of return would be in units of that array. ?Functions that convert date/time in some format (or many) and following some rule (or one of many) to such a floating array can still go elsewhere, maintained by people who know the definitions, if they have interest (pun intended). ?That would make the functions in numpy much more useful without bloating them or making them a maintenance nightmare.

If you think of time just as a regularly spaced, e.g. days, but with sparse points on it, or as a continuous variable, then extending the current functions should be relatively easy. I guess the only questions are compounding, annual, quarterly or at each payment, and whether the annual rate is calculated as real compounded annualized rate or as accounting annual rate, e.g. quarterlyrate*4.

This leaves "What is the present value, if you get 100 Dollars at the 10th day of each month (or at the next working day if the 10th day is a holiday or a weekend) for the next 5 years and the monthly interest rate is 5/12%?" for another day.

Initially I understood you wanted the date as a string or date type as in e.g open office. What would be the units of the user-supplied, floating-point time array? It is still necessary to know the time units to provide an annualized rate, unless the rate is in continuous time, exp(r*t). I don't know whether this would apply to all functions in numpy.finance, it's a while since I looked at the code. Maybe there are some standard simplifications in open office or excel.

I briefly skimmed the list of function in the open office help, and it would be useful to have them available, e.g. as a package in scipy. But my google searches in the past for applications in finance with a compatible license didn't provide much useful code that could form the basis of a finance package.

Adding more convenience and functionality to numpy.finance is useful, but if they get extended with slow feature creep, then another location (scipy) might be more appropriate and would be more expandable, even if it happens only slowly.

That's just my opinion (obviously), I'm a relative newbie to numpy/scipy and still working my way through all the different subpackages.

np.irr is defined on (anonymous) constant time intervals and gives you the growth per time interval. The code is very short, basically a call to np.roots(values):

def irr(values): """ Return the Internal Rate of Return (IRR).

This is the rate of return that gives a net present value of 0.0.

Parameters ---------- values : array_like, shape(N,) Input cash flows per time period. At least the first value would be negative to represent the investment in the project.

Returns ------- out : float Internal Rate of Return for periodic input values.

Examples -------- >>> np.irr([-100, 39, 59, 55, 20]) 0.2809484211599611

""" res = np.roots(values[::-1]) # Find the root(s) between 0 and 1 mask = (res.imag == 0) & (res.real > 0) & (res.real <= 1) res = res[mask].real if res.size == 0: return np.nan rate = 1.0/res - 1 if rate.size == 1: rate = rate.item() return rate

So, I think this is a continuous definition of growth, not some periodic compounding.

I'd propose the time array would be in anonymous units, and the result would be in terms of those units. For example, if an interval of 1.0 in the time array were one fortnight, it would give interest in units of continuous growth per fortnight, etc. Anything with many more options than that does not belong in numpy (but it would be interesting to have elsewhere).

Here is my stab at xirr. It depends on the python datetime module and the Newton - Raphson algorithm in scipy.optimize, but it could be taken as a starting point if someone wants to get rid of the dependencies (I haven't worked too much with dates or NR before). The reference for the open office version is here http://wiki.services.openoffice.org/wiki/Documentation/How_Tos/Calc:_XIRR_fu..., and it performs in exactly the same way (assumes 365 days a year). It also doesn't take a 'begin' or 'end' argument for when the payments are made. but this is already in the numpy.financial and could be added easily.

def _discf(rate, pmts, dates): import numpy as np dcf=[] for i,cf in enumerate(pmts): d=dates[i]-dates[0] dcf.append(cf*(1+rate)**(-d.days/365.)) return np.add.reduce(dcf)

def xirr(pmts, dates, guess=.10): ''' IRR function that accepts irregularly spaced cash flows

Parameters ---------- values: array_like Contains the cash flows including the initial investment dates: array_like Contains the dates of payments as in the form (year, month, day)

Returns: Float Internal Rate of Return

Notes ---------- In general the xirr is the solution to

.. math:: \sum_{t=0}^M{\frac{v_t}{(1+xirr)^{(date_t-date_0)/365}}} = 0

Examples -------------- dates=[[2008,2,5],[2008,7,5],[2009,1,5]] pmts=[-2750,1000,2000] print xirr(pmts,dates) ''' from datetime import date from scipy.optimize import newton

for i,dt in enumerate(dates): dates[i]=date(*dt)

f = lambda x: _discf(x, pmts, dates)

return newton(f, guess)

if __name__=="__main__": dates=[[2008,2,5],[2008,7,5],[2009,1,5]] pmts=[-2750,1000,2000] print xirr(pmts,dates)

While I was still trying to think about the general problem, Skipper already implemented a solution. The advantage of Skippers implementation using actual dates instead of just an array of numbers is that it is possible to directly calculate the annual irr, since the time units are well specified. The only problem is the need for an equation solver in numpy. Just using a date tuple would remove the problem of string parsing, and it might be possible to extend it later to a date array. So, I think it would be possible to include Skippers solution, with some cleanup and testing, if an equation solver can be found or if np.roots can handle high order (sparse) polynomials. Below is my original message, which is based on the assumption of a date array that is just an array of numbers without any time units associated with it. Josef """

From my reading of the current irr, you have compounding of the interest rate at the given time interval. So if your data is daily data, you would get a daily interest rate with daily compounded rates, which might not be the most interesting number that the user wants. For the iir function it would still be very easy for the user to convert the daily or monthly rate to the annualized rate, (1+r_d)**365 -1 (1+r_m)**12 -1 (?).

For the implementation, would np.roots still work if you have 1000 days for example, or 360 months, or a few hundred fortnights? What would be the alternative in numpy for finding the root? equation solvers are in scipy. For arbitrary time units with possible large numbers, working with exp should be easier . In this case the exponent would be floats and not integers, so not a polynomial. I think in the continuous time version, we need to solve for r in sum(values*exp(-r*dates)) = 0 Can this be done in numpy? If dates are floats where the unit is one year, then this would give the continuously compounded annual rate, I think. Another property of the current function, that I just realized, is, that it doesn't allow for negative interest rates. This might not be a problem for the intended use, but if you look at real, i.e. inflation adjusted, interest rates then it happens often enough. Other options that might work if np.roots can handle it, would be to use integer time internally but fractional time from the user where the integer unit would be the reference period and the fractions would be for example 2/12 for the second month. I never tried this but using fractional units has a long enough tradition in finance. Or that the user optionally specifies the time units ("y" or "m" or "d") or number of periods per year (365, 12, 52, 26) """