complaints about no replies last week

Wed Apr 1 10:00:18 EDT 2009

On Mar 31, 4:07 pm, Arnaud Delobelle <arno... at googlemail.com> wrote:
> prueba... at latinmail.com writes:
>
> [...]
>
>
>
> > Well since I attracted a couple people's attention I will describe the
> > problem in more detail. Describing the problem properly is probably as
> > hard as solving it, so excuse me if I struggle a bit.
>
> > The problem is for a health insurance company and involves the period
> > of time a person is covered. Most insurance companies allow not only
> > for the main member to be insured but his family: the spouse and the
> > dependents (children). This additional coverage costs extra but less
> > than a full new insurance. So for example if Alice buys an insurance
> > worth at 100 dollars a month, she can insure her husband Bob for an
> > additional 50 dollars. Under certain circumstances Alice may go off
> > the insurance and only Bob stays. In that case the price goes back to
> > 100 dollars or maybe there is a deal for 80 or something like that. In
> > other words the cost of the insurance is dependent on the combination
> > of family members that participate in it. Additionally not only do we
> > have different family compositions but also different insurance
> > products. So you can get medical, dental and vision insurance.
>
> > All that data is stored in a database that is not very tidy and looks
> > something like this:
>
> > First Day of Coverage, Last Day of Coverage, Relationship, Product
> > 5/3/2005, 5/3/2005, D, M
> > 9/10/2005, 10/10/2005, S, V
> > 3/15/2005, 7/15/2005, M, M
> > 3/1/2005, 6/1/2005, S, M
> > 5/15/2005, 7/20/2005, D, D
> > 9/10/2005, 1/1/2140, M, V
> > 2/1/2005, 5/3/2005, M, M
>
> > Where
> > Relationship: M=Main Member, S=Spouse, D=Dependent
> > Product: M=Medical, D=Dental, V=Vision
>
> > As far as I know at the present time there are no deals based on
> > combination of products purchased so we will handle each product
> > independently. What I want is a simple algorithm that allows me to
> > calculate something like this out of it (hopefully I didn’t make a
> > mistake):
>
> > Medical:
> > 2/1/2005, 2/28/2005, M
> > 3/1/2005, 5/2/2005, M&S
> > 5/3/2005, 5/3/2005, M&S&D
> > 5/4/2005, 6/1/2005, M&S
> > 6/2/2005, 7/15/2005, M
>
> > Dental:
> > 5/15/2005, 7/20/2005, D
>
> > Vision:
> > 9/10/2005, 10/10/2005, M&S
> > 10/11/2005, 1/1/2140, M
>
> OK the approach I describe in my previous email works fine for this
> particular problem. I implement it below - the function walk_ivals is at
> the heart of it, I've made it as simple as possible and it's pretty
> clear that it is O(nlogn).
>
> The function that actually sorts the data is union(), and it's just a
> call to walk_ivals with callback the function acc() which is constant
> time, therefore union() itself has the same complexity as walk_ivals.
>
> There are no comments - I don't have the time to add any, sorry!
>
> --------------------------------------------------
> import datetime
> from collections import defaultdict
>
> def walk_ivals(ivals, callback, endvals=(-1, 1)):
>     endpoints = [(x, data) for ival, data in ivals for x in zip(ival, endvals)]
>     endpoints.sort()
>     for (endpoint, endval), data in endpoints:
>         callback(endpoint, endval, data)
>
> def union(ivals):
>     timelines = defaultdict(list)
>     mult = defaultdict(lambda: defaultdict(int))
>     def acc(date, step, data):
>         rel, prod = data
>         old_mult = mult[prod][rel]
>         mult[prod][rel] -= step
>         if not(old_mult and mult[prod][rel]):
>             rels = [rel for rel, m in mult[prod].iteritems() if m]
>             if timelines[prod] and timelines[prod][-1][0] == date:
>                 timelines[prod][-1][1] = rels
>             else:
>                 timelines[prod].append([date, rels])
>     walk_ivals(ivals, acc)
>     return timelines
>
> test_data = """5/3/2005, 5/3/2005, D, M
> 9/10/2005, 10/10/2005, S, V
> 3/15/2005, 7/15/2005, M, M
> 3/1/2005, 6/1/2005, S, M
> 5/15/2005, 7/20/2005, D, D
> 9/10/2005, 1/1/2140, M, V
> 2/1/2005, 5/3/2005, M, M"""
>
> def parse_date(date_string):
>     month, day, year = map(int, date_string.split('/'))
>     return datetime.date(year, month, day)
>
> def parse_data(data_string):
>     data = []
>     for line in data_string.split("\n"):
>         start, end, rel, prod = line.split(",")
>         start, end = map(parse_date, (start + datetime.timedelta(1), end))
>         rel, prod = rel.strip(), prod.strip()
>         data.append([(start, end), (rel, prod)])
>     return data
>
> def test():
>     ivals = parse_data(test_data)
>     timelines = union(ivals)
>     for prod, timeline in timelines.iteritems():
>         print "-"*20
>         print "Product", prod
>         for date, covers in timeline:
>             print date, ' & '.join(covers)
>
> if __name__ == '__main__':
>    test()      
> --------------------------------------------------
>
> Here is what it outputs:
>
> marigold:junk arno$ python intervals2.py
> --------------------
> Product M
> 2005-02-01 M
> 2005-03-01 S & M
> 2005-05-03 S & M & D
> 2005-05-04 S & M
> 2005-06-02 M
> 2005-07-16
> --------------------
> Product D
> 2005-05-15 D
> 2005-07-21
> --------------------
> Product V
> 2005-09-10 S & M
> 2005-10-11 M
> 2140-01-02
>
> --------------------------------------------------
>
> The date output is slightly different from yours - I didn't realise you
> had time intervals and now I don't have the time to change it, although
> it's only a cosmetic change (the end-date of each interval is the
> start-date of the next one minus 1 day).
>
> HTH
>
> PS: warning - I have done some minor editing of the code after pasting
> it, so it might break (although it should be fine).
>
> --
> Arnaud

Thanks Arnaud,
indeed I got this error when running the program:

Traceback (most recent call last):
  File "C:/Python26/timeint2.py", line 57, in <module>
    test()
  File "C:/Python26/timeint2.py", line 48, in test
    ivals = parse_data(test_data)
  File "C:/Python26/timeint2.py", line 42, in parse_data
    start, end = map(parse_date, (start + datetime.timedelta(1), end))
TypeError: cannot concatenate 'str' and 'datetime.timedelta' objects

easy fix:
        start, end = map(parse_date, (start , end))
        end+=datetime.timedelta(1)

I am assuming you want close-open intervals.
I also added another line to the test:

test_data = """5/3/2005, 5/3/2005, D, M
9/10/2005, 10/10/2005, S, V
3/15/2005, 7/15/2005, M, M
3/1/2005, 6/1/2005, S, M
5/15/2005, 7/20/2005, D, D
9/10/2005, 1/1/2140, M, V
2/1/2005, 5/3/2005, M, M
3/1/2004, 7/1/2004, M, M"""

result:
--------------------
Product M
2004-03-01 M
2004-07-02
2005-02-01 M
2005-03-01 S & M
2005-05-03 S & M & D
2005-05-04 S & M
2005-06-02 M
2005-07-16
--------------------
Product D
2005-05-15 D
2005-07-21
--------------------
Product V
2005-09-10 S & M
2005-10-11 M
2140-01-02

The output will have to be massaged to get the begin and end of each
interval, but otherwise this is more or less what I was looking for.

The walk_ivals and union function are 20 lines of pretty dense code. I
will have to study it a bit to understand how it really works.

Thanks again