What is the best data structure for a very simple spreadsheet?
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Sun Jan 3 07:28:40 EST 2010
On Sun, 03 Jan 2010 03:27:46 -0800, vsoler wrote:
> My application would contain a limited set of "cells" represented by the
> instances of a Cell class:
>
> class Cell:
> ...
>
> A1=Cell(7)
> A2=Cell(2*A1)
> A3=Cell(3*A1+A2)
> A4=Cell(A3*4)
>
> Of course, A1 = 7, A2 = 14, A3 = 35 and A4 = 140
>
> Now, I somehow want to be able to show a dependency tree
>
> 1 level dependency trees
> A1: None
> A2: A1
> A3: A1, A2
> A4: A3
>
> All levels dependency trees
>
> A1: None
> A2: A1
> A3: A1, A2
> A4: A3, A2, A1
>
> Leaf + values dependency trees:
>
> A1: 7
> A2: A1=7, 2
> A3: 3, A1=7, 2
> A4: 3, A1=7, 2, 4
>
> What I'd like to know is:
>
> 1) what are, in your opinion, the basic elements of the Cell class?
def Cell(object):
def __init__(self, payload):
self.payload = payload
def __str__(self):
return str(self.payload)
def __float__(self):
return float(self.payload)
def dependency(self):
try:
return self.payload.dependency()
except AttributeError:
return ['None']
Cells can contain one of three things: a number, a string, or a formula.
The first two can be supported by providing a built-in Python object
(float or str) as payload. You can support formulae two ways that I can
think of:
(1) Provide a formula as a string, with a leading '='. Then, whenever you
want to evaluate such a cell, you fetch the string from the cell, parse
it, generate an arithmetic expression, and calculate it.
(2) Instead of parsing the formula on every single spreadsheet refresh,
use a couple of helper classes:
class Indirect(object):
def __init__(self, ref, sheet=None):
if sheet is None:
self.sheet = default_sheet()
else:
self.sheet = sheet
self.ref = ref
def __str__(self):
return str(self.sheet[self.ref])
def float(self):
return float(self.sheet[self.ref])
def dependency(self):
return [self.ref]
class Formula(object):
def __init__(self, x, y, op):
self.x = x
self.y = y
self.op = op
def eval(self):
return self.op(float(x), float(y))
def dependency(self):
return self.x.dependency(level) + self.y.dependency(level)
Then do something like this:
sheet = {}
sheet['A1'] = Cell(7)
sheet['A2'] = Cell(Formula(2, Indirect('A1'), operator.mul))
sheet['A3'] = Cell(
Formula(
Formula(3, Indirect('A1'), operator.mul),
Indirect('A2'),
operator.add
))
sheet['A4'] = Cell(Formula(Indirect('A3'), 4, operator.mul))
Then you only need to parse each human-readable formula like '=2*A1' once.
> 2) Do I need a parser to evaluate the formulas like “3*A1+A2”?
Yes.
> Can you recommend one library that already contains one?
Try PyParsing.
> 3) Do I need a tree
> data structure to represent my data? would the tree be an attribute of
> the class instance?
I suspect a dict will be faster.
To get the dependencies of each cell:
for key, value in sheet.items():
print key, value.dependency()
Keep in mind I haven't tested ANY of this -- it is entirely stream of
consciousness. I've never actually done this, so I have no idea whether
it is a good approach or not, but it seems to me that it might be.
--
Steven
More information about the Python-list
mailing list