RE: [Python-Dev] Re: Decimal data type issues
[Kevin Jacobs] #- >[Jewett, Jim J] #- >#- Under the current implementation: #- >#- (0, (2, 4, 0, 0, 0), -4) #- >#- is not quite the same as #- >#- (0, (2, 4) -1) #- >#- Given this, is should be possible for the user to specify #- >#- (at creation) which is desired. #- > #- >It *is* posible: #- > #- >>>>Decimal('2.4000') #- >Decimal( (0, (2, 4, 0, 0, 0), -4) ) #- > #- >>>>Decimal('2.4') #- >Decimal( (0, (2, 4), -1) ) #- > #- #- <sarcasm>Great!</sarcasm>. One of my previous posts #- specifically listed #- that I didn't want to have to pre-parse and reformulate #- string literals to #- achieve the desired precision and scale. The "external" <lost> what? </lost> :p I still don't understand why do you want that. #- >If you construct using precision, and the precision is #- smaller than the #- >quantity of digits you provide, you'll get rounded, but if #- the precision is #- >greater than the quantity of digits you provide, you don't #- get filled with #- >zeros. #- #- Rounding is exactly what should be done if one exceeds the desired #- precision. Using #- less that the desired precision (i.e., not filling in zeros) #- may be okay #- for many applications. #- This is because any operations on the value will have to be #- performed #- with the precision #- defined in the decimal context. Thus, the results will be #- identical, #- other than that the #- Decimal instance may not store the maximum precision #- available by the #- schema. If I don't misunderstand, you're saying that store additional zeroes is important to your future operations? Let's make an example. If I have '2.4000', I go into decimal and get:
Decimal('2.4000') Decimal( (0, (2, 4, 0, 0, 0), -4) )
If I have '2.4', I go into decimal and get:
Decimal('2.4') Decimal( (0, (2, 4), -1) )
Are you trying to say that you want Decimal to fill up that number with zeroes...
Decimal('2.4', scale=4) # behaviour don't intended, just an example Decimal( (0, (2, 4, 0, 0, 0), -4) )
...just to represent that you have that precision in your measurements and reflect that in future arithmetic operations? If yes, I think that: a) '2.4' and '2.4000' will behaviour identically in future operations; b) why do you need to represent in the number the precision of your measurement? . Facundo
Batista, Facundo wrote:
[Kevin Jacobs] #- <sarcasm>Great!</sarcasm>. One of my previous posts #- specifically listed #- that I didn't want to have to pre-parse and reformulate #- string literals to #- achieve the desired precision and scale. The "external"
<lost> what? </lost> :p
I still don't understand why do you want that.
It seems that Jim and I want to be able to easily create Decimal instances that conform to a pre-specified (maximum) scale and (maximum) precision. The motivation for this is clearly explained in that section of the PostgeSQL manual that I sent the other day. i.e., numeric and decimal values in SQL are specified in terms of scale and precision parameters. Thus, I would like to create decimal instances that conform to those schema -- i.e., they would be rounded appropriately and overflow errors generated if they exceeded either the maximum precision or scale. e.g.: Decimal('20000.001', precision=4, scale=0) === Decimal('20000') Decimal('20000.001', precision=4, scale=0) raises an overflow exception Decimal('20000.001', precision=5, scale=3) raises an overflow exception Decimal('200.001', precision=6, scale=3) === Decimal('200.001') Decimal('200.000', precision=6, scale=3) === Decimal('200') or Decimal('200.000') (depending on if precision and scale are interpreted as absolutes or maximums) In order to be able to accomplish this behavior in an "external" library, either the literals would have to be pre-parsed and manipulated, OR an intermediate Decimal value would be created using the raw literal, which would then used to detect overflows and then apply the necessary rounding criteria based on the desired (maximum) scale. Hopefully this is somewhat clearer.
If I don't misunderstand, you're saying that store additional zeroes is important to your future operations?
Not for mine. I would be content with interpreting the scale and precision parameters as maximums rather than absolutes. However, it is important to poll other users, since their applications may be less forgiving.
Decimal('2.4', scale=4) # behaviour don't intended, just an example
Decimal( (0, (2, 4, 0, 0, 0), -4) )
...just to represent that you have that precision in your measurements and reflect that in future arithmetic operations?
If yes, I think that: a) '2.4' and '2.4000' will behaviour identically in future operations; b) why do you need to represent in the number the precision of your measurement?
Neither. It is well understood that operations on Decimal instances must rely on the context. The idea here is to overflow and round correctly upon instance creation without going through a great deal of additional effort. Thanks, -Kevin
On Tue, Apr 20, 2004, Kevin Jacobs wrote:
Neither. It is well understood that operations on Decimal instances must rely on the context. The idea here is to overflow and round correctly upon instance creation without going through a great deal of additional effort.
Why do you think this is a "great deal" of effort? I still have some trouble understanding why you think this should go into Decimal rather than being an add-on. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "I used to have a .sig but I found it impossible to please everyone..." --SFJ
Aahz wrote:
On Tue, Apr 20, 2004, Kevin Jacobs wrote:
Neither. It is well understood that operations on Decimal instances must rely on the context. The idea here is to overflow and round correctly upon instance creation without going through a great deal of additional effort.
Why do you think this is a "great deal" of effort? I still have some trouble understanding why you think this should go into Decimal rather than being an add-on.
It could be an add-on, but it seems a common and fundamental enough operation that it should be well supported by the core library. External implementations may be less efficient, as they cannot take advantage of the internal implementation details that a better integrated solution would offer. This isn't something I am willing to go to war on, but at the same time, I'm willing to expend some effort to lobby for inclusion. Either way, I will have the necessary infrastructure to accomplish my aims, though my goal is for everyone to have it without re-inventing the wheel. Silence on this topic benefits nobody. Thanks, -Kevin
On Tue, Apr 20, 2004, Kevin Jacobs wrote:
Aahz wrote:
On Tue, Apr 20, 2004, Kevin Jacobs wrote:
Neither. It is well understood that operations on Decimal instances must rely on the context. The idea here is to overflow and round correctly upon instance creation without going through a great deal of additional effort.
Why do you think this is a "great deal" of effort? I still have some trouble understanding why you think this should go into Decimal rather than being an add-on.
It could be an add-on, but it seems a common and fundamental enough operation that it should be well supported by the core library. External implementations may be less efficient, as they cannot take advantage of the internal implementation details that a better integrated solution would offer.
This isn't something I am willing to go to war on, but at the same time, I'm willing to expend some effort to lobby for inclusion. Either way, I will have the necessary infrastructure to accomplish my aims, though my goal is for everyone to have it without re-inventing the wheel. Silence on this topic benefits nobody.
How is your need here more common and fundamental than Money? I see here a repeat of the discussions around the new datetime module, where the decision was made to keep the core implementation dirt-simple, with enough hooks for people to add functionality. What makes this case different? -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "I used to have a .sig but I found it impossible to please everyone..." --SFJ
Kevin Jacobs
This isn't something I am willing to go to war on, but at the same time, I'm willing to expend some effort to lobby for inclusion. Either way, I will have the necessary infrastructure to accomplish my aims, though my goal is for everyone to have it without re-inventing the wheel. Silence on this topic benefits nobody.
I thought I'd try to comment, based on this. After all, I do have an interest in the issue (I'm an Oracle user), and so if there is an issue, it could well affect me, so I should at least be sure I understand the point. ... and I discovered that I understand Decimal far less than I thought I did. But after some experimentation, and reading of the spec, I think that I've hit on a key point: The internal representation of a Decimal instance, and specifically the number of digits of precision stored internally, has no impact on anything, *except when the instance is converted to a string* The reason for this is that every possible operation on a Decimal instance uses context, with the sole exception of the "convert to string" operations (sections 4 and 5 of the spec). As a result of this, I'm not sure that it's valid to care about the internal representation of a Decimal instance.
It seems that Jim and I want to be able to easily create Decimal instances that conform to a pre-specified (maximum) scale and (maximum) precision.
But here we have that same point - Decimal instances do not "conform to" a scale/precision.
The motivation for this is clearly explained in that section of the PostgeSQL manual that I sent the other day. i.e., numeric and decimal values in SQL are specified in terms of scale and precision parameters. Thus, I would like to create decimal instances that conform to those schema -- i.e., they would be rounded appropriately and overflow errors generated if they exceeded either the maximum precision or scale.
OK, so what you are talking about is rounding during construction. Or is it? Hang on, and let's look at your examples.
e.g.:
Decimal('20000.001', precision=4, scale=0) === Decimal('20000')
This works fine with the current Decimal:
Decimal("20000.001").round(prec=4) == Decimal("20000") True
Do you dislike the need to construct an exact Decimal, and then round it? On what grounds? I got the impression that you thought it would be "hard", but I don't think the round() method is too hard to use... (Although I would say that the documentation in the PEP is currently very lacking in its coverage of how to use the type - I found the round() method after a lot of experimentation. Before the Decimal module is ready for prime time, it needs some serious documentation effort).
Decimal('20000.001', precision=4, scale=0) raises an overflow exception
Hang on - this example is the same as the previous one, but you want a different result! In any case, the General Decimal Arithmetic spec doesn't have a concept of overflow when a precision is exceeded (only when the implementation-defined maximum exponent is exceeded), so I'm not sure what you want to happen here in the context of the spec.
Decimal('20000.001', precision=5, scale=3) raises an overflow exception
A similar comment abut overflow applies here. I can imagine that you want to know if information has been lost, but that's no problem - check like this:
Decimal("20000.001").round(prec=5) == Decimal("20000.001") False
Decimal('200.001', precision=6, scale=3) === Decimal('200.001')
Again, not an issue:
Decimal("200.001").round(prec=6) == Decimal("200.001") True
Decimal('200.000', precision=6, scale=3) === Decimal('200') or Decimal('200.000') (depending on if precision and scale are interpreted as absolutes or maximums)
This doesn't make sense, given that Decimal("200") == Decimal("200.000"). Unless your use of === is meant to imply "has the same internal representation as", in which case I don't believe that you have a right to care what the internal representation is. I've avoided considering scale too much here - Decimal has no concept of scale, only precision. But that's effectively just a matter of multiplying by the appropriate power of 10, so shouldn't be a major issue. Apologies if I've completely misunderstood or misrepresented your problem here. If it's any consolation, I've learned a lot in the process of attempting to comment. Paul. -- This signature intentionally left blank
[Paul Moore]
... and I discovered that I understand Decimal far less than I thought I did. But after some experimentation, and reading of the spec, I think that I've hit on a key point:
The internal representation of a Decimal instance, and specifically the number of digits of precision stored internally, has no impact on anything, *except when the instance is converted to a string*
The reason for this is that every possible operation on a Decimal instance uses context, with the sole exception of the "convert to string" operations (sections 4 and 5 of the spec).
As a result of this, I'm not sure that it's valid to care about the internal representation of a Decimal instance.
It's not quite right. While this is a floating-point arithmetic, it was designed to "preserve scale" so long as precision isn't exceeded. Because you can set precision to anything, this is more powerful than it sounds at first. For example, Decimal("20.32") - Decimal("8.02") is displayed as Decimal("12.30"). That final trailing zero is effectively inherited from "the scales" of the inputs, and is entirely thanks to that the internal arithmetic is unnormalized. More info can be found here: http://www2.hursley.ibm.com/decimal/IEEE-cowlishaw-arith16.pdf However, if unnormalized floating-point is used with sufficient precision to ensure that rounding does not occur during simple calculations, then exact scale-preserving (type-preserving) arithmetic is possible, and the performance and other overheads of normalization are avoided. and in Cowlishaw's FAQ: http://www2.hursley.ibm.com/decimal/decifaq4.html#unnari Why is decimal arithmetic unnormalized?
... I've avoided considering scale too much here - Decimal has no concept of scale, only precision.
As above, it was carefully designed to *support* apps that need to preserve scale, but as a property that falls out of a more powerful and more general arithmetic. Note that the idea that various legacy apps *agree* on what "preserve scale" means to begin with is silly -- there's a large variety of mutually incompatible rules in use (e.g., for multiplication there's at least "the scale of the result is the sum of the scales of the multiplicands", "is the larger of the multiplicands' scales", "is the smaller of the multiplicands' scales", and "is a fixed value independent of the multiplicands' scales"). Decimal provides a single arithmetic capable of emulating all those (and more), but it's up to the app to use enough precision to begin with, and rescale according its own bizarre rules.
But that's effectively just a matter of multiplying by the appropriate power of 10, so shouldn't be a major issue.
It can also require rounding, so it's not wholly trivial. For example, what's 2.5 * 2.5? Under the "larger (or smaller) input scale" rules, it's 6.2 under "banker's rounding" or 6.3 under "European, and American tax rounding" rules. Decimal won't give either of those directly (unless precision is set to the ridiculously low 2), but it's easy to get either of them *using* Decimal (or to get 6.25 directly, which is the least surprising result to people).
"Tim Peters"
[Paul Moore]
As a result of this, I'm not sure that it's valid to care about the internal representation of a Decimal instance.
It's not quite right. While this is a floating-point arithmetic, it was designed to "preserve scale" so long as precision isn't exceeded. [...]
Tim, Thanks for taking the time to clarify this. I'm going to need to think about this some more to grasp the implications, but I see where it's intended to apply. FWIW, my feeling now is that Kevin's requirement is something that can be handled by a subclass of Decimal, or a class which contains a Decimal. I'm not convinced by Kevin's suggestion that the operations needed are "hard" - code complexity can (and should) be encapsulated inside the subclass, and I don't see the need for runtime inefficiency. Specifically, I can't see why, if you can first get an (effectively, according to whatever rules you want to apply) exact Decimal representation of your "number", you can't do any further scaling and changing of precision, etc, entirely with Decimal instances, and with minimal loss of runtime efficiency. Maybe a concrete example of what Kevin is after (I have a database background, so I'm happy if it's based around SQL NUMBER datatypes) would clarify his concerns. Paul. -- This signature intentionally left blank
[Paul Moore]
... FWIW, my feeling now is that Kevin's requirement is something that can be handled by a subclass of Decimal, or a class which contains a Decimal.
I should emphasize that Decimal has always been intended to be an implementation of IBM's proposed standard for decimal arithmetic, not to emulate some arbitrary bag of behaviors picked from a particular database, or other app (let alone to emulate the union of all schemes in current use). I'm especially keen to keep it "pure" at the start of its life. There are extensions I'd like to see too, but I'm keeping quiet about them for now (although adding a .copy() method to context objects is clearly harmless).
I'm not convinced by Kevin's suggestion that the operations needed are "hard" - code complexity can (and should) be encapsulated inside the subclass, and I don't see the need for runtime inefficiency.
The current implementation is so inefficient in so many ways I'm not concerned about efficiency arguments at all now. The point is much more to get the spec implemented correctly at first.
Specifically, I can't see why, if you can first get an (effectively, according to whatever rules you want to apply) exact Decimal representation of your "number", you can't do any further scaling and changing of precision, etc, entirely with Decimal instances, and with minimal loss of runtime efficiency.
Indeed, the individual digits are stored as individual decimal digits right now, so picking them apart is as cheap as it gets.
Maybe a concrete example of what Kevin is after (I have a database background, so I'm happy if it's based around SQL NUMBER datatypes) would clarify his concerns.
That's fine -- although I don't expect it to influence the first release of Decimal (which is aimed at implementing a specific standard).
[Kevin Jacobs] ...
Hopefully this is somewhat clearer.
Sorry, it really isn't to me. When you extract "a number" from one of your databases, what do you get from it, concretely? A triple of (decimal string with embedded decimal point, integer precision, integer scale)? An integer value with an integer scale? A decimal string w/o embedded decimal point and an integer scale? Etc. I gave up then after the first two examples:
... Thus, I would like to create decimal instances that conform to those schema -- i.e., they would be rounded appropriately and overflow errors generated if they exceeded either the maximum precision or scale. e.g.:
Decimal('20000.001', precision=4, scale=0) === Decimal('20000') Decimal('20000.001', precision=4, scale=0) raises an overflow exception
The inputs on those two lines look identical to me, so I'm left more lost than before -- you can't really want Decimal('20000.001', precision=4, scale=0) *both* to return 20000 *and* raise an overflow exception. In any case, that's not what the IBM standard supports. Context must be respected in its abstract from-string operation, and maximum precision is a component of context. If context's precision is 4, then from-string('20000.001') would round to the most-significant 4 digits (according to the rounding mode specified in context), and signal both the "inexact" and "rounded" conditions. What "signal" means: if the trap-enable flags are set in context for either or both of those conditions, an exception will be raised; if the trap-enable flags for both of those conditions are clear, then the inexact-happened and rounded-happened status flags in context are set, and you can inspect them or not (as you please). That's what the standard provides. More than that would be extensions to the standard. The standard is precise about semantics, and it's plenty to implement (just) all of that at the start.
Tim Peters wrote:
[Kevin Jacobs] ...
Hopefully this is somewhat clearer.
Sorry, it really isn't to me. When you extract "a number" from one of your databases, what do you get from it, concretely? A triple of (decimal string with embedded decimal point, integer precision, integer scale)? An integer value with an integer scale? A decimal string w/o embedded decimal point and an integer scale? Etc.
Sorry for all of the unnecessary confusion. I am in and out of meetings all of this week and have been trying to keep up several technical conversations in 5 minute breaks between sessions. As such, my examples were flawed. However, I now have _10_ minutes, to answer some of your questions, so hopefully I can explain slightly better. First, I get decimal numbers from many database adapters, flat files, XML files, in a variety of string formats, mainly. Virtually all are decimal string representations (i.e., a string of numbers with an option decimal point thrown in somewhere). Not all of them encode scale explicitly by adding trailing zeros, though most of the time do they conform to a given maximum precision. A few sources do provide decimals as an integer with an explicit decimal scale exponent.
Thus, I would like to create decimal instances that conform to those schema -- i.e., they would be rounded appropriately and overflow errors generated if they exceeded either the maximum precision or scale. e.g.:
Decimal('20000.001', precision=4, scale=0) === Decimal('20000') Decimal('20000.001', precision=4, scale=0) raises an overflow exception
The inputs on those two lines look identical to me, so I'm left more lost than before -- you can't really want Decimal('20000.001', precision=4, scale=0) *both* to return 20000 *and* raise an overflow exception.
Clearly not. The first example was supposed to have a precision of 5: Decimal('20000.001', precision=5, scale=0) === Decimal('20000')
In any case, that's not what the IBM standard supports. Context must be respected in its abstract from-string operation, and maximum precision is a component of context. If context's precision is 4, then
from-string('20000.001')
would round to the most-significant 4 digits (according to the rounding mode specified in context), and signal both the "inexact" and "rounded" conditions. What "signal" means: if the trap-enable flags are set in context for either or both of those conditions, an exception will be raised; if the trap-enable flags for both of those conditions are clear, then the inexact-happened and rounded-happened status flags in context are set, and you can inspect them or not (as you please).
Yes -- this is what I would like to have happen, but with a short-cut to support this common operation. My previous comment about "great difficulty" was not in terms of the implementation, but rather the number of times it would have to be developed independently, if not readily available. However, I am still not aware of a trivial way to enforce a given scale when creating decimal instances. As you point out in a separate e-mail, there are many operations that in effect preserve scale due to unnormalized arithmetic operations. However, this conversation is somewhat academic since there does not seem to be a consensus that adding support for construction with scale and precision parameters are of general use. So I will create my own decimal subclass and/or utility function and be on my merry way. Thanks, -Kevin
[Kevin Jacobs]
Sorry for all of the unnecessary confusion. ...
No problem.
First, I get decimal numbers from many database adapters, flat files, XML files, in a variety of string formats, mainly. Virtually all are decimal string representations (i.e., a string of numbers with an option decimal point thrown in somewhere).
Then what problem are you trying to address when reading these numbers in? Is it that you don't trust, e.g., that a column of a database declared with some specific (precision, scale) pair enforced its own restrictions? Using Decimal.Decimal(string) exactly as-is today, you'll get exactly whatever number a string-of-digits-possibly-with-a-decimal-point specifies.
Not all of them encode scale explicitly by adding trailing zeros, though most of the time do they conform to a given maximum precision. A few sources do provide decimals as an integer with an explicit decimal scale exponent.
The spec doesn't supply any shortcuts for changing the exponent, because multiplication and division by powers of 10 are exact (barring underflow and overflow). Perhaps a shortcut for that would be handy, but it's not semantically necessary. ...
Clearly not. The first example was supposed to have a precision of 5:
Decimal('20000.001', precision=5, scale=0) === Decimal('20000')
So you're really doing a data conversion step? That is, you don't really want the numbers your data source gives you, but want to transform them first on input? You *can*, of course, it just strikes me as an odd desire.
In any case, that's not what the IBM standard supports. Context must be respected in its abstract from-string operation, and maximum precision is a component of context. If context's precision is 4, then
from-string('20000.001')
would round to the most-significant 4 digits (according to the rounding mode specified in context), and signal both the "inexact" and "rounded" conditions. What "signal" means: if the trap-enable flags are set in context for either or both of those conditions, an exception will be raised; if the trap-enable flags for both of those conditions are clear, then the inexact-happened and rounded-happened status flags in context are set, and you can inspect them or not (as you please).
Yes -- this is what I would like to have happen, but with a short-cut to support this common operation.
What, exactly, is "this common operation"? Everything I described in that paragraph happens automatically as a result of a single from-string operation. Is it that you're reading in 10 numbers and want a unique precision for each one? That would surprise me. For example, if you're reading a column of a database, I'd expect that a single max precision would apply to each number in that column.
My previous comment about "great difficulty" was not in terms of the implementation, but rather the number of times it would have to be developed independently, if not readily available.
Well, I don't know what "it" is, exactly, so I'll shut up.
However, I am still not aware of a trivial way to enforce a given scale when creating decimal instances.
Sorry, I don't even know what "enforce a given scale" *means*. If, for example, you want to round every input to the nearest penny, set context to use the rounding method you mean by "the nearest", define a penny object: penny = Decimal("0.01") and then pass that to the quantize() method on each number: to_pennies = [Decimal(n).quantize(penny) for n in input_strings] Then every result will have exactly two digits after the decimal point. If for some mysterious reason you actually want to raise an exception if any information is lost during this step, set context's inexact-trap flag once at the start. If you want to raise an exception even if just a trailing 0 is lost, set context's rounding-trap flag once at the start.
As you point out in a separate e-mail, there are many operations that in effect preserve scale due to unnormalized arithmetic operations.
Yes.
However, this conversation is somewhat academic since there does not seem to be a consensus that adding support for construction with scale and precision parameters are of general use.
Except I'd probably oppose them at this stage even if they were of universal use: we're trying to implement a specific standard here at the start. Note that a construction method that honors context is required by the spec, and precision is part of context.
So I will create my own decimal subclass and/or utility function and be on my merry way.
Don't forget to share it! I suspect that's the only way I'll figure out what you're after <wink>.
participants (5)
-
Aahz
-
Batista, Facundo
-
Kevin Jacobs
-
Paul Moore
-
Tim Peters