Significant digits in a float?
Roy Smith
roy at panix.com
Tue Apr 29 09:38:33 EDT 2014
In article <535f0f9f$0$29965$c3e8da3$5496439d at news.astraweb.com>,
Steven D'Aprano <steve+comp.lang.python at pearwood.info> wrote:
> On Mon, 28 Apr 2014 12:00:23 -0400, Roy Smith wrote:
>
> [...]
> > Fundamentally, these numbers have between 0 and 4 decimal digits of
> > precision,
>
> I'm surprised that you have a source of data with variable precision,
> especially one that varies by a factor of TEN THOUSAND.
OK, you're surprised.
> I don't know what justification you have for combining such a
> mix of data sources.
Because that's the data that was given to me. Real life data is messy.
> One possible interpretation of your post is that you have a source of
> floats, where all the numbers are actually measured to the same
> precision, and you've simply misinterpreted the fact that some of them
> look like they have less precision.
Another possibility is that they're latitude/longitude coordinates, some
of which are given to the whole degree, some of which are given to
greater precision, all the way down to the ten-thousandth of a degree.
> What reason do you have to think that something recorded to 14
> decimal places was only intended to have been recorded to 4?
Because I understand the physical measurement these numbers represent.
Sometimes, Steve, you have to assume that when somebody asks a question,
they actually have asked the question then intended to ask.
> Perhaps you need to explain why you're doing this, as it seems
> numerically broken.
These are latitude and longitude coordinates of locations. Some
locations are known to a specific street address. Some are known to a
city. Some are only known to the country. So, for example, the 38.0
value represents the latitude, to the nearest whole degree, of the
geographic center of the contiguous United States.
> I really think you need to go back to the source. Trying to infer the
> precision of the measurements from the accident of the string formatting
> seems pretty dubious to me.
Sure it is. But, like I said, real-life data is messy. You can wring
your hands and say, "this data sucks, I can't use it", or you can figure
out some way to deal with it. Which is the whole point of my post. The
best I've come up with is inferring something from the string formatting
and I'm hoping there might be something better I might do.
> But I suppose if you wanted to infer the number of digits after the
> decimal place, excluding trailing zeroes (why, I do not understand), up
> to a maximum of four digits, then you could do:
>
> s = "%.4f" % number # rounds to four decimal places
> s = s.rstrip("0") # ignore trailing zeroes, whether significant or not
> count = len(s.split(".")[1])
This at least seems a little more robust than just calling str(). Thank
you :-)
> Assuming all the numbers fit in the range where they are shown in non-
> exponential format.
They're latitude/longitude, so they all fall into [-180, 180].
> Perhaps you ought to be using Decimal rather than float.
Like I said, "The numbers are given to me as Python floats; I have no
control over that".
> > I'm willing to accept that fact that I won't be able to differentiate
> > between float("38.0") and float("38.0000"). Both of those map to 1,
> > which is OK for my purposes.
>
> That seems... well, "bizarre and wrong" are the only words that come to
> mind.
I'm trying to intuit, from the values I've been given, which coordinates
are likely to be accurate to within a few miles. I'm willing to accept
a few false negatives. If the number is float("38"), I'm willing to
accept that it might actually be float("38.0000"), and I might be
throwing out a good data point that I don't need to.
For the purpose I'm using the data for, excluding the occasional good
data point won't hurt me. Including the occasional bad one, will.
> By the way, you contradict yourself here. Earlier, you described 38.0 as
> having zero decimal places (which is wrong). Here you describe it as
> having one, which is correct, and then in a later post you describe it as
> having zero decimal places again.
I was sloppy there. I was copy-pasting data from my program output.
Observe:
>>> print float("38")
38.0
In standard engineering parlance, the string "38" represents a number
with a precision of +/- 1 unit. Unfortunately, Python's default str()
representation turns this into "38.0", which implies +/- 0.1 unit.
Floats represented as strings (at least in some disciplines, such as
engineering) include more information than just the value. By the
number of trailing zeros, they also include information about the
precision of the measurement. That information is lost when the string
is converted to a IEEE float. I'm trying to intuit that information
back, and as I mentioned earlier, am willing to accept that the
intuiting process will be imperfect. There is real-life value in
imperfect processes.
More information about the Python-list
mailing list