Significant digits in a float?

Tue Apr 29 09:38:33 EDT 2014

In article <535f0f9f$0$29965$c3e8da3$5496439d at news.astraweb.com>,
 Steven D'Aprano <steve+comp.lang.python at pearwood.info> wrote:

> On Mon, 28 Apr 2014 12:00:23 -0400, Roy Smith wrote:
> 
> [...]
> > Fundamentally, these numbers have between 0 and 4 decimal digits of
> > precision, 
> 
> I'm surprised that you have a source of data with variable precision, 
> especially one that varies by a factor of TEN THOUSAND.

OK, you're surprised.

> I don't know what justification you have for combining such a
> mix of data sources.

Because that's the data that was given to me.  Real life data is messy.

> One possible interpretation of your post is that you have a source of 
> floats, where all the numbers are actually measured to the same 
> precision, and you've simply misinterpreted the fact that some of them 
> look like they have less precision.

Another possibility is that they're latitude/longitude coordinates, some 
of which are given to the whole degree, some of which are given to 
greater precision, all the way down to the ten-thousandth of a degree.

> What reason do you have to think that something recorded to 14 
> decimal places was only intended to have been recorded to 4?

Because I understand the physical measurement these numbers represent.  
Sometimes, Steve, you have to assume that when somebody asks a question, 
they actually have asked the question then intended to ask.

> Perhaps you need to explain why you're doing this, as it seems 
> numerically broken.

These are latitude and longitude coordinates of locations.  Some 
locations are known to a specific street address.  Some are known to a 
city.  Some are only known to the country.  So, for example, the 38.0 
value represents the latitude, to the nearest whole degree, of the 
geographic center of the contiguous United States.

> I really think you need to go back to the source. Trying to infer the 
> precision of the measurements from the accident of the string formatting 
> seems pretty dubious to me.

Sure it is.  But, like I said, real-life data is messy.  You can wring 
your hands and say, "this data sucks, I can't use it", or you can figure 
out some way to deal with it.  Which is the whole point of my post.  The 
best I've come up with is inferring something from the string formatting 
and I'm hoping there might be something better I might do.

> But I suppose if you wanted to infer the number of digits after the 
> decimal place, excluding trailing zeroes (why, I do not understand), up 
> to a maximum of four digits, then you could do:
> 
> s = "%.4f" % number  # rounds to four decimal places
> s = s.rstrip("0")  # ignore trailing zeroes, whether significant or not
> count = len(s.split(".")[1])

This at least seems a little more robust than just calling str().  Thank 
you :-)

> Assuming all the numbers fit in the range where they are shown in non-
> exponential format.

They're latitude/longitude, so they all fall into [-180, 180].

> Perhaps you ought to be using Decimal rather than float.

Like I said, "The numbers are given to me as Python floats; I have no 
control over that".

> > I'm willing to accept that fact that I won't be able to differentiate
> > between float("38.0") and float("38.0000").  Both of those map to 1,
> > which is OK for my purposes.
> 
> That seems... well, "bizarre and wrong" are the only words that come to 
> mind.

I'm trying to intuit, from the values I've been given, which coordinates 
are likely to be accurate to within a few miles.  I'm willing to accept 
a few false negatives.  If the number is float("38"), I'm willing to 
accept that it might actually be float("38.0000"), and I might be 
throwing out a good data point that I don't need to.

For the purpose I'm using the data for, excluding the occasional good 
data point won't hurt me.  Including the occasional bad one, will.

> By the way, you contradict yourself here. Earlier, you described 38.0 as 
> having zero decimal places (which is wrong). Here you describe it as 
> having one, which is correct, and then in a later post you describe it as 
> having zero decimal places again.

I was sloppy there.  I was copy-pasting data from my program output.  
Observe:

>>> print float("38")
38.0

In standard engineering parlance, the string "38" represents a number 
with a precision of +/- 1 unit.  Unfortunately, Python's default str() 
representation turns this into "38.0", which implies +/- 0.1 unit.

Floats represented as strings (at least in some disciplines, such as 
engineering) include more information than just the value.  By the 
number of trailing zeros, they also include information about the 
precision of the measurement.  That information is lost when the string 
is converted to a IEEE float.  I'm trying to intuit that information 
back, and as I mentioned earlier, am willing to accept that the 
intuiting process will be imperfect.  There is real-life value in 
imperfect processes.