At the moment, the array module of the standard library allows to
create arrays of different numeric types and to initialize them from
an iterable (eg, another array).
What's missing is the possiblity to specify the final size of the
array (number of items), especially for large arrays.
I'm thinking of suffix arrays (a text indexing data structure) for
large texts, eg the human genome and its reverse complement (about 6
billion characters from the alphabet ACGT).
The suffix array is a long int array of the same size (8 bytes per
number, so it occupies about 48 GB memory).
At the moment I am extending an array in chunks of several million
items at a time at a time, which is slow and not elegant.
The function below also initializes each item in the array to a given
value (0 by default).
Is there a reason why there the array.array constructor does not allow
to simply specify the number of items that should be allocated? (I do
not really care about the contents.)
Would this be a worthwhile addition to / modification of the array module?
My suggestions is to modify array generation in such a way that you
could pass an iterator (as now) as second argument, but if you pass a
single integer value, it should be treated as the number of items to
allocate.
Here is my current workaround (which is slow):
def filled_array(typecode, n, value=0, bsize=(1<<22)):
"""returns a new array with given typecode
(eg, "l" for long int, as in the array module)
with n entries, initialized to the given value (default 0)
"""
a = array.array(typecode, [value]*bsize)
x = array.array(typecode)
r = n
while r >= bsize:
x.extend(a)
r -= bsize
x.extend([value]*r)
return x

I just spent a few minutes staring at a bug caused by a missing comma
-- I got a mysterious argument count error because instead of foo('a',
'b') I had written foo('a' 'b').
This is a fairly common mistake, and IIRC at Google we even had a lint
rule against this (there was also a Python dialect used for some
specific purpose where this was explicitly forbidden).
Now, with modern compiler technology, we can (and in fact do) evaluate
compile-time string literal concatenation with the '+' operator, so
there's really no reason to support 'a' 'b' any more. (The reason was
always rather flimsy; I copied it from C but the reason why it's
needed there doesn't really apply to Python, as it is mostly useful
inside macros.)
Would it be reasonable to start deprecating this and eventually remove
it from the language?
--
--Guido van Rossum (python.org/~guido)

hello,
*the pep 397 says that any python script is able to choose the language
version which will run it, between all the versions installed on the
computer, using on windows a launcher in the "C:\windows" folder.*
can the idle version be chosen like this too, or can the idle "run" command
do it?
(except it is already like this and I have a problem)
thank you, and have a nice day/evening!
(and sorry if this sounds irritating)

Hi folks,
After much discussion on this list, I have written up a PEP, and it is
ready for review (see below)
It is also here: https://www.python.org/dev/peps/pep-0485/
That version is not quite up to date just yet, so please refer to the one
enclosed in this email for now.
I am managing both the PEP and a sample implementation and tests in gitHub
here:
https://github.com/PythonCHB/close_pep
Please go there if you want to try it out, add some tests, etc. Pull
requests welcomed for code, tests, or PEP editing.
A quick summary of the decisions I made, and what I think are the open
discussion points:
The focus is on relative tolerance, but with an optional absolute
tolerance, primarily to be used near zero, but it also allows it to be used
as a plain absolute difference check.
It is using an asymmetric test -- that is, the tolerance is computed
relative to one of the arguments. It is perhaps surprising and confusing
that you may get a different result if you reverse the arguments, but in
this discussion it became clear that there were some use-cases where it was
helpful to know exactly what the tolerance is computed relative too, and
that in most use cases, if just doesn't matter. I hope this is adequately
explained in the PEP. We could add a flag to set a symmetric test (I'd go
with what boost calls the "strong" test), but I'd rather not -- it just
confuses things, and I expect users will tend to use defaults anyway.
It is designed to work mostly with floats, but also supports Integer,
Decimal, Fraction, and Complex. I'm not really thrilled with that, though,
it turns out to be not quite as easy to duck-type it as I had hoped. To
really do it right, there would have to be more switching on type in the
code, which I think is ugly to write -- contributions, opinions welcome on
this.
I used 1e-8 as a default relative tolerance -- arbitrarily because that's
about half of the decimal digits in a python float -- suggestions welcome.
Other than that, of course, we can bike-shed the names of the function and
the parameters. ;-)
Fire away!
-Chris
PEP: 485
Title: A Function for testing approximate equality
Version: $Revision$
Last-Modified: $Date$
Author: Christopher Barker <Chris.Barker(a)noaa.gov>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 20-Jan-2015
Python-Version: 3.5
Post-History:
Abstract
========
This PEP proposes the addition of a function to the standard library
that determines whether one value is approximately equal or "close"
to another value.
Rationale
=========
Floating point values contain limited precision, which results in
their being unable to exactly represent some values, and for error to
accumulate with repeated computation. As a result, it is common
advice to only use an equality comparison in very specific situations.
Often a inequality comparison fits the bill, but there are times
(often in testing) where the programmer wants to determine whether a
computed value is "close" to an expected value, without requiring them
to be exactly equal. This is common enough, particularly in testing,
and not always obvious how to do it, so it would be useful addition to
the standard library.
Existing Implementations
------------------------
The standard library includes the
``unittest.TestCase.assertAlmostEqual`` method, but it:
* Is buried in the unittest.TestCase class
* Is an assertion, so you can't use it as a general test (easily)
* Uses number of decimal digits or an absolute delta, which are
particular use cases that don't provide a general relative error.
The numpy package has the ``allclose()`` and ``isclose()`` functions.
The statistics package tests include an implementation, used for its
unit tests.
One can also find discussion and sample implementations on Stack
Overflow, and other help sites.
These existing implementations indicate that this is a common need,
and not trivial to write oneself, making it a candidate for the
standard library.
Proposed Implementation
=======================
NOTE: this PEP is the result of an extended discussion on the
python-ideas list [1]_.
The new function will have the following signature::
is_close_to(actual, expected, tol=1e-8, abs_tol=0.0)
``actual``: is the value that has been computed, measured, etc.
``expected``: is the "known" value.
``tol``: is the relative tolerance -- it is the amount of error
allowed, relative to the magnitude of the expected value.
``abs_tol``: is an minimum absolute tolerance level -- useful for
comparisons near zero.
Modulo error checking, etc, the function will return the result of::
abs(expected-actual) <= max(tol*expected, abs_tol)
Handling of non-finite numbers
------------------------------
The IEEE 754 special values of NaN, inf, and -inf will be handled
according to IEEE rules. Specifically, NaN is not considered close to
any other value, including NaN. inf and -inf are only considered close
to themselves.
Non-float types
---------------
The primary use-case is expected to be floating point numbers.
However, users may want to compare other numeric types similarly. In
theory, it should work for any type that supports ``abs()``,
comparisons, and subtraction. The code will be written and tested to
accommodate these types:
* ``Decimal``: for Decimal, the tolerance must be set to a Decimal type.
* ``int``
* ``Fraction``
* ``complex``: for complex, ``abs(z)`` will be used for scaling and
comparison.
Behavior near zero
------------------
Relative comparison is problematic if either value is zero. In this
case, the difference is relative to zero, and thus will always be
smaller than the prescribed tolerance. To handle this case, an
optional parameter, ``abs_tol`` (default 0.0) can be used to set a
minimum tolerance to be used in the case of very small relative
tolerance. That is, the values will be considered close if::
abs(a-b) <= abs(tol*expected) or abs(a-b) <= abs_tol
If the user sets the rel_tol parameter to 0.0, then only the absolute
tolerance will effect the result, so this function provides an
absolute tolerance check as well.
A sample implementation is available (as of Jan 22, 2015) on gitHub:
https://github.com/PythonCHB/close_pep/blob/master/is_close_to.py
Relative Difference
===================
There are essentially two ways to think about how close two numbers
are to each-other: absolute difference: simply ``abs(a-b)``, and
relative difference: ``abs(a-b)/scale_factor`` [2]_. The absolute
difference is trivial enough that this proposal focuses on the
relative difference.
Usually, the scale factor is some function of the values under
consideration, for instance:
1) The absolute value of one of the input values
2) The maximum absolute value of the two
3) The minimum absolute value of the two.
4) The arithmetic mean of the two
Symmetry
--------
A relative comparison can be either symmetric or non-symmetric. For a
symmetric algorithm:
``is_close_to(a,b)`` is always equal to ``is_close_to(b,a)``
This is an appealing consistency -- it mirrors the symmetry of
equality, and is less likely to confuse people. However, often the
question at hand is:
"Is this computed or measured value within some tolerance of a known
value?"
In this case, the user wants the relative tolerance to be specifically
scaled against the known value. It is also easier for the user to
reason about.
This proposal uses this asymmetric test to allow this specific
definition of relative tolerance.
Example:
For the question: "Is the value of a within x% of b?", Using b to
scale the percent error clearly defines the result.
However, as this approach is not symmetric, a may be within 10% of b,
but b is not within x% of a. Consider the case::
a = 9.0
b = 10.0
The difference between a and b is 1.0. 10% of a is 0.9, so b is not
within 10% of a. But 10% of b is 1.0, so a is within 10% of b.
Casual users might reasonably expect that if a is close to b, then b
would also be close to a. However, in the common cases, the tolerance
is quite small and often poorly defined, i.e. 1e-8, defined to only
one significant figure, so the result will be very similar regardless
of the order of the values. And if the user does care about the
precise result, s/he can take care to always pass in the two
parameters in sorted order.
This proposed implementation uses asymmetric criteria with the scaling
value clearly identified.
Expected Uses
=============
The primary expected use case is various forms of testing -- "are the
results computed near what I expect as a result?" This sort of test
may or may not be part of a formal unit testing suite.
The function might be used also to determine if a measured value is
within an expected value.
Inappropriate uses
------------------
One use case for floating point comparison is testing the accuracy of
a numerical algorithm. However, in this case, the numerical analyst
ideally would be doing careful error propagation analysis, and should
understand exactly what to test for. It is also likely that ULP (Unit
in the Last Place) comparison may be called for. While this function
may prove useful in such situations, It is not intended to be used in
that way.
Other Approaches
================
``unittest.TestCase.assertAlmostEqual``
---------------------------------------
(
https://docs.python.org/3/library/unittest.html#unittest.TestCase.assertAlm…
)
Tests that values are approximately (or not approximately) equal by
computing the difference, rounding to the given number of decimal
places (default 7), and comparing to zero.
This method was not selected for this proposal, as the use of decimal
digits is a specific, not generally useful or flexible test.
numpy ``is_close()``
--------------------
http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.isclose.html
The numpy package provides the vectorized functions is_close() and
all_close, for similar use cases as this proposal:
``isclose(a, b, rtol=1e-05, atol=1e-08, equal_nan=False)``
Returns a boolean array where two arrays are element-wise equal
within a tolerance.
The tolerance values are positive, typically very small numbers.
The relative difference (rtol * abs(b)) and the absolute
difference atol are added together to compare against the
absolute difference between a and b
In this approach, the absolute and relative tolerance are added
together, rather than the ``or`` method used in this proposal. This is
computationally more simple, and if relative tolerance is larger than
the absolute tolerance, then the addition will have no effect. But if
the absolute and relative tolerances are of similar magnitude, then
the allowed difference will be about twice as large as expected.
Also, if the value passed in are small compared to the absolute
tolerance, then the relative tolerance will be completely swamped,
perhaps unexpectedly.
This is why, in this proposal, the absolute tolerance defaults to zero
-- the user will be required to choose a value appropriate for the
values at hand.
Boost floating-point comparison
-------------------------------
The Boost project ( [3]_ ) provides a floating point comparison
function. Is is a symetric approach, with both "weak" (larger of the
two relative errors) and "strong" (smaller of the two relative errors)
options.
It was decided that a method that clearly defined which value was used
to scale the relative error would be more appropriate for the standard
library.
References
==========
.. [1] Python-ideas list discussion thread
(https://mail.python.org/pipermail/python-ideas/2015-January/030947.html)
.. [2] Wikipedaia page on relative difference
(http://en.wikipedia.org/wiki/Relative_change_and_difference)
.. [3] Boost project floating-point comparison algorithms
(
http://www.boost.org/doc/libs/1_35_0/libs/test/doc/components/test_tools/fl…
)
Copyright
=========
This document has been placed in the public domain.
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker(a)noaa.gov

The subprocess module provides some nice tools to control the details of
running a process, but it's still rather awkward for common use cases where
you want to execute a command in one go.
* There are three high-level functions: call, check_call and check_output,
which all do very similar things with different return/raise behaviours
* Their naming is not very clear (check_output doesn't check the output, it
checks the return code and captures the output)
* You can't use any of them if you want stdout and stderr separately.
* You can get stdout and returncode from check_output, but it's not exactly
obvious:
try:
stdout = check_output(...)
returncode = 0
except CalledProcessError as e:
stdout = e.output
returncode = e.returncode
I think that what these are lacking is a good way to represent a process
that has already finished (as opposed to Popen, which is mostly designed to
handle a running process). So I would:
1. Add a CompletedProcess class:
* Attributes stdout and stderr are bytes if the relevant stream was piped,
None otherwise, like the return value of Popen.communicate()
* Attribute returncode is the exit status
* ? Attribute cmd is the list of arguments the process ran with (not sure
if this should be there or not)
* Method cp.check_returncode() raises CalledProcessError if returncode !=
0, inspired by requests' Response.raise_for_status()
2. Add a run() function - like call/check_call/check_output, but returns a
CompletedProcess instance
3. Deprecate call/check_call/check_output, but leave them around
indefinitely, since lots of existing code relies on them.
Thanks,
Thomas

Hello,
The other day, I was looking for an "atexit" equivalent at the function
level. I was hoping to replace code like this:
def my_function(bla, fasel):
with ExitStack() as cm:
...
cm.push(...)
...
with something like:
@handle_on_return
def my_function(bla, fasel, on_return):
...
on_return.push(...)
...
It seems that contextlib *almost* offers a way to do this with ExitStack
and ContextDecorator - but as far as I can tell the final piece is
missing, because ContextDecorator does not offer a way to pass the
context manager to the decorated function. However, the following
decorator does the job:
def handle_on_return(fn):
@functools.wraps(fn)
def wrapper(*a, **kw):
with contextlib.ExitStack() as on_return:
kw['on_return'] = on_return
return fn(*a, **kw)
return wrapper
It's not a lot of code, but my feeling is that not anyone who might be
interested in an "on_return" functionality would be able to come up with
this right away.
Would it make sense to add something along these lines to contextlib?
Maybe instead of a new decorator, ContextDecorator could also take an
additional keyword argument that tells it to pass the context manager to
the decorated function?
Best,
-Nikolaus
--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
»Time flies like an arrow, fruit flies like a Banana.«

Hi,
postgreSQL supports infinity for datetime:
http://www.postgresql.org/docs/current/static/datatype-datetime.html#AEN6027
{{{
infinity date, timestamp later than all other time stamps
-infinity date, timestamp earlier than all other time stamps
}}}
Mapping this to python is not possible at the moment.
See:
http://initd.org/psycopg/docs/usage.html#infinite-dates-handling
{{{
PostgreSQL can store the representation of an “infinite” date, timestamp, or interval. Infinite dates are not available
to Python, so these objects are mapped to date.max, datetime.max, interval.max. Unfortunately the mapping cannot be
bidirectional so these dates will be stored back into the database with their values, such as 9999-12-31.
}}}
I don't know the internals of the datetime module. I guess it is not possible to support infinity.
What do you think?
Thomas Güttler

Hi all, maybe the best list would be python-dev but I don't dare
waking the sleeping lions there :)
So my question is this: is there a coherent mobile strategy among core
dev people? I mean you guys release python for linux/macos/windows and
the question is if there are any plans to do the same for a mobile
platform. It doesn't have to be android or ios just anything that the
core dev team chooses and sticks with.
I've been developing python apps for smartphones (mostly hobby
projects though) using sl4a but that seems like is dead. Now people
suggest me using kivy which seems to be alive but who knows how long.
There are some other projects qpython, etc, which are small and
equally not so reliable at least looking from the outside. Is kivy now
an officially blessed distribution? Since google was so integral to
both python (through employing Guido) and android I'd think it would
make sense for google to have an official python environment for
android in cooperation with the python dev team.
Does the PSF has an opinion on this? It would be great if there would
be something for mobile phones that we could rely on not going away
just as with linux/macos/windows.
Or there are some issues which preclude this from the start?
Cheers,
Daniel
--
Psss, psss, put it down! - http://www.cafepress.com/putitdown

OOPS, sent to only Paul by mistake the first time.
-Chris
On Sun, Jan 25, 2015 at 11:07 PM, Paul Moore <p.f.moore(a)gmail.com> wrote:
> And to many users (including me, apparently - I expected the first one
> to give False), the following is "floating point arcana":
>
> >>> 0.1*10 == 1.0
> True
> >>> sum([0.1]*10)
> 0.9999999999999999
> >>> sum([0.1]*10) == 1
> False
Any of the approaches on the table will do something reasonable in this
case:
In [4]: is_close_to.is_close_to(sum([0.1]*10), 1)
testing: 0.9999999999999999 1
Out[4]: True
Note that the 1e-8 d3fault I chose (which I am not committed to) is not
ENTIRELY arbitrary -- it's about half the digits carried by a python float
(double) -- essentially saying the values are close to about half of the
precision available. And we are constrained here, the options are between
0.1 (which would be crazy, if you ask me!) and 1e-14 -- any larger an it
would meaningless, and any smaller, and it would surpass the precision of a
python float. PIcking a default near the middle of that range seems quite
sane to me.
This is quite different than setting a value for an absolute tolerance --
saying something is close to another number if the difference is less than
1e-8 would be wildly inappropriate when the smallest numbers a float can
hold are on order of 1e-300!
> This does seem relatively straightforward, though. Although in the
> second case you glossed over the question of X% of *what* which is the
> root of the "comparison to zero" question, and is precisely where the
> discussion explodes into complexity that I can't follow, so maybe
> that's precisely the bit of "floating point arcana" that the naive
> user doesn't catch on to.
arcana, maybe, not it's not a floating point issue -- X% of zero is zero
absolutely precisely.
But back to a point made earlier -- the idea here is to provide something
better than naive use of
x == y
for floating point. A default for the relative tolerance provides that.
There is no sane default for absolute tolerance, so I dont think we should
set one.
Note that the zero_tolerance idea provides a way to let
is_close_to(something, 0.0) work out of the box, but I'm not sure we can
come up with a sane default for that either -- in which case, what's the
point?
> I'm not sure what you're saying here - by "not setting defaults" do
> you mean making it mandatory for the user to supply a tolerance, as I
> suggested above?
I for one, think making it mandatory to set one would be better than just
letting the zeros get used.
I've used the numpy version of this a lot for tests, , and my work flow is
usually:
write a test with the defaults
if it pases, I'm done.
If it fails, then I look and see if my code is broken, or if I can accept a
larger tolerance.
So I'm quite happy to have a default.
> I really think that having three tolerances, once of which is nearly
> > always ignored, is poor API design. The user usually knows when they are
> > comparing against an expected value of zero and can set an absolute
> > error tolerance.
>
> Agreed.
>
also agreed -- Nathanial -- can you live with this?
Note that Nathaniel found a LOT of examples of people using
assertAlmostEqual to compare to zero -- I think that's why he thinks it's
important to have defaults that do something sane for that case. However,
that is an absolute comparison function -- inherently different anyway --
it's also tied to "number of digits after the decimal place", so only
appropriate for values near 1 anyway -- so you can define a sensible
default there. not so here.
> - Absolute tolerance defaults to zero (which is equivalent to
> > exact equality).
>
yup
> > - Relative tolerance defaults to something (possibly zero) to be
> > determined after sufficient bike-shedding.
>
> Starting the bike-shedding now, -1 on zero. Having is_close default to
> something that most users won't think of as behaving like their naive
> expectation of "is close" (as opposed to "equals") would be confusing.
1e-8 -- but you already know that ;-) -- anything between 1e-8 and 1e-12
would be fine with me.
> Just make it illegal to set both. What happens when you have both set
> is another one of the things that triggers discussions that make my
> head explode.
sorry about the head -- but setting both is a pretty useful use-case -- you
have a bunch of values you want to do a realive test on. Some of them maybe
exactly zero (note -- it could be either expected or actual) -- and you
know what "close to zero" means to you. so you set that for abs_tolerance.
Done.
And I actually didn't think anyone objected to that approach -- though
maybe the exploded heads weren't able to write ;-)
In fact, an absolute tolerance is so easy that I wouldn't write a function
for it:
abs(expected - actual) <= abs_tolerance
I ended up adding it to my version to deal with the zero case.
> Having said that, I don't think the name "is_close" makes the
> asymmetry clear enough. Maybe "is_close_to" would work better (there's
> still room for bikeshedding over which of the 2 arguments is implied
> as the "expected" value in that case)?
I now have "expected" and "actual", but I think those makes are too unclear
-- I like "expected" -- anyone have a better idea for the other one?
> > abs(actual - expected) <= relative_tolerance*expected
> >
> > Now if expected is zero, the condition is true if and only if
> > actual==expected.
>
> I would call out this edge case explicitly in the documentation.
It is called out, but I guess I need to make that clearer.
Overall, I think it would be better to simplify the proposed function
> in order to have it better suit the expectations of its intended
> audience, rather than trying to dump too much functionality in it on
> the grounds of making it "general".
right -- that is why I've been resisting adding flags for all the various
options
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker(a)noaa.gov
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker(a)noaa.gov

Sorry,
This slipped off list -- bringin it back.
On Mon, Jan 26, 2015 at 12:40 PM, Paul Moore <p.f.moore(a)gmail.com> wrote:
> > Any of the approaches on the table will do something reasonable in this
> > case:
> >
> > In [4]: is_close_to.is_close_to(sum([0.1]*10), 1)
> > testing: 0.9999999999999999 1
> > Out[4]: True
>
> Yes, but that's not my point. I was responding to Steven's comment
> that having 2 different types of tolerance isn't "arcana", by pointing
> out that I find even stuff as simple as multiplication vs cumulative
> addition confusing. And I should note that I was (many years ago!) a
> maths graduate and did some numerical maths courses, so this stuff
> isn't completely unknown to me.
Right it can be arcane -- which is why I want this function, and why we
want it to do something "sane" most of the time, be default.
> Note that the 1e-8 default I chose (which I am not committed to) is not
> > ENTIRELY arbitrary -- it's about half the digits carried by a python
> float
> > (double) -- essentially saying the values are close to about half of the
> > precision available. And we are constrained here, the options are between
> > 0.1 (which would be crazy, if you ask me!) and 1e-14 -- any larger an it
> > would meaningless, and any smaller, and it would surpass the precision
> of a
> > python float. PIcking a default near the middle of that range seems quite
> > sane to me.
>
> Sorry, that means nothing to me. Head exploding time again :-)
Darn -- I'll try again -- with a relative tolerence, two values are only
going to be close if their exponent is within one of each-other. So what
you are setting is how many digits of the mantisa you care about. a
toleranc eof 0.1 would be about one digit, and a tolerance of 1e-15 would
be 15 digits. Python floats carry about 15 digits -- so the relative
tolerance has to be betwwen 1e-1 and 1e-15 -- nothign else is useful or
makes sense. So I put it in the middle: 1e-8
> > This is quite different than setting a value for an absolute tolerance --
> > saying something is close to another number if the difference is less
> than
> > 1e-8 would be wildly inappropriate when the smallest numbers a float can
> > hold are on order of 1e-300!
>
> On the other hand, I find this completely obvious. (Well, mostly -
> don't the gaps between the representable floats increase as the
> magnitude gets bigger, so an absolute tolerance of 1e-8 might be
> entirely reasonable when the numbers are sufficiently high?
sure it would -- that's the point -- what makes sense as an absolute
tolerance depends entirely on the magnitude of the numbers -- since we
don't know the magnitude of the numbers someone may use, we can't set a
reasonable default.
> arcana, maybe, not it's not a floating point issue -- X% of zero is zero
> > absolutely precisely.
>
> But the "arcana" I was talking about is that a relative error of X%
> could be X% of the value under test, of the expected value, of their
> average, or something else.
Ahh! -- which is exactly the point I think some of us are making --
defining X% error relative to the "expected" value is the simplest and most
straightforward to explain. That's the primary reason I prefer it.
And only *one* of those values is zero, so
> whether X% is a useful value is entirely dependent on the definition.
>
not sure what you meant here, but actually relative error goes to heck if
either value is zero, and with any of the definitions we are working with.
So X% is useful for any value except if one of the values is zero.
> And how relative errors are defined *is* floating point arcana (I can
> picture the text book page now, and it wasn't simple...)
>
semantics here -- defining a realtive error can be done with pure real
numbers -- computing it can get complex with floating point.
> But back to a point made earlier -- the idea here is to provide something
> > better than naive use of
> >
> > x == y
>
<snip>
> I still wonder whether "naive use of equality" is much of a target,
> though. There are only two use cases that have been mentioned so far.
> Testing is not about equality, because we're replacing
> assertAlmostEqual. And when someone is doing an iterative algorithm,
> they are looking for convergence, i.e. within a given range of the
> answer. So neither use case would be using an equality test anyway.
>
well, the secondary target is a better (or more flexible)
assertAlmostEqual. It is not suitable for arbitrarily large or small
numbers, and particularly not numbers with a range of magnitudes -- a
relative difference test is much needed.
I'm not sure I follow your point, but I will say that if Nathaniel has
> seen a lot of use cases for assertAlmostEqual that can't be easily
> handled with the new function, then something is badly wrong.
Well, I"m not suggesting that we replace assertAlmostEqual -- but rather
augment it. IN fact, assertAlmostEqual is actually a an absolute tolerance
test (expressed in terms f decimal places). That is the right thing, and
the only right thing to use when you want to compare to zero.
What I'm proposing a relative tolerance test, which is not the right thing
to use for comparing to zero, but is the right thing to use when comparing
numbers of varying magnitude.
> There
> aren't enough good use cases that we can reasonably decide to reject
> any of them as out of scope,
I've lost track of what we might be rejecting.
The whole symmetric -- non symmetric argument really is bike shedding -- in
the context of "better than ==" or "different but as good as
assetAlmostEqual" -- any of them are just fine.
so really all wer aare left with is defaults -- also bike-shedding, except
for the default for the zero test, and there are really two options there:
use 0.0 for abs_tolerance, and have it fail for any test against zero
unless the user specifies something approporate for their use case.
or
use a SOME default, either for abs_tolerance or zero_tolerance, and make an
assumption about the ofder of magnitide of the lielky results, so that it
will "jsut work" for tests against zero. Maybe something small relative to
one (like 1e-8) would be OK, but that concerns me -- I think you'd get
false positives for small numbers which is worse that false negatives for
all comparisons to zero.
> 1e-8 -- but you already know that ;-) -- anything between 1e-8 and 1e-12
> > would be fine with me.
>
> TBH all I care about in this context is that there must be 2 values x
> and y for which is_close(x,y) == True and x != y.
everything on the table will do that.
I'm tempted to
> strengthen that to "for all y there must be at least 1 x such that..."
> but it's possible that's *too* strong and it can't be achieved.
>
I think we could say that for all y except 0.0 -- and even zero if an
abs_tolerance is greater than zero is set.
> Basically, the default behaviour needs to encompass what I believe is
> most people's intuition - that "close" is a proper superset of
> "equal".
>
A good reason not to have all defaults be zero -- I don't think we need a
function that doesn't work at all with default values.
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker(a)noaa.gov