For a finite difference method approximating a th order derivative, with th order convergence, the error depends on as
Or equivalently,
depends on a bound of the th derivative, while depends linearly on the machine precision, and the conditioning of the function. Setting the two errors equal to one another gives the lowest error, revealing how the best mesh scales:
E.g., for a first-order derivative, the optimal mesh is
and in the more general case,
Changing from 32 bit to 64 bit Floating Point representations is like multiplying by this factor (the constants should cancel out). See the table below for a break-down of the change to the optimal that comes from this switch. Just squaring would unjustly include the constants twice, but still shows the general trend.
Floating Point Error Details
We wish to estimate . Due to floating point error, we assume
where Denoting the true finite difference (no floating point error) as
and using a finite difference to approximate the numerator,
The resulting error is a combination of truncation and floating point error:
where for instance for the 3-point centered method.
Taking , where bounds , the minimum error occurs at
Practical Floating Point Error
Float32 vs. Float64 / double
How much does the best case error increase by switching from Float32 to Float64? Deep Learning commonly relies on Float32 (or even Automatic Mixed Precision) precision to reduce memory requirements or allow more parameters. This decision often trickles into scientific machine learning, but at what accuracy cost? For a given precision, what is the optimal ?
We can continue, and rearrange , where . Taking , we have
For , this would mean vs. .
Let's suppose with , . Then, what would be, and what would be for ?
In other words, by using double precision, we can use as many points in our mesh as using single precision. On the other hand, if , this factor is .
Example results for on :
Derivative Order
Truncation Order
Predicted Factor
Observed Factor
32bit ,
1
2
500
924
727
2
2
100
150
126
3
2
40
42
92
1
4
40
41
92
2
4
22
29
41
Here, these "Factors" are such that , which we can either predict analytically based on the error form and the values of ("Predicted"). Alternatively we can observe these factors by testing a bunch of on a smooth example problem and compare the optimal between the different precisions ("Observed").