Finite Differences

Resources

Main Idea

TODO

Error Analysis

For a finite difference method approximating a dth order derivative, with kth order convergence, the error depends on nx as

C1nxkTruncation Error+C2nxdFloating Point Error.

Or equivalently,

C1ΔxkTruncation Error+C2ΔxdFloating Point Error.

C1 depends on a bound of the (d+k)th derivative, while C2 depends linearly on the machine precision, and the conditioning of the function. Setting the two errors equal to one another gives the lowest error, revealing how the best mesh scales:

C1nxk=c2εmachnxdC1c2εmach1=nxd+knxεmach1/(d+k)

E.g., for a first-order derivative, the optimal mesh is

nxεmach1/(k+1),orΔx=εmach1/(k+1),

and in the more general case,

nxεmach1/(k+d),orΔxεmach1/(k+d),

Changing from 32 bit to 64 bit Floating Point representations is like multiplying by this factor (the constants should cancel out). See the table below for a break-down of the change to the optimal nx that comes from this switch. Just squaring nx would unjustly include the constants twice, but still shows the general trend.

Floating Point Error Details

We wish to estimate f(x). Due to floating point error, we assume

fl(x)=(1+ε)x,

where |ε|εmachine precision1016. Denoting the true finite difference (no floating point error) as

δ(x)=f(x+h)f(x)h.

Thus,

f(x)=δ(x)+τf(h)O(h).

Recalling the relative Condition Number,

κf(x)=|xf(x)f(x)|,

and using a finite difference to approximate the numerator,

κf(x)|x||f(x+h)f(x)h|.

The resulting error is a combination of truncation and floating point error:

e(h)=|xf(x)Δxhf(x)|εh+τf(h),

where for instance τh=O(h2) for the 3-point centered method.

Taking τh=(Mh2)/6, where M bounds f, the minimum error occurs at

h=(3εM)1/3.

Practical Floating Point Error

Float32 vs. Float64 / double

How much does the best case error increase by switching from Float32 to Float64? Deep Learning commonly relies on Float32 (or even Automatic Mixed Precision) precision to reduce memory requirements or allow more parameters. This decision often trickles into scientific machine learning, but at what accuracy cost? For a given precision, what is the optimal nx?

We can continue, and rearrange h=(cε)1/3, where c=κf(x)f(x). Taking c=1, we have

ε=1016h=1016/3=105.3¯4.64158883106ε=108h=108/3=102.6¯2.15443469103

For [1,1], this would mean nx=430886 vs. nx=928.

nx=2h=2c1/3ε1/3

Let's suppose with εf=108, nxf=100. Then, what would c be, and what would nxd be for εd=1016?

hf=2/nxfεd=εf2c=(2nxf)31εf=(2nxd)31εd8(nxf)31εf=8(nxd)31εf21(nxf)3=1(nxd)31εf(nxf)3=(nxd)3εfnxf=nxd(εf)1/3nxd1500nxd=nxf(εf)1/3nxf500

In other words, by using double precision, we can use ×500 as many points in our mesh as using single precision. On the other hand, if εf=105.3, this factor is ×58.4.

Example results for sin(x) on [1,1]:

Derivative Order Truncation Order Predicted Factor Observed Factor 32bit nx, nxf
1 2 500 924 727
2 2 100 150 126
3 2 40 42 92
1 4 40 41 92
2 4 22 29 41

Here, these "Factors" are cfac such that nxd=cfacnxf, which we can either predict analytically based on the error form and the values of ε ("Predicted"). Alternatively we can observe these factors by testing a bunch of nx on a smooth example problem and compare the optimal nx between the different precisions ("Observed").