(This post is translated from my Chinese blog https://zhuanlan.zhihu.com/p/30938921562)
Background : I recently returned to the Julia community because I am brainstorming several interface designs for Julia (one of such attempts is presented at https://zhuanlan.zhihu.com/p/30356390469). To avoid disrupting the flow of the main article, I’ve placed my motivations at the end [1], as general non-Julia readers may not be interested in them.
After researching several posts on discord, I’ve completely clarified an issue: the correctness of generic numeric programs cannot be universally verified. Therefore, the only feasible way to avoid errors is to avoid using generic numeric programs on untested inputs. (By generic programs, I mean a function f which can accept different input types).
Specifically, I want to assert this proposition: if f is a generic numeric program, then there does not exist a systematic constraint/proof (e.g., based on types, interfaces, or any check that doesn’t require examining f’s specific implementation) that can prove f is correct for any input type.
Roughly speaking, this is because
a) the correctness of numeric programs can be viewed as the difference between numerical approximation and theoretical definition, which depends on too many factors that cannot be captured by the interfaces alone.
b) generic programs’ correctness specification is also generic, which requires a nontrivial *proof*. That is, if you know for some input type S your program is correct, this correctness will not automatically imply the correctness of another type S’.
This impossibility is neither a problem of program design (such as lack of interfaces) nor a coding issue, but a pure semantic and logical impossibility. It’s similar to how formal proofs cannot verify the correctness of a specification itself. For example, if you accidentally write a wrong specification — you want to implement an addition function, but your specification declares a subtraction function, and then you actually implement addition incorrectly — of course this sounds unlikely, but when doing formal verification, it’s easily to get confused: does a seemingly complex abstract declaration actually declare what we want?
Returning to numerical computation, let me illustrate this impossibility with an actual example from the Julia community:
Mean of integers overflows — bug or expected behaviour? discourse.julialang.org/t/mean-of-integers-overflows-bug-or-expected-behaviour/34767
This example concerns the mean function. You might say, isn’t mean just calculating the average of an array? What’s there to discuss? And it’s easy to think of an algorithm: mean(x) = sum(x) / length(x) — first sum, then divide by length. For floating-point numbers, this might be numerically imprecise, but not a major problem. But let’s think about it — is this approach correct for an integer array?
Let me be more specific: if integers have only finite representation (distinguishing from Python’s unlimited precision), which might cause overflow during summation, is this acceptable behavior? For example, the average of [INT_MAX, INT_MAX] is mathematically INT_MAX, but INT_MAX + INT_MAX will overflow, leading to strange results.
Obviously, there are many different perspectives on this issue:
1. Still use sum(x)/length(x), use infinite precision integers like Python
- Benefits: Integers never overflow
- Drawbacks: Infinite precision calculation is very slow; for large integer arrays, this implementation is practically unusable
2. Still use sum(x)/length(x), let integers silently overflow (treat as algebraic operations on Z-ring) or UB
- Benefits: Implementation is fast
- Drawbacks: Results are incorrect; users might silently get incorrect values
3. Still use sum(x)/length(x), integer overflow is an error
- Benefits: Users won’t silently get incorrect results; implementation is simple with overflow checking
- Drawbacks: Implementation is slow, but better than infinite precision
4. Convert elements in x to floating-point before summing, then calculate the mean as for a floating-point array
- Benefits: Results are correct
- Drawbacks: Additional conversion has performance costs, and now the program will not be generic
None of these solutions is better than the others, but that’s not my main point. What I want to emphasize is that the absolutely correct mathematical definition mean(x) = sum(x) / length(x) is so deceptive — this elegant definition is almost useless in actual numerical computation because it can’t calculate meaningful results: either it’s wrong due to overflow or numerically unstable.
This is obvious in the deep learning community: neural networks using different precisions (FP32, FP16, BF16, FP8) behave completely differently. FP16/BF16 require mixed precision to avoid divergence (normalization layers use FP32 for accumulation), FP8 kernels must be written manually to ensure performance and prevent