NASA has a list of 10 rules for software development
Read the full document (local copy)
or the original.
Read it first, before reading these comments.
Those rules were written from the point of view of people writing
embedded software for extremely expensive spacecraft, where tolerating
a lot of programming pain is a good tradeoff for not losing a mission.
I do not know why someone in that situation does not use the SPARK
subset of Ada, which subset was explicitly designed for verification,
and is simply a better starting point for embedded programming than C.
I am criticising them from the point of view of people writing
programming language processors (compilers, interpreters, editors)
and application software.
We are supposed to teach critical thinking. This is an example.
- How have Gerard J. Holzmann’s and my different contexts affected
our judgement? - Can you blindly follow his advice without considering your
context? - Can you blindly follow my advice without considering
your context? - Would these rules necessarily apply to a different/better
programming language? What if function pointers
were tamed? What if the language provided opaque abstract
data types as Ada does?
1. Restrict all code to very simple control flow constructs —
do not use goto
statements,
setjmp()
or longjmp()
constructs,
and direct or indirect recursion.
Note that setjmp()
and longjmp()
are how C does exception handling, so this rule bans any use
of exception handling.
It is true that banning recursion and jumps and loops without
explicit bounds means that you know your program is
going to terminate. It is also true that recursive functions
can be proven to terminate about as often as loops can, with
reasonably well-understood methods. What’s more important here is
that “sure to terminate” does not imply
“sure to terminate in my lifetime”:
int const N = 1000000000; for (x0 = 0; x0 != N; x0++) for (x1 = 0; x1 != N; x1++) for (x2 = 0; x2 != N; x2++) for (x3 = 0; x3 != N; x3++) for (x4 = 0; x4 != N; x4++) for (x5 = 0; x5 != N; x5++) for (x6 = 0; x6 != N; x6++) for (x7 = 0; x7 != N; x7++) for (x8 = 0; x8 != N; x8++) for (x9 = 0; x9 != N; x9++) -- do something --;
This does a bounded number of iterations. The bound is N10.
In this case, that’s 1090. If each iteration of the loop body
takes 1 nsec, that’s 1081 seconds, or about 7.9×1072
years. What is the practical difference between “will stop
in 7,900,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000
years” and “will never stop”?
Worse still, taking a problem that is naturally expressed
using recursion and contorting it into something that manipulates an
explicit stack, while possible, turns clear maintainable code into
buggy spaghetti. (I’ve done it, several times. There’s an example
on this web site. It is not a good idea.)
2. All loops must have a fixed upper-bound. It must be trivially
possible for a checking tool to prove statically that a preset
upper-bound on the number of iterations of a loop cannot be exceeded.
If the loop-bound cannot be proven statically, the rule is considered
violated.
This is an old idea. As the example above shows, it is not enough
by itself to be of any practical use. You have to try to make the
bounds reasonably tight, and you have to regard hitting an
artificial bound as a run-time error.
By the way, note that putting depth bounds on recursive procedures
makes them every bit as safe as loops with fixed bounds.
3. Do not use dynamic memory allocation after initialization.
This is also a very old idea. Some languages designed for embedded
work don’t even have dynamic memory allocation. The big
thing, of course, is that embedded applications have a fixed amount of
memory to work with, are never going to get any more, and should not
crash because they couldn’t handle another record.
Note that the rationale actually supports a much stronger rule:
don’t even simulate dynamic memory allocation. You can of
course manage your own storage pool:
typedef struct Foo_Record *foo; struct Foo_Record { foo next; ... }; #define MAX_FOOS ... static struct Foo_Record foo_zone[MAX_FOOS]; foo foo_free_list = 0; void init_foo_free_list() { for (int i = MAX_FOOS - 1; i >= 0; i--) { foo_zone[i].next = foo_free_list; foo_free_list = &foo_zone[i]; } } foo malloc_foo() { foo r = foo_free_list; if (r == 0) report_error(); foo_free_list = r->next; return r; } void free_foo(foo x) { x->next = foo_free_list; foo_free_list = x; }
This technically satisfies the rule, but it
violates the spirit of the rule. Simulating malloc()
and free() this way is worse than using the real
thing, because the memory in foo_zone is permanently tied up
for Foo_Records, even if we don’t need any of those at the
moment but do desperately need the memory for something else.
What you really need to do is to use a memory allocator
with known behaviour, and to prove that the amount of memory
in use at any given time (data bytes + headers) is bounded
by a known value.
Note also that SPlint can verify at compile time that
the errors NASA speak of do not occur.
One of the reasons given for the ban is that the performance
of malloc() and free() is unpredictable. Are these the only
functions we use with unpredictable performance? Is there
anything about malloc() and free() which makes them
necessarily unpredictable? The existence of
hard-real-time garbage collectors suggests not.
The rationale for this rule says that
Note that the only way
to dynamically claim memory in the absence of memory allocation from the
heap is to use stack memory. In the absence of recursion (Rule 1), an
upper bound on the use of stack memory can derived statically, thus
making it possible to prove that an application will always live within
its pre-allocated memory means.
Unfortunately, the sunny optimism shown here
20 Comments
jmclnx
nice, saved because these days who know ho long the page will be available :)
readthenotes1
Someone in NASA once told me that it was easier to teach a mechanical engineer to program then a software developer mechanical engineering.
From that perspective, the avoidance of recursion is more compelling. Plus, Fortran didn't support it…
dooglius
Title should indicate that this is a _criticism_ of the rules.
einpoklum
Well, I was expecting to see:
Rule 1: Don't write any code that can make the ship go ka-boom.
Rule 2: We need to consistently use the f'ing metric system and SI units – and don't you forget it.
Rule 3: … You haven't already forgotten about rule #2, have you?
nealabq
No recursion means no Erlang. Which means no RabbitMQ?
toolslive
from the original document: "critical code is written in C." The document is not dated, but it's probably quite old (I'm guessing 30-something years). Writing critical code in C is probably a mistake, but once you find yourself in that situation, you will find these rules are too tight (which is also what the criticism is about). You should probably read them as "try to avoid …".
So I would just prepend the document with "Rule 0: Try to avoid writing critical code in C."
zoogeny
> People using Ada, Pascal (Delphi), JavaScript, or functional languages should also declare types and functions as locally as possible.
My own personal approach in JavaScript is to avoid defining functions in a nested manner unless I explicitly want to capture a value from the enclosing scope.
This is probably due to an outdated mental model I had where it was shown in performance profiling that a function would be redefined every time the enclosing function was called. I doubt this is how any reasonable modern JavaScript interpreter works although I haven't kept up. Since the introduction of arrow functions (a long time ago now in relative terms) their prolific use has probably lead to deep optimizations that render this old mental model completely useless.
But old habits dies hard I guess and now I keep any named function that does not capture local variables at a file/module scope.
A lot of the other notes are interesting and very nit-picky in the "technically correct is the best kind of correct" way that older engineers eat up. I feel the general tone of carefulness that the NASA rules is trying to communicate to be very good and I would support most of them in the context that they are enforced.
jsrcout
I work with a lot of embedded and embedded-adjacent software, and even I think several of these rules are too much. Having said that, Holzmann's rules are from 2006, and embedded / space qualified hardware has improved quite a bit since then. These days planetary probes run C++, and JWST even uses JavaScript to run command scripts. Things are changing.
raylus
Note this is NASA/JPL, not NASA-wide, JPL is a NASA center (FFRDC).
matu3ba
Example 1 is a deficit of C with missing computed goto and switch continue. Example 2 review is ambiguous on practicality. Example 3 reads very odd, an over-approximated upper stack bound is very possible https://github.com/ziglang/zig/issues/157#issuecomment-76395… and tighter computation with advanced analysis as well. Example 4,5,6,7,8,9,10 yes.
Overall good read.
layer8
The rule about recursion is likely also to ensure a statically known bound on needed stack space, in addition to a statically known runtime bound (in conjunction with the other rules).
While the criticism of rule 3 is right in that there is a dependence on the compiler, it is still a prerequisite for deriving upper bounds for the runtime by static analysis on the binary. This is actually something that is done for safety-critical systems that require a guaranteed response time, based on the known timing characteristics of the targeted microprocessor.
AndyKelley
If I made my own criticism of these rules it would be very different from OP. It was difficult to take the article serious from the get-go when it defended setjmp/longjump. That pattern is so obviously broken from anyone who has ever had to go near it. The article makes an argument like this:
1. setjmp/longjmp is exception handling
2. exception handling is good
and I take serious issue with that second premise.
Also the loop thing obviously means to put a max iteration count on every loop like this:
for (0..N) |_| {
}
where N is a statically determined max iteration count. The 10^90 thing is silly and irrelevant. I didn't read the article past this point.
If I were to criticize those rules, I'd focus on these points:
* function body length does not correlate to simplicity of understanding, or if anything it correlates in the opposite way the rules imply
* 2 assertions is completely arbitrary, it should assert everything assertable, and sometimes there won't be 2 assertable things
dehrmann
Friendly reminder that you are most likely not NASA and have different goals, so you should approach the problem differently. The cost of a stack overflow (even at FAANG-scale) for you is likely many orders of magnitude cheaper than for NASA, and the cost of delaying a project for a year is much worse.
Unless you work on commercial aircraft avionics. NASA flying a probe into Mars makes them look dumb. Flying an aircraft into a mountain haunts people for the rest of their lives.
bumby
Just for context, these aren’t really “rules” as much as proposed practices. Note that official “rules” are in documents with names like “NPR” aka “NASA procedural requirements.”[1] So, while someone may use the document in the featured article to frame a discussion, a developer is not bound to comply (or alternatively waive) those “rules” and could conceivably just dismiss them.
[1] e.g. https://nodis3.gsfc.nasa.gov/displayDir.cfm?t=NPR&c=7150&s=2…
manmal
> All loops must have a fixed upper-bound
Things like spinlocks or CAS (Compare-And-Swap) are elegant and safe solutions for concurrency, and AFAIK you can’t really limit their upper bound. Others in the thread have pointed out that those are more guidelines than rules – still, not sure about this one.
redtriumph
My ex-boss completed his PhD thesis with author of this doc as his advisor. So some of the ideas seem relevant since they popped up in few of the work related conversations with my boss.
gorfian_robot
now raise your hand if you are actually in the business of writing code being deployed on JPL missions …..
bueller? bueller???
wileydragonfly
The physical constraints of a sheet of paper are pleasing to the eye. Otherwise the dimensions of paper would be different. Oh well, it was good for a chuckle.
jph
Read the original and it explains the purpose of each item:
https://spinroot.com/gerard/pdf/P10.pdf
The original clearly describes that the coding rules primarily target C and attempt to optimize the ability to
more thoroughly check the reliability of critical applications written in C. The original author clearly understands what they're doing, and explains lots of other ways to verify C code.
For what it's worth, the rationales in the original all make perfect sense to me. Perhaps this is because I learned C on tiny systems? I learned C for hardware for implanted medical devices, and our lab did similar kinds of guidelines.
fallingmeat
fun fact, the same guy (G Holzman) also made the Spin model checker