NASA has a list of 10 rules for software development by vyrotek

ByHackTech February 16, 2025

20Comments

Share This Article

Sed ut perspiciatis unde.

Send to HN

NASA has a list of 10 rules for software development

Read the full document (local copy)
or the original.
Read it first, before reading these comments.

Those rules were written from the point of view of people writing
embedded software for extremely expensive spacecraft, where tolerating
a lot of programming pain is a good tradeoff for not losing a mission.
I do not know why someone in that situation does not use the SPARK
subset of Ada, which subset was explicitly designed for verification,
and is simply a better starting point for embedded programming than C.

I am criticising them from the point of view of people writing
programming language processors (compilers, interpreters, editors)
and application software.

We are supposed to teach critical thinking. This is an example.

How have Gerard J. Holzmann’s and my different contexts affected
our judgement?
Can you blindly follow his advice without considering your
context?
Can you blindly follow my advice without considering
your context?
Would these rules necessarily apply to a different/better
programming language? What if function pointers
were tamed? What if the language provided opaque abstract
data types as Ada does?

1. Restrict all code to very simple control flow constructs —
do not use `goto` statements,
`setjmp()` or `longjmp()` constructs,
and direct or indirect recursion.

Note that setjmp() and longjmp()
are how C does exception handling, so this rule bans any use
of exception handling.

It is true that banning recursion and jumps and loops without
explicit bounds means that you know your program is
going to terminate. It is also true that recursive functions
can be proven to terminate about as often as loops can, with
reasonably well-understood methods. What’s more important here is
that “sure to terminate” does not imply
“sure to terminate in my lifetime”:

    int const N = 1000000000;
    for (x0 = 0; x0 != N; x0++)
    for (x1 = 0; x1 != N; x1++)
    for (x2 = 0; x2 != N; x2++)
    for (x3 = 0; x3 != N; x3++)
    for (x4 = 0; x4 != N; x4++)
    for (x5 = 0; x5 != N; x5++)
    for (x6 = 0; x6 != N; x6++)
    for (x7 = 0; x7 != N; x7++)
    for (x8 = 0; x8 != N; x8++)
    for (x9 = 0; x9 != N; x9++)
        -- do something --;

This does a bounded number of iterations. The bound is N¹⁰.
In this case, that’s 10⁹⁰. If each iteration of the loop body
takes 1 nsec, that’s 10⁸¹ seconds, or about 7.9×10⁷²
years. What is the practical difference between “will stop
in 7,900,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000
years” and “will never stop”?

Worse still, taking a problem that is naturally expressed
using recursion and contorting it into something that manipulates an
explicit stack, while possible, turns clear maintainable code into
buggy spaghetti. (I’ve done it, several times. There’s an example
on this web site. It is not a good idea.)

2. All loops must have a fixed upper-bound. It must be trivially
possible for a checking tool to prove statically that a preset
upper-bound on the number of iterations of a loop cannot be exceeded.
If the loop-bound cannot be proven statically, the rule is considered
violated.

This is an old idea. As the example above shows, it is not enough
by itself to be of any practical use. You have to try to make the
bounds reasonably tight, and you have to regard hitting an
artificial bound as a run-time error.

By the way, note that putting depth bounds on recursive procedures
makes them every bit as safe as loops with fixed bounds.

3. Do not use dynamic memory allocation after initialization.

This is also a very old idea. Some languages designed for embedded
work don’t even have dynamic memory allocation. The big
thing, of course, is that embedded applications have a fixed amount of
memory to work with, are never going to get any more, and should not
crash because they couldn’t handle another record.

Note that the rationale actually supports a much stronger rule:
don’t even simulate dynamic memory allocation. You can of
course manage your own storage pool:

    typedef struct Foo_Record *foo;
    struct Foo_Record {
	foo next;
	...
    };
    #define MAX_FOOS ...
    static struct Foo_Record foo_zone[MAX_FOOS];
    foo foo_free_list = 0;

    void init_foo_free_list() {
	for (int i = MAX_FOOS - 1; i >= 0; i--) {
	    foo_zone[i].next = foo_free_list;
	    foo_free_list = &foo_zone[i];
	}
    }

    foo malloc_foo() {
	foo r = foo_free_list;
	if (r == 0) report_error();
	foo_free_list = r->next;
	return r;
    }

    void free_foo(foo x) {
	x->next = foo_free_list;
	foo_free_list = x;
    }

This technically satisfies the rule, but it
violates the spirit of the rule. Simulating malloc()
and free() this way is worse than using the real
thing, because the memory in foo_zone is permanently tied up
for Foo_Records, even if we don’t need any of those at the
moment but do desperately need the memory for something else.

What you really need to do is to use a memory allocator
with known behaviour, and to prove that the amount of memory
in use at any given time (data bytes + headers) is bounded
by a known value.

Note also that SPlint can verify at compile time that
the errors NASA speak of do not occur.

One of the reasons given for the ban is that the performance
of malloc() and free() is unpredictable. Are these the only
functions we use with unpredictable performance? Is there
anything about malloc() and free() which makes them
necessarily unpredictable? The existence of
hard-real-time garbage collectors suggests not.

The rationale for this rule says that

Note that the only way
to dynamically claim memory in the absence of memory allocation from the
heap is to use stack memory. In the absence of recursion (Rule 1), an
upper bound on the use of stack memory can derived statically, thus
making it possible to prove that an application will always live within
its pre-allocated memory means.

Unfortunately, the sunny optimism shown here

0Likes

Written by

HackTech

View all posts by HackTech

Show comments (20)

20 Comments

Post Author

jmclnx

Posted February 15, 2025 at 9:09 pm

nice, saved because these days who know ho long the page will be available :)

0Likes Log in to Reply
Post Author

readthenotes1

Posted February 15, 2025 at 9:19 pm

Someone in NASA once told me that it was easier to teach a mechanical engineer to program then a software developer mechanical engineering.

From that perspective, the avoidance of recursion is more compelling. Plus, Fortran didn't support it…

0Likes Log in to Reply
Post Author

dooglius

Posted February 15, 2025 at 9:19 pm

Title should indicate that this is a _criticism_ of the rules.

0Likes Log in to Reply
Post Author

einpoklum

Posted February 15, 2025 at 9:42 pm

Well, I was expecting to see:

Rule 1: Don't write any code that can make the ship go ka-boom.

Rule 2: We need to consistently use the f'ing metric system and SI units – and don't you forget it.

Rule 3: … You haven't already forgotten about rule #2, have you?

0Likes Log in to Reply
Post Author

nealabq

Posted February 15, 2025 at 9:46 pm

No recursion means no Erlang. Which means no RabbitMQ?

0Likes Log in to Reply
Post Author

toolslive

Posted February 15, 2025 at 9:50 pm

from the original document: "critical code is written in C." The document is not dated, but it's probably quite old (I'm guessing 30-something years). Writing critical code in C is probably a mistake, but once you find yourself in that situation, you will find these rules are too tight (which is also what the criticism is about). You should probably read them as "try to avoid …".

So I would just prepend the document with "Rule 0: Try to avoid writing critical code in C."

0Likes Log in to Reply
Post Author

zoogeny

Posted February 15, 2025 at 9:53 pm

> People using Ada, Pascal (Delphi), JavaScript, or functional languages should also declare types and functions as locally as possible.

My own personal approach in JavaScript is to avoid defining functions in a nested manner unless I explicitly want to capture a value from the enclosing scope.

This is probably due to an outdated mental model I had where it was shown in performance profiling that a function would be redefined every time the enclosing function was called. I doubt this is how any reasonable modern JavaScript interpreter works although I haven't kept up. Since the introduction of arrow functions (a long time ago now in relative terms) their prolific use has probably lead to deep optimizations that render this old mental model completely useless.

But old habits dies hard I guess and now I keep any named function that does not capture local variables at a file/module scope.

A lot of the other notes are interesting and very nit-picky in the "technically correct is the best kind of correct" way that older engineers eat up. I feel the general tone of carefulness that the NASA rules is trying to communicate to be very good and I would support most of them in the context that they are enforced.

0Likes Log in to Reply
Post Author

jsrcout

Posted February 15, 2025 at 10:03 pm

I work with a lot of embedded and embedded-adjacent software, and even I think several of these rules are too much. Having said that, Holzmann's rules are from 2006, and embedded / space qualified hardware has improved quite a bit since then. These days planetary probes run C++, and JWST even uses JavaScript to run command scripts. Things are changing.

0Likes Log in to Reply
Post Author

raylus

Posted February 15, 2025 at 10:12 pm

Note this is NASA/JPL, not NASA-wide, JPL is a NASA center (FFRDC).

0Likes Log in to Reply
Post Author

matu3ba

Posted February 15, 2025 at 10:25 pm

Example 1 is a deficit of C with missing computed goto and switch continue. Example 2 review is ambiguous on practicality. Example 3 reads very odd, an over-approximated upper stack bound is very possible https://github.com/ziglang/zig/issues/157#issuecomment-76395… and tighter computation with advanced analysis as well. Example 4,5,6,7,8,9,10 yes.
Overall good read.

0Likes Log in to Reply
Post Author

layer8

Posted February 15, 2025 at 10:32 pm

The rule about recursion is likely also to ensure a statically known bound on needed stack space, in addition to a statically known runtime bound (in conjunction with the other rules).

While the criticism of rule 3 is right in that there is a dependence on the compiler, it is still a prerequisite for deriving upper bounds for the runtime by static analysis on the binary. This is actually something that is done for safety-critical systems that require a guaranteed response time, based on the known timing characteristics of the targeted microprocessor.

0Likes Log in to Reply
Post Author

AndyKelley

Posted February 15, 2025 at 10:39 pm

If I made my own criticism of these rules it would be very different from OP. It was difficult to take the article serious from the get-go when it defended setjmp/longjump. That pattern is so obviously broken from anyone who has ever had to go near it. The article makes an argument like this:

1. setjmp/longjmp is exception handling

2. exception handling is good

and I take serious issue with that second premise.

Also the loop thing obviously means to put a max iteration count on every loop like this:

for (0..N) |_| {

}

where N is a statically determined max iteration count. The 10^90 thing is silly and irrelevant. I didn't read the article past this point.

If I were to criticize those rules, I'd focus on these points:

* function body length does not correlate to simplicity of understanding, or if anything it correlates in the opposite way the rules imply

* 2 assertions is completely arbitrary, it should assert everything assertable, and sometimes there won't be 2 assertable things

0Likes Log in to Reply
Post Author

dehrmann

Posted February 15, 2025 at 10:40 pm

Friendly reminder that you are most likely not NASA and have different goals, so you should approach the problem differently. The cost of a stack overflow (even at FAANG-scale) for you is likely many orders of magnitude cheaper than for NASA, and the cost of delaying a project for a year is much worse.

Unless you work on commercial aircraft avionics. NASA flying a probe into Mars makes them look dumb. Flying an aircraft into a mountain haunts people for the rest of their lives.

0Likes Log in to Reply
Post Author

bumby

Posted February 15, 2025 at 10:46 pm

Just for context, these aren’t really “rules” as much as proposed practices. Note that official “rules” are in documents with names like “NPR” aka “NASA procedural requirements.”[1] So, while someone may use the document in the featured article to frame a discussion, a developer is not bound to comply (or alternatively waive) those “rules” and could conceivably just dismiss them.

[1] e.g. https://nodis3.gsfc.nasa.gov/displayDir.cfm?t=NPR&c=7150&s=2…

0Likes Log in to Reply
Post Author

manmal

Posted February 15, 2025 at 10:55 pm

> All loops must have a fixed upper-bound

Things like spinlocks or CAS (Compare-And-Swap) are elegant and safe solutions for concurrency, and AFAIK you can’t really limit their upper bound. Others in the thread have pointed out that those are more guidelines than rules – still, not sure about this one.

0Likes Log in to Reply
Post Author

redtriumph

Posted February 15, 2025 at 10:59 pm

My ex-boss completed his PhD thesis with author of this doc as his advisor. So some of the ideas seem relevant since they popped up in few of the work related conversations with my boss.

0Likes Log in to Reply
Post Author

gorfian_robot

Posted February 15, 2025 at 11:23 pm

now raise your hand if you are actually in the business of writing code being deployed on JPL missions …..

bueller? bueller???

0Likes Log in to Reply
Post Author

wileydragonfly

Posted February 16, 2025 at 12:01 am

The physical constraints of a sheet of paper are pleasing to the eye. Otherwise the dimensions of paper would be different. Oh well, it was good for a chuckle.

0Likes Log in to Reply
Post Author

jph

Posted February 16, 2025 at 12:02 am

Read the original and it explains the purpose of each item:

https://spinroot.com/gerard/pdf/P10.pdf

The original clearly describes that the coding rules primarily target C and attempt to optimize the ability to
more thoroughly check the reliability of critical applications written in C. The original author clearly understands what they're doing, and explains lots of other ways to verify C code.

For what it's worth, the rationales in the original all make perfect sense to me. Perhaps this is because I learned C on tiny systems? I learned C for hardware for implanted medical devices, and our lab did similar kinds of guidelines.

0Likes Log in to Reply
Post Author

fallingmeat

Posted February 16, 2025 at 12:29 am

fun fact, the same guy (G Holzman) also made the Spin model checker

0Likes Log in to Reply

NASA has a list of 10 rules for software development by vyrotek

NASA has a list of 10 rules for software development by vyrotek

Share This Article

Newsletter

NASA has a list of 10 rules for software development

1. Restrict all code to very simple control flow constructs —
do not use `goto` statements,
`setjmp()` or `longjmp()` constructs,
and direct or indirect recursion.

2. All loops must have a fixed upper-bound. It must be trivially
possible for a checking tool to prove statically that a preset
upper-bound on the number of iterations of a loop cannot be exceeded.
If the loop-bound cannot be proven statically, the rule is considered
violated.

3. Do not use dynamic memory allocation after initialization.

HackTech

20 Comments

jmclnx

readthenotes1

dooglius

einpoklum

nealabq

toolslive

zoogeny

jsrcout

raylus

matu3ba

layer8

AndyKelley

dehrmann

bumby

manmal

redtriumph

gorfian_robot

wileydragonfly

jph

fallingmeat

Leave a comment Cancel reply

Editor's Choice

NASA has a list of 10 rules for software development by vyrotek

NASA has a list of 10 rules for software development by vyrotek

Share This Article

Newsletter

NASA has a list of 10 rules for software development

1. Restrict all code to very simple control flow constructs — do not use goto statements, setjmp() or longjmp() constructs, and direct or indirect recursion.

2. All loops must have a fixed upper-bound. It must be trivially possible for a checking tool to prove statically that a preset upper-bound on the number of iterations of a loop cannot be exceeded. If the loop-bound cannot be proven statically, the rule is considered violated.

3. Do not use dynamic memory allocation after initialization.

20 Comments

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter

1. Restrict all code to very simple control flow constructs —
do not use `goto` statements,
`setjmp()` or `longjmp()` constructs,
and direct or indirect recursion.

2. All loops must have a fixed upper-bound. It must be trivially
possible for a checking tool to prove statically that a preset
upper-bound on the number of iterations of a loop cannot be exceeded.
If the loop-bound cannot be proven statically, the rule is considered
violated.