The first entry in this series shows how to use the new DIP1000 rules to have slices and pointers refer to the stack, all while being memory safe. The second entry in this series teaches about the ref
storage class and how DIP1000 works with aggregate types (classes, structs, and unions).
So far the series has deliberately avoided templates and auto
functions. This kept the first two posts simpler in that they did not have to deal with function attribute inference, which I have referred to as “attribute auto inference” in earlier posts. However, both auto
functions and templates are very common in D code, so a series on DIP1000 can’t be complete without explaining how those features work with the language changes. Function attribute inference is our most important tool in avoiding so-called “attribute soup”, where a function is decorated with several attributes, which arguably decreases readability.
We will also dig deeper into unsafe code. The previous two posts in this series focused on the scope
attribute, but this post is more focused on attributes and memory safety in general. Since DIP1000 is ultimately about memory safety, we can’t get around discussing those topics.
Avoiding repetition with attributes
Function attribute inference means that the language will analyze the body of a function and will automatically add the @safe
, pure
, nothrow
, and @nogc
attributes where applicable. It will also attempt to add scope
or return scope
attributes to parameters and return ref
to ref
parameters that can’t otherwise be compiled. Some attributes are never inferred. For instance, the compiler will not insert any ref
, lazy
, out
or @trusted
attributes, because very likely they are explicitly not wanted where they are left out.
There are many ways to turn on function attribute inference. One is by omitting the return type in the function signature. Note that the auto
keyword is not required for this. auto
is a placeholder keyword used when no return type, storage class, or attribute is specified. For example, the declaration half(int x) { return x/2; }
does not parse, so we use auto half(int x) { return x/2; }
instead. But we could just as well write @safe half(int x) { return x/2; }
and the rest of the attributes (pure
, nothrow
, and @nogc
) will be inferred just as they are with the auto
keyword.
The second way to enable attribute inference is to templatize the function. With our half
example, it can be done this way:
int divide(int denominator)(int x) { return x/denominator; } alias half = divide!2;
The D spec does not say that a template must have any parameters. An empty parameter list can be used to turn attribute inference on: int half()(int x) { return x/2; }
. Calling this function doesn’t even require the template instantiation syntax at the call site, e.g., half!()(12)
is not required as half(12)
will compile.
Another means to turn on attribute inference is to store the function inside another function. These are called nested functions. Inference is enabled not only on functions nested directly inside another function but also on most things nested in a type or a template inside the function. Example:
@safe void parentFun() { // This is auto-inferred. int half(int x){ return x/2; } class NestedType { // This is auto inferred final int half1(int x) { return x/2; } // This is not auto inferred; it's a // virtual function and the compiler // can't know if it has an unsafe override // in a derived class. int half2(int x) { return x/2; } } int a = half(12); // Works. Inferred as @safe. auto cl = new NestedType; int b = cl.half1(18); // Works. Inferred as @safe. int c = cl.half2(26); // Error. }
A downside of nested functions is that they can only be used in lexical order (the call site must be below the function declaration) unless both the nested function and the call are inside the same struct, class, union, or template that is in turn inside the parent function. Another downside is that they don’t work with Uniform Function Call Syntax.
Finally, attribute inference is always enabled for function literals (a.k.a. lambda functions). The halving function would be defined as enum half = (int x) => x/2;
and called exactly as normal. However, the language does not consider this declaration a function. It considers it a function pointer. This means that in global scope it’s important to use enum
or immutable
instead of auto
. Otherwise, the lambda can be changed to something else from anywhere in the program and cannot be accessed from pure
functions. In rare cases, such mutability can be desirable, but most often it is an antipattern (like global variables in general).
Limits of inference
Aiming for minimal manual typing isn’t always wise. Neither is aiming for maximal attribute bloat.
The primary problem of auto inference is that subtle changes in the code can lead to inferred attributes turning on and off in an uncontrolled manner. To see when it matters, we need to have an idea of what will be inferred and what will not.
The compiler in general will go to great lengths to infer @safe
, pure
, nothrow
, and @nogc
attributes. If your function can have those, it almost always will. The specification says that recursion is an exception: a function calling itself should not be @safe
, pure
, or nothrow
unless explicitly specified as such. But in my testing, I found those attributes actually are inferred for recursive functions. It turns out, there is an ongoing effort to get recursive attribute inference working, and it partially works already.
Inference of scope
and return
on function parameters is less reliable. In the most mundane cases, it’ll work, but the compiler gives up pretty quickly. The smarter the inference engine is, the more time it takes to compile, so the current design decision is to infer those attributes in only the simplest of cases.
Where to let the compiler infer?
A D programmer should get into the habit of asking, “What will happen if I mistakenly do something that makes this function unsafe, impure, throwing, garbage-collecting, or escaping?” If the answer is “immediate compiler error”, auto inference is probably fine. On the other hand, the answer could be “user code will break when updating this library I’m maintaining”. In that case, annotate manually.
In addition to the potential of losing attributes the author intends to apply, there is also another risk:
@safe pure nothrow @nogc firstNewline(string from) { foreach(i; 0 .. from.length) switch(from[i]) { case 'r': if(from.length > i+1 && from[i+1] == 'n') { return "rn"; } else return "r"; case 'n': return "n"; default: break; } return ""; }
You might think that since the author is manually specifying the attributes, there’s no problem. Unfortunately, that’s wrong. Suppose the author decides to rewrite the function such that all the return values are slices of the from
parameter rather than string literals:
@safe pure nothrow @nogc firstNewline(string from) { foreach(i; 0 .. from.length) switch(from[i]) { case 'r': if (from.length > i + 1 && from[i + 1] == 'n') { return from[i .. i + 2]; } else return from[i .. i + 1]; case 'n': return from[i .. i + 1]; default: break; } return ""; }
Surprise! The para