Can we have a faster FFI for CRuby? Yes.
Can we have a faster FFI for CRuby?
I love programming in Ruby, and I advocate for people to write as much Ruby as possible.
But sometimes you really really must call out to native code.
Even in those cases, I encourage people to write as much Ruby as possible, especially because YJIT can optimize Ruby code but not C code.
Taken to its logical extreme, this guidance means that if you want to call a native library, you should write a native extension with a very very limited API
where most work is done in Ruby.
Any native code would be a very thin wrapper around the function we actually want to call that just converts Ruby types in to the types required by the native function.
Of course such a simplistic API would be well suited to work with a library like FFI.
Now, usually I steer clear of FFI, and to be honest the reason is simply that it doesn’t provide the same performance as a native extension.
Lets take a look at a very simple example benchmark to better understand what I mean.
In this benchmark, we’re going to wrap the strlen
C function with FFI.
We’ll compare the FFI implementation with a C extension that does the same thing (using the strlen
Ruby Gem that yours truly wrote just for this post).
We’ll also include a comparison with indirectly calling the String#bytesize
, as well as directly calling String#bytesize
.
require "ffi"
require "strlen"
require "benchmark/ips"
module A
extend FFI::Library
ffi_lib 'c'
attach_function :strlen, [:string], :int
end
module B
def self.strlen(x)
x.bytesize
end
end
str = "foo"
Benchmark.ips do |x|
x.report("strlen-ffi") { A.strlen(str) }
x.report("strlen-ruby") { B.strlen(str) }
x.report("strlen-cext") { Strlen.strlen(str) }
x.report("ruby-direct") { str.bytesize }
x.compare!
end
Here is the output from the benchmark:
ruby 3.5.0dev (2025-02-11T16:42:26Z master 4ac75f6f64) +PRISM [arm64-darwin24]
Warming up --------------------------------------
strlen-ffi 1.557M i/100ms
strlen-ruby 2.875M i/100ms
strlen-cext 3.047M i/100ms
ruby-direct 4.048M i/100ms
Calculating -------------------------------------
strlen-ffi 15.682M (± 0.5%) i/s (63.77 ns/i) - 79.398M in 5.063141s
strlen-ruby 28.697M (± 0.3%) i/s (34.85 ns/i) - 143.747M in 5.009135s
strlen-cext 30.661M (± 0.8%) i/s (32.61 ns/i) - 155.406M in 5.068838s
ruby-direct 39.879M (± 0.6%) i/s (25.08 ns/i) - 202.412M in 5.075857s
Comparison:
ruby-direct: 39878845.7 i/s
strlen-cext: 30661398.4 i/s - 1.30x slower
strlen-ruby: 28697184.3 i/s - 1.39x slower
strlen-ffi: 15681971.0 i/s - 2.54x slower
First, directly calling String#bytesize
is the fastest, and we can think of it as our baseline.
Any indirection we add will necessarily add more overhead, and we probably can’t “beat” this number.
Calling strlen
via C extension is second fastest, followed by indirectly calling String#bytesize
, and finally the FFI implementation is slowest.
These benchmark results can teach us a couple interesting things.
First, the difference between the “ruby-direct” benchmark and the “strlen-ruby” benchmark shows that there definitely is overhead in pushing and popping stack frames.
Eliminating this overhead is one of the things that JIT compilers like YJIT specialize in.
Second, the difference between the “strlen-cext” benchmark and the “strlen-ffi” benchmark shows that there is significant overhead incurred when calling a native function via FFI.
Calling the C extension is slower than directly calling String#bytesize
, but calling strlen
via FFI adds even more overhead than the C extension does.
In other words, if Ruby provides a method to do something you need, then just use the method that Ruby provides.
But if you need to call a foreign function, a small C extension wrapper will generally have less overhead than an FFI wrapper.
I’ve not avoided FFI because I think it’s intrinsically worse than a C extension.
Rather, paying the FFI tax is just a reality I’
7 Comments
pestatije
FFI – Foreign Function Interface, or how to call C from Ruby
poisonta
I can sense why it didn’t go to tenderlovemaking.com
internetter
"write as much Ruby as possible, especially because YJIT can optimize Ruby code but not C code"
I feel like I'm not getting something. Isn't ruby a pretty slow language? If I was dipping into native I'd want to do as much in native as possible.
shortrounddev2
Does ruby have its equivalent to typescript, with type annotations? The language sounds interesting but I tend not to give dynamically typed languages the time of day
chris12321
Between Rails At Scale and byroot's blogs, it's currently a fantastic time to be interested in in-depth discussions around Ruby internals and performance! And with all the recent improvements in Ruby and Rails, it's a great time to be a Rubyist in general!
nialv7
isn't this exactly what libffi does?
haberman
> Rather than calling out to a 3rd party library, could we just JIT the code required to call the external function?
I am pretty sure this is the basis of the LuaJIT FFI: https://luajit.org/ext_ffi.html
I think LuaJIT's FFI is very fast for this reason.