In a previous post, we reviewed how SEI Novus has optimized its use of Redis for its portfolio analytics product, Alpha Platform. In this article, we’ll dig deeper into an aspect of our implementation: scripting Redis with Lua.
Before embarking on this task, I was apprehensive about learning Lua—let alone running it embedded in Redis. As it turns out, it wasn’t bad at all; I documented my journey in this article to help you get started and bring similar benefits to your applications.
Use Case Overview
Before getting into the details of our usage of Lua, let’s review our use case. In order to optimize the memory profile of data we store in our Redis, we devised a normalization scheme in which we substitute string values (column and group names) with integers.
So, a Scala object with values like:
com.novus.analytics.core.database.api.redis.APIRedisRecord(
columns = List(
"Asset",
"AssetTypeTag",
"AssetTypeTagCode",
"ClientProvidedBeta",
"CustomIndex",
"DefaultSymbol",
"Issuer",
"PnL_userConfigured",
"Pnl",
"PositionName",
"PositionPnl",
"PositionPnlExFx"), ...)
we serialize as binary objects to Redis from a much more compact tuple of Ints:
scala.Tuple8(List(1,2,3,4,5,6,7,8,9,10))
…where the numbers correspond to keys in a Hash “dictionary.” Because our application is distributed, with worker nodes processing these tuples, we need to store the hash dictionary centrally so the workers can fetch it and decode the Ints back to string values. Our central dictionary store? Of course, Redis.
The Redis Hash for each “dictionary” looks something like this:
"1" -> "Asset"
"2" -> "AssetTypeTag"
"3" -> "AssetTypeTagCode"
"4" -> "ClientProvidedBeta"
"5" -> "CustomIndex"
"6" -> "DefaultSymbol"
"7" -> "Issuer"
"8" -> "PnL_userConfigured"
"9" -> "Pnl"
"10" -> "PositionName"
"11" -> "PositionPnl"
"12" -> "PositionPnlExFx"
This approach provides enormous benefits in terms of space savings. However, it also introduces a considerable complexity: our hash string values can’t be known in advance, so we have to map them to Int keys on-the-fly. Furthermore, for a single user request, we might have hundreds of workers processing different chunks of data, each with distinct string values that need mapping—but all have to share a single hash dictionary with contention for updates. We need to update the hash efficiently while avoiding collisions between entries.
Put another way: we have hundreds of worker nodes, processing a user request and all trying to update the shared hash dictionary for that request. One worker might seek to register “1” as “Asset” while another wants “1” to be “DefaultSymbol.” Obviously, we can’t have one update stomp the other—we have to assign out key/value mappings as they come in. How to solve this problem?
First Cut: Brute Force with HGETALL
As described in our previous post, our first pass at this problem was use the Redis HGETALL operation. The logic was as follows:
- HGETALL all the fields for a hash.
- Check if the value we want is in the hash.
- If so, use the existing int → value mapping.
- If not, add-and-use a new int → value mapping with HSETNX.
- Finally, check the result of HSETNX—if it returned “false” (value not set), retry from the beginning.
This approach certainly works, but suffers from poor performance. HGETALL is a known CPU-hog for large hashes, and we saw large, sustained CPU spikes and request timeouts during peak usage. Furthermore, if a hash key collision does occur (i.e. HSETNX returned false), the resolution requires multiple trips—not ideal. Aside from the latency of network round trips, reasoning through the logic is difficult.
We need a solution that adds a hash entry and assigns it a key in a single step. This is where Lua scripting comes in.
Devising our Script Logic
Before writing a line of Lua code, we need to understand our logic. Conceptually it’s very simple—a classic “get-or-put if-absent” algorithm. Rather than multiple steps with HGETALL and HSETNX, we need a function that looks something like this pseudocode:
function hash_id_get_or_put_if_absent(value)
if hash_contains(value)
return hash_get(value)
else
id = hash_put(value)
return id
end if
We want this to be treated as a transaction: o