History of Berkeley DB (CACM 2022) by shehjar

Share This Article

Sed ut perspiciatis unde.

Kirk Mckusick sat down with Margo Seltzer and Mike Olson to discuss the history of Berkeley DB, for which they won the ACM Software System Award in 2021. McKusick has spent his career as a BSD and FreeBSD developer. Margo Seltzer has spent her career as a professor of computer science and as an entrepreneur of database software companies. Mike Olson started his career as a software developer and later started and managed several open source software companies. Berkeley DB is a production-quality, scalable, NoSQL, open source platform for embedded transactional data management.

KIRK MCKUSICK: Berkeley DB came out of the University of California at Berkeley Computer Systems Research Group’s work to create a version of Unix unencumbered by AT&T’s ownership rights to the original version of Unix. To do that, we needed a new kernel, written without using any of the AT&T code. We also needed all the applications and libraries that shipped with the operating system.

My colleague on the Berkeley BSD Project, Mike Karels, and I were in charge of getting a clean version of the kernel—that’s another story! But Keith Bostic took on the task of getting all the apps and libraries done. He solicited volunteers for much of that work. I know he worked with you two on that. Why don’t you start the story there?

MARGO SELTZER: This started with a standard stupid grad-student trick. I had taken Mike Stonebraker’s graduate database course at Berkeley, and I thought Litwin’s extensible linear hashing¹ was really cool. Keith [Bostic], recognizing “stupid grad-student syndrome,” said, “Hey, how would you like to implement it? I need a replacement for ndbm, dbm, and hsearch.” I innocently said, “Sure!”

He introduced me to Ozan Yigit, who had written gdbm, which was specifically a ndbm replacement. We brainstormed together and figured out a way to have a single hash package that would support both the persistent (n)dbm, as well as the in-memory hsearch replacement. That became what was known as, cleverly enough, hash.

Then, Keith’s ulterior motive kicked in, which was that he really wanted a record package that he could put under a replacement vi, because vi was another big piece of userland that needed to be rewritten. He had it in his head that he wanted a record manager underneath it.

This required an interface that would let him access records by record number—that is, “I want line 59” returns the 59th line in a database, which for his purposes was a text file. He had seen a Stonebraker paper with Heidi Stettner that showed how to implement this record-number interface on top of B-trees.³

And that’s when Mike came along.

MIKE OLSON: I was Margo’s officemate at this point. We were both Stone-braker students. I was on the Postgres project, where I was responsible for a bunch of the storage code, including B-trees.

I’d already written B-tree implementations two or three times in my life, so when Keith began to pester me with this project, I refused. I had written B-trees over and over again, and it just seemed like pointless work to do it again.

At the time, I was convening a pretty regular Friday afternoon study group for the research group, which was code for, “Let’s go down to the local brewpub and throw back some pints.” Keith started crashing those and really leaning on me to do this.

Finally, to shut him up, I agreed that I would write the B-tree code. And since Margo was doing hash, we were a team.

I also had an ulterior motive: I was going to replace the B-tree code in Postgres with the Berkeley DB B-tree code, which was cleaner and better. I never got around to doing that work. I’m sorry about that, because I think it would have been an improvement for the Postgres code as well.

MCKUSICK: So, the two of you coded up a clean version of ndbm, no AT&T code involved. That got added to the Berkeley Software Distribution. And that’s basically Berkeley DB today?

SELTZER: There’s another chapter. Rewind to before I went to graduate school. I had worked at a company called Sequoia Systems, where we were building a transactional storage system to support COBOL programs.

We had designed what we called Sequoia’s Transaction-Oriented Record Manager, otherwise known as STORM. We started building it but never finished it. Both my boss and I left the company. So, in the back of my head, I had always had this idea that you could build a transaction-oriented record manager.

As another side project, I somehow convinced Mike that once he built the B-tree, we should make it transaction-al. We could take the hash and B-tree code, put them under a common API, and have this cool transactional thing.

That became libtp, which was another paper that Mike and I wrote.²

MCKUSICK: Was libtp part of the BSD releases as well? Did you ship that code?

SELTZER: No, the libtp code was what I like to refer to as a “graduate-student code.” Keith never brought it up to production-code quality.

MCKUSICK: All right, now you have this graduate-student code that supports transactions, and some other code shipping with BSD. How did those two threads get woven together?

OLSON: Let me give a little bit of background. Margo and Keith did almost all of the work on the software, starting in 1992 and continuing into the late 1990s. Berkeley DB, the DB 1.85 library, got shipped all over the planet with Berkeley Unix. It got used in a huge number of projects.

In particular, a group of researchers at the University of Michigan built an LDAP (Lightweight Directory Access Protocol) engine to look up records by key, over the network, in a really fast way. That eventually became Open-LDAP, and that’s a key part of a lot of authentication systems today. They used Berkeley DB, and they did a really good job, except the DB 1.85 code didn’t support transactions.

God forbid two people should do something at the same time, or your computer should go down in the middle of an update—your database could be corrupted.

Netscape (the company) wanted a directory server as part of its product line. It hired the entire University of Michigan team into the company to bring along the code and get the work done.

I wasn’t around at the time. I’ll let Margo tell it from there.

SELTZER: The LDAP folks had seen the libtp paper, and they went looking for the code. They contacted Keith and me, asking, “Hey, we’re using DB 1.85; we see this libtp. Where’s the code?”

And we said, “Uhhhhh, graduate-student code; you don’t really want to use that for production.” And they said, “Well, what would it take to make the libtp code real?”

Keith and I had been talking about building a production-quality transactional library for some time. We knew it was going to be a fair bit of work, and we had day jobs. Netscape said, “Well, you know, we’d pay money for it.”

Keith and I thought, “Well that’s novel. Who would have thought of that?”

We managed to structure a deal with Netscape such that we built out the production transactional version, Netscape would pay us for it, but we would retain the rights to sell it to other people.

We decided that we wanted to do the work, and if all we ever did was build this for Netscape—and they were happy with it—that would be fine.

But once we did build it, we figured we would hang a shingle out. A

History of Berkeley DB (CACM 2022) by shehjar

History of Berkeley DB (CACM 2022) by shehjar

Share This Article

Newsletter

HackTech

Leave a comment Cancel reply

Editor's Choice

History of Berkeley DB (CACM 2022) by shehjar

History of Berkeley DB (CACM 2022) by shehjar

Share This Article

Newsletter

HackTech

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter