Console #115 - Interview with Jonathan of BonsaiDb

Featuring Locutus, Hatchet, and BonsaiDB

Jul 24, 2022

Microns helps founders like you reach thousands of interested buyers to sell your startup, quickly and efficiently. Zero commission.

Try Microns for free!

🏗️ Projects

Locutus

Locutus is a software platform that makes it easy to create decentralized alternatives to today's centralized tech companies. These decentralized apps will be easy to use, scalable, and secured through cryptography.

language: Rust, stars: 1316, issues: 19, last commit: July 22, 2022
repo: github.com/freenet/locutus
site: freenet.org

hatchet

Hatchet is a tool to help you manage/prune your Email Inbox. As it processes your inbox, it will keep track of the unique sender email addresses and the number of emails from each sender. It will also search the email headers and body for "unsubscribe" links.

language: Go, stars: 284, issues: 3, last commit: June 29, 2022
repo: github.com/AnalogJ/hatchet

BonsaiDb

A developer-friendly document database that grows with you, written in Rust

language: Rust, stars: 576, issues: 84, last commit: July 05, 2022
repo: github.com/khonsulabs/bonsaidb
site: bonsaidb.io

🎤 Interview With Jonathan of BonsaiDb

https://avatars.githubusercontent.com/u/50053?v=4

Hey Jon! Thanks for joining us! Let us start with your background. Where have you worked in the past, where are you from, how did you learn to program, and what languages or frameworks do you like?

I am a life-long, mostly-self-taught coder. My first exposure to programming was when I was in first grade. My mother, a school teacher, would often need to help work with my older brother on his math, which he wasn’t a fan of. We would compete for pennies on math problems, which meant I was nearly three years ahead of my math level at school. The principal of my school, Tanya Channel, took notice of this and found a professor at Fort Hays State University who happily tutored me free of charge.
It was during these sessions where I noticed his computer. I don’t remember exactly how we got on the subject, but he started teaching me BASIC. We wrote a simple game together, although to be clear, he did most of the work.
The principal was very passionate about getting technology into her school, and I was the kid that took to it the most – mostly because my mom would have me with her after school or on the weekends while grading papers or preparing her lessons. I can’t stress how lucky I was to grow up in this environment. I learned HyperCard and HyperTalk thoroughly enough to write a few silly games, including an RPG with turn-based combat featuring Snoop Dogg as the final boss. I even helped write lesson plans for students to create simple games of their own in HyperCard.
Eventually, I discovered REALbasic on a CD in MacWorld. Despite being inspired by BASIC, REALbasic was an object-oriented, compiled language with an IDE that was significantly more powerful than HyperCard, but was still approachable to learn without much guidance. After high school, I found myself in Austin, TX looking for someplace to work for the summer. I discovered that the developers of REALbasic were located there, and I persistently pursued an internship. They took me on as a QA intern for the summer, and at the end of it, I found myself wishing I could just stay on rather than go to college.
After a semester at Kansas State University where I was attempting to double major in Piano Performance and Computer Science, I found myself stressing about whether a scholarship was going to be renewed due to a high GPA requirement. I noticed an opening for a full-time QA position, and I emailed the CEO and asked if they would have me full-time after the semester finished. They accepted, and I started my career.
Over the next five years, I learned so much from incredibly talented people. The compiler maintainer, Mars Saxman, helped me learn about compiler design and helped me eventually contribute to various areas of the compiler.
By 2009, I found myself chatting with a work colleague about starting up a mobile app business for trade shows. He and a partner from another business were just at CES and were shocked that there wasn’t a native app with the schedule and maps. In 2010, CES had a native app – ours.
The next 10 years at Core-apps involved me growing from being the sole developer in a three-person partnership to managing a multi-team technical department. After selling the company in 2019, I wanted to take a break, and I had a desire to write a community-driven 2D MMORPG. I didn’t have confidence in succeeding, so the other goal I had was to learn Rust, as I saw it as an interesting up-and-coming technology.
Today, I find myself only interested in writing Rust code. I love how helpful the compiler is, and it often feels like I’m writing code in a high-level language despite being a “systems language.”

Who or what are your biggest influences as a developer?

While I have had countless colleagues and employees that have helped shape my approach to development, at the core of it all is the scientific method and being curious about learning the “why’s” behind whatever I’m investigating or learning about.
Having a start in Quality Assurance for a cross-platform programming language was an incredibly valuable experience. I learned quickly that the process of forming a hypothesis, testing it, and evaluating the results was crucial in trying to reproduce and simplify issues.
I try to always remain open to learning that there are better ways to do something that I’ve always done another way.
One example is my stance on unit testing changing throughout my career. Early in my career, I used it incredibly sparingly. I found it overly difficult to do well in most of the code I was writing, so I would just skip it. Over time, I hired employees that were more passionate about automated testing, and eventually, I found myself an active supporter of unit testing and deploying CI with automated testing and code coverage for our actively developed projects.
Today, I don’t think I could make BonsaiDb reliable without a solid approach to unit testing. SQLite is an excellent example of this. Dr. Richard Hipp, the original author of SQLite, has spoken about how important testing was in helping stabilize SQLite in its early years. I’m proud of BonsaiDb’s that current code coverage hovers around 85% lines of code, but I also plan to continue to improve it over time.

What are you currently learning?

I’m learning how to write a database (lol). Each day is an opportunity for me to learn something. The last few months have been focused on learning how I might be able to improve BonsaiDb’s performance. I wrote a blog post at the start of May that takes a deep look at the mistakes I made in my original benchmarking and testing of BonsaiDb and its storage format.
I learned that Linux has tmpfs these days, which doesn’t actually persist files to disk. I set up a series of tests and benchmarks trying to learn the fastest way to ensure bytes being written to disk are persisted.
The results of all of this learning are promising – I wrote a new storage format based on all of my experimentation and research that is showing very promising early benchmark results.
Away from the computer, I enjoy learning and experimenting with cooking and baking. It’s rewarding to eat a delicious meal, and it’s fun to try new recipes and techniques. I also still love playing the piano, which can be a good escape when I’m pondering a tough problem. I’m not working on any particular piece of music, which means each day I am often trying something new. As a result, I am learning how to sight read much better.

What resources were the most useful in helping you to learn how to write a database?

The CouchDB documentation for their map/reduce views does an incredible job of explaining how the view indexing system works, that I understood enough of how to try to build an equivalent implementation using any other underlying database. That's how BonsaiDb began -- an API built atop another database.
Last October, I ended up replacing the underlying database with my own key-value database inspired by Couchstore, the C implementation of the CouchDB storage format. The documentation in that project was very helpful in helping me understand how to design an append-only B+Tree -- quite an interesting project!
I also have looked through other databases’s documentation, mailing list discussions, and articles. I rarely dive into other databases’s' source code. I generally prefer to understand how other databases work from a high level and come up with my own approach inspired by my understanding of their implementations while still being customized to my own vision. This ensures that I fundamentally understand the hows and whys behind each part of BonsaiDb.

What have you been listening to lately?

There are a few podcasts that I generally listen to every episode of. Coffee with Butterscotch is a podcast featuring three brothers who run a game studio together. They will often talk about what they’re doing to improve their development processes, and as a former small business owner, I find it refreshing to hear their takes on many topics we also faced when growing our company. I also love listening to CoRecursive and Rustacean Station.
I listen to more hours of audiobooks each week than I listen to podcasts, however. I love diving into epic fantasy including multiple listens of Wheel of Time by Robert Jordan and Stormlight Archives by Brandon Sanderson. I’m currently revisiting the currently incomplete series The Kingkiller Chronicles by Patrick Rothfuss, and prior to that finished my first trip through Cradle by Will Wight.

What type of music do you most like to play?

It's quite an eclectic blend ranging from romantic-style composers such as Rachmaninoff and Chopin all the way to modern-day transcriptions of songs from Game of Thrones or some of my favorite music from video games I played over the years.
Playing the piano has become a great way to take a break. Because I am not constantly practicing the same piece over and over, my brain has to focus on the music fully while I play. I can't think about the problem I left at the computer, otherwise my playing will suffer. As many developers know, sometimes the best thing you can do when churning on a hard problem is take a break and come back to it.

Why was BonsaiDb started?

I set out to write a MMORPG and ended up writing a database. After a year of development of my 2d game engine including UI library, I grew frustrated that I seemingly never made any progress on the actual game. While mourning the loss of my dog, I took a break and re-evaluated what I wanted to accomplish with this chapter of my life. I realized I didn’t want to work alone, but I didn’t believe in my project enough to want to hire other developers for it. So, the obvious answer to me was: open source it.
So, I changed the project to public on GitHub, and the next morning I woke up to a Discord message from Daxpedda who was looking for an open-source Rust game project to contribute to. We began to have multi-hour long Discord calls where we discussed what we believed our ideal architecture would look like. The most common design constraint was: what database are we using?
My goal for this game was not to make millions of dollars. I wanted it to be a passion project that could someday fund a very small team. I wanted the infrastructure to be affordable and maintainable by that small team.
The game we wanted to build was like EVE Online: a single universe for players, but we wanted each star system to have its own high-availability guarantees, and we wanted to be able to move star systems between servers in a cluster as load demanded. When I thought about the complexity of trying to deploy a highly-available PostgreSQL cluster per star system, it made me realize that PostgreSQL really wasn’t the best tool for this particular job.
At Core-apps, I picked CouchDB for our primary data storage – a restful document database with map-reduce views. Its design is very straightforward, and while the design can be limiting in some ways, it can be used in very powerful and creative ways. One day during a chat with Daxpedda, it dawned on me that CouchDB’s core design wouldn’t be that tricky to build atop one of the existing low-level database libraries already available for Rust. The rest, as they say, is history.

Where did the name for BonsaiDb come from?

Originally, I named the project PliantDb. CouchDB and BonsaiDb do not allow arbitrarily querying data that is contained within a document. The only way to retrieve part of a document is to implement a view, which was created by implementing a map function and optionally a reduce function. I started calling this a “programmable database.”
Well, one day while Daxpedda and I were stumbling over pronouncing PliantDb, we decided maybe it was worth considering a name change. The word “pliant” looks so similar to “plant” that Daxpedda quickly brainstormed BonsaiDb. I came up with the tagline, “a database that grows with you.”
The practice of bonsai involves carefully trimming and training a tree over time. This seemed like a very apt analogy for what most developers experience with their data model over time. While we may hope for our data to grow in specific patterns, developers are truly at the mercy of how their applications are used, which often means adjusting indexes or refactoring how data is stored. This seems similar to how someone caring for a bonsai tree can try to train a tree to one shape, but it may grow unexpectedly.

Are there any overarching goals of BonsaiDb that drive design or implementation? If so, what trade-offs have been made in BonsaiDb as a consequence of these goals?

One of my overarching goals is to make BonsaiDb have one of the best developer experiences available. One challenge with writing code that interacts with a database is unit testing. From the beginning, I wanted BonsaiDb to offer an offline, SQLite-esque mode as well as a networked server version. Furthermore, I wanted code to be able to be written that would work regardless of whether the database was a remotely networked database or if it was a local database.
This led to a design centered around Rust’s Trait system. One of the benefits of this design was that I was able to create a generic unit testing suite that would work on any mode of access for BonsaiDb. This means that each time I add a new way to connect to a BonsaiDb database, an entire unit testing suite is able to be run against that connection to verify it’s working correctly.
A general design philosophy I have is starting with the most basic implementation and building on top of it. To that end, I recognized that I didn’t want to use JSON for documents, but each serialization format has its own tradeoffs and benefits. Rather than forcing the user into a particular storage format for their documents, I designed BonsaiDb’s core to treat documents as arbitrary bytes.
This means, however, that building a generic admin interface for BonsaiDb is going to be more challenging, as each collection of documents may have its own serialization format. This also directly impacts what strategies can be used to develop a high-level query language.
I believe the choice I made provides meaningful options for developers and leads to an easier-to-understand architecture for BonsaiDb.
One other design decision is to prioritize the Rust experience. There is nothing stopping BonsaiDb from being able to be used with other languages other than an investment of time. I’m such a happy Rustacean right now that I have no current plans to prioritize adding support for other languages. I would be happy to work with other contributors who are interested in helping make BonsaiDb work with other languages.

What is the most challenging problem that’s been solved in BonsaiDb, so far?

Designing the API to take advantage of Rust type’s system is what I’m the proudest of so far. One of the examples in the repository shows how map/reduce views can be used involves using a third-party histogram library as the View’s value type. This allows fully type-safe map/reduce queries.
Originally, BonsaiDb used another embedded database library (Sled). For many reasons, I replaced it last October with my own library, Nebari, inspired by Couchstore, the storage format CouchDB uses. I do not have a background in database architecture, so it was a fun challenge to learn how to implement a B+Tree in an append-only format.
In May, I discovered some flaws in my original benchmarking that forced me to go back to the drawing board in an effort to make BonsaiDb perform similarly to other popular ACID compliant databases. This new library, Sediment, replaces the append-only file format that Nebari uses to reuse disk space and intelligently preallocate space. This has been the most challenging problem, but I’m not ready to say it’s solved yet.

What were the flaws in your benchmarking?

I was benchmarking against temporary files, and I misunderstood the documentation of Rust's std::io::Write::flush(). I wrote in-depth about my mistakes and learning on BonsaiDb's blog.
Benchmarking IO against temporary files doesn't sound like a flaw, unless you're aware of tmpfs. On modern distributions, /tmp is often mounted using tmpfs as the filesystem. This is a special filesystem implementation that uses virtual memory for the filesystem. This means that fsync, normally a slow operation, is practically instantaneous.
My approach to multi-tree/multi-table transactions required two fsync calls. Once the benchmarks were run on a real filesystem, the performance characteristics of all embedded databases benchmarked drastically changed.

Are there any competitors or projects similar to BonsaiDb? If so, what were they lacking that made you consider building something new?

I’m not aware of any open-source and free-to-use database that is attempting to offer all of the features I’m aiming for out of the box. In addition to ACID compliant storage and role-based access control, BonsaiDb also has a high-performance delayed persistence key-value store similar to Redis. I’ve also begun planning a job queuing and scheduling system a la Sidekiq or Amazon SQS. I am looking to add support for other common data problems such as time-series data, metrics, logging, and more. And all of these features will work embedded in an application, over a network connection to a single server, and eventually, over a network connection to a cluster of servers.
Each of these features has been implemented before in other databases, but I’m trying to build a comprehensive database platform that most projects could be built on. Purpose-built databases will always be able to outperform general purpose databases, so if users need the highest raw performance, BonsaiDb may not be suitable for those applications. However, for many developers, a database that is simple to deploy and is free, open-source, lightweight, reasonably performant, and scalable will be a clear win.
The closest comparison to what I’m building might be RavenDB, a commercial open-source database written on the .NET platform. It’s a feature-rich database that I haven’t personally used, but have heard great things about. While it’s open-source and has a community edition, the community edition has several limitations that prevent it from scaling too large without purchasing a license.

What was the most surprising thing you learned while working on BonsaiDb?

When trying to benchmark my new storage format’s performance against SQLite on Mac OS X, I was shocked to discover that the built-in version of SQLite isn’t able to provide ACID compliant transactions. It turns out that Apple replaces the “fullfsync” pragma’s implementation with a weaker guarantee. Benchmarking databases fairly has been challenging with these types of gotchas lurking.

What is your typical approach to debugging issues filed in the BonsaiDb repo?

I start by trusting that the reporter experienced an issue, even if I decide there isn’t a bug in BonsaiDb itself. If BonsaiDb behaved differently than expected, I try to think about how I could improve the documentation or API to make it more likely to meet future users’ expectations.
If there is a misbehavior of some sort, I may ask follow up questions to try to understand what all functionality might be involved in recreating the issue. Once I have some ideas of ways to try to break BonsaiDb, I try to create a unit test to reproduce the issue. If I can’t reproduce the issue, I’ll ask if the user can try to narrow it down further, and let them know what I’ve tried. By telling the user what I’ve tried, it can sometimes help the user identify what is unique about their setup that I didn’t try.
Once I can reproduce the issue, the strategy for debugging varies greatly based on the type of issue. I use Visual Studio Code with Rust Analyzer and CodeLLDB which allow an IDE-like debugging experience. But many times, I will do log-based debugging by printing out intermediate values that I want to verify meet my expectations.

What is the release process like for BonsaiDb?

Right now, the process is semi-automated but manually driven. Our repositories use the cargo-xtask pattern, which allow extending cargo with code written inside of the repository. To release the currently checked out code, I would execute `cargo xtask publish` to run my custom deployment process.
BonsaiDb’s repository currently contains 6 published crates and several more that are pending release with the next version. These crates may depend on one another, which means publishing must be done in a specific order. This requirement is also present in other repositories we manage as part of creating BonsaiDb. I share the same custom deployment process between all of my Rust repositories, and I can customize the behavior in each repository (e.g., in BonsaiDb).
I also wrote a README management tool to facilitate sharing the README content with the Crate’s main documentation page with correct links on GitHub, Crates, and docs.rs. It also allows pulling in centralized files such as the licenses and code of conduct. If I ever need to update one of those files, I can update the central file and run “rustme” to get the updated versions in each repository.
All of this tooling is very early in development, and I tend to improve it a little bit each time I do a release.
From a source control perspective, I ensure that each release is tagged so that users can easily browse the source code for the specific version that they are using.

Is BonsaiDb intended to eventually be monetized if it isn’t monetized already?

It is not monetized beyond my GitHub Sponsors which is linked at the bottom of the homepage and on the repositories that I am the primary contributor on. There are several potential ways to monetize an open-source database, but the honest truth is that I’m not that interested in directly monetizing the database. I would prefer to foster this into a true community-driven project that isn’t built mostly by one person.
If I were actively trying to profit off of it, I feel like I would be less likely to attract the type of passionate contributors I hope to find over time. I would rather try to monetize an app or game built with BonsaiDb than I would want to try to monetize BonsaiDb itself.
To that end, I want to begin dogfooding BonsaiDb with a larger project towards the end of this year.

Any ideas about what that project will be? Back to the MMO?

I'm definitely not going back to my original MMO concept, but I have been daydreaming of designing a much smaller MMO concept instead. Whatever project I pick, I only know one thing so far: I want it to be the first heavy user of an administration GUI.
BonsaiDb needs an administrative interface, but its underlying design presents some unique challenges. BonsaiDb doesn't know anything about the data stored within it, not even whether it's Pot, JSON, or some other format. This makes providing a generic editor for a document hard -- plus a generic JSON editor wouldn't enforce any data constraints the underlying model contained.
What if BonsaiDb's administrative interface extensible so that the way built-in administration sections are implemented is the same way any section could be implemented for any schema or collection? Specifically for the game idea, I want to be able to do my game content administration tasks in the BonsaiDb admin rather than create a separate set of tooling.
While it may sound daunting to tackle both an admin project and another project at the same time, it can be hard to design a flexible and extensible architecture without being forced to use it along the way. My goal is to use this next project as an "ideal example" of how the admin interface could be used to power back office administration tools.

If you plan to continue developing BonsaiDb, where do you see the project heading next?

I have two competing priorities for BonsaiDb: high availability and general usability. It’s very important to me to have at least one viable high availability strategy before leaving the alpha phase. Yet, I also won’t leave the alpha phase without a well-rounded set of functionality that is intuitive, reasonably performant, and reliable.

What motivates you to continue contributing to BonsaiDb?

Now that I’ve envisioned what I want BonsaiDb to be, I don’t want to use another database, and I know some users already feel the same way. That reason is strong enough on its own for me to continue working on BonsaiDb.
Because of my desire to use BonsaiDb in anything I want to build, the biggest motivator for me is that I am currently “funemployed.” I cannot work on BonsaiDb forever without earning some money, but I have enough runway for now that I am enjoying the freedom to build something I’m passionate about.
Earlier this year, I also recognized that I enjoy helping others solve interesting problems. This was similar to my years working on REALbasic, where the product was used to make other products. I’ve always enjoyed knowing the software I worked on was used to create various other products and projects.

Are there any other projects besides BonsaiDb that you’re working on?

I’m currently focused on BonsaiDb, and its related projects. Last fall I decided I needed to focus on BonsaiDb until it at least had some form of high availability before I could even consider building a business powered by BonsaiDb.

Do you have any other project ideas that you haven’t started?

It’s pretty rare for me to have an idea for a project and not to start working on it in some limited fashion. I enjoy experimenting, and it can be a good activity to refuel myself if I’m starting to feel burned out on a task.
I still want to try my hand at building my vision of a social, sandbox MMORPG that could be developed and maintained by a very small team. I want to try to make such a game succeed with no other form of monetization beyond a small subscription fee ($5 or less per month).
I would love to create a privacy-focused, open-source, self-hostable social network with end-to-end encryption that could be plugged into games but be accessible outside of the game. When I played EVE Online, it was disappointing to not be able to chat with people in game without launching the full game client. This leads to players needing to use third-party tools which breaks immersion. If I end up developing part of this for my own game(s), it seems like it would be a great candidate to modularize and release separately.

Where do you see software development heading next?

As someone who has professionally used a wide variety of languages over the years, I landed on Rust for several important reasons. With dynamically typed languages, your code is largely verified by how good your test suite is. It isn’t uncommon to need to run the application and trigger a workflow before you might catch a small typo. On the other hand, statically typed languages can prevent many common errors before the code is even executed.
Rust extends those simple type checking benefits with extra rules that prevent common memory safety issues. While this can be a friction point for new users of the language, I can assure readers it’s worth the effort to learn. The refactoring experience in Rust is incredible. Never in any other language have I been able to do day-long refactoring sessions and have the code just work once it compiles. By removing a whole class of possible errors, it makes it more likely that when your code compiles, it might actually work the way you expected it to.
While I’m very optimistic of Rust’s future, I don’t think that any one tool will ever “be the future” of software development. I don’t pay enough attention to the ongoing research being done by people much smarter than myself, so I don’t have any specific predictions of where software development will be in 5, 10, or 20 years. The only thing I am confident of is that we will continue seeing better tools that help developers be more productive and improve the quality of their projects.

Do you have any suggestions for someone trying to make their first contribution to an open-source project?

Try not to be intimidated, regardless of whether the contribution you’re making is in your comfort zone or not. Many maintainers will be overjoyed to have a potential contributor and will be happy to help.
For a first contribution, I would recommend finding a small task or feature in their issue tracker. Some repositories, including BonsaiDb, have a tag identifying “Good First Issues” for potential contributors. If you’re able to solve the issue, check the repository’s contribution guidelines to see if there are any extra steps to take before submitting a pull request.
If you aren’t able to solve it, don’t hesitate to comment on the issue asking follow up questions. Again, most maintainers will be ecstatic to have a new contributor and will be happy to answer some questions to help you complete your task.
Most importantly: have fun! It’s a great feeling when you’ve helped a project you love get even better.

Want to join the conversation about one of the projects featured this week? Drop a comment, or see what others are saying!

Console by CodeSee.io

Discussion about this post