Console 57

The Art of Command Line, Permafrost Engine, and RonDB

Sponsorships

Product Disrupt

Get freshly curated resources to learn product design and create digital products, twice every month in your inbox. Join over 4,000+ creators from the likes of Figma, Google, WordPress, Apple, Adobe and more… Get free resources


rondb

RonDB is a stable distribution of NDB Cluster, a key-value store with SQL capabilities. It is based on a release of MySQL, an SQL database server.

language: C++, stars: 228, watchers: 7, forks: 18, issues: 2

last commit: June 3, 2021, first commit: Jul 31, 2000 (this includes MySQL commits)

https://twitter.com/mikael_ronstrom

permafrost-engine

Permafrost Engine is an OpenGL RTS game engine written in C.

language: C, stars: 2168, watchers: 50, forks: 70, issues: 5

last commit: May 12, 2021, first commit: October 25, 2017

https://twitter.com/everglorygame

the-art-of-command-line

Master the command line, in one page.

stars: 91701, watchers: 2686, forks: 10045, issues: 197

last commit: September 07, 2020, first commit: May 20, 2015

https://twitter.com/ojoshe

terraform-provider-factorio

“Infrastructure as Code” for your factory.

language: Go, stars: 121, watchers: 3, forks: 0, issues: 0

last commit: May 28, 2021, first commit: May 09, 2021


An Interview With Mikael Ronström of Logical Clocks

Hey Mikael! Thanks for joining us! Let’s start with your background. Where have you worked in the past, where are you from, how did you learn to program, what languages or frameworks do you like, etc?

I was born in Stockholm, Sweden and I am still there. I studied at the Gymnasium in the late 70s (similar to high school) where the school had a computer from 1973. We played computer games (Lunar Landing) on a teleprinter. My first programming I did in Basic on an ABC 80 (yes, it came in 1980).

Most of my education in programming has been on the job training. The only formal course in programming I participated in taught Simula. I view myself more as a Software Architect that does a lot of programming. I spent 5 years in the second half of the 1980s working as a programmer in various positions. Among other things developing a Local Area Network product, actually a lot of assembler programming. In 1990 I joined Ericsson where I had the opportunity to study for a Ph.D in Technical Information Systems. This is where NDB Cluster was born, which is the base for RonDB which I currently work on.

I use very simple frameworks, I want the framework to be so simple such that I can focus on the actual programming task. Thus I use Vim together with CMake and Make. I like using large screens, this means that I can have 10 different files and windows open in parallel on one screen and with screen switching I can have 100 things on my screens to work on in parallel.

Where did you get your Ph.D?  Do you feel it was worth the time and effort you spent in obtaining it?

I got my Ph.D at the University of Linköping. The Ph.D studies have definitely been worth the effort over time. I learned a lot about the issues involved in building database software, and Ph.D studies helped me become a better writer (both of text and software).

Who or what are your biggest influences as a developer?

I had a lot of influences, but the most important came from all the reviews I have done working at MySQL. The person that I probably learnt most about programming from is Monty Widenius, I spent a week in his home to review the MySQL partitioning feature.  

He had an acute sense for details in the programming which I learnt a lot from. Personally, I have been more focused on the architecture and the program is simply a means of implementing the architecture.

What?! You worked with Monty? How did the two of you meet and become close enough for you to spend time at his house?

At that time Monty was CTO of MySQL and I was developing a new key feature for MySQL. Monty was keenly interested in all the details of the code going into the MySQL Server. I became a part of MySQL when MySQL acquired the NDB Cluster team from Ericsson in 2003 and created the product MySQL Cluster.

What's an opinion you have that most people don't agree with?

I got a lot of influence from the AXE System at Ericsson. This system used asynchronous programming with modules where no data sharing was possible between modules.

Most developers strive to use programming languages with lots of frameworks. Obviously developing applications, this makes sense. But in developing a Database Machine you are judged by the performance, latency and availability. Thus, it makes sense to invest more time in building an efficient architecture.

Actually most developers of key-value databases have realised this, so today this is a fairly popular approach.

What’s your most controversial programming opinion?

Currently I have focused my debate on the advantages of using thread pipelines. Thus breaking up a task in several steps and letting each step be performed by a specialised thread. This provides very good instruction cache behaviour and I have shown that it can improve throughput up to 30-40%.

I think most people focus on writing code that is short and concise and using lots of frameworks. I find it extremely hard to debug software where I can’t easily follow the code. Thus I prefer writing code that uses more code lines.

RonDB also allocates all its memory at startup and controls all memory after that, thus most frameworks are disqualified. Most of the code in RonDB is written using asynchronous programs which means that one cannot relinquish control of the code to a framework that can do all sorts of blocking system calls. Here is a link to the scheduler and here is a link to a code example executing the message TCKEYREQ (look for execTCKEYREQ, this is the signal that receives a key lookup request).

I am not sure this is necessarily a programming opinion. It is more based on analysis of the requirements. Actually, most debates I have had in my career on how to build software is based on that we prioritise different requirements.

What is one app on your phone that you can’t live without that you think others should know about?

I only use my phone to make phone calls and to login to banks and so forth. I spend 10 hours per day in front of the computer, so no need to also spend time with a smartphone :) On my computer I spend most of the time on Gmail, Slack, Zoom, Microsoft Teams and obviously terminal windows and Vim where the development happens.

If you could dictate that everyone in the world should read one book, what would it be?

As a religious person it would definitely be the Book of Mormon, another Testament of Jesus Christ.

I'm originally from Utah and I have to say I'm surprised to hear someone from Sweden is a Latter Day Saint!  How did you come about being a Latter Day Saint and what about the religion drew you to it?

I became a member of the Church of Jesus Christ of Latter Day Saints in 1988. I always had a strong faith in God as my heavenly Father and have always been interested in hearing about what people believe in. After meeting with some missionaries from the church I decided to venture to ask God about their faith. I received an answer to my prayers eventually. The year after, I also received some personal inspiration on what to work on, and I am still at it and having lots of fun doing it.

If you had to suggest 1 person developers should follow, who would it be?

Google is your friend and can answer most questions :)

If you could teach every 12 year old in the world one thing, what would it be and why?

Interesting question, I have 5 kids, so a soul-searching question :)

I think patience is the most important thing to learn. The longer you can wait for your reward, the happier you will end up.

If I gave you $10 million to invest in one thing right now, where would you put it?

I gather I would be selfish and invest it in RonDB, the money would be spent on building both a larger sales team and a larger development team. I usually have ideas on how to develop the product for at least the next 10 years.

In a more general term I would invest in clean energy.

I’m curious what those 10 year ideas are, care to elaborate?

What I meant is that I have 10 years of ideas of developments. However the priorities change

and also every time one implements a new feature it leads to a new set of possible developments.

But just to give a flavor here is a few ideas for future development:

  1. Use query threads also for Locked Reads and Write Operations (leads to more flexibility and improved use of all CPUs) (work in progress)

  2. Poolification of Schema Memory (work in progress), includes a malloc implementation usable in our asynchronous programs.

  3. Special malloc variant for long signals, enables signals with up to a few MBytes of signals (work in progress)

  4. Variable sized rows in disk columns

  5. Support BLOBs up to MBytes in size that are part of row in-memory and on-disk (BLOBs are supported in RonDB, but are implemented as multiple rows)

  6. Improved interpreter to handle row changes and row reads

  7. Parallel Query Aggregation

  8. Drop Node

  9. Support up to 8k API Nodes

  10. Range-based Partitioning

  11. Completely Local Checkpoints

  12. New APIs

.....

Gather that these short texts don't explain all the details of what these features do. I usually try to do features first with large impact and low effort. But every now and then it is necessary to do complex changes as well.

What are you currently learning?

I don’t specifically learn a single thing, but I read about history, science, and newspapers. I am very interested in modern computer hardware. Others are interested in fast cars, I am interested in fast computers. So I read a lot about Hardware development to understand how Software architecture will change. So I read a lot about Intel Persistent Memory, AMD Zen 3 and 4, Intel Ice Lake, Sapphire Rapids, Apple ARM development, ARM Graviton, ARM Ampere, NVMe drives....

I find my reading usually through Google searches. So e.g. searching on news related to ARM, Intel Optane, Intel Persistent Memory, AMD Zen, Ice Lake, Sapphire Rapids.

What have you been listening to lately?

When I listen to something it is either music or watching TV. I am very interested in Sports, I spent the last 10 years as a soccer coach. So I watch a lot of Soccer, but also Cycling, Formula 1 and hockey.

I also watch a lot of movies. My wife often complains that I have already seen every movie :) I like a lot of different types of movies but the best ones are the sentimental ones, particularly movies with inspiration from real life where people overcome hardship of all sorts.

Who is your favorite team?

I tend to favor watching Arsenal, I like how they play soccer and they aren’t winning every game which means it is exciting to watch a game, you don’t know the outcome.

How do you separate good project ideas from bad ones?

In a database what matters most is Latency, Availability, Throughput and Scalability. Thus all measurable items, thus most ideas can be separated as good or bad based on how they affect these requirements. So first I make a quick mind study and later I also test the ideas after implementing them.

Most ideas are positive for some workloads, and negative for other workloads. Thus it is important to decide how to prioritise the different workloads.

Why was RonDB started?

In the 1990s I started doing a very thorough requirement analysis for telecom databases. Based on these requirements I selected the database algorithms I thought would work best for telecom databases. In some cases some new algorithms were invented. One of those was a Non-blocking 2-phase commit protocol, another was algorithms to handle Online Schema Changes.

To start with the idea was to partner with a database developer to reach those goals. We found some interesting companies to cooperate with. However the negotiations on how to cooperate failed. So instead I started implementing the ideas from the requirement analysis.

NDB Cluster was developed for the telecom industry. For a long period it has been adapted to modern applications such as Fintech, AI, Health Care, Gaming. With the introduction of RonDB we made it possible to skip all backwards compatibility requirements and focus a very successful product on new modern applications.

RonDB was also started as a Database product focused on being used in the Cloud.

Who, or what was the biggest inspiration for RonDB?

The biggest inspiration on RonDB is the automation of database algorithms. I’ve spent most of the last 5-10 years automating the configuration of NDB/RonDB. In my studies Jim Gray and C. Mohan were the biggest influences on me.

Are there any overarching goals of RonDB that drive design or implementation?

One overarching goal is No Downtime. This means that all replicas must be available to take over instantly. This has a consequence that you cannot replicate in the background. This enables RonDB to reach less than 30 seconds of downtime per year in a perfect setup.

Another overarching requirement is latency. This is handled by using asynchronous programming and having very strict rules on how the database code handles scheduling rules. Each execution unit must complete within around 10 microseconds and preferably within 1-2 microseconds.

What is the most challenging problem that’s been solved in RonDB, so far?

I spent 2.5 years almost exclusively on implementing a new Local Checkpoint algorithm. Previous to this new algorithm each checkpoint had to make a full checkpoint. This makes not so much sense with TBytes of data. Thus a checkpoint that makes a partial checkpoint often is required. Implementing this turned out to be a lot more complicated than I expected. I had to write a formal proof to convince myself that the algorithm is fully working. This proof is documented in the RonDB tree in storage/ndb/src/kernel/block/Backup.cpp. There are a lot of adaptive parts in this implementation and if I had time I would definitely write a research paper about it.

With this new algorithm each data node can handle up to 16 TBytes of in-memory data and at least 100 TBytes of on-disk columns.

Are there any competitors or projects similar to RonDB? If so, what were they lacking that made you consider building something new?

When NDB Cluster was started in the 1990s everyone considered database research a done deal. Now there are tens of database companies with valuations beyond a billion dollars.

RonDB has a unique combination of Latency, Availability, Throughput and Scalable Storage.

A master thesis student at Spotify benchmarked RonDB against AeroSpike (a key-value database known for good performance) and found that RonDB had 40% better latency in a feature store benchmark.

What was the most surprising thing you learned while working on RonDB?

Interesting question, what has surprised me? I was surprised how many difficult software issues that one could find very simple solutions to. Like when we created RonDB I wanted to make RonDB configuration fully automated. I had been thinking about this problem for a few years and we have implemented partial support for this in NDB Cluster for a few years. I was expecting the last step to be a year-long effort, but then I realised a solution that made it possible to complete this task in a few weeks instead.

Another problem in NDB Cluster was the ability to change the number of replicas, this problem I have been aware of for more than 10 years. But a few months ago I realised that we already solved it in a sense. So I was able to find a solution which was implemented in a couple of weeks for this problem that I also expected a complex effort to solve.

So very often there is a simple solution to the complex problems you are facing. But it could take years of contemplation before the solution dawns on you.

What is your typical approach to debugging issues filed in the RonDB repo?

At first I make a quick assessment of the issue, sometimes this is enough to point to where in the code the problem comes from. Most of the time it is necessary to find a repeatable test case. Actually most issues are found in internal testing which means that there is a repeatable test case.

RonDB has a trace file that shows exactly how the last 1000 jumps and 4000 last messages were executed before the crash happened. This is very useful to debug the problems. This type of trace files are generated even in production binaries.

In addition, obviously the usual tools with core files, debug printouts and so forth. In particular recovery code requires immense amounts of printouts since the bug happens before a failure and is detected when recovering the node.

What is the release process like for RonDB?

We are currently preparing a new release of RonDB, version 21.04.1. This means that we run through a very large set of test programs. We have functional test programs, we have a lot of automated recovery test programs. In addition, we do a wide range of benchmark tests.

When we consider the software ready for a release we build the software on a special build machine that takes about 2 hours to create a tarball for Linux. Next we move this tarball to our download server.

The next step is to create images on AWS and Azure using the new release. 

The final step is to write documentation for the new features and document the release notes and at the last step we write blogs and post on Twitter, LinkedIn, and other social media.

How is RonDB monetized?

RonDB is a part of the Machine Learning platform Hopsworks that contains RonDB, HopsFS (a file system implemented on top of RonDB), Feature Store (implemented on top of RonDB).

Hopsworks is sold by Logical Clocks and is the current main source of revenue. The intention is to also monetize RonDB as a standalone product as well.

How did Logical Clocks start?

Logical Clocks is a spinoff from KTH where the CEO Jim Dowling spent more than 10 years leading a research effort on distributed systems. Jim and I were colleagues at MySQL 15 years ago and I assisted him earlier in his research using NDB Cluster as a platform to build a highly scalable file system.

How do you balance your work on open-source with your day job and other responsibilities?

My work is open-source development, it is what I do. Obviously it has to be balanced with my personal life, but all my kids are now grown ups so I am fairly free to spend a lot of time on RonDB.

Do you think any of your projects do more harm than good?

I view all developments as neither harmful nor good. It is up to the user to use it in a harmful way or a good way. Obviously a database can be used for good and it can be used for bad things.  I focus my interest on the good things the technology can bring.

What is the best way for a new developer to contribute to RonDB?

Often the easiest way into a project is to work on various OS issues or HW issues. Currently ARM is an important development for RonDB. So fixing things running on ARM is a nice way into the RonDB project. Another similar project is to support FreeBSD.

For those interested in implementing virtual machines could consider extending the virtual machine in DbtupExecQuery.cpp. The program is set up in the NDB API and used in the Dbtup module.

If you plan to continue developing RonDB, where do you see the project heading next?

The plan is definitely to continue developing RonDB. There are many areas to improve, a big area we have worked on for a while is handling parallel query execution for the real-time data in RonDB.

What motivates you to continue contributing to RonDB?

That I develop a tool that can be used in solving all sorts of human challenges. One challenge that I personally take a lot of interest in is genealogy. To build a global genealogical tree for humanity.

Where do you see software development in-general heading next?

The data volumes are constantly increasing, thus more and more tools to handle data analysis and all sorts of processing data will be important. I think this means that traditional software development and high-performance computing will be more and more merged together.

Where do you see open-source heading next?

Open-source will continue to be very important for tools that many people use. E.g. tools to handle massive volumes of data is an area where open-source is likely to continue to thrive.

Do you have any suggestions for someone trying to make their first contribution to an open-source project?

Focus on tasks that you do often and write some tools automating those things. My career was a programmer career and the software I developed we decided to turn into open-source. So the simplest manner is probably to get hired by a company doing open source development.


Not subscribed to Console? Subscribe now to get a list of new open-source projects curated by an Amazon engineer in your email every week.

Already subscribed? Refer 10 friends to Console and we’ll donate $100 to an open-source project of your choice!