Console

CyberChef, Continuous Reforestation, and Git Split Diffs

Sponsorship

The Reshape

Stay up to date with AI 🤖 & Data Science 📊 by reading top-notch articles. We are the first to spot hot news (data proven!). At the same time, we scour the Internet to find the most overlooked publications. Get a package of both in a concise form delivered straight to your inbox for free 📬.


Not subscribed to Console? Subscribe now to get a list of new open-source projects curated by an Amazon engineer in your email every week.

Already subscribed? Refer 10 friends to Console and we’ll donate $100 to an open-source project of your choice and send you a Console sticker pack!

Share Console


Projects

CyberChef

CyberChef is a simple, intuitive web app for carrying out all manner of “cyber” operations within a web browser open-sourced by GCHQ.

language: JavaScript, stars: 12446, watchers: 315, forks: 1631, issues: 251

last commit: March 26, 2021, first commit: November 28, 2016

https://twitter.com/GCHQ

git-split-diffs

git-split-diffs is GitHub style split diffs with syntax highlighting in your terminal.

language: TypeScript, stars: 1923, watchers: 8, forks: 18, issues: 4

last commit: July 07, 2021, first commit: April 10, 2021

https://twitter.com/banga_shrey

continuous-reforestation

A GitHub Action for planting trees within your development workflow using the Reforestation as a Service (RaaS) API developed by DigitalHumani.

language: Python, stars: 130, watchers: 7, forks: 3, issues: 4

last commit: March 27, 2021, first commit: December 09, 2019

https://twitter.com/protontypes


An Interview With Jesse Smith of DistroWatch

Hey Jesse! Let’s start with your background. Where have you worked in the past, where are you from, how did you learn how to program, what languages or frameworks do you like, etc?

This question covers a lot of ground and I'll try to keep my answer from rambling on too long. In the past I've worked in a variety of locations and positions. I've been a janitor, convenience store clerk, support desk IT guy, server admin. I've been a roaming IT guy that goes into different businesses all the time to try to fill in or fix issues. I've also done product and music reviews for a few publications.

Location-wise, I'm from eastern Canada. I grew up in a rural environment in a very scenic region of Nova Scotia. I was lucky, as a kid, to have relatively early access to computers, both at school and at home. I learned to program by accidentally typing "list" instead of "run" one day while trying to run a BASIC program and was fascinated by the gibberish that appeared on the screen. Being a curious lad of about nine I started making minor adjustments to the code on the screen to see how it would impact how the program would run. For the first six or seven years I was mostly self taught.

Later, I was able to learn Pascal in high school and then took a more in-depth set of programming, sysadmin, and database courses in college. While I was there I learned UNIX and started running Linux at home.

As for which languages and frameworks I enjoy, I'm pretty flexible and happy to use whichever tool seems most convenient at the time. I mostly use C/C++ for local applications and JavaScript or PHP when working on websites. While I picked up COBOL, Assembly, Java, Delphi, Python, and a handful of other languages over the years I've rarely ever had a real-world opportunity to use them.

Typically the level of code I work with doesn't lend itself to using frameworks much, but I have learned to appreciate Qt for desktop programming. The "signals and slots" approach takes a little while to get used to, but I appreciate how clean and powerful Qt can be.

What sort of music were you reviewing?

I mostly reviewed metal music. I used to write for We Love Metal (they've since shut down). I'd often get assigned the more obscure or unusual bands, stuff the other reviewers weren't interested in. So I ended up listening to symphony metal, some instrumental-only groups. I really enjoyed Lullacry, Benedictum, and Kevin M Thomas.

I also reviewed some pop and rock albums.  I reviewed Sarah Jackson Holman early on, Yusif, and Snakecharmer.

One of my favourite outcomes from a review was I was the first person to review an up and coming artist named Natasha Mira Todd, probably about ten years ago. She's gone on to bigger and better things, but at the time she was entirely self-published and working out of her home. She and her family were so thrilled at the review - just getting more exposure I suppose - her mother sent me a thank-you for the article. It was really sweet of them.

Who or what are your biggest influences as a developer?

The "who" part of this question is tricky. I haven't had many mentors, guides, or people I looked up to in my journey as a developer. I have an uncle who provided me with some programming books and answered questions for me when I was younger and I appreciated that. In a fun twist, his son (my cousin) interviewed me about what it's like to be a developer for one of his college courses last year. So the relationship came full circle.

When I was in college I had two instructors in particular who were a big influence. My Pascal and Assembly language professor taught me a lot about efficient code and squeezing the most out of the hardware. My UNIX professor certainly put me on the path of running Linux and BSD. His approach to things like security - balancing pragmatism versus idealism - had a strong influence on my work as a system administrator.

As for "what" is my biggest influence... It's probably my situation, the context of what I'm working on or my perceived need at the time. Early on in my developer journey I was often working with some tight constraints - whether it was time, computing resources, or tools. As a result I have tended to take the approach which gives me the best result for the least amount of resources/time/patching.

Two of my earlier projects were prime examples. One had to run on DOS machines, on processors of any speed, and be easy to upgrade by non-techie people in remote areas without a network connection. My first open source project had some uncomfortable requirements of being graphical, cross-platform, it had to be super easy to use and install, and I didn't have access to the target platforms for testing purposes. I ended up writing the whole thing in structured JavaScript to create what we'd probably call a web-app these days, but at the time (late 2002) it was just a fancy web page. I checked earlier this year and, 18 years later, the thing still runs flawlessly on modern browsers, which I feel is a good sign for something I slapped together fresh out of college.

What's an opinion you have that most people don't agree with?

Attack Of The Clones was my favourite Star Wars prequel, does that count?

In technology circles, I think the opinion (or maybe it's better to call it an outlook) I have that many people don't seem to share is finding a practical balance. Oftentimes people get caught up in their quest for something: the perfect design, unbreakable security, using the best programming language, the distro with the most up to date software, or one ultimate distro that can do everything. Call me either practical or lazy, but I'm inclined to see computers as tools. Very interesting tools, to be sure, but ultimately just tools. So while I have preferences, I'll try to find the best tool for the job in front of me.

For example, I prefer open source software, but I'll use non-free firmware if it gets my wireless network running. I use some basic security approaches to lock down my machines, but I'm more interested in being able to detect and recover from an intrusion or theft than I am in preventing one. I have a directory full of scripts on my workstation for accomplishing or automating all sorts of little tasks and they're written in a mix of bash, tcsh, awk, and PHP. I don't have a strong preference (some might say loyalty) to any one language or approach, I'll use whatever seems to best fit the job at the moment.

What’s your most controversial programming opinion?

Well, I just mentioned using and liking both PHP and JavaScript, so those might qualify. Honestly, I think both of those languages (and COBOL for that matter) have an undeserved poor reputation because they're beginner-friendly and that results in a lot of awkward, poor code being written in them. But the languages themselves I find quite straightforward and consistent to work with and I've never had a serious problem with them.

Another outlook I have is that I regard C++ as a perfectly good language, as long as you ignore most of its features. I like writing C++ as it seemed to be originally intended: C with classes. I think too many programmers see the many things a language _can_ do and dive in, not stopping to think how unreadable their code is likely to become. I'm getting off track here, but my unpopular opinion is that more programmers should focus on writing source code that is readable and usable by other programmers rather than writing the coolest or most powerful code they can.

If you could dictate that everyone in the world should read one book, what would it be?

One of my favourite books, which I happened to just finish reading again, is A Wizard Of Earthsea by Ursula K LeGuin. It explores a young man wrestling with pride, fear, and his demons (inner and outer) in a fantasy setting. The book has a simple flow and language to it, making it quite accessible, while sharing perspectives on calmness, self-knowledge, the importance of friendship, and the dangers of ego-centric pride. Plus there's a cool fight between a wizard and a dragon, so that's pretty exciting.

If you had to suggest 1 person developers should follow, who would it be?

I feel I don't have a good answer for this as I tend not to focus much on any one developer's work or opinions. So I'm going to cheat and recommend two whose work I respect. Jim Hall, the creator of freeDOS (@FreeDOS_Project on Twitter) is an outstanding and super friendly guy who always seems to have positive and insightful things to say; and Jonas Termansen (@sortiecat on Twitter) is doing some really cool work through Sortix. Even if you don't have any interest in these open source operating systems, I'd recommend listening to their views on clean design and standards.

If you could teach every 12 year old in the world one thing, what would it be and why?

You have a lot of tough questions. One thing? Someone once asked me if I thought honesty or compassion was more important. Then pointed out that it's almost never an either/or choice. You can be honest and compassionate. You can disagree without being cruel. You can speak the truth without being a jerk and be kind without lying.

Too often I hear people respond to having their behaviour called mean by saying they're "just being honest" or justify lying by saying they "don't want to hurt the other person's feelings." Good communicators can be kind and honest, firm and compassionate. I wish more people knew that.

What are you currently learning?

Right now I'm working on some projects which are testing the limits of my knowledge (and patience) with Shopify and Wordpress. Both solid platforms which these projects are trying to push in unexpected directions.

Care to get into specifics?

With the Shopify situation we're looking at expanding the "shop" to include more blog-like and community-oriented options. Shopify can be expanded this way, but it's not really designed for it and the available plugins are mostly commercial. So we're looking at ways to extend the social, search, and blog options without spending too much money.

The Wordpress account is a different beast. Wordpress is pleasantly extendable, that's all well and good. But my client wants to merge the Wordpress user accounts with an existing database of customers so they can interact with the new Wordpress website and their old shop seamlessly.

How do you separate good project ideas from bad ones?

There are a few things which come to mind. One of the big questions is whether the project's scope is too big. People who come to me with projects they want to implement which will revolutionize something or be used by millions of people probably aren't realistic. Another consideration is whether the project is sustainable long-term. I could write software which will calculate income taxes or insurance rates, but they'd need to be updated and tested against new rules every year. Does the person or team involved have the time to do that consistently on a reasonable time table?

Finally, I try to take on projects I personally find interesting and useful. As much as possible I want to be engaged. Often I hear new developers asking, "I'm looking for a project. What should I work on?" To me this is like a painter asking what they should paint or a writer asking what the topic of their first novel should be. Find what motivates you, what you want to share, what you want to make better.

Who, or what was the biggest inspiration for DistroWatch?

I don't want to speak for Ladislav, but I think the inspiration was just trying to make sense of the expanding Linux ecosystem and the lack of consistency between the various distributions. Each project has its own release schedule, package versions, package names, set of features, filesystem organization, etc. It was difficult to make an informed comparison between any two Linux distributions. DistroWatch strives to gather up the data and present it in a consistent format, making an "apples to apples" comparison easier.

Are there any overarching goals of DistroWatch that drive design or implementation?

Our goal has always been to try to provide useful information to our readers. Whether that's through reviews, offering up a glossary of terms, making it easier to find documentation, answering questions, making download links easier to find, or helping them compare features between distributions. We want to gather and share information in a way that makes sense to people.

Whether we implement a new feature or provide a new information page often depends on how useful we think it will be to our readership. Is it something people will find useful or will it be one more button or clutter that gets in the way?

What trade-offs have been made in DistroWatch as a consequence of these goals?

I think the trade-offs we tend to see the most are in the Search page and in how information gets added to DistroWatch. We get a lot of suggestions for search features people want to see or tags people want to have added to distributions in our database. Many of these don't get added because, well, the site has been active for 20 years. If just one person in that 20 years wants us to track a specific package or make it possible to search for how many people are on a development team, then it might not be worth looking into. But if a dozen people all ask us to make it possible to search for a specific init implementation then we tend to take notice and add it.

With regards to how information gets added to DistroWatch, earlier on in my involvement with the website I wanted to make it more open, more wiki-like. However, it quickly became apparent through some tests that we'd end up getting a lot of spam and a lot of inaccurate information that way. It would take longer for us to verify and curate data from volunteers than it would to just track and manage the information ourselves. This means it's somewhat less convenient for people to get new information or projects added to DistroWatch, but it means what has been added has been verified.

What is the most challenging problem that’s been solved in DistroWatch, so far?

Earlier I mentioned that each distribution organizes itself differently. Linux distributions use different package managers, use different names when packaging the same software, or place information in different locations. It is both our mission and greatest challenge to sort through all that diversity (some might say chaos) and try to extract the information to organize it into a set format. This makes it easier for visitors to our site to search for and compare packages and features between projects.

The tricky part is learning how a distribution organizes its data and then trying to automate extracting the information. I have dozens of scripts for cracking open a project's install media (ISO file), finding key bits of information and then sorting it into tables. Almost every distro needs its own script, or tweak to an existing script, to gather up all the data we organize and share. I guess it's fair to say there isn't one specific bit of code that stands out, it's more the process of regularly writing and tweaking scripts to harvest the diverse information from hundreds of distributions.

Are there any competitors or projects similar to DistroWatch?

As far as I know, there weren't any similar products or websites around when DistroWatch was created. I think that's why it gained so much attention. I don't think anyone was doing anything similar, at least at the time, so DistroWatch was providing an unusual, maybe unique, service. The closest project to it I can think of was FreshMeat.net which tracked open source releases and version information for specific programs. However, I don't know of anything else which did the same for distributions.

These days there still really isn't anything truly similar. There are some websites like Repology which track repository information for various projects, and others which review distributions or announce new product releases. Still others provide torrents for new distributions. But in each case they are basically each focusing on one small part of what we do. DistroWatch covers a lot of bases - linking to hardware vendors, new releases, package information, tutorials, an overview of key features, searches for key components, providing command line examples for dozens of popular programs, etc. I don't know of anyone else that does this.

From time to time someone will come along and decide they want to duplicate what we do, with a slightly different approach, and try to make The Next DistroWatch. However, they usually don't last beyond the planning stage because people greatly underestimate the amount of time it takes to run a website like this. Even once you've got all the website code written, which could take weeks or months, once all the infrastructure is in place then it can take an hour to process and publish a single new release. We usually do about two of those a day, on average. Plus writing articles, answering questions, and adding new projects to our database. We can get a few dozen to a few hundred e-mails and notifications a day, almost every day. People interested in making a DistroWatch-like website usually aren't prepared for that kind of time commitment. As a result we tend not to see many websites trying to do the same thing.

What was the most surprising thing you learned while working on DistroWatch?

One thing which regularly stuns me is both how resilient and how fragile open source software can be. The open nature of Linux distributions, and their components, means anyone can improve them, anyone can fork a dormant project to keep it running, anyone can fix a security bug. Anyone can come along and make a better mousetrap to improve the ecosystem.

At the same time, some critical, key pieces of infrastructure are barely maintained or abandoned. It's surprising to me how often it is revealed that some important piece of software, used on millions of computers, is maintained by one person in their spare time, or even effectively discontinued. When I stepped in to help maintain SysV init, I don't think anyone had worked on it in nearly five years. When OpenSSL caused a panic over the state of its code, it was basically a part-time project for a few developers. Yet both projects had been running on millions of machines. So much of the open source landscape is like that - amazing diversity, flexibility and power mixed with barely maintained keystones.

Where do you see the project heading next?

My ideas for the site have shifted slightly over the years. I used to be interested in more open contributions, more contributors, more interviews, more features! These days I'm more interested in small, polishing adjustments. Clearing up bugs, keeping things running smoothly, getting processes into place that make the day to day tasks run better. I'm more interested in providing curated, accurate information rather than having more information of unverified quality.

That was the key insight with this newsletter as well.  Plenty of open-source newsletters out there are just rounding up a bunch of projects, but not many are highly curating them.

I agree. I think the "Bazaar" approach is great for giving projects exposure, but it's not great at helping people find specific projects or reliable information. Most of the "top 10 alternative applications" or "top 10 distro" lists I see often include projects that either aren't really open source or aren't maintained anymore.

Are there any other projects besides DistroWatch that you’re working on?

Several! I also maintain a handful of open source projects, including SysV init, the Bftpd file server, LimitCPU (a fork of cpulimit), the Dungeons and Dragons Character Generator, Sopwith SDL, the cross-platform port of doas which is used by FreeBSD and several Linux distributions, and I'm thinking of returning to the 2-D Atomic Tanks game which I maintained from 2005 to 2009. People can keep up with my work and new releases on my Patreon page.

Do you have any other project ideas that you haven’t started?

A few hundred. Most of them though are not realistic due to cost or time constraints. I sometimes toy with some code, or look at projects other people have started with similar ideas - open source competitors to Facebook or Twitter, for example, income tax calculators, I used to love coding for a MUD called Age Of Legacy and that siren song often calls to me. I'd love to make a multiple-player, networked version of Star Trek 25th Anniversary, a peer-to-peer open source, security-first chat program, a lightweight on-demand media player for the Raspberry Pi, a modern port of Legends of Valour, ... However, time is precious and limited. I do my best to balance my open source work with my home life and other hobbies. Most of these will probably never go beyond the design board unless someone hires me to work on them.

Where do you see software development heading next?

Probably more abstraction and more AI-style contributions. I think we're about due for another step up in terms of higher level languages or the way code is written. Something that makes it easier to snap together modules of code or ways to automate more of the code creation. For my generation Python and Visual BASIC were sort of a step up the modular, scripted abstraction ladder for computer languages. I suspect we're due for another step up where the developer gives some guidelines to the development environment and it uses an AI-like parser to try to figure out what the developer wants, then writes some basic code and module framework the developer can tweak and fill in as needed. We are close to that now with tools like Qt Creator, and I'm thinking the next step will be for the development environment to automate more of the "grunt work" in the coding process.

Along those lines, what are your thoughts on the no code movement?

I have mixed feelings about it. I like the idea of reducing the barrier to entry for people who want to create. Anything that encourages people to solve problems they see and makes finding solutions easier is good, in my opinion.

However, I also think there is a trade-off to this. More people creating programs without deeper knowledge likely means more bugs, more issues, more security holes. Unless, of course, we're very careful with how the tools people will be using are safe.

As an example, I occasionally get called in to fix projects, usually websites, which were created by "no code" tools.  What You See Is What You Get editors, that sort of thing. When the site owners want to expand and discover the initial website isn't up to the challenge, they need someone who can code to come in and re-write or fix things.

So I think "no code" tools and approaches have their place, especially for addressing simple solutions. But I think there will always be a place for people like me who work more behind the scenes, a level down from the "no code" folks.

Where do you see open-source heading next?

I think more big companies are seeing the value of not only using open source, but being active open source contributors. Microsoft not only supports Linux on its cloud platforms, but runs it on Windows via the WSL and contributes to the Linux kernel. Netflix gives back financially and through development hours to FreeBSD. Google builds Chrome from the open source Chromium. I suspect more and more companies will do most of their development in the open, but save a little special customization or optimization for their final build. It makes good business sense to do most of the work in the open where the developer community can help and then sell a polished version to consumers.

Do you have any suggestions for someone trying to make their first contribution to an open-source project?

Two things, I suppose. One is to find a project which interests you. Find the bug you want fixed or the feature you want to see implemented. Then dive in. Pick something that you are motivated to work on, the result is very rewarding.

The other suggestion is to talk with the people who run the project you want to work on. Some of them don't want help, some want a specific patch style, some may like your idea but feel it is out of scope for their project. Talk with the other developers first before you do a lot of work which may not be accepted. They can either guide your efforts to areas that will be helpful or help teach you what you need to know about their project.

Console

The Book of Secret Knowledge, Slides, and Supertokens

Sponsorship

If you, or someone you know, is interested in sponsoring the newsletter, please reach out at console.substack@gmail.com


Not subscribed to Console? Subscribe now to get a list of new open-source projects curated by an Amazon engineer in your email every week.

Already subscribed? Refer 10 friends to Console and we’ll donate $100 to an open-source project of your choice and send you a Console sticker pack!

Share Console


Projects

the-book-of-secret-knowledge

The Book of Secret Knowledge is a collection of inspiring lists, manuals, cheatsheets, blogs, hacks, one-liners, cli/web tools and more.

stars: 46579, watchers: 1638, forks: 4871, issues: 15

last commit: July 13, 2021, first commit: June 23, 2018

https://twitter.com/trimstray

slides

slides is a terminal based presentation tool.

language: Go, stars: 4404, watchers: 50, forks: 112, issues: 9

last commit: July 16, 2021, first commit: April 05, 2021

https://twitter.com/maaslalani

supertokens-core

SuperTokens is an open core alternative to proprietary login providers like Auth0 or AWS Cognito.

language: Java, stars: 2417, watchers: 39, forks: 79, issues: 91

last commit: July 06, 2021, first commit: January 05, 2020

https://twitter.com/supertokensio

PynamoDB

PynamoDB is a Pythonic interface to Amazon’s DynamoDB.

language: Python, stars: 1659, watchers: 42, forks: 363, issues: 226

last commit: June 29, 2021, first commit: January 20, 2014

https://www.linkedin.com/in/ikonst


An Interview With Rishabh Poddar of SuperTokens

Hey Rishabh! Thanks for joining us! Let’s start with your background. Where are you from, how did you learn how to program and what languages or frameworks do you like?

Graduated from Imperial College of London in 2015 with a BSc in CS. Originally from Mumbai, India.

Learnt programming when I was 15 by myself, and then formalised it via a college education - so a lot of unlearning, and learning again.

I have worked with Node (express), PHP, Spring, reactJS, react-native, iOS and Android. Out of these, I prefer Node the most, due to its simplicity. Though I never use JS directly, always use TypeScript.

What are you currently learning?

About web components and compilers like StencilJS. We may use it for our frontend SDK to support React, Angular and Vue + VanillaJS

Why was SuperTokens started?

SuperTokens was initially a secure session management solution - we had started it because we saw that session management was one of the most misunderstood topics (judging by the endless arguments online), and that its importance in user security is generally overlooked. For example, even if an app has 2fa, if a user’s session tokens are compromised, that user’s account is compromised. How easy it is to compromise the tokens and the effect of that is a function of how it’s implemented - so we wanted to provide an out of the box, very secure solution.

Once we got into YC, we quickly realised that auth in general still has several pain points. Almost all the companies we spoke to (~100), complained about its complexity, limitations, pricing etc.. So we expanded our scope from session to auth in general.

What value did you get out of being in YC?

The biggest value that YC brought us was a large pool of other devs / companies we could go talk to, to understand their auth related pain points. Before YC, it would be hard to get on calls with other devs, but during and after YC, we got tons of inbound! So the biggest learning here is that talking to users / potential users is the best activity early stage founders can do.

Are there any overarching goals of SuperTokens that drive design or implementation?

So we aim to be as developer friendly as possible. This means allowing for customizability, control and natively supporting as many frameworks / languages as possible.

For example, we have architected SuperTokens so that the frontend talks to your backend API layer and not SuperTokens directly. Our backend SDK automatically exposes all the APIs required by our frontend widgets. This allows users to customise all those APIs, within their own backend very easily. They don’t need to create webhooks, or upload source code to the SuperTokens dashboard.

What trade-offs have been made in SuperTokens as a consequence of these goals?

The trade off of this is that the speed at which we can release new features is slower - cause we need to build those backend APIs for each language out there, as opposed to just once in the SuperTokens core.

It also implies that some of the features that other auth providers ask users to pay for, we have to make them free. For example, some providers limit the number of social login providers in their free tier, or limit the kind of social login allowed. In our case, because the whole OAuth dance happens via the user’s frontend and backend (not involving SuperTokens core), we can’t have any of those limitations - this is a positive from a customer’s point of view, but may be a negative from our business’ point of view.

What is the most challenging problem that’s been solved in SuperTokens, so far?

I think the session management that we have built works incredibly well. We use rotating refresh tokens as a security measure, and have had to solve several complex edge cases such as resilience to network failures, or syncing refresh API calls across multiple browser tabs. 

Are there any competitors or projects similar to SuperTokens? If so, what were they lacking that made you consider building something new?

There are tons of other auth providers - both open and close sourced.

I am biased here, but other solutions lack the ability to make lots of customisations easily. Plus there is a very steep learning curve to implementing them - especially for other open source auth providers. I believe the different architecture that our solution employs, allows for easier customisations and prevents developers from having to understand all the various OAuth flows and jargon - making it easier to understand.

What was the most surprising thing you learned while working on SuperTokens?

It was surprising for me when I realised that our product is, first, our documentation and second, the actual solution / code. As developers, we tend to focus way more on the code than on user communication… but if our docs are “bad”, then no matter how the code is, it won’t matter. Whilst the other way around is not true (as long as the code is correct).

What is your typical approach to debugging issues filed in the SuperTokens repo?

The first step is to reliably replicate the issue. Sometimes this can be really difficult because of the number of various frameworks  and deployment options (serverless or not, using SSR or not). I remember that there was a frontend issue that a user of ours was experiencing - an issue that I could not replicate locally. So I used chrome’s override feature to change the frontend code on their site directly whilst debugging. 

Once the issue is replicated, we go about finding the root cause of it and fixing it. Sometimes the fix is easy, other times, it leads to breaking changes… we of course try and minimise the number of API / breaking changes, but sometimes, it’s inevitable.

Finally, we add unit / integration tests to make sure that the issue never comes back.

What is the release process like for SuperTokens?

For each SDK / the core, we start by adding a dev tag to the commit that needs to be released. The tag looks like dev-vX.Y.Z, where X.Y.Z is the current version number.

Adding this tag starts a CI/CD process which runs all the unit tests for the SDK, and all the integration tests.

For the integration tests, we set up the whole supertokens stack in CICD and run tests for each compatible version. For example, if we release a change to the frontend SDK, we set up a node backend and the supertokens core for integration tests. If the frontend SDK is compatible with multiple versions of the node backend, then we run integration tests with each of the versions.

Once all the tests pass, we add a release tag (vX.Y.Z) to the commit and delete the dev tag. A release tag can only be added to a commit that has a dev tag, and if all the tests have passed.

In terms of versioning, we follow the Semantic Versioning guidelines.

What are you using for CI/CD?

We are using CircleCI.

How do you intend to monetize SuperTokens?

We follow Gitlab’s buyer based model. So we (will) have different pricing tiers for different feature sets - aimed at different sizes of companies.

Within each tier, we will change the pricing based on the usage / number of users.

The pricing will also be different for our managed service vs self hosted.

What is the current main source of revenue?

At the moment, because we still need to build out many features that “larger” companies need, our main source of revenue is support.

Where do you see the project heading next?

We have a product roadmap page that highlights all the next set of features we will be making

We also want to expand the team to be able to support several more tech stacks - both frontend and backend.

Finally, we have a lot of work to do to make our docs better - starting with moving to Docusaurus v2.

Is there anything in particular you're looking for in a developer?

Excellent core CS skills - so understanding of operating systems, threading, cryptography, networking, compilers etc.. someone who can work with a variety of languages or is capable of picking one up in a few weeks time.

Where did you hear about Docusaurus?

So ReactJS's docs use Docusaurus. That's where we first found out about it.

What motivates you to continue contributing to SuperTokens?

Seeing sites that implement us - even the people behind that site don’t pay us.

Where do you see software development heading next?

I see a lot of people using serverless and the ecosystem around that.

With the release of tools like Github copilot, I guess that in a few decades, a lot of the boilerplate / grunt work in programming will be 100% automated. Devs will be focusing way more on app / code architecture, code reviews (generated by machines) and stitching different tools together to achieve what they want - development time will reduce significantly.

Where do you see open-source heading next?

A lot more people will want to choose open source solutions over closed source (that’s one of our thesis for why SuperTokens). The reason is that a whole generation of devs have “grown up” and learnt programming by reading and contributing to open source - so the natural instinct is to go with an OS alternative.

The above in turn implies more scope for contributions, which in turn implies more features and “better” projects.

The one topic I am most unclear, but very curious about, is how crypto / blockchain will affect open source developers - specifically for OS code that’s running on blockchains like Ethereum. For example, if I write an OS library that’s ultimately used in an app that runs on Ethereum, I can imagine there being a system where the author(s) of the library get paid each time their code runs… That would be pretty cool. 

Console

Unison, Rustpad, and Hypercore

Sponsorship

The Reshape

Stay up to date with AI 🤖 & Data Science 📊 by reading top-notch articles. We are the first to spot hot news (data proven!). At the same time, we scour the Internet to find the most overlooked publications. Get a package of both in a concise form delivered straight to your inbox for free 📬.


Not subscribed to Console? Subscribe now to get a list of new open-source projects curated by an Amazon engineer in your email every week.

Already subscribed? Refer 10 friends to Console and we’ll donate $100 to an open-source project of your choice!

Share Console


Projects

unison

Unison is a new programming language, currently under active development. It’s a modern, statically-typed purely functional language, similar to Haskell, but with the ability to describe entire distributed systems with a single program.

language: Haskell, stars: 3950, watchers: 116, forks: 190, issues: 615

last commit: July 10, 2021, first commit: September 19, 2013

https://www.unisonweb.org/slack

rustpad

Rustpad is an efficient and minimal collaborative code editor that is self-hosted, with no database required.

language: Rust, stars: 1492, watchers: 19, forks: 36, issues: 3

last commit: June 24, 2021, first commit: May 31, 2021

https://www.ekzhang.com/

hypercore

Hypercore is a secure, distributed append-only log.

language: JavaScript, stars: 1877, watchers: 65, forks: 154, issues: 66

last commit: May 05, 2021, first commit: December 20, 2015

https://twitter.com/hypercoreproto

rustc_codegen_gcc

The Rust GCC back-end was officially accepted into the Rust compiler this week!


An Interview With Eric Zhang of Rustpad

Hey Eric! Thanks for joining us! Let’s start with your background. How did you learn how to program and where have you worked in the past?

I started programming about 10 years ago with Python, immediately took a liking to building fun toy apps in middle school, and have been learning since then. My current interests are in programming languages, graphics, machine learning, and systems. Previously I researched computer vision at Nvidia, built infrastructure for Scale AI’s machine learning team, and did programming languages research at Harvard.

What was it like working at a start-up and how did it compare with research or academia?

Scale moves very fast, and on the ML team, there was a lot more emphasis on finishing pressing features to improve metrics and reduce manual labor. In academia you care about writing papers; in industry it’s more about training functional models & building the data pipelines around them.

Who or what are your biggest influences as a developer?

Open source has had a tremendous impact on my life, both in using the software and learning from it. I’ve personally gained a ton of insight from reading into tricky parts of open source code, taking inspiration from the design of popular projects, and interacting with successful project governance models.

What's an opinion you have that most people don't agree with?

I started programming before there was any social or monetary incentive (didn’t even know what having a “career” meant, back then). I’d continue building, even if all programming jobs disappeared tomorrow.

If I gave you $10 million to invest in one thing right now, where would you put it?

Developer tools, research, anything that accelerates progress in the industry.

How do you separate good project ideas from bad ones?

Good projects, for me, are ones that I can finish quickly and produce a meaningfully distinct result from everything that currently exists out there. I want to have different ideas and inspire others to come up with their own.

Why was Rustpad started?

I had a need to build a better centralized collaborative text editor than the ones currently available pre-packaged.

Who, or what was the biggest inspiration for Rustpad?

Firepad, ot.js, and ShareDB.

Are there any overarching goals of Rustpad that drive design or implementation?

Low-latency editing, seamless collaboration (live cursors), correctness, horizontal/vertical scalability, and efficiency (low server cost).

If so, what trade-offs have been made in Rustpad as a consequence of these goals?

Rustpad does not persist state. In a production deployment, you would definitely want to persist state, but this is something that would need to be appended on the app. For short-term, one-off editing sessions of the kind that Rustpad targets, this is a fine tradeoff, and since everything is stored in RAM, it is very fast.

What is the most challenging problem that’s been solved in Rustpad, so far?

The biggest unknown factor for me, when starting the project, was the operational transformation control algorithm. This is tricky because it involves concurrent distributed systems logic, communicating with multiple clients, as well as async Rust (which is difficult). See https://github.com/ekzhang/rustpad/blob/main/rustpad-server/src/rustpad.rs for the eventually consistent algorithm. There’s a lot of race conditions and subtle liveness issues that could arise if not thinking carefully about this.

Are there any competitors or projects similar to Rustpad? If so, what were they lacking that made you consider building something new?

Firepad was tied to Firebase. Not customizable, uses outdated web programming practices, and therefore not a viable alternative for me. ShareDB was perhaps a little too heavyweight and aimed at a more general problem.

What was the most surprising thing you learned while working on Rustpad?

For me, it was that I could buy a tiny $10 DigitalOcean droplet and still handle all of the WebSocket traffic from thousands of users clicking from the front page of Hacker News, editing documents simultaneously, while using less than 3% CPU capacity. Rust is really fast. Even if Rustpad grew a ton, I could still vertically scale really easily (just click a button) before having to set up more complex horizontal options like sharding, which would require writing code.

What is your typical approach to debugging issues filed in the Rustpad repo?

Reproduce, assign myself, then fix. For example, one recent issue had to do with support for emojis, which include some of the rare Unicode code points that are encoded as UTF-16 surrogate pairs. It turns out that Rust encodes strings as UTF-8 and can extract characters as scalar values, while JavaScript takes the approach of saying that the length of a string is its UTF-16 length. This leads to client/server inconsistencies in surrogate pairs (very exotic: emojis, ancient Greek text, cuneiform), and I had to fix this and write good tests.

What is the release process like for Rustpad?

Continuous, done through CI, which is possible and quite simple because the app is stateless (so we don’t have to worry about backwards compatibility). This is good for me as a single maintainer who doesn’t want to spend too much time worrying about release correctness.

Is Rustpad intended to eventually be monetized if it isn’t monetized already?

No, not intended to be monetized.

Do you have any other project ideas that you haven’t started?

Yep, I’m constantly thinking about new project ideas and letting them stew!

Care to elaborate?

Right now I’m building a collaborative math typesetting system called Slate, beta is at https://slate.rs. :) Other ideas for smaller web projects: an interactive playground for creating new parsers and languages, web dashboards for exploratory data analysis that can import from arbitrary JSON files and include shareable visualizations (powered by SQL in the browser), and a CLI tool to easily broadcast your terminal with other people in a pinch.

Do you have any suggestions for someone trying to make their first contribution to an open-source project?

Make sure you talk to the maintainer first! Most open source maintainers have poured many hours into their projects and would be happy to chat. By talking to them, you front-load the effort of aligning values and gain clarity into how the project is structured.

Console

Internet in a Box, Nushell, and Handsfree

Sponsorship

Baronfig

Did you know that you generate 4x more ideas with pen to paper than typing? Use Baronfig's all-in-one setup, the Idea Toolset, and get your ideas flowing. Visit baronfig.com »


Not subscribed to Console? Subscribe now to get a list of new open-source projects curated by an Amazon engineer in your email every week.

Already subscribed? Refer 10 friends to Console and we’ll donate $100 to an open-source project of your choice!

Share Console


Projects

nushell

nushell is…you guessed it…a new type of shell.

language: Rust, stars: 14398, watchers: 147, forks: 648, issues: 504

last commit: July 01, 2021, first commit: May 10, 2019

https://twitter.com/jntrnr & https://twitter.com/andras_io

handsfree

Handsfree.js allows you to quickly integrate face, hand, and/or pose tracking to your frontend projects.

language: JavaScript, stars: 1977, watchers: 26, forks: 76, issues: 33

last commit: March 13, 2021, first commit: November 05, 2019

https://twitter.com/goinghandsfree

iiab

internet-in-a-box is a “learning hotspot” that brings the Internet’s crown jewels (Wikipedia in any language, thousands of Khan Academy videos, zoomable OpenStreetMap, electronic books, WordPress journaling, Toys from Trash electronics projects, ETC) to those without Internet. internet-in-a-box can be hosted on a Raspberry Pi.

language: Jinja, stars: 427, watchers: 14, forks: 47, issues: 226

last commit: July 01, 2021, first commit: May 27, 2017

http://internet-in-a-box.org/


An Interview With Oz Ramos of Handsfree

Hey Oz! Thanks for joining us! Let’s start with your background. How did you first learn how to program?

I started learning to code in the summer between grades 5 and 6. At the time my father was working on a Bulletin Board System that specialized in ASCII based games (as I remember it). I asked him to show me how to make games and he gave me a book titled “BASIC: For Dummies”.

As he often called me a dummy, I took this as an insult as if he was calling me a “basic dummy” and I tore into that book! By the end of the summer I had made a pixel art app: I didn’t know how to capture arrow keys in BASIC so I used the - and + keys to select pixels and then a number to put a colored pixel.

Who or what are your biggest influences as a developer?

My heroes growing up were Bruce Lee, Rodney Mullen, and Isaac Newton and from them I’ve adopted a kind of “endurance” for working hard on projects for long stretches with little or no personal gain.

Later I was inspired by the CodingHorror blog and the explosive rise of jQuery. Johnny Lee’s wiimote demo was also massively influential. Recently, Golan Levin showed me how to take a more creative approach with code.

If you had to suggest 1 person developers should follow, who would it be?

Definitely and without hesitation: Charlie Gerard @devdevcharlie

How do you separate good project ideas from bad ones?

Excluding projects that hurt someone or something, I don’t think there are bad project ideas...just ideas that fulfill your current goals more than others.

I organize my ideas on Notion into different tiers: long, short, and silly. The long term ideas help me steer Handsfree.js, and my short term ideas help me execute. The “fun” ideas are there for when I need a break or want to get creative.

Why was Handsfree started?

Handsfree.js was started in 2018 during my 2nd year of homelessness after I was inspired by a resident at the shelter who suffered a severe stroke. Originally the idea was to help him control a mouse pointer with his face so that he could use the web to do things like go on Facebook and watch YouTube.

However, as I used the tool myself to explore different sites I became inspired by the idea of expressively browsing the web...that is to say, to browse the web in the way you felt.

The project originated as SeeClarke.js after Arthur C Clarke who said: “Any sufficiently advanced technology is indistinguishable from magic”. The goal was to create a library that was magical to use - from reading the docs to actually building with it - for the purpose of creating assistive technologies.

But the more I used the library myself the more fun I had and I began exploring other modalities like hand tracking and even fully body pose tracking. Eventually, I renamed the project Handsfree.js to represent the fact that it could help you use the web (and by extension devices and services connected to it) hands-free.

Are there any overarching goals of Handsfree that drive design or implementation?

The goals change as myself and others discover new ways to use the library, but currently I’d like to complete #100DaysofCode where I use the library in a radically new way each day and then at the end create an interactive NFT called “Playing the web”.

The purpose for this goal is to demonstrate how easy the library is to use while also demonstrating its potential to help people make money hands-free.At the same time, the challenge of having to build one thing a day will hopefully improve the library by making it more succinct to use.

One much longer term goal is to position Handsfree.js in such a way that it becomes like a kind of “jQuery for Mixed Reality” or “jQuery for the next web”. I also have an ambition for the docs to be regarded as one of the most interactive and “best” docs out there.

What is the most challenging problem that’s been solved in Handsfree, so far (code links to any particularly interesting sections are welcomed)?

The plugin system! I’ve had to rewrite Handsfree.js from scratch 8 times to get this right. The plugin system was inspired by WordPress and frameworks like Vue and React, and it makes Handsfree.js so much easier to maintain.

My favorite thing about plugins is that anyone can create them without making Pull Requests, and anyone can use them without updating Handsfree.js! Then if you need extra specialized functionality you can npm install, link to, or just copy+paste the functionality!

The plugin system is what I’m most proud of, and I don’t think any other computer vision library supports plugins in this way. The really cool thing about plugins is that you can bulk enable/disable them by tag, for example: handsfree.enablePlugins(‘browser’) to use the browser hands-free...how cool is that!?

Are there any competitors or projects similar to Handsfree? If so, what were they lacking that made you consider building something new?

Handsfree.js has a Client Mode designed to incorporate its plugin system with any computer vision library, so I don’t like to think of them as competitors. Instead, here are some libraries Handsfree.js can augment:

  • ml5.js - This is the model for Handsfree.js, they have excellent documentation and a great community! They share many of the same models and an API to help you build your own

  • human.js - A newer library with a strong focus on human analysis, with more models than Handsfree.js currently

  • face-api.js and tracking.js - These are the OG’s in my opinion for multi-modal computer vision in the browser

What was the most surprising thing you learned while working on Handsfree?

It was very clear to me from the beginning that the project would become useful, but I didn’t realize how long it would take for me to convince other people 😅 It took me 3 years to get 500 stars and then only a few days to triple that!

How do you balance your work on open-source with your day job and other responsibilities?

Handsfree.js is my full time jam and has received significant support from:

  • The Studio for Creative Inquiry at Carnegie Mellon

  • Open Source Software Toolkits for the Arts (OSSTA)

  • The Clinic for Open Source Arts (COSA)

  • Glitch.com

  • Google PAIR

  • crowdfunding

  • and many others!

Do you think any of your projects do more harm than good?

Oh what an interesting question!

I believe that any open sourced project has the potential for doing harm and Handsfree.js is no exception. For example, imagine an ad company using the library to covertly, or even overtly, detect facial expressions to better serve ads. Or another company using it in a discriminatory way, like for job selection or as a requirement to access a service.

Another risk is that people deploy the library in risky environments without considering security (it is webcam driven after all), like using the library in HTTP sites instead of HTTPS or using it in a Browser Extension without considering which domain the webcam permission is given to (although I’ve published The Handsfree Browser Extension Starter Kit to address these issues).

What is the best way for a new developer to contribute to Handsfree?

The best way is to just start experimenting and creating things with it! Documenting, creating examples, writing tutorials, and even translating would be huge!

I think this is generally true of any open source project. The thing you use is typically only one part of a greater effort.

Where do you see software development in-general heading next?

This is the question that keeps me up at night!

First, I think that the mouse and keyboard will go out of style. I don’t think we will be using the mouse and keyboard much once AR becomes as ubiquitous as smart phones, and certainly not when Brain Computer Interfaces become practical.

Secondly, I think the future of software development will be much less about physically coding and more about verbally describing what you want.  I think it’ll be more about conversing with a computer. There was a mind blowing GPT-3 demo last year that gives you a peek into this future.

Of course at some point I believe we’ll have super-intelligence, and I think the term “software” will become fuzzy if not archaic.

How do you plan to monetize Handsfree?

My dream is to become a tech “influencer” but by influencing my own work. The cool thing about working on an open source project like this is that there are a lot of content opportunities that can be monetized, like YouTube tutorials and online courses.

Console

Dolt, NymphCast, and Repro

Sponsorship

If you, or someone you know, is interested in sponsoring the newsletter, please reach out at console.substack@gmail.com


Projects

modern-unix

Modern Unix is a collection of modern/faster/saner alternatives to common unix commands.

stars: 9467, watchers: 193, forks: 179, issues: 14

last commit: June 23, 2021, first commit: January 12, 2021

https://stackoverflow.com/users/8858995/ibraheem-ahmed

Dolt

Dolt is a SQL database that you can fork, clone, branch, merge, push and pull just like a git repository.

language: Go, stars: 9033, watchers: 97, forks: 231, issues: 294

last commit: June 25, 2021, first commit: June 03, 2015

https://twitter.com/dolthub

NymphCast

NymphCast is an open-source alternative to Chromecast you can run on a Raspberry Pi.

language: C++, stars: 1854, watchers: 69, forks: 65, issues: 15

last commit: June 08, 2021, first commit: February 08, 2019

https://twitter.com/MayaPosch

repro

repro allows you to automatically create and manage scripts from history.

language: Shell, stars: 26, watchers: 1, forks: 1, issues: 0

last commit: June 05, 2021, first commit: June 05, 2021

https://twitter.com/adamsidiali


Not subscribed to Console? Subscribe now to get a list of new open-source projects curated by an Amazon engineer in your email every week.

Already subscribed? Refer 10 friends to Console and we’ll donate $100 to an open-source project of your choice!

Share Console


An Interview With Zach Musgrave of Dolt

Hey Zach! Thanks for joining us! Let’s start with your background. Where have you worked in the past, where are you from, how did you learn how to program, what languages or frameworks do you like?

I learned to program when I was 12 or so on my Macintosh LCII, making point and click adventure games in Hypercard, basically through trial and error. I took my first programming class in college. Before joining DoltHub I spent 8 years at Amazon and then 5 at Google and learned a ton at both. I grew up in the Seattle area and went to school at the UW, which made me one of the only locals working at Amazon.

In terms of languages, I was a Java dev most of my career and have now switched to Golang. I will never program in Java again if I have the option of Golang. For scripting, I love Perl and hate Python. I can’t live without JetBrains IDEs.

What’s your most controversial programming opinion?

Code quality doesn’t actually matter. Customers don’t see code, they only see results. Engineers obsess over code quality because we like pretty code, but its relationship to product quality is very weak. All the processes people put in place to try to improve code quality are a huge tax on development with very mixed results. Google had incredibly restrictive policies around code quality that cost thousands of engineering years to enforce annually, and yet it was trivial to find terrible code in the repository. And Google can’t ship software to save its life.

I’m not saying code quality isn’t a good thing, or that bad code quality doesn’t make it harder to ship software. Both these things are true. But it doesn’t make your product good. As an industry we invest in code quality far past the point of diminishing returns, and still end up with bad code.

What are you currently learning?

Related to Dolt I’ve been learning the intricacies of MySQL’s transaction model so that Dolt can reproduce it bug for bug. The interesting thing is that MySQL does a lot of locking to guarantee consistency among clients, while Dolt does no locking and tries to sort out everything via merge at commit time.

Outside work, I’m learning about dog training because we’re getting a family dog soon, and learning to play some classic rock songs on the guitar. I kind of only know how to play mid 2000s indie rock songs and it’s a little obnoxious.

What have you been listening to lately?

I know when I’m working really hard because I go to the same few instrumental albums and listen to them on repeat. Today it’s Archipelago by Hidden Orchestra. Also all of El Ten Eleven’s oeuvre.

How do you separate good project ideas from bad ones?

It’s hard. It’s pretty easy to identify bad technical ideas, but hard to know a bad product idea.

For Dolt, our job is somewhat easier, because we’ve chosen to copy two separate products: MySQL for the database side, and Git for the versioning side. A lot of our project planning is simply deciding in what order to build things. We know we’re eventually going to build 100% of each.

Why was Dolt started?

Dolt was originally intended to make sharing data on the internet easier. It’s hard today because people either mail CSV files around, or rely on APIs, making it really hard to collaborate effectively. We thought if we could get people to think of their data the same way they think of their source code, and use tools that encourage collaboration like Git does, then we could bootstrap a data sharing community. That’s why we wrote Dolt.

We still believe in this vision, but we think it’s going to take a long time. We have to convince people that version control for data matters, and that Dolt is the right way to do it. It took Git over 5 years to reach a critical mass, and people were already sold on the concept of version control for source code. So we’re going to have to be patient. In the meantime, our customers want to use Dolt as an application database server. So we now see Dolt becoming successful first as an application database, and then as a means of publishing and sharing data second, probably much later.

It seems like a lot of your customers would be ML projects.  By "critical mass", do you mean even outside ML use cases, or, are you even having difficulty making in-roads there?

ML is one of our biggest use cases, and we’re getting good traction in the ML space. Those customers want to version their training data and model outputs so their workflows are reproducible. But what they’re not doing is sharing their data. That’s what we think will take a long time to materialize.

We’re trying to bootstrap data sharing and collaboration by paying people to do it via our data bounties program. We’ve paid out around $50k in bounty money so far and gotten some really great open datasets produced for that money. But we just think it’s going to take a long time for this idea to catch on, and we’re OK with that.

Who, or what was the biggest inspiration for Dolt?

Definitely Git. We named Dolt to pay homage to how Linus Torvalds named Git.

Torvalds sarcastically quipped about the name git (which means "unpleasant person" in British English slang): "I'm an egotistical bastard, and I name all my projects after myself. First 'Linux', now 'git'."

We wanted a word meaning "idiot", starting with D for Data, short enough to type on the command line, and not taken in the standard command line lexicon. So, Dolt.

Dolt’s command line copies Git’s exactly, so if you know how to use Git you know how to use Dolt.

Are there any overarching goals of Dolt that drive design or implementation?

The obvious answer here is that Dolt storage has to be a commit graph. Without this, it’s not possible to implement branching and merging the way Git does, which was our most important design goal. This requirement drives every other technical decision in the product.

From the other direction, we want to be a 100% compatible, drop-in MySQL replacement, so that if you have a MySQL based application, you can port it to Dolt by just changing the connection string. The SQL layer is built mostly on top of the storage layer, but some of these requirements do find their way all the way down to the bottom layer.

What would the syntax for something like branching look like in SQL?

To get versioning features off the command line and into SQL, we introduce a bunch of custom SQL functions and system tables. E.g. you can examine the diff on a table named myTable with this query:

SELECT * FROM dolt_diff_myTable where to_commit IS NULL;

To switch to a different branch you can set some special session variables, use a different connection string, or use a special SQL function:

SELECT DOLT_CHECKOUT(‘-b’, ‘myBranch’);

And you can always query any revision of a table with the AS OF syntax:

SELECT * FROM myTable AS OF ‘feature-branch’;

We have a documentation site that covers all of this in depth.

What is the most challenging problem that’s been solved in Dolt, so far?

The most challenging technical aspect of Dolt is probably the storage format itself. It uses a novel data structure called a ProllyTree to get structural sharing across revisions, so that you can keep multiple versions of the data around without blowing up storage costs. It also makes diff and merge performant. We’ve published a bunch of technical articles about it, e.g.:

https://www.dolthub.com/blog/2020-06-16-efficient-diff-on-prolly-trees/

Are there any competitors or projects similar to Dolt? If so, what were they lacking that made you consider building something new?

Dolt is the only SQL database that you can branch and merge, fork and clone, and nobody else is building a direct competitor right now.

A lot of products call themselves “Git for data,” but they’re not, not really. What they mean is that they’re a data product that has some version control features. But most of them version only the schema of a database, not the actual data, and the rest can’t branch or merge the data, or even diff two revisions. The exception here is TerminusDB, which does branch and merge. But it’s a graph database, not SQL.

We wrote a roundup on all the products calling themselves “Git for data” over a year ago, and it hasn’t changed much.

https://www.dolthub.com/blog/2020-03-06-so-you-want-git-for-data/

What was the most surprising thing you learned while working on Dolt?

I love Golang, but I was very surprised to learn that a panic in any goroutine (of which there can be thousands) will kill your entire process. Not just the thread that panicked, but the entire process. There’s no top-level mechanism to ensure that you can catch these, no way to install a global panic handler at the program’s entry point. You have to handle any possible source of panic individually not only in your code, but in all the third-party libraries you use. It’s a serious problem with the language runtime I hope they address.

Why was Go chosen for the Dolt implementation?

Golang is a great language and we’re generally very happy with it, but we chose it for practical business reasons.

We built Dolt on top of a fork of an open source graph database called noms, written in golang. Noms implements the ProllyTree data storage and commit graph, and we built Dolt’s table and schema semantics on top of that. Building on top of noms saved us at least a year of engineering work and let us get to market much faster.

Picking golang also enabled us to adopt the go-mysql-server project, also written in pure golang, to build our SQL engine implementation. We’re really fortunate to have found these two great golang projects lying around for us to extend.

What is your typical approach to debugging issues filed in the Dolt repo?

My favorite way to debug an issue is by pure vibes, where I let my intuition guide me to where I just know the source of the problem must be even if I can’t explain why. Feels good man.

But often I have no idea what the problem is, which means I get a repro set up and put some breakpoints in GoLand to see what’s going on.

What motivates you to continue contributing to Dolt?

Dolt is my full-time job and I have equity in the company, so it would be really nice for it to succeed and make me rich. But beyond that, contributing to Dolt is satisfying because we ship all the time. We go from whiteboard discussion to feature launch in a couple weeks. When customers find a bug or ask for a feature, we usually get them a release the same week. Working at a place that values moving fast and putting tangible results in customers’ hands on a continual basis feels great.

What is the release process like for Dolt?

We have a great release engineer, Dustin Brown, who has automated everything for us. We do continuous integration and deployment with thousands of automated tests and performance benchmarks on every PR. Cutting a release is as simple as clicking a button on GitHub. I wrote a janky perl script to generate release notes including changes in dependent projects because none of the other release note generators I could find had this feature.

How are you currently monetizing Dolt?

We make money by selling support contracts to companies using Dolt as their application database, similar to other database companies. We also make money using the same private repository model that GitHub does, where people using DoltHub can pay us $50 a month to get private repositories. Eventually we’ll sell database server hosting as well.

What is the best way for a new developer to contribute to Dolt?

The best way to contribute to Dolt is to start using it, and find out what Git or MySQL features it doesn’t have that you need. Then start implementing them! We want them all, and we’re implementing them in the order people ask (paying customers first). There are a ton of things left to implement, and most of them aren’t hard, just not urgent for us.

If you plan to continue developing Dolt, where do you see the project heading next?

Dolt is headed in several exciting directions next. 

One big push is making Dolt as performant as MySQL for the OLTP use case. Right now we’re about 4-8x slower on average, depending on the query. Then we need to benchmark and improve our numbers on concurrent transactions. Lots of work to do there, but we’re confident we can pull even with MySQL on performance in the next year.

Our big planned feature launch is a hosted solution for Dolt databases. The idea is if you are using DoltHub as your remote, then you can click a button on DoltHub to spin up a VM running your database as a server, and we give you the connection string to it. Then whenever you push to DoltHub, your running database gets updated with the data you just pushed.

We’re also going to build a cloud-native version of Dolt that can scale to any amount of data (petabytes) and separates storage from processing, like data warehouses do.

A lot of people have been moving in this direction lately, but it's never occurred to me to ask about technical details related to this.  I'm imagining the use of Kubernetes to achieve this, or, will you use something else?

Right, we’re going to deploy a Dolt container into Kubernetes for every hosted server. Our whole stack is deployed on Kubernetes, so we already have the infrastructure to make this happen.

What are the current technical challenges you're having with this now, and do you see any more arising in the future?

The technical challenge there is managing fleets of these server containers at scale and having sufficient monitoring and automation to keep them alive and responsive. Kubernetes helps a lot there, but it’s not magic. You still have to do the work.

Another technical challenge is implementing hosted read replicas, which people are going to want for performance. We’re still designing how those will work with Dolt, and we have a bunch of ideas.

Loading more posts…