Console 49

Coqui, Biff, and Flashlight

Sponsorships

SafeBase

SafeBase helps companies organize and share their security program information to close deals faster.  Their security status page stores security program information for easy sharing and provides insights about the security review process to help you streamline your workflow. Companies like Crossbeam and ReadMe are using SafeBase to automate their interactions with prospective customers during the security review process. Save time and reduce complexity with SafeBase today.


PhoneInfoga

PhoneInfoga is an information gathering & OSINT framework for phone numbers.

language: Go, stars: 4276, watchers: 389, forks: 1264, issues: 20

last commit: April 14, 2021, first commit: October 25, 2018

https://twitter.com/sundowndev

TTS

TTS is a deep learning toolkit for Text-to-Speech, battle-tested in research and production.

language: Python, stars: 1146, watchers: 41, forks: 64, issues: 14

last commit: April 16, 2021, first commit: January 22, 2018

https://twitter.com/coqui_ai

Flashlight

Flashlight is a fast, flexible machine learning library written entirely in C++ from the Facebook AI Research Speech team and the creators of Torch and Deep Speech.

language: C++, stars: 2077, watchers: 87, forks: 248, issues: 57

last commit: April 16, 2021, first commit: December 21, 2018

https://twitter.com/facebookai

Biff

Biff is a web framework + self-hosted deployment solution for Clojure.

language: Clojure, stars: 190, watchers: 10, forks: 6, issues: 37

last commit: March 05, 2021, first commit: March 27, 2020

https://twitter.com/obryant666


An Interview With Jacob O’Bryant of Biff

Hey Jacob! Let’s start with your background. Where have you worked in the past, where are you from, how did you learn how to program, what languages or frameworks do you like?

I was first introduced to programming (Visual Basic 6) by my dad when I was 10 or 11, and I got into Linux when I was 15 which accelerated things. I finished my CS degree about 3 years ago, worked for a year at Lucid Software and then quit to become a full-time startup founder. I work on recommender systems, applications that try to introduce you to new things you might like. More broadly I’m interested in the field of information discovery (or at least that’s what I’ve started calling it), which (I’ve decided) encompasses recommender systems, search engines, social networks, etc.

What's an opinion you have that most people don't agree with?

I think a lot of people give algorithms too much credit for bad things in society. For example, filter bubbles are a popular topic, even though the empirical evidence indicates they’re not much of an issue. But it makes a good story, so the idea persists.

What are you currently learning?

How on earth to get a startup off the ground, especially from the perspective of a solo technical founder. I think I’m almost there, and objectively, the numbers are improving. But along the way I have discovered some unexpected downsides of doing a startup (besides just “it’s really hard,” which was expected).

I often feel like society just doesn’t have a good place for me, and being a founder is simply the least bad option. One of my dreams is to create some kind of organization for people like that--I’m envisioning something like a mix between Y Combinator and grad school.

What would you have considered an expected downside, and what do you consider an unexpected downside?

The most common reason people give for why they think you shouldn't do a startup is roughly "it's really hard and it probably won't work." However, if you're ambitious, that's a positive signal--better that than doing something easy with guaranteed success, know what I mean?

However, being hard doesn't necessarily mean it's worth your time, and that was basically the surprise: being a founder is not necessarily a high leverage activity, even if you think of yourself as being smart, ambitious, etc. If Linus Torvalds had been obsessed with starting a company, would we have Linux?

The question is, do you spend your time chasing customers and figuring out what's providing value to them (a la Paul Graham), or do you instead focus on interesting ideas (a la Linus)?

For customers/users vs. interesting ideas, it's a mix. I am trying to build a business, and I want to do that by working on things I'm interested in (otherwise what's the point?). Feedback from people influences but doesn't drive my decisions. I make most important decisions with intuition. When I started being a founder, I wish I had focused entirely on interesting ideas and then switched to "business mode" only after I found something that gained some traction.

Psychological pressure is an issue. Quitting your job feels amazing for the first few months. After that, I constantly felt weighed down by "how in the world will I ever make any money doing this, what if I have to go back to square one and get a job again," which is incredibly distracting. Maybe I should've just told myself "I've got one year to work on whatever I want without trying to build a startup, and then I'll decide where to go from there." Alternatively, maybe part-time work is the answer. But that has downsides too.

I've written a few relevant essays on this topic which you can read here: The trade-offs of being a startup founderWhat I wish I could've done instead of college

Why was Biff started?

Biff was the result of spending a year and a half trying to figure out the best way for me to do web dev in Clojure. Clojure doesn’t have a default web framework like Rails or Django; rather you pick and choose various libraries and effectively build your own framework. I think that’s a fine way to do things, but I also think if I’ve gone through all the effort of making my own framework, I might as well polish it and write some documentation so other people have the option of reusing my work.

Where did the name Biff come from?

I spent approximately ten seconds on it. I wanted something short because I knew I would be typing it a lot. I thought of the punching sounds that Batman makes in the old TV shows, like “wham,” “pow,” and (I guess) “biff.” I retroactively attribute it to Biff Tannen from Back to the Future. I like the humorous unpretentiousness of it.

Are there any overarching goals of Biff that drive design or implementation?

Biff is designed for myself, a solo, early-stage startup founder. Everything flows from that. For example, speed in the early stages of a project is important; I want to be able to try out new ideas quickly. Thus Biff is end-to-end, it covers back end, front end and deployment. Scale on the other hand is not an immediate concern, so I haven’t optimized for that--though I don’t want to have to throw everything out if (when!) my startup takes off, so I try to avoid doing anything naive.

What trade-offs have been made in Biff as a consequence of these goals?

An important part of Clojure culture is the idea of “simple” vs. “easy,” which have specific, separate definitions in this context. Simple means “untangled,” the opposite of complex, where you have a mess of things all tied together. As a system grows, complexities tend to multiply, so in the long-term, one of the highest ROI things you can do is to keep things simple. It’s like how in calculus you want to simplify a function as much as possible before taking the derivative. This often has trade-offs with easiness: the more you try to automate something, the more opportunities there are to get it wrong and introduce complexity. “Gem install hairball,” as Rich Hickey once said at Rails conf.

However, ease (or “speed at the beginning,” as I think of it) is also really important in certain contexts, like startups. And this gets at the crux of Biff: how do you design a system that’s both simple and easy? I don’t always get it right; I’ve often accidentally introduced complexity as I’ve pursued ease. So every now and then I have to back-track and fix my mistakes.

Is Biff intended to eventually be monetized if it isn’t monetized already?

Plan A is for my startup to take off, in which case I’ll do the opposite of monetizing Biff: I’ll pay people to work on it. Plan B is consulting. If I could make a living using Biff to write web apps for people, that would be a pretty sweet gig. In particular, it might work to help SMBs who want to try Clojure/functional programming for a greenfield project but are wary about going off the beaten path. (If there are any Console readers in that position, feel free to email me. I've been thinking about doing this part-time soon).

If you plan to continue developing Biff, where do you see the project heading next?

I'd like to improve the deployment story. Right now the Biff project template includes some Terraform and Packer config for DigitalOcean, along with a handful of Bash scripts for provisioning the server image. But I've been exploring more deployment options lately, and I think a better path is Dokku for small/hobby projects and then one of the various Heroku alternatives for larger projects (in particular, Render, DO's App Platform, or Porter). To accommodate that, I've been refactoring Biff to make it 12-factor compliant. That'll also make it easier to deploy using any other method besides whichever ones I provide documentation for.

Where do you see software development in-general heading next?

Here’s where I hope it goes. Right now on the web, data is tightly coupled to software. Among other things, that harms extensibility. I think each individual should have their own database with some kind of access layer, and then all the web apps they use plug into that database somehow. Then it would be much easier to have many, small, interchangeable web apps (“do one thing and do it well”) rather than these huge platforms that you can’t really tinker with. There are a bunch of different initiatives that are aiming for similar things (like Solid), but usually privacy and data ownership are the main motivations, not extensibility.

(There’s an interesting analogy to object-oriented vs. functional programming here: “better to have 100 apps that operate on one database than 10 apps that operate on 10 databases”).

This idea has been slowly evolving in my head for the past four years now, and I finally started writing a prototype for something in this space last week. So maybe I’ll have a release for that out soon.

This sounds similar to urbit’s vision. Have you heard of them, if so, do you have any thoughts on the project?

I have, though I haven't investigated it deeply. There are quite a few projects aiming for something like this actually. To be frank, I think many of them suffer from the "architecture astronaut" problem (though I shan't name any names). I'm trying to go for something that's very small, pragmatic and useful, with a focus on adoption. There won't be much point if no one actually uses it.

Where do you see open-source heading next?

I think self-hosted web apps becoming prevalent is a really interesting possibility. I think Replit will drive this: you can run your own instance of someone else’s app in about two clicks. That’s huge for adoption. Imagine this combined with that last idea: we could have a whole ecosystem of open-source web apps that all operate on your own database. Open-source strikes back.

I keep hearing about Replit but haven’t looked into it deeply enough myself. How did you hear about it and what make you dive so deeply into it?

I heard about it from Paul Graham's Twitter feed a couple years ago; he was tweeting about them heavily at the time (also Triplebyte). I'm a big fan of making things easy, and I've long been into Linux system administration/dev ops, so it seemed like an exciting possibility. I played with it a bit off and on. I started using it more recently since they added "always on", which lets you make web services that don't go to sleep. Currently I find it's great for making one-off web services in Python.


Like what you saw here? Why not share it!

Share

Or, better yet, share Console!

Share Console

Also, don’t forget to subscribe to get a list of new open-source projects curated by an Amazon engineer directly in your email every week.

Console 48

GUN, DVC, and Tinyland

Sponsorships

Important, Not Important

In 1986, Richard Hammer, titan of American mathematics and pillar of Bell Labs, wandered over to the chemistry table in the lunch room and asked: “What are the most important problems in your field?”

The next week he asked, “What important problems are you working on? 

And the next week he asked, “If what you are doing is not important, and if you don’t think it is going to lead to something important, why are you at Bell Labs working on it?”

There’s never been a better time to work on the world’s hardest problems, whether you’re an open-source software engineer, an epidemiologist, or a wind turbine technician.

And that’s where the Important, Not Important newsletter comes in. 

We curate the most vital science news of the week, add contextual analysis, and give you data-driven Action Steps you can take to feel better, and drive transformational change.

From climate to clean energy, AI to antibiotics, food & water to whatever other fresh hell could either kill us all, or turn this place into Star Trek, we’ve got it covered.

Once a week. 10 minutes or less. Get it for free at importantnotnotimportant.com


GUN

GUN is an open source cybersecurity protocol for syncing decentralized graph data.

language: JavaScript, stars: 12750, watchers: 317, forks: 833, issues: 216

last commit: April 06, 2021, first commit: March 17, 2014

https://twitter.com/marknadal

DVC

DVC is git for data and ML models.

language: Python, stars: 7684, watchers: 124, forks: 729, issues: 561

last commit: April 09, 2021, first commit: March 04, 2017

https://twitter.com/rkuprieiev

Tinyland

Tinyland is a very small version of the Bret Victor Dynamicland project.

language: Python, stars: 65, watchers: 7, forks: 8, issues: 5

last commit: October 15, 2019, first commit: September 27, 2019

https://twitter.com/emmazsmith

Wistalk

Wistalk allows you to analyze a Wikipedia user’s activity.

language: Python, stars: 7, watchers: 3, forks: 1, issues: 0

last commit: March 29, 2021, first commit: March 20, 2021

https://twitter.com/altilunium


An Interview With Mark Nadal of GUN

Hey Mark! Let’s start with your background. How did you learn how to program and what languages or frameworks do you like?

I’m a mathematician at heart, rebel in practice - so I dropped out of academics into what I thought was math’s cousin, programming. Lo and behold, there is rarely anything pure or logical about programming. But it was too late, code changes the world, not theories, so I keep trying to build the ideal with broken strings. My belief is to choose the most open and user facing technology tools, so while I do not like them, browser compatible javascript is the sword for my revolution.

Who or what are your biggest influences as a developer?

Claude Shannon and Feynman. They built certainty out of uncertainty, that’s marvelous. Don’t get too hung up on trends, build tools that will make being human move from tolerable to marvelous.

What's an opinion you have that most people don't agree with?

Most of my opinions. I’ll only mention one that will make people hate me rather than scorn me. “If your variable names are longer than a mathematical symbol, then your code will become bit rot.”

What’s your most controversial programming opinion?

I got a lot of flak for this one.

What is one app on your phone that you can’t live without that you think others should know about?

I hate apps, so I disagree with the question. Some of the few I do have though are SMS Backup+, SwiftKey, and Video Collage. A life saving tip tho is this hack to reduce robocall spam

If you could dictate that everyone in the world should read one book, what would it be?

I don’t read books. Write them.

If you could teach every 12 year old in the world one thing, what would it be and why?

First principle reasoning. We need more people who can independently invent without being seduced by head-nodding expertism.

If I gave you $10 million to invest in one thing right now, where would you put it?

Other than my own projects? Very few things in tech excite me, but Rik Arends’ Makepad, HARC’s Ohm, various AI projects, etc. are important for software to improve. For returns, obviously Elon Musk and Bitcoin.

What are you currently learning?

When I have time? Rust and writing my own AI trained physics engine.

What resources do you use to stay up to date on software engineering?

If you have to stay up to date on something, that’s an indication it will go out of fad as fast as it came in.

Do you listen to music while coding? If so, what have you been listening to lately?

No. But anything retrowave or 331Erock would do.

How do you separate good project ideas from bad ones?

Whether they reduce complexity in the world or add to it.

Why was GUN started?

I had written a driver for MongoDB in 2010 but then later needed a graph database, so I started writing a driver for Neo4j in 2013 but was terrified to discover everything was serialized as a table with duplication. It would take weeks to implement, not support real-time updates, and would be locked into a bad master-slave ideology. So I had a choice, spend that time implementing yet another database driver, or use that time building my dream database? 6 years later, we did a quarter billion downloads in 2020 alone.

Does GUN stand for anything?  If so, what?

Graph. Universe. Node.

Who, or what was the biggest inspiration for GUN?

Kinda like in the Silicon Valley show, my servers caught fire when my previous company was highlighted in the Wall Street Journal. Waking up to 3am pager calls is no fun, especially only to find that everything is working except the database. As a mathematician I had worked on plenty of applied distributed systems problems, so it was sad to see all the major databases at the time being designed around centralized solutions. It just didn’t make sense, master-slave systems are not as scalable or resilient, and a database is exactly the sort of thing you want to be fault tolerant. So I wanted to make the future be a little more egalitarian. And while many disagree with this analogy, historically the best way to prevent slavery has been to empower every man and woman with the ability to defend themselves - an alternative acronym for GUN is “Governed Under None”.

What was your previous company?

My previous company was a collaborative web design tool that used video game controls. I have a few videos and demos here.

Are there any overarching goals of GUN that drive design or implementation? If so, what tradeoffs have been made in GUN as a consequence of these goals?

GUN is what is considered a “highly available, strongly eventually consistent” database which means you should not use it for banking, according to the CAP Theorem. If you’d like to know more about this subject, check out our cartoon explainer at https://gun.eco/distributed/matters.html . The goal of GUN is to be maximally efficient with as minimal code or dependencies as possible, yet still be robust enough to survive nuclear apocalypses. A simple way to think about that first part is to imagine if billions of people were to use a Ready Player One like metaverse - I won’t stop improving GUN until it can handle that type of performance and scale. The implications of this from an engineering standpoint is that I’m trying to model and replicate the same algorithms that nature and reality itself uses for transmitting information. The second part is about creating a fault tolerant system that is not only easy for developers to build apps with, but also continue working indefinitely without any maintenance even in the face of external catastrophes. This is important from a human perspective, not an engineering one, because it allows people to have ownership over their own data and memories, in perpetuity, without being locked into somebody else’s platform. It is also a perfectly reasonable goal too, the pyramids of Egypt have stood for thousands of years built with materials that erode, yet the logic of Pythagoras and Euclid has not decayed by a single granule over millennia. So why shouldn’t apps, built with platonic algorithms, outlast the silicon sand they run on?

This seems to be a very similar design goal to Bitcoin.  Were you at all influenced by those design goals?

No. But they're good design principles, it isn't hard to find a lot of p2p/decentralized people prior/after Bitcoin.

What is the most challenging problem that’s been solved in GUN, so far?

The entire protocol can be re-implemented in as short as 30 minutes, like I did on stage in this talk https://gun.eco/docs/Porting-GUN . All the database stuff turns out to be surprisingly easy. But the thing that 7 years later still gives me a hassle, headache, and nauseous pain? The hardest thing to implement is an easy, elegant API. Developers in the community swear by the API, gushing about how magically simple and powerful it is. But this is pure illusion by abstracting complexity, and this is what a good API does, it is deceptive. Solving timeless distributed systems problems is a piece of pancake compared to making those same solutions be appetizing to use.

What is your typical approach to debugging issues filed in the GUN repo?

First replicate it, then use this secret debugging technique the whole world should know about. Write a wrapper around your println/console.log that only logs a message if it matches a particular sequence number. Now put “1, 2, 3…” log statements into where you think the code flow must go to hit the bug. Even if the same function is called 20 times, the log will only print for the isolated pass relevant to the bug - or is not called at all (or with the data you expected), which incrementally indicates where your mental model of the code deviates from the actual call paths, thus probably the bug (or closer to it, ghosthunter!). It is hard to explain this process in text, if anybody wants to record a screencast of it with me, just ping me.

This isn't anywhere in the GUN code as a utility is it? 

This shouldn't be in the source code, it will be deleted when I stop debugging haha  and then `console.only(1, 'here')` ... `console.only(2, 'next?")` ... and so on. Somebody in my community also wrote a tracer for it too, that generates flow charts for a particular command. But this is all very hacky, by hand, not something easily/generally usable for others yet.

What an interesting technique!  I feel this would be born of necessity while debugging distributed systems bugs since they're inherently more difficult to debug than non-distributed systems.

Actually no, this is just for single-thread/process code flow, we built panic-server for simulating distributed systems correctness & load testing. 

What is the release process like for GUN?

We strictly adhere to the religious diet of “Do I feel like it? OK.” release schedule. I openly admit I’m wrong about this.

Is GUN intended to eventually be monetized?

I strongly disagree with “open-core” (crippleware) and other licenses, especially (A)GPL, etc. and am committed to MIT/Zlib/Apache2/BSD style licenses. Open source is about creating value for everyone. This means I will never be able to monetize GUN directly, and that is a good thing - it keeps it transparent, accountable, and free of adverse incentives or scammy schemes. If a tool is truly valuable, it will make building other services that can be monetized be exponentially easier to create and maintain. So someday I’ll charge (or more specifically be the payment processing layer) for a DBaaS service like Firebase, but federated, for anyone who is too lazy to 1-click deploy their own self-hosted solution.

Any idea when “someday” will come?

6 months in my dreams. So... 6 years by over-estimating engineering standards? :P My hard deadline is 2.5 years. Hopefully half that.

How do you balance your work on open-source with your day job and other responsibilities?

I’m blessed with the opportunity to work on this full time, thanks to donors and investors who believe in long term value creation and who are dedicated to fixing the mess that has taken over the internet. This also means, after a year like 2020, making sure you have the mental health to ground you for such challenges. At the end of the day, it means doing the right thing that will make you last and survive, not necessarily what is trendy or sexy.

Do you think any of your projects do more harm than good?

The technology I create is apolitical, though it, even its name, is deeply saturated in ideology. The name reminds good people to use something with care and caution - a database is a loaded gun when it contains something as sensitive as users’ data, don’t act or tread upon it lightly. Bad people will figure out how to destroy things, regardless of the tool or its warnings. It deeply saddens me that there is no doubt people who have used it for hateful things. My belief though, is that good will outgrow the bad, especially if technology and systems are designed to prevent monopoly control. And through and through, the essence of GUN is that it is code that enables techno-egalitarianism - no masters, no slaves, no monopolies, no dictators, no tyrants. It gives everyone the power, even if some are terrified that such power exists in the first place.

What is the best way for a new developer to contribute to GUN?

Jump into our chat room and start asking questions, complaining about bugs or problems, and share what ideas and projects you are passionate about! Being human is the best way to start. :) http://chat.gun.eco/

If you plan to continue developing GUN, where do you see the project heading next?

We crossed 40 million downloads in one month, and now I’m desperately trying to get the next version out and work to scale up to 100M monthly users.

What motivates you to continue contributing to GUN?

If I was passionate about something, that could change and the project would be dropped. But if I encounter a pain point, a problem so overwhelming that I cannot proceed or make progress, then I’m forced to keep working on it regardless of whether I’m passionate or not. GUN is not what I want to be doing, but it is the tunnel that must be bored through an insurmountable problem, that will let me achieve the dreams I actually seek.

Have you always been this intellectually fearless, or was there some kind of inflection point in your life that lead to this?

Why thank you. Both? It took me till last year to realize that most people don't understand anything I say because communication is primarily done through tribe signaling, not first principles explanations. So in one sense, I've really screwed it all up too, because I didn't know I was triggering. Oops.

Well, so, maybe you've fixed your communication style such that it's hitting more closely to home for people, but, I'm asking a more fundamental question. Not everyone wakes up and thinks "the world should be more egalitarian and I'm going to work on that". It isn't enough to do that technically, but you also need to change the zeitgeist of the population, which is non-trivial. So you have to be somewhat fearless to wake up and do that every day, when, as a skilled developer, you could go get a job at a mega-corp and enrich yourself from the moat they've created.

Good point. Thank you. It's cause and effect, I'm not a developer because I was looking for a job, the opposite, I'm a developer because I was looking for the highest impact but lowest friction effort. The best way to change minds is to create tools they use regardless of whether they ideologically agree with them or not. Humans feel the urge to fulfill the purpose of what is in their hands, so if you can put more hammers than swords in them, more paint brushes than guns, you may get a more creative society. This does not override your agency, it's just a matter of directing probabilities. Let economics work at scale, and you'll see specializations develop, because at the end of the day, we're human, we love to play, and having a sculpt in our hands begs us to poke and prod at the universe, to discover how we can mold it. Interaction is our way to dance with reality and communicate beyond the now, but across time and age. It shows animals we can face the void, and have created monuments. We were here for more than just our paychecks.


Like what you saw here? Why not share it!

Share

Or, better yet, share Console!

Share Console

Also, don’t forget to subscribe to get a list of new open-source projects curated by an Amazon software engineer directly in your email every week

Console 47

OrbitDB, Infracost, and Deep Daze

Sponsorships

Day One Fellowship Cohort 3

Starting a business is hard, lonely, confusing, and too many founders don't make it through the early stages or have the foundations to succeed.

Enter: The Day One Fellowship, a community and 10-week cohort based program where founders develop startup ideas, validate market interest, and build products and services that customers actually want. With Day One you receive 1:1 coaching and accountability from experienced startup mentors, along with a tight-knit peer group and a network of hundreds of founders, mentors and investors.

Learn more and apply now for Cohort 3, which kicks off April 25th.


OrbitDB

OrbitDB is a peer-to-peer database for the decentralized web.

language: JavaScript, stars: 5517, watchers: 157, forks: 371, issues: 121

last commit: March 20, 2021, first commit: December 26, 2015

https://twitter.com/orbit_db

brotab

brotab allows you to control your browser’s tabs from the command line.

language: Python, stars: 139, watchers: 9, forks: 13, issues: 23

last commit: June 02, 2020, first commit: August 10, 2017

infracost

Infracost allows you to get cloud cost estimates for Terraform in your CLI and pull requests.

language: Go, stars: 2759, watchers: 39, forks: 106, issues: 70

last commit: April 01, 2021, first commit: June 19, 2020

https://twitter.com/aliscott

deep-daze

Deep Daze is a simple command line tool for text to image generation using OpenAI’s CLIP and Siren.

language: Python, stars: 2872, watchers: 62, forks: 180, issues: 41

last commit: April 02, 2021, first commit: January 17, 2021

https://twitter.com/lucidrains

https://twitter.com/advadnoun


Help Wanted

If you’re interested in posting a help wanted ad for your project to thousands of open-source developers, send an email to console.substack@gmail.com


An Interview With Mark of OrbitDB

Hey Mark! Let’s start with your background. How did you learn how to program and what languages or frameworks do you like?

I’m a musician by education, and now 15 years later I’m a self-taught programmer and engineering manager by trade. Luckily for me, my father slammed K&R’s book on C in front of me because I was playing too many video games in middle school.

I like JavaScript a lot, and lately I’ve become very in love with Rust. I also like anything adjacent to the “permaweb” class of software like IPFS, libp2p, and of course OrbitDB.

I'm guessing by the Nuno Bettencourt comment below that your primary instrument is Guitar?

Haha yup. Classical guitar in school. Ended up with a bachelor of music performance with a dual emphasis in music business and sound recording technology.

It seems to be fairly common for musicians to transition to computer science. The similarities between the two domains has always been interesting to me. Why do you think it is that musicians so frequently make the transition into programming?

The lack of music jobs 😂

I wonder why we all looked around and thought "alright, if not this, then programming", though?

Yeah I'm not sure, I came up through Web design so I think it was a calculus of "well it's still sorta creative, and pays the rent...”. But, you could also go into things like pattern recognition and structured discrete mathematics... But, that's above my pay grade.

or, even carpentry, or hospitality, right? It doesn't even have to be something intellectual that you decide to do for money. It just seems like an overwhelming majority choose programming.

It's very true. There was something hanging in the music building at school about how the guy that designed part of the original Mac was a programmer and he designed it specifically for some pre-MIDI sequencing / composition. So, maybe this meme is older than us.

I guess it will forever be a mystery. Anyway, back to software, who or what are your biggest influences as a developer?

I have a very close friend that is always my go-to for side projects, Jordyn Bonds aka @skybondsor. We have a unique balance that’s hard to find and we always end up creating great stuff, like TallyLab.

Lately, I’m just amazed by the team I get to work with at Equilibrium - they’re miracle workers and better at software engineering than I ever was. We’re hiring, by the way!

What’s your most controversial programming opinion?

Plain HTML and CSS are just fine, 99% of the time :)

What is one app on your phone that you can’t live without that you think others should know about?

Metamask

If you could dictate that everyone in the world should read one book, what would it be?

Finite and Infinite Games.

If you could teach every 12 year old in the world one thing, what would it be and why?

This came to me from two people - Nuno Bettencourt’s dad (yes, the guitarist from Extreme), and an artist named Jon Sarkin: Just start making stuff. Don’t worry if it’s gonna make money, don’t worry if it’s the “right way.” Make it, fix it, get better, repeat. Everything else is a by-product of that.

If I gave you $10 million to invest in one thing right now, where would you put it?

Non-Fungible Tokens

What are you currently learning?

How to be a better person to work with, every day.

What resources do you use to stay up to date on software engineering?

Our #random channel at Equilibrium does the trick right now. Did I mention we’re hiring?

How do you separate good project ideas from bad ones?

I’m a person with ADHD so I’ve resigned myself to the idea that if I don’t remember an idea I had, it was probably not a keeper.

Why was OrbitDB started?

OrbitDB was started by Samuli aka @haadcode to solve the problem of performant, mutable data on IPFS.

Who, or what was the biggest inspiration for OrbitDB?

If necessity is the mother of all invention, what is the father?

Are there any overarching goals of OrbitDB that drive design or implementation? If so, what trade-offs have been made in OrbitDB as a consequence of these goals?

Right now we’re working towards a 1.0 release. This means aggressive performance refactoring, benchmarking, testing, and more testing. OrbitDB works. It works well. There are still things it can do better and that’s what we’re tackling right now.

The trade-off here is that to do it well, we need dedicated resources. To get dedicated resources, we need funding. To get funding in the space right now you’re looking at development grants and corporate sponsorship. Things are in the works and we’re right at the starting line.

What is the most challenging problem that’s been solved in OrbitDB, so far?

OrbitDB’s “magic” is its implementation of the Conflict-free Replicated Data Type (CRDT). Samuli would be able to tell you more about this, but CRDTs are a fantastic data structure that use logical clocks for consensus, giving strong eventual consistency without the use of a Blockchain.

You can’t protect against, say, double-spend with OrbitDB, but you can do most anything else.

Can you point us to where in the OrbitDB code the CRDTs are being used?

Sure, it's all in this package here: https://github.com/orbitdb/ipfs-log. Basically, it creates a blockchain-like linked list of entries that refer backwards by hash. Each entry has something called a "Lamport Clock" which contains a node ID (usually your public key) and a logical clock value like 1, 2, 3, 4. After you retrieve all the entries you sort by the clock values, then by Node ID. That gives you total ordering.

Do you think there was a particular insight Samuli had with respect to how he used CRDTs that allowed for this performant mutable data on IPFS?

I can't speak for Samuli but I believe that it came out of necessity and the idea of the CRDT was formalized a bit later.

What was the most surprising thing you learned while working on OrbitDB?

Did you know that if you fly into the EU, and then have a layover and a flight to another EU airport, you can just WALK out of that airport and skip customs entirely? I had no idea, I thought I was going to get tackled but nope, you can just waltz on out.

What is the release process like for OrbitDB?

OrbitDB is split into about two-dozen modules, essentially npm packages. This makes development fairly streamlined because they can be versioned independently. 

The trade off is that we do something called the “publish dance” every time we want to make a release. We make RC versions of every package that changed, test OrbitDB as a whole, then test it in a few of our user’s projects, like 3Box.

What is the main source of revenue for OrbitDB?

We have an OpenCollective that’s just kinda sitting there right now. We also apply for development grants and seek corporate sponsorship.

How do you balance your work on open-source with your day job and other responsibilities?

I don’t know, ask me when my state’s COVID score is back to green :(

Do you think any of your projects do more harm than good?

I’m sensitive to the ideas that any of these decentralized + distributed technologies are capable of doing harm, and I truly think about it every day. There’s no way around it. I still believe that once all this is said and done the good will outweigh the harm, and by a large margin. 

What is the best way for a new developer to contribute to OrbitDB?

Be loud on the open PRs until you browbeat people into finishing them. No, seriously - there’s a lot of really good improvements waiting to be reviewed, documented, and tested. Grab a shovel and get contribution cred.

If you plan to continue developing OrbitDB, where do you see the project heading next?

On the Road to 1.0

Where do you see open-source heading next?

Development grants have made a huge rise, and seem to be here to stay. If your project can improve, implement, or augment another popular project with a foundation behind it, you can get a lot of mileage out of helping them out and getting paid for it.

Do you have any suggestions for someone trying to make their first contribution to an open-source project?

Click the “Create a PR” button. It’ll be OK! We’re not all jerks in open-source, despite what you’ve been told.


Like what you saw here? Why not share it!

Share

Or, better yet, share Console!

Share Console

Also, don’t forget to subscribe to get a list of new open-source projects curated by an Amazon software engineer directly in your email every week.

Console 46

Graphtage, Crums, and Outrun

Sponsorships

If you, or someone you know, is interested in sponsoring the newsletter, please reach out at console.substack@gmail.com


crums-pub

crums-pub includes the data model, parsers, and utilities for crums.io.

language: Java, stars: 0, watchers: 1, forks: 0, issues: 0

last commit: February 13, 2021, first commit: January 04, 2021

outrun

Execute a local command using the processing power of another Linux machine.

language: Python, stars: 2743, watchers: 34, forks: 50, issues: 10

last commit: March 22, 2021, first commit: July 14, 2020

https://twitter.com/overv

awesome-compose

awesome-compose is a list of Docker Compose samples curated by Docker.

stars: 9390, watchers: 246, forks: 1064, issues: 25

last commit: March 22, 2021, first commit: February 12, 2020

https://twitter.com/Docker

graphtage

Graphtage is a semantic diff utility and library for tree-like files such as JSON, JSON5, XML, HTML, YAML, and CSV open-sourced by Trail Of Bits.

language: Python, stars: 1904, watchers: 39, forks: 30, issues: 15

last commit: February 28, 2021, first commit: March 21, 2020

https://twitter.com/trailofbits


Help Wanted

If you’re interested in posting a help wanted ad for your project to thousands of open-source developers, send an email to console.substack@gmail.com


An Interview With Babak of Crums.io

Hey Babak! Let’s start with your background. Where have you worked in the past, where are you from, how did you learn how to program, what languages or frameworks do you like, etc?

I’ve been coding for well over 20 years now, but I came late to programming professionally (I’m that old). In the 80s when I was in school (Applied & Engineering Physics), programming was already becoming a requirement to graduate. I didn’t take programming seriously then: the computer (mainframe) seemed like just this powerful calculator that shipped with a manual I wasn’t about to dedicate much time on; and inside, the manual described this thing called a programming language. I’d learn just enough to get the job done (I’m lazy that way). Then in the mid 90s as the web was taking off my view changed entirely. Whatever I’ve learned since, I learned online.

Professionally, I’ve coded mostly backend systems in C++, Java, some Scala. 

How you do feel about Scala?

Mixed feelings.

Pluses:

  • Incredibly cool, powerful, and expressive

  • Great respect for the concept and the engineering

  • Great for writing DSL

Minuses:

  • Cognitive load, the price of expressiveness

  • Software is as much about telling the machine what to do, as it is communicating an idea to the community.

What’s your most controversial programming opinion?

Java is not a terrible language.

If you could teach every 12 year old in the world one thing, what would it be and why?

Ok, this is a fun challenge (they’re 12-year-olds!): You have many superpowers.  To achieve a superpower, you must hone it, work it. Find yours, it’s mostly likely different than your friends’. (If these superpowers were the same in everyone, they wouldn’t be superpowers, would they..)

If I gave you $10 million to invest in one thing right now, where would you put it?

I’d put it in Crums, obviously :) I have a backlog of products and features to implement. I chip away at these regardless, but, recognize I can’t (I mean, shouldn’t) do everything: a small dedicated team would pick up the pace. Then there’s the business development side, strategy, etc. I have some ideas, talking to prospects, exploring possible verticals, etc. Though I wear this hat as well, it doesn’t fit me well. I need help here too.

What are you currently learning?

I’m learning abstract algebra. I’m not very good at it, but I want to understand the DL papers. Also learning Latex since I’m writing a paper.

What are you writing the paper on?

On hashpointer-based data structures. So, a blockchain is really a linked list and a Merkle tree is really just a binary tree using hashpointers. It's not a search tree, it's kinda the opposite. The paper explores the properties of common data structures, where pointers and references are replaced by hashpointers.

What resources do you use to stay up to date on software engineering?

I tend to focus on topics first, then on general software engineering. I think if you do this, you automatically become aware of the new tools and technologies in that problem domain. So, for example, back when I was interested in (and made a living from) search and IR, I knew about and used the Apache Lucene project early on. Ditto with ML: Hadoop and Spark. I haven’t been focused on ML/AI lately, so I’ve missed the cool Python ecosystem that has evolved in recent years.

Where did you work on search?

I worked at a company whose main product, Alchemy, was a line-of-business application whose main purpose was to ingest data, and save it to "cold-storage" (CDs). It sported an inverted index for search. Very cool product for its time.

Later, I worked at a startup that managed email gateways for enterprises. I worked on a hosted service that saved and managed email content with a retention policy (when to delete) and search.

How do you separate good project ideas from bad ones?

I think a good indication of whether a project has legs is if it can garner mindshare. But, I’d phrase it as “separate the so-so ideas from the really good ones”, because a project is seldom a bad idea, imo: at the very minimum you learn something and have something interesting to show and talk about in the next job interview. Projects are a bit like programming. Iterative. I aim to build on the successes and avoid the mistakes of the previous round.

Why was Crums started?

I started the project because I perceived a growing need to establish the provenance of data. We are about to be buried under a deluge of dis- and misinformation, deep fakes, etc. and one way to combat it, my thinking went, is to systemize a way that allows one to distinguish which version came first.

That was the original impetus. But the more I work on it, I discover more things that can be usefully anchored to digital timestamps. To get a sense of how it works, see this overview and the REST API documentation.

Who, or what was the biggest inspiration for Crums?

Bitcoin. About 6 years ago I started learning about blockchain technologies, not so much as a user but to satisfy my curiosity about how they work. I’m into algorithms and data structures, and became immediately enamored with these powerful, new (actually old) concepts I was learning: hashpointers, tamper proof structures, Merkle trees, hash proofs, and the like. Since then, I keep finding use for these same techniques in everyday code.

Are there any overarching goals of Crums that drive design or implementation?

1) The service must scale and never go offline. 

2) The digital records the service vends must be verifiable even if the service does go offline.

3) The service must maintain an audit trail of everything it does.

What trade-offs have been made in Crums as a consequence of these goals?

Since there’s a virtually unlimited supply of user-submitted hashes for the service to potentially witness, the service cannot remember every hash it has ever seen indefinitely. Instead, it employs a cookie data model: it puts the onus on the user to save the hash’s crumtrail (the witness record).

What is the most challenging problem that’s been solved in Crums, so far?

The backend is on its 2nd iteration. The first version was more a PoC; this second version scales better. It’s composed of multiple independent daemons that can be deployed in fewer or greater numbers to meet demand. This server side code is not open sourced at this time.

Why isn’t it open sourced?

Because it invites trouble :D. I mean, I aim to write solid code, but I'm sure there's an attack vector in there I haven't thought about. Generally, I feel you don't share production server code, especially if it's a one-off and not widely replicated

I noticed you did have some Crum code on Github though.  How did you decide what could be open-sourced and what couldn’t?

I'm assuming you're asking what we keep private, what goes public, because it's actually all on Github. Crums is composed of multiple submodules, each in their own Github repo.

Some, like the Merkle tree library, are sort of independent and can be configured for other use cases (different hashing algo, different leaf byte length, etc) than Crums. So these have their own repo.

The repos are gathered here https://github.com/crums-io

My general sense is, to make as much of the code open-source as practically possible. Mindshare, if you can get it, enjoys a premium over IP.

It sounds like the individual components of the server are open-sourced, but putting them all together into 1 repo is not. Is that correct?

Yes. But note, the real reason the components have their own repos is that some of those components are modular enough to have a life of their own.

The io-util lib, for instance, is a general purpose library I've built, used, and improved over the years. Kinda like the way traditional woodworkers would build their own carpentry tools. https://flylib.com/books/en/1.315.1.29/1/ except as programmers we understand our toolbox gets better when we share.

What was the most surprising thing you learned while working on Crums?

How well-suited the techniques of blockchain are to concurrent environments such as the server side. For once you can efficiently prove correctness (side effect of tamper-proofness), many of the challenges of handling concurrency fall by the wayside.

What is your typical approach to debugging issues filed in the Crums repo?

The usual: write up the simplest unit test that would have captured the bug. If necessary, instrument the code to see what’s going on. (Stepping through the debugger on the server side code is often not practical.) Fix and document the fix. Not necessarily in that exact order.

What is the release process like for Crums?

I’ve done only one release. The “process” needs work: it’s evolving :) The only thing I can say: it’ll be often.

Is Crums intended to eventually be monetized, if so, how?

The goal is to somehow monetize. Not because we love money (we do), but also because it’s the most viable path to sustaining and growing a project. Presently, we’re beginning to do consulting work for businesses that are required to maintain audit trails (for regulatory, financial, or other reasons). We don’t derive much revenue from this (we sent out our first bill last month!).

I don’t see consulting work as our long term business strategy: the goal now is to sign up early adopters, and explore how best to align the product to meet their needs. Longer term, I envision offering hosted, multi-tenant, cloud-based solutions: I think this scales better from a business standpoint. That said, I’m no expert in business development, recognize I’m not very good at selling, and am open to listening and changing course.

How do you balance your work on open-source with your day job and other responsibilities?

I steal time from other things. I’m not sure you’d call it balanced. I push it forward, even if it’s a little bit, every day. The nice thing about software is that it is cumulative. I’m in a bit more of a hurry, but I picture it like Andy Dufresne carrying his pocketful of rock dust to the prison yard everyday (Shawshank Redemption reference).

Do you think any of your projects do more harm than good?

I hope not: the intent is the opposite, kinda to weed out harm and falsehoods by making it easy to prove certain truths. But then there’s the “law of unintended consequences”.

What is the best way for a new developer to contribute to Crums?

Connect with me on github, or shoot me an email (babak at crums .io)

Where do you see the project heading next?

I have a working whitepaper on a broader vision. It works out how a cooperating collective of businesses can trust and verify “intra-collective” transactions by managing their business ledgers using tamper proof methods. This too is an iterative process, with ideas fleshed out in code, and then code informing the paper.

Towards this end, the next release will include a client-side library for tamper proof ledgers. It models an append-only table in a flexible way, with verifiable history. It’ll have a demo app, a Journal, that does this for a text file. It allows its owner to tear out and make public any specific line from the journal, while keeping the rest of its contents private.

What motivates you to continue contributing to Crums?

I feel like I’ve discovered a pasture sitting behind a nearby hill. There’s a lot of low hanging fruit.

Where do you see software development in-general heading next?

I think software development will creep into every profession. It will be an imperative necessity even if you're a surgeon.

Where do you see open-source heading next?

I think it’s fast becoming the de facto way to distribute and release core tools and products. There’s little opportunity (business-wise) in tools: it’s in services.

Do you have any suggestions for someone trying to make their first contribution to an open-source project?

Find a project you have, or will have, use for. Use it. You’ll find many things that could be better or are broken. Fix the easiest. Contribute. Repeat.


Like what you saw here? Why not share it!

Share

Or, better yet, share Console!

Share Console

Also, don’t forget to subscribe to get a list of new open-source projects curated by an Amazon software engineer directly in your email every week.

Console 45

Banned, Bevy, and Airbyte

Sponsorships

Code on the Table

Code on the Table is an online event about open-source business models happening on 03/24. Prominent open-source speakers will discuss the following topics:

  • How has FOSS changed?

  • Can open-core survive Amazon?

  • Is there more pressure for developer tools to be free?

  • What are enterprise companies looking for when they choose open-source software?

Reserve your spot today!


airbyte

Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.

language: Java (core platform) + language-agnostic connectors, stars: 1940, watchers: 61, forks: 158, issues: 489

last commit: March 16, 2021, first commit: July 28, 2020

https://twitter.com/AirbyteHQ

banned.h

banned.h is a header file in the git repo with a list of banned C functions.

language: C, stars: 37271, watchers: 2306, forks: 21120, issues: 58

last commit: March 19, 2021, first commit: April 07, 2005

Plausible Analytics

Simple, open-source, lightweight, and privacy-friendly web analytics alternative to Google Analytics.

language: Elixir, stars: 6932, watchers: 87, forks: 300, issues: 25

last commit: March 18, 2021, first commit: September 02, 2019

https://twitter.com/PlausibleHQ

bevy

Bevy is a “refreshingly simple” data-driven game engine built in Rust.

language: Rust, stars: 7312, watchers: 191, forks: 576, issues: 401

last commit: March 20, 2021, first commit: November 13, 2019

https://twitter.com/bevyengine


Help Wanted

If you’re interested in posting a help wanted ad for your project to thousands of developers, send an email to console.substack@gmail.com


An Interview With Michel of Airbyte

Hey Michel! Let’s start with your background. Where have you worked in the past, where are you from, how did you learn how to program, what languages or frameworks do you like, etc?

I’ve been working in data engineering for 15 years. Originally from France, I came to the US in 2011 to join a small startup named LiveRamp. As the company grew, I became Head of Integrations and Director of Engineering, where my team built and scaled over 1,000 data ingestion and distribution connectors to replicate hundreds of TB worth of data every day. 

After LiveRamp’s acquisition and later IPO (NYSE:RAMP), I wanted to go back to an early stage startup. So I joined rideOS as Director of Engineering, again deep in data engineering. While there, I realized that companies were always trying to solve the same problem over and over again. This problem should be solved once and for all. 

This is when I decided to start a new company, and Airbyte was born. 

Who or what are your biggest influences as a developer?

Over my career, I have always been battling against complexity, whether in code, infrastructure, processes or organization. It is always possible to do complex things in a simple way, and that has always been my North Star wherever I’ve been and whatever I’ve done. 

What’s your most controversial programming opinion?

An amazing software engineer is the one who writes the least code. 

Let me explain. Programming as we know it today has been around for 40 years. It means that it is very rare that you’re the first person to encounter that particular problem. Before going into any project, everyone should be thinking about using an existing solution. For every spec and decision there should be a rationale as to why you are not using an existing solution. You should be able to explain how it is an asset vs. just re-building something. Applying this in your day to day has many impacts. First,  you can get more features out of the box. Second, if you encounter an issue, it is likely that someone outside of your company faced it and will fix it, and if not, then you can fix it for everyone. Third, it means all the code you’re writing is going to be valuable code for the product and will be an actual asset instead of a reinvention of the wheel.

This is the reason we decided to build Airbyte. There are existing solutions (FiveTran, Stitch..) but they are all closed source and cloud based. There is no existing solution that can be used out of the box, in the safety of your cloud, that can be extended and customized and that can support the long tail of all the connectors. We wanted to make sure that the next time an engineer builds  a connectivity layer into a product or an analytics infrastructure, they wouldn’t have to spend countless hours building and managing connectors and instead they have access to a stable, community maintained, full-featured solution. 

If you could dictate that everyone in the world should read one book, what would it be?

Sapiens. It is one of the best written and approachable books about how we got to where we are as Humans.

If you could teach every 12 year old in the world one thing, what would it be and why?

Learn to not be great at everything and instead force yourself to be good/decent in 90% of things and the best in the last 10%. It is a waste of time to try to be great at everything, because it is not possible. But it is possible to be AMAZING at one thing and if you focus on this early, it will compound over time. For the 90% remaining, other people will be amazing at it, you should find them and rely on them.

If I gave you $10 million to invest in one thing right now, where would you put it?

Tesla (obviously don’t take that as investment advice :)). They will win the self-driving car war. To get autonomous driving technology on the street, it is required to have a “Safety Driver” (someone who has been trained to take over if the car misbehaves). 

Tesla is the ONLY company that has solved how to get safety drivers at scale while making money out of it: they sell cars. Tesla customers have become safety drivers and data probes for the company, and there are millions of these customers. Most other companies have 10-20k cars. Because of that Tesla has more data than anyone, and for solving this particular problem, data is the most important asset. 

How do you separate good project ideas from bad ones?

During the first 6 months of Airbyte, we had our fair share of pivots and explorations. 

We learned that you start with an intuition, but that intuition is very hard to evaluate unless you do some customer discovery interviews. You’ll detect the bad projects very quickly, as they can be dismissed within 2-3 days. The hard part is in distinguishing the great projects from the good ones. To do that, you need to pay a lot of attention to pattern matching and biases (trying to remove them as much as possible from your potential clients, and also from yourself). One thing that helped us was doing a series of five interviews, diving deeper with each series, until we reached a point where we weren’t learning anything new from the interviews. 

Why was Airbyte started?

From July to August 2020, we reached out to 250 of Fivetran’s and StitchData’s clients. To do so, we took the list of all the public customers listed by these companies and we automated an outreach on LinkedIn. In the end, we managed to talk to 45 of them. What we learned during those interviews is that all these closed-source cloud-based approaches didn’t actually solve the data integration problem. All the companies still had to build their own connectors on the side, either because they were not supported, or they were supported but not in the way they needed. This problem can easily be addressed with open source, if you make it simpler to build and maintain connectors with the open-source tool. That’s exactly what we have in mind with Airbyte. 

In addition to this, we started to see companies that couldn’t use those tools, as they couldn’t use cloud-based vendors because of data security concerns. Again, an open-source product would fix this problem. 

The last point is around their volume-based pricing, which is unpredictable. Who knows how many rows you will replicate this month? An open-source self-hosted solution could address that point, too. 

That’s why we started Airbyte. 

Who, or what was the biggest inspiration for Airbyte?

In all honesty, I would say there were two inspirations. 

The first one was Fivetran. Their shortcomings were our inspiration. We will have more and more data, more and more tools (and therefore data silos), and more and more requirements in terms of data privacy and security. The data integration problem will only get bigger with time. Closed-source solutions can’t address the long tail of connectors. They will always have a ROI consideration in regards to building and maintaining connectors that are used only by a few customers. Airbyte can address this. We’re building some abstraction to make it low-code to build connectors, and as all those connectors will be standardized, it will be easier for us and the community to help with their maintenance. 

The second inspiration was Singer.io’s failure and slow death. Singer could have been great, if Stitch had been a lot more involved in the community. It was never their focus, and more like an afterthought to increase the number of connectors they could sell. But, above all, they didn’t plan well about how they could standardize the connectors to make it a lot easier to maintain them. Any contributor could build a connector with only their use case in mind, and in the end, they were the only ones able to maintain that connector. The impact of this is that most Singer taps are out of date today. That’s why, at Airbyte, we have our own data protocol with standardization in mind. This protocol is compatible with Singer, by the way. It’s a way for us to help Singer users to migrate to our new standard, so that all their work doesn’t go to waste. 

What are the overarching goals of Airbyte that drive design or implementation, and what trade-offs have been made as a consequence of these goals?

We need Airbyte to work for any company whatever their data stack, and whatever their use case and data volume. With that in mind, we still have a lot to do! But we’re doing it step by step while anticipating the architecture we need to address all those use cases. In the end, data integration is a thousand-paper-cut problem. 

So, for instance, we started with only full refresh. We added the incremental append in December 2020. We will add CDC support, integration with Airflow, DBT, OAuth, etc. in the very near future. Our goal is to be able to address 90% of use cases with the community open-source edition by the end of 2021. After all, we want to become the open-source standard to replicate data, and to commoditize data integration. 

What is your typical approach to debugging issues filed in the Airbyte repo?

We identify the blockers and try to prioritize them as soon as possible—usually on the same day. Apart from those high-priority bugs, we have a weekly sprint process and go over all the issues week after week to prioritize them. We use GitHub to keep track of our issues & milestones. All of this is public.

What is the release process like for Airbyte?

There are two parts to Airbyte: the core platform and the connectors. We release on a weekly basis on the core platform. Regarding connectors, we have bi-weekly sprints, but we release in a continuous manner. 

How do you intend to monetize Airbyte?

Here’s what we have in mind in terms of business model. 

There will be a community edition that will remain open-source forever. Everything we’ve built right now is part of that open-source edition. It will include all the features that an individual contributor needs to perform their integration, i.e., connectors, integration with the data stack (DBT, Airflow, etc.), incremental/change data capture, etc.

Then, there will be a licensed edition with two plans:

  1. A standard plan: hosting & management (premium support + SLA)

  2. An enterprise plan, with data quality & privacy compliance features, and SSO & user access management features

We will work also on a hosted version in the future. 

The last business model we have in mind is what we call "Powered by Airbyte.” We’d empower you to offer integrations to your own clients on your platform, using our white-labeled connectors through our API. 

What is the best way for a new developer to contribute to Airbyte?

The first thing would be to join our Slack. Then, you can check our documentation and understand the architecture of Airbyte. Finally, you could check out our good first issues that we have tagged specifically for new contributors. 

If you plan to continue developing Airbyte, where do you see the project heading next?

Well, we want to become the open-source standard for data integration, and be agnostic in terms of sources and destinations, first and foremost. Only after we’ve become the standard will we start focusing on premium features, too. 

What motivates you to continue contributing to Airbyte?

The whole team is passionate about our mission to change how data is being managed within companies. When you look at the data infrastructure, in the value chain, the data warehouses / lakes and anything downstream of them are pretty mature now. That includes transformation with DBT, data analytics / visualization / business intelligence. However, everything upstream of the warehouse is not yet mature. Data integration, data lineage, data quality, privacy compliance, data cataloging and discovery—we still need standards for all of them. We feel that data integration is in the middle of it, and that’s exciting to us. 


Like what you saw here? Why not share it?

Share

Or, you can share Console!

Share Console

Also, don’t forget to join thousands of engineers in subscribing to a weekly roundup of the latest in open-source software, curated by an Amazon engineer.

Loading more posts…