Scarf, CutiePi, and Signald
|Jackson Kelley||May 2|
Nanos is a free/open source unikernel that runs Linux applications faster and safer than Linux itself. Deploy your first unikernel in just a few minutes!
language: Haskell, stars: 113, watchers: 11, forks: 6, issues: 2
last commit: April 21, 2021, first commit: April 29, 2019
free-python-books is a list of Python books in English that are free to read online or download.
stars: 2140, watchers: 99, forks: 241, issues: 1
last commit: April 10, 2021, first commit: March 01, 2019
signald is a daemon that facilitates communication via Signal Private Messenger. Unofficial, unapproved, and not nearly as secure as the real Signal clients.
language: Java, stars: 79, issues: 70
last commit: May 1, 2021, first commit: December 26, 2017
CutiePi is an open-source Raspberry Pi tablet.
stars: 283, watchers: 28, forks: 30, issues: 5
last commit: May 01, 2021, first commit: April 18, 2019
Hey Avi! Thanks for joining us. Let’s start with your background. Where have you worked in the past, where are you from, how did you learn how to program, and what languages or frameworks do you like?
I grew up in Long Beach, CA but I’ve been living in the Bay Area for the last 11 years. I learned to code in college, I didn’t do any programming before that. I definitely had an interest in it for a long time, but it wasn’t until I got to college that I took my first computer science class. First I learned to code in Scheme, and then moved on to Python and Java, making my way through the UC Berkeley curriculum.
These days I love coding in Haskell, and I’m really into the ideas of the functional programming paradigm, as well as strong static type systems. I find that to be very fun. Developer tool programming and web programming are currently my favorite domains of programming these days.
What are your biggest influences as a developer?
My biggest influences are the Haskell community as a whole which has really shaped the way I think about programming. Some key ideas: The idea that we can have the computer check a lot of our mistakes and do that ahead of time, before we push code out to production or even before we execute it, and how we can push those ideas as far as possible. The approach of using more ideas from category theory and applying them to programming design patterns really resonated with me as well. For me, this makes coding very fun, and that language has had a very big impact on how I think about coding overall.
What's an opinion you have that most people don't agree with?
One of the main ones is really foundational to Scarf—it’s the idea that data collection in open-source is inherently bad or something to avoid. This is a view I do not share.
I believe that the OSS community as a whole would benefit from having more data-sharing initiatives amongst ourselves. Having a better view into how different pieces of open source software are used can help maintainers do their maintenance much more effectively. When end users share info with maintainers about how they are using that software, it can inform and improve the software being delivered. My take is that the OSS community would be well-served by having open opinions about how we share usage data with each other, and that’s not something that a lot of people agree with yet. Over time we’re seeing that maintainers benefit from having better usage metrics about their software, and that’s something that can benefit everyone downstream from them.
What is one app on your phone that you can’t live without that you think others should know about?
One app that comes to mind is Living Worlds. It cost a dollar. Best dollar I ever spent on an app. It has this really nice procedurally generated pixel art for your phone background, and there’s one for every month. It’s live and it will update with the weather and time of day, etc. It’s all 8-bit pixel graphics, but it’s very beautiful, and I really like it.
If you could teach every 12 year old in the world one thing, what would it be and why?
I think I would teach all the ins and outs of using Google search really effectively, teaching people how to learn things on their own. I don’t know what the average google fu of a 12 year old is these days. I’d like to teach them to identify mistakes and misinformation in graphs and data. I’d teach kids data literacy in general, so they can learn how to learn on their own.
If I gave you $10 million to invest in one thing right now, where would you put it?
I’m compelled to say, put it in Scarf so we can continue changing open-source for the better!
I think the Covid situation in India is very alarming, I think that as many resources as we could put into that right now would go the longest way towards saving lives.
What are you currently learning?
I’m learning a ton every day about open source, as I try to get deep in learning on how we can help maintainers and open source projects and companies. Aside from that, I’m really into trying to learn more about cooking and baking all the time. I especially love baking bread, there’s so much hidden complexity to it. And lately, I’m spending an inordinate amount of time learning to play chess.
What resources do you use to stay up to date on software engineering?
Largely Twitter. My Twitter feed is mostly people in the programming community. And seeing what people are talking about. When people post a random new project or library, I’ll go check things out from there. Reddit as well is pretty helpful for this kind of thing, but I’d say Twitter is the main place that I’m getting my new programming stuff from though.
What have you been listening to lately?
Music-wise, I listen to a lot of lo-fi hip-hop because I can listen to that while I work, and a lot of mathy-rock. One band that comes to mind that I’ve been listening to recently is a Japanese band called Toe. I also listen to a lot of economics podcasts these days - Planet Money, the Indicator, and Freakonomics are all very good.
How do you separate good project ideas from bad ones?
Often, the projects I’m working on are largely to scratch my own itches. And that’s a really good gut check, “is this something I want to exist?”, and if so, that’s how I know I want to work on it. The other thing I’d add is, if I’m spinning on a project, I’m quick to just give it a try. I just jump right in and see where it takes me. Sometimes it doesn’t lead places, sometimes it does, and I think just doing is the best way.
Who, or what was the biggest inspiration for Scarf?
Scarf started from my own pain points as an open source maintainer. I had projects that were out there and being used, but I didn’t know how, or how much, or by whom, unless they actually told me. End users were running into issues that were affecting them this way or that. Scarf was born from the awareness that open source maintainers don’t know who’s using their projects, or how they are being used, until problems arise. Often companies are using our code, or relying on it, and, with the correct incentive structures in place, they might be willing to pay for predictable maintenance of critical software, for instance. We see commercial opportunities for many open source projects that are not being taken advantage of because maintainers have no idea who’s using their code.
Are there any overarching goals of Scarf that drive design or implementation?
It’s all about connecting open source developers to their users, especially when those users are companies, but even when they’re not. Getting usage data into the hands of maintainers is an overarching goal. For instance, “What versions of your software are being adopted over time?” This information is super useful for maintenance, even separately from monetization. If we have more informed maintainers, we have better open source software for everyone.
The tradeoff is that we need to be very responsible and respectful of people’s privacy in how we do that data collection. We must be careful as we develop standards for these processes. Scarf is GDPR-compliant out of the box, because we don’t store PII like IP addresses, but we do store the metadata about them. These are the kind of tradeoffs that we think about: getting maintainers more data while protecting end-user privacy. We’ve prioritized getting the basics to maintainers first: basics that can help them with maintenance, and help them identify commercial companies that are using their work, without sacrificing any of that privacy. That’s the line that we are walking.
We keep our ear to the ground and listen to our users. When end users told us they didn’t care for the method of analytics we developed in one of our early projects, scarf-js, we adapted, and released a new project that was more in line with user expectations. Documentation Insights provides analytics for tracking how people are viewing your docs, even if they aren’t on your website. We also built the Scarf Gateway, which is based on the metadata around registries and the downloads to your packages.
What is the most challenging problem that’s been solved in Scarf, so far?
The Scarf Gateway has been quite a technical undertaking! The Gateway provides a central access point to your containers and packages on any registry, so you can seamlessly switch between them. The Gateway also provides maintainers with rich download statistics. Keeping that available and largely redirecting so we don’t have a big footprint on the install was a huge technical challenge to solve. When we started, we didn’t know the details of how these registries work internally and that’s what makes me really proud of the product we’ve released. You can see the details of the Scarf Gateway on our website. It’s definitely the most technically challenging thing that we’ve built to date.
But in the long run, we’re also working on a new project called Nomia, which is generalizing some of the ideas which have come out of the Nix community. Things like content addressing, reproducibility in builds, and package management, and generalizing those concepts is also a technically challenging problem.
I’m also not aware of how these registries work internally. Would you be able to give a brief explanation?
Registries are the services where developers upload and host their bundled software packages so they can be downloaded, searched, etc. They are ultimately what’s responsible for turning a package identifier, e.g. ‘email@example.com’ to a downloadable artifact and associated metadata that your package manager can understand to build your projects and its dependencies. Registries vary between programming languages and operating systems in implementation for everything from package formats, auth, validation, security, API functionalities and more, but they are largely accomplishing the same set of tasks. Packages can be simple as .zip files, or more complex like a Docker container with many layers that might need to be fetched from different places.
Are there any competitors or projects similar to Scarf? If so, what were they lacking that made you consider building something new?
In terms of the monetization of open source, I see some similarity with Tidelift. However, Scarf itself is not focusing on the support marketplace side of things at this time, though it’s something we will do more of in the future.
For me, Tidelift could not address any of my code, because I was writing developer tools that are installed on developers machines rather than checked into a project manifest file. My command line tool would never make it into a package.json of a project, or anything like that, so Tidelift would never really see it. I love what Tidelfit is doing, but I would not be able to make money from Tidelift for the tools I was building myself.
The more immediate problem that Scarf is focusing on—in the space of software distribution and observability—is something that no one is working on in the way that we are. The very best registry, in terms of sharing data with maintainers, is probably crates.io from the Rust crowd. It shows version adoption, and things like that. And yet, there’s so much data the registries see that they don’t share with maintainers, and that fundamental problem is really what got us started on Scarf. Distribution channels are a black box for people building the software, and that’s something no one else is addressing as we are.
What was the most surprising thing you learned while working on Scarf?
The fact that people were ok with the kinds of data that Google Analytics collects but not ok with the kind of data that scarf-js or npm collects, was totally unexpected for us. Scarf never collects personally identifying information! We learned to never make assumptions about people’s expectations when it comes to privacy. Still, it was surprising to learn that people were more ok with what became our Documentation Insights product, more than with scarf-js. This was definitely something we didn’t anticipate.
Again, we are pushing the envelope with the kind of data sharing initiatives in open source, and we will continue to listen to what people say. We want to make products that support maintainers and respect end users, and that’s what’s landed us with the set of tools we currently offer.
What is your typical approach to debugging issues filed in the Scarf repo?
These days we have introduced a lot of observability tools. Good old print statements take you a long way in troubleshooting something like that. We use other observability tools as well. For instance, we just started using Honeycomb.io for backend observability, which has been really great, our team loves it. We use whatever instrumentation we can to gain more observability for better debugging. One of the reasons we love coding in Haskell and having really strong type systems is that we catch a lot of errors ahead of time, and we can catch many head-scratching bugs before we even make them in the first place. This is not to say we don’t have bugs, but when we do, observability tooling gives us a picture of what’s going on and what needs to be fixed.
What is the release process like for Scarf?
Currently we’re deploying Scarf about once a week, but we’re working toward fully continuous deployments. We use Terraform to do our infrastructure provisioning. We use tools like Nix for our build and CI pipelines. Our process: We test in our staging environment, whatever is on our main line of code, whenever we want to go out, and we have a separate production branch.
These processes are likely to change very soon, as the team grows. Right now, with our manual deployment and release process, we are typically testing in the staging environment, looking at the diff into prod, and then deploying it with Terraform. And then Terraform tells our Kubernetes cluster what versions of all the different containers of our services to deploy, and then the code is pushed that way.
Do you eventually intend to monetize Scarf, and if so, how?
Yes, Scarf is a for profit business and we’ll definitely be monetizing aspects of the things we build. The Scarf Gateway is very amenable to traditional SAAS business models. We will be charging for premium features on top of the Gateway, such as automatic registry-mirroring and failover. We’ll have copies of all your containers cached and ready to go, and we can serve those in the event that your registry goes down, and that’s a service we’ll charge for.
We envision charging for additional tooling and features on top of the data that you have in your Scarf account. In the long term, our main goal is connecting open source companies and maintainers to their commercial users. We can help with support agreements, facilitation, license verification and enforcement, and more. Our goal is to be there from delivery to maintenance, for support and commercialization for any given piece of software. The Scarf toolchain will be the platform facilitating every aspect of distribution, and that’s how Scarf will make money in the long run.
How do you balance your work on open-source with your day job and other responsibilities?
This is really hard to do. The more popular your project gets, the more of your time it’s likely going to take. One solution is to bring contributors on board and build out your community so that it’s not just on you. It’s always helpful to spell out on your project’s README the best ways for people to contribute to your project. Make it as easy for them as you can. Label the open issues that are best for new contributors so they are easy to find.
The other way - this is what we’re addressing with Scarf - is to have more observability into what’s actually happening, so you can be more proactive. If you have a better sense of how your users are interacting with your project you can prioritize better. Prioritization is key. Making sure you’re focusing on the right things for your open source project and that you’re not building stuff that won’t be used. In general, ruthlessly cutting scope—and doing less work. These are great ways to balance the workload.
Or, start a company. If you’re looking for balance in your life, definitely start a company. /s
Do you think any of your projects do more harm than good?
No, I don’t think so. At Scarf, we are very, very careful with the data we collect, the data we store, and the data we expose. I assert that none of the information we’re exposing to maintainers does harm. This data does not expose anyone personally, I believe it is a very good force in open source for maintainers to understand how their projects are being used.
People sometimes make the argument that introducing money into open source at all is something that can lead to misaligned incentives. Where we come from is that work should always be compensated, one way or another. People should own their work and people should be compensated for their work. That’s an ideal I hold true.
Where do you see open-source heading next?
I see a few big changes on the way. One that I’ve been underlining for this whole time: Parts of the software stack have been so far very ‘black-boxy’ to people who are building on it. The distribution layer of open source is one of them. I think we’re moving in the direction of greater openness.
Software development in general is spreading around the world more rapidly than it has in years past. A recent GitHub blog post shows that their biggest developer growth is in India. As we move forward, I expect to see more diversity in software development, which will be great. Also, as we build and ship applications, the need for globalized infrastructure will become more important. In the post-pandemic world, your customers and users may become a lot more globally distributed than you might think.
We’ll continue to see more participation from non-engineers in software development, with the rise of no-code and low-code tools. I’m thinking about how no-code builders and actual developers interact, and how we get designers and all other creators more integrated into the actual development of software. Open source is a lot more than just the source code, and integrating more of the people involved with the actual life cycle of the software is going to go a long way. That’s definitely the direction we’re headed.
Do you have any suggestions for someone trying to make their first contribution to an open-source project?
Just go for it! Even if you’re submitting a fix to a typo in a readme, just send the pull request! The barrier to entry is relatively low for fixing documentation - a lightweight way to get started that people appreciate. Getting in and talking to people, showing up at a new project that’s interesting to you, and saying, “I’m new, is there anything I can help with?” From a maintainer’s perspective, people are thrilled to help new folks get involved - one of the best parts of working in open source is getting new developers in, and helping them with their first contribution. That’s the best part. People are happy to help when you’re wanting to learn and get involved.
Like what you saw here? Why not forward it to a friend?
Or, better yet, share Console!
Also, don’t forget to subscribe to get a list of new open-source projects curated by an Amazon software engineer directly in your email every week.