Console 46

Graphtage, Crums, and Outrun

Mar 28, 2021

Sponsorships

If you, or someone you know, is interested in sponsoring the newsletter, please reach out at console.substack@gmail.com

crums-pub

crums-pub includes the data model, parsers, and utilities for crums.io.

language: Java, stars: 0, watchers: 1, forks: 0, issues: 0

last commit: February 13, 2021, first commit: January 04, 2021

outrun

Execute a local command using the processing power of another Linux machine.

language: Python, stars: 2743, watchers: 34, forks: 50, issues: 10

last commit: March 22, 2021, first commit: July 14, 2020

https://twitter.com/overv

awesome-compose

awesome-compose is a list of Docker Compose samples curated by Docker.

stars: 9390, watchers: 246, forks: 1064, issues: 25

last commit: March 22, 2021, first commit: February 12, 2020

https://twitter.com/Docker

graphtage

Graphtage is a semantic diff utility and library for tree-like files such as JSON, JSON5, XML, HTML, YAML, and CSV open-sourced by Trail Of Bits.

language: Python, stars: 1904, watchers: 39, forks: 30, issues: 15

last commit: February 28, 2021, first commit: March 21, 2020

https://twitter.com/trailofbits

Help Wanted

If you’re interested in posting a help wanted ad for your project to thousands of open-source developers, send an email to console.substack@gmail.com

An Interview With Babak of Crums.io

Hey Babak! Let’s start with your background. Where have you worked in the past, where are you from, how did you learn how to program, what languages or frameworks do you like, etc?

I’ve been coding for well over 20 years now, but I came late to programming professionally (I’m that old). In the 80s when I was in school (Applied & Engineering Physics), programming was already becoming a requirement to graduate. I didn’t take programming seriously then: the computer (mainframe) seemed like just this powerful calculator that shipped with a manual I wasn’t about to dedicate much time on; and inside, the manual described this thing called a programming language. I’d learn just enough to get the job done (I’m lazy that way). Then in the mid 90s as the web was taking off my view changed entirely. Whatever I’ve learned since, I learned online.
Professionally, I’ve coded mostly backend systems in C++, Java, some Scala.

How you do feel about Scala?

Mixed feelings.
Pluses:
Incredibly cool, powerful, and expressive
Great respect for the concept and the engineering
Great for writing DSL
Minuses:
Cognitive load, the price of expressiveness
Software is as much about telling the machine what to do, as it is communicating an idea to the community.

What’s your most controversial programming opinion?

Java is not a terrible language.

If you could teach every 12 year old in the world one thing, what would it be and why?

Ok, this is a fun challenge (they’re 12-year-olds!): You have many superpowers. To achieve a superpower, you must hone it, work it. Find yours, it’s mostly likely different than your friends’. (If these superpowers were the same in everyone, they wouldn’t be superpowers, would they..)

If I gave you $10 million to invest in one thing right now, where would you put it?

I’d put it in Crums, obviously :) I have a backlog of products and features to implement. I chip away at these regardless, but, recognize I can’t (I mean, shouldn’t) do everything: a small dedicated team would pick up the pace. Then there’s the business development side, strategy, etc. I have some ideas, talking to prospects, exploring possible verticals, etc. Though I wear this hat as well, it doesn’t fit me well. I need help here too.

What are you currently learning?

I’m learning abstract algebra. I’m not very good at it, but I want to understand the DL papers. Also learning Latex since I’m writing a paper.

What are you writing the paper on?

On hashpointer-based data structures. So, a blockchain is really a linked list and a Merkle tree is really just a binary tree using hashpointers. It's not a search tree, it's kinda the opposite. The paper explores the properties of common data structures, where pointers and references are replaced by hashpointers.

What resources do you use to stay up to date on software engineering?

I tend to focus on topics first, then on general software engineering. I think if you do this, you automatically become aware of the new tools and technologies in that problem domain. So, for example, back when I was interested in (and made a living from) search and IR, I knew about and used the Apache Lucene project early on. Ditto with ML: Hadoop and Spark. I haven’t been focused on ML/AI lately, so I’ve missed the cool Python ecosystem that has evolved in recent years.

Where did you work on search?

I worked at a company whose main product, Alchemy, was a line-of-business application whose main purpose was to ingest data, and save it to "cold-storage" (CDs). It sported an inverted index for search. Very cool product for its time.

Later, I worked at a startup that managed email gateways for enterprises. I worked on a hosted service that saved and managed email content with a retention policy (when to delete) and search.

How do you separate good project ideas from bad ones?

I think a good indication of whether a project has legs is if it can garner mindshare. But, I’d phrase it as “separate the so-so ideas from the really good ones”, because a project is seldom a bad idea, imo: at the very minimum you learn something and have something interesting to show and talk about in the next job interview. Projects are a bit like programming. Iterative. I aim to build on the successes and avoid the mistakes of the previous round.

Why was Crums started?

I started the project because I perceived a growing need to establish the provenance of data. We are about to be buried under a deluge of dis- and misinformation, deep fakes, etc. and one way to combat it, my thinking went, is to systemize a way that allows one to distinguish which version came first.
That was the original impetus. But the more I work on it, I discover more things that can be usefully anchored to digital timestamps. To get a sense of how it works, see this overview and the REST API documentation.

Who, or what was the biggest inspiration for Crums?

Bitcoin. About 6 years ago I started learning about blockchain technologies, not so much as a user but to satisfy my curiosity about how they work. I’m into algorithms and data structures, and became immediately enamored with these powerful, new (actually old) concepts I was learning: hashpointers, tamper proof structures, Merkle trees, hash proofs, and the like. Since then, I keep finding use for these same techniques in everyday code.

Are there any overarching goals of Crums that drive design or implementation?

1) The service must scale and never go offline.
2) The digital records the service vends must be verifiable even if the service does go offline.
3) The service must maintain an audit trail of everything it does.

What trade-offs have been made in Crums as a consequence of these goals?

Since there’s a virtually unlimited supply of user-submitted hashes for the service to potentially witness, the service cannot remember every hash it has ever seen indefinitely. Instead, it employs a cookie data model: it puts the onus on the user to save the hash’s crumtrail (the witness record).

What is the most challenging problem that’s been solved in Crums, so far?

The backend is on its 2nd iteration. The first version was more a PoC; this second version scales better. It’s composed of multiple independent daemons that can be deployed in fewer or greater numbers to meet demand. This server side code is not open sourced at this time.

Why isn’t it open sourced?

Because it invites trouble :D. I mean, I aim to write solid code, but I'm sure there's an attack vector in there I haven't thought about. Generally, I feel you don't share production server code, especially if it's a one-off and not widely replicated

I noticed you did have some Crum code on Github though. How did you decide what could be open-sourced and what couldn’t?

I'm assuming you're asking what we keep private, what goes public, because it's actually all on Github. Crums is composed of multiple submodules, each in their own Github repo.

Some, like the Merkle tree library, are sort of independent and can be configured for other use cases (different hashing algo, different leaf byte length, etc) than Crums. So these have their own repo.

The repos are gathered here https://github.com/crums-io

My general sense is, to make as much of the code open-source as practically possible. Mindshare, if you can get it, enjoys a premium over IP.

It sounds like the individual components of the server are open-sourced, but putting them all together into 1 repo is not. Is that correct?

Yes. But note, the real reason the components have their own repos is that some of those components are modular enough to have a life of their own.

The io-util lib, for instance, is a general purpose library I've built, used, and improved over the years. Kinda like the way traditional woodworkers would build their own carpentry tools. https://flylib.com/books/en/1.315.1.29/1/ except as programmers we understand our toolbox gets better when we share.

What was the most surprising thing you learned while working on Crums?

How well-suited the techniques of blockchain are to concurrent environments such as the server side. For once you can efficiently prove correctness (side effect of tamper-proofness), many of the challenges of handling concurrency fall by the wayside.

What is your typical approach to debugging issues filed in the Crums repo?

The usual: write up the simplest unit test that would have captured the bug. If necessary, instrument the code to see what’s going on. (Stepping through the debugger on the server side code is often not practical.) Fix and document the fix. Not necessarily in that exact order.

What is the release process like for Crums?

I’ve done only one release. The “process” needs work: it’s evolving :) The only thing I can say: it’ll be often.

Is Crums intended to eventually be monetized, if so, how?

The goal is to somehow monetize. Not because we love money (we do), but also because it’s the most viable path to sustaining and growing a project. Presently, we’re beginning to do consulting work for businesses that are required to maintain audit trails (for regulatory, financial, or other reasons). We don’t derive much revenue from this (we sent out our first bill last month!).
I don’t see consulting work as our long term business strategy: the goal now is to sign up early adopters, and explore how best to align the product to meet their needs. Longer term, I envision offering hosted, multi-tenant, cloud-based solutions: I think this scales better from a business standpoint. That said, I’m no expert in business development, recognize I’m not very good at selling, and am open to listening and changing course.

How do you balance your work on open-source with your day job and other responsibilities?

I steal time from other things. I’m not sure you’d call it balanced. I push it forward, even if it’s a little bit, every day. The nice thing about software is that it is cumulative. I’m in a bit more of a hurry, but I picture it like Andy Dufresne carrying his pocketful of rock dust to the prison yard everyday (Shawshank Redemption reference).

Do you think any of your projects do more harm than good?

I hope not: the intent is the opposite, kinda to weed out harm and falsehoods by making it easy to prove certain truths. But then there’s the “law of unintended consequences”.

What is the best way for a new developer to contribute to Crums?

Connect with me on github, or shoot me an email (babak at crums .io)

Where do you see the project heading next?

I have a working whitepaper on a broader vision. It works out how a cooperating collective of businesses can trust and verify “intra-collective” transactions by managing their business ledgers using tamper proof methods. This too is an iterative process, with ideas fleshed out in code, and then code informing the paper.
Towards this end, the next release will include a client-side library for tamper proof ledgers. It models an append-only table in a flexible way, with verifiable history. It’ll have a demo app, a Journal, that does this for a text file. It allows its owner to tear out and make public any specific line from the journal, while keeping the rest of its contents private.

What motivates you to continue contributing to Crums?

I feel like I’ve discovered a pasture sitting behind a nearby hill. There’s a lot of low hanging fruit.

Where do you see software development in-general heading next?

I think software development will creep into every profession. It will be an imperative necessity even if you're a surgeon.

Where do you see open-source heading next?

I think it’s fast becoming the de facto way to distribute and release core tools and products. There’s little opportunity (business-wise) in tools: it’s in services.

Do you have any suggestions for someone trying to make their first contribution to an open-source project?

Find a project you have, or will have, use for. Use it. You’ll find many things that could be better or are broken. Fix the easiest. Contribute. Repeat.

Like what you saw here? Why not share it!

Or, better yet, share Console!

Share Console

Also, don’t forget to subscribe to get a list of new open-source projects curated by an Amazon software engineer directly in your email every week.

Console by CodeSee.io

Discussion about this post