Console #25

Chaos, Dendrons, and Helixes

Jackson Kelley

Nov 01, 2020

TwitGrid is a simple, broadsheet layout for reading Twitter feeds.

last commit: 2 days ago, first commit: Oct 19, 2020

https://twitter.com/vilimpoc

Typesense

Typesense is a fast, typo-tolerant search engine for building delightful search experiences.

last commit: 5 days ago, first commit: November 10 , 2015

https://twitter.com/typesense

Rocketcrab

rocketcrab is a lobby service and launcher for mobile web party games.

last commit: 1 day ago, first commit: Jul 9, 2020

https://twitter.com/tannerkrewson

Dendron

Dendron is an open-source, local-first, markdown-based, note-taking tool built on top of VSCode.

last commit: 19 minutes ago, first commit: Apr 10, 2020

Apache Helix is a generic cluster management framework used for the automatic management of partitioned, replicated and distributed resources hosted on a cluster of nodes. Helix automates reassignment of resources in the face of node failure and recovery, cluster expansion, and reconfiguration.

https://twitter.com/TheASF

Awesome Chaos Engineering

Awesome Chaos Engineering is a curated list of awesome Chaos Engineering resources.

last commit: 7 days ago, first commit: Jul 26, 2017

Grafana Tempo

Grafana Tempo is an open source, easy-to-use and high-scale distributed tracing backend.

last commit: yesterday, first commit: January 16, 2020

https://twitter.com/grafana

Pressure Vessel

Pressure Vessel is Valve’s newly open sourced container code for its Linux games.

last commit: 5 days ago, first commit: 14 Feb, 2015

https://twitter.com/valvesoftware

Pygame 2.0

Pygame is a free and open source cross-platform library for the development of multimedia applications like video games on Python. It released its 2.0 version this week 😊.

last commit: 6 hours ago, first commit: Nov 7, 2000

https://twitter.com/pygame_org

An Interview With Jason Bosco of Typesense

What is your background?

I’m Jason Bosco, currently Co-Founder of Typesense. I’m based out of Los Angeles.
Before Typesense, I worked at an e-commerce startup called Verishop, heading Engineering, Product and Design. Before that I worked at Dollar Shave Club - started out as the 2nd engineer, built v1 of some of the core systems like subscription billing, marketing automation, etc, then built teams around these systems and was VP of Engineering when I left.
I learnt to program when I was 11 years old! My first language was C, then picked up a bit of Java, Visual Basic, and C++. I then discovered web development through PHP and stuck with it for a few years. Then picked up Erlang out of necessity for a massively multiplayer game server we were working on (and loved it!). Finally stumbled on Ruby/Rails and have been primarily a Rubyist since 2012. Meanwhile the JS revolution exploded in the last couple of years and I finally found a chance to pick up some modern ES6 while working on the JS client for Typesense.
For web development, I’ve tried a couple of frameworks in Node, PHP and Ruby and I always find myself coming back to Rails. What I love about it is that I can focus on building the business logic rather than having to deal with plumbing different libraries together in other frameworks. Rails seems to come “pre-plumbed” which increases my productivity. I also like Erlang (though my knowledge is probably outdated now) because it exposed me to a whole new programming paradigm. No variables, only constants - that blew my mind! Pattern matching, spawning processes for everything, message passing - pretty cool features.

Why was Typesense started?

My co-founder, Kishore and I have worked on search related products in the past and we repeatedly saw first-hand the effort and complexity of using various search engines that were available at the time - ElasticSearch, Solr, etc. So just out of intellectual curiosity, Kishore started looking into what goes into building a search engine from scratch and why it’s so complex. Turns out, search need not be that complex for 80% of the use cases. This effort slowly took the shape of a product with an API and we figured more people might find this useful. And Typesense was born. We put up our project on Github towards the end of 2015 and have been chipping away at it since.
In the meantime, Algolia started out as a venture-backed company, solving similar problems, but with a proprietary (and expensive-at-scale) search product. As a happy coincidence, Typesense ended up becoming a free and open source alternative to Algolia.

What would you say is the biggest difference between Typesense and Algolia?

Besides price, we’ve also solved a few key pain points that Algolia users typically run into. For example: Algolia requires separate indices for each sort order (which eats into your usage), whereas with Typesense you can configure sorting dynamically when you query. This allows for more flexible use cases. In general, many settings that can only be configured at the index level in Algolia, can be configured dynamically at search time in Typesense.
Then there’s the flexibility you get with software you can download and run yourself anywhere. Unlike Algolia, you can run Typesense locally on your development machine while you develop against it. You can run it in your CI environment for integration tests if you need to. You can also deploy the same docker image to your production k8s cluster. There is no proprietary vendor lock-in, since the entire codebase is open source.

How did you and Kishore meet?

Kishore and I met during our CS undergrad!

Are there any overarching goals of Typesense that drive design or implementation?

The overarching goal with Typesense is to offer sub-50ms search and developer productivity. So with every feature we build, we pore over search performance on one side and also make sure we make it an out-of-the-box and intuitive experience for developers who use that feature. We also want to ensure that deploying Typesense to production is a simple and straight-forward process.

What tradeoffs have been made in Typesense as a consequence of these goals?

To make search queries fast, Typesense holds the entire index in memory. But the tradeoff is that for petabyte-scale data (like log data), you’d need a ton of RAM to index it, which might make it cost prohibitive and so Typesense wouldn’t be a fit for these datasets.
With developer productivity, the tradeoff I find ourselves making is how configurable a feature should be. Too much flexibility and we end up with one too many config parameters (like ElasticSearch with a couple of thousand parameters), too little flexibility and we end up only being useful only in a particular set of circumstances. Balancing this tradeoff with every feature is a nice challenge.

What is the most challenging problem that’s been solved in Typesense so far?

We spent about 4 months earlier this year focusing on improving indexing performance and concurrency. And we had to keep an eye on performance at every step in the pipeline, since an additional per-record latency of even 0.25ms can add up to several hundred seconds lag in a large enough dataset.
All of this was challenging in that we had to go through about 25 iterations before we were satisfied with what we saw. We also had to put a couple of other key feature requests on hold while we worked on this, which was hard.
Overall, I’m really happy with where we landed. I tested a dataset with ~3 million records (Amazon product data) that was ~13GB on disk and was able to get a throughput of 250 concurrent search queries per second on a 16GB, 8-vCPU 3-node Typesense cluster. I was able to ingest this dataset in about 20 minutes into the cluster.

Any interesting insights related to the concurrency optimizations?

We switched out our memory allocator from malloc to jemalloc, improved on our existing lock-free concurrency mechanism (using shards) to take advantage of all CPU cores, we switched to raft-based clustering so all nodes in the cluster can service reads and writes, and switched data ingests to use streams to handle large volumes of data.

What were you using prior to migrating to Raft?

We previously had a primary-replica model where writes could only be sent to one node.

Did you implement Raft yourself or are you using a library?

Phew, no! Thankfully there's a battle-tested raft library that we use. The library provides hooks into various life cycle events that you integrate with your application.

How is Typesense currently monetized?

We started monetizing Typesense this year. As an open source project, we wanted to make sure that it is also a sustainable revenue-generating business to ensure its longevity.
We initially tried the model of open-core and paid premium features. But we quickly realized that this model hurts adoption. We also had to repeatedly make the hard decision of whether a new feature will be part of the open core or if it will be a premium feature.
Based on feedback from users, we have now pivoted to offering a hosted SaaS version of Typesense, called Typesense Cloud (in public beta since Sep 2020), for those of us who would rather not manage any servers. We run the same open source version in our managed offering, so users can choose to either self host or let us manage their Typesense cluster for them. We have also open-sourced all previously-premium features.
The nice thing about the Open Source + Cloud model is that incentives are aligned well. We dog food our own product on Typesense Cloud and it is in our best interest to make it as easy as possible to operate our product.
In addition to Typesense Cloud which is a paid product, we have also started offering paid prioritized support for companies that need it. We help with best practices around deployment and use, troubleshooting, etc. given our experience running Typesense Cloud.

How do you balance your work on open source with your day job and other responsibilities?

I look at it as having 2-3 projects going parallely and I try to maintain some variety in the type of work I do in each of those projects. For example, if I’m working on the visual designs in one project, I try to work on the dev pieces in another project, and then on customer support in a 3rd project. Having this variety helps keep me motivated on all projects.

What is the best way for a new developer to contribute to Typesense?

The best way would be to contribute an API library in your favorite language. We do have a REST API, but we also have official libraries in JS, PHP, Python and Ruby. We’d love to support libraries in more languages, but we’re not experts in all of them! So we welcome contributions.
Another way would be to use Typesense with the datasets you already have, see how it performs and share benchmarks and feedback with us.

Where do you see the project heading next?

We want it to be a self-sustaining revenue-generating open-source business, that provides powerful yet affordable search solutions for both the solo developer working on their side project, and the large team working on building an instant search experience in their product. So the next big goal is to get the word out that Typesense exists to our peers in the industry.

Do you have any suggestions for someone trying to make their first contribution to an open source project?

If you want to contribute to open source and don’t know where to start, just look at your project’s dependencies file (package.json, Gemfile, etc). Pick a project from there that seems like a relatively small one. Look through their issue tracker for things that seem like low hanging fruit and offer to help. If you can’t find one, start a conversation with the authors and ask them where you can help.

An Interview With Chris of Stealth

What is your background?

Phew, my background is mostly in the field of co-evolutionary AI Engineering and/or Data Science, depending on the projects I am working on.

Back in the days I taught myself how to program. When we got our first Internet connection I visited a list of websites that my uncle wrote for me on a Post-It note. The first website was altavista, the second one was MIT OpenCourseWare. Well, then the “5GB flat-rate” got cancelled after around 6 days and my parents got mad but couldn’t blame me for wanting to study more online.

Soon I got into phreaking a bit, and then cybersecurity in general. I guess I was kind of a little script kiddie at times, too.

My passion for Software Architecture developed much much later when I was working at Zynga’s STG (Shared Tech Group), where we were building the game engine for our game studios. The engine was mostly built for isomorphic purposes, but supported all kinds of arcade games, too.

For me, the most easy language to write software in is ECMAScript and then Rust for everything that needs native integration (e.g. via neon-bindings). I do value a good and strict linter, though, as (seemingly) opposed to the overall ecosystem in NPM. I kind of don’t like how NPM handles the headers/sources/ABI metadata problem, that’s why most of my software written in node is dependency free.

Why was Stealth started?

I had the idea to build my own web browser for a while now, and I built a couple of prototypical UIs in order to find out “how” I wanted it to be built. The initial inspiration for the architecture came actually from the UI animations in the series “Person of Interest” aka my “research” prototype that’s somewhere buried down my GitHub profile.

When Google decided to change the manifest for a static list of URLs for AdBlockers (and @gorhill made a request to undo it) around February 2019, I decided that now it’s time to build my browser.

I also think that the “concept” of what people understand of what a browser actually is, is a wrong solution to the problem at hand.
A browser should be more like an assistant that helps its users to (re-)find knowledge that they tried to find or have found in the past. I always have literally thousands of bookmarks on every machine, and it’s hard to keep track of them - because the omnibars always find stuff online, but never bookmarked articles… which also annoys me a lot.

I also thought of scraping the bookmarks into an offline archive with some sort of web interface for it, but scraping is broken a lot, too. And I think that can be done better.

Are there any overarching goals of Stealth that drive design or implementation, and if so, what tradeoffs have been made in Stealth as a consequence of these goals?

I don’t want Stealth to be a browser that’s as stupid as others. And I definitely don’t want automation-by-programming, because I do not think this is a valuable goal for the future.

I didn’t want to go the IFTTT route, because usually those solutions tend to evolve into a “Fass ohne Boden” as we Germans would say.

When it comes to trade-offs I currently skipped the whole underlying CSS/HTML/browser engine thing due to lack of time - and decided to instead go the headless browser route, which actually helped me a lot for the automation goals that I had in mind.

The current idea of Stealth is that the browser will allow users to record their own interactions with websites, such as keyboard, mouse, navigation, extraction and labelling of content with the idea that later on, something “bigger” can be built with that technology.

Imagine what people could build if the whole programming part of gathering knowledge on the internet isn’t necessary anymore, and say, an AI could read you bedtime stories from articles on your favorite blogs...

What is the most challenging problem that’s been solved in Stealth so far?

Personally, I would say the UI. But time-wise this definitely was networking and HTTP.

Rational people would assume that HTTP/1.1 is specified and every server uses the same mechanisms and handshakes and implements the protocol by-the-book. Well, turned out, this assumption is very wrong.

Transfer-Encoding: chunked and 206 Partial Content streams are implemented wrong, on every single server out there. If you request 2 or 3 different chunks other than “0-”, for example, you will end up in nightmarish scenarios because even nginx or apache all do non-spec-compliant responses. That’s the case for DNS over HTTPS servers, too.

I probably spent 5 months just for partial content streams, reading cURL in parallel to figure out what the common issues are - and trying to adapt the concept to my peer-to-peer “everything is a tunnel” concept.

What’s your typical approach to debugging issues filed in the Stealth repo?

Due to networking being so challenging, I decided to write my own testrunner that is also implemented with ES6 modules.

Turns out, in the whole NPM ecosystem, there’s not a single testrunner that is both ESM compatible and doesn’t transpile itself via Babel before you execute it - which makes a “debugger;” statement pretty useless, especially in asynchronous code.

So in order to be able to test modules not only in an on-project-level basis, but also be able to inspect tests for single methods, I decided to built a testrunner that has stateless tests. This means that inside the “describe(...)” calls there have to be zero branches, and all “assert()” calls have to be stateless as well.

This allows very flexible debugging scenarios that helps a lot with networking problems or when, say, a peer-to-peer service behaves differently than expected due to a state-related bug that appears only in some occurrences (e.g. TCP RST packets were injected, which happens a lot, too).

Currently, I don’t have a plan for UI testing yet, because pretty much all UI testing frameworks need information about the DOM structure and work on a queryselector basis, including Puppeteer based ones… which I kind of don’t like.

Is Stealth intended to eventually be monetized if it isn’t monetized already?

Stealth is currently unfunded as a project. I thought a really long while about whether I should make Stealth itself proprietary or open-source, but finally decided to go with the GPL3 license.

My current idea for a potential business model is that I can provide a benefit for Stealth’s “Swarm Intelligence”.
So the idea here is to have something like “Knowledge Trackers” that are similar to the Torrent tracker concept. They don’t track the content itself, but only the metadata. Imagine a web service that allows users to share/seed and synchronize/leech the metadata of websites.

As “metadata” I see workflow automations, labeling instructions, etc. - so that companies or people that want to do web scraping on a large scale have access to the metadata that describes “how to extract data from a website” and in return could save them a lot of work for building e.g. scrapy-based scripts all the time.

How do you balance your work on open source with your day job and other responsibilities?

I would say that currently my priorities are set with Stealth and the Tholian Network, but of course, reality doesn’t allow that. Pre-COVID I was freelancing a lot to pay the bills, but post-COVID the situation is a bit different because the market is kind of crushed over here in Germany.

I applied for a lot of research funding grants, but most of them focus on very different areas and don’t see the value of a web browser that tries to automate the web, so there was no luck (yet).

When it comes to how I’m balancing all of it my girlfriend would probably say that I’m not balancing it at all. I try to do a lot of sports and outside activities when I’m stuck with coding (so I go outside and hike the mountains a bit).

But usually when I have an idea for a solution I need to get it done until I can sleep again, that’s kind of a consequence of having ADHD.

What is the Tholian Network?

The Tholian Network is the peer-to-peer network I'm trying to build with the Stealth and Radar instances as infrastructure, where the idea is to enable users to share what they've automated on the web in a peer-to-peer manner. This is mostly currently in my head, as I've only begun to implement Radar as a software and it's very very prototypical at this point.
Later there will be swarm intelligence based algorithms to verify authenticity of resources and to identify censorship or malicious behaviors among peers. Basically Radar tries to be a gateway/tracker of sorts that can show e.g. popular articles, popular topics, a censorship index, most contributing peers etc. and will very likely aim to be something like a decentralized way for communities to share access to knowledge.

What is the best way for a new developer to contribute to Stealth?

Currently, the most pressing issues are the parsing and filtering of content (such as the CSS and HTML parser).

But, in general, I would just invite people to the Tholian Beta group on Telegram. I promise we’re nice, and we don’t bite :)

Rather than that there’s a lot of things to do left in the project. A lot of tests need to be implemented, and a lot of the Browser’s Internal Pages (“stealth:...”) are rough around the edges.

If you plan to continue developing Stealth, where do you see the project heading next?

Stealth as a project tries to enable users to automate their own activities on the web.

After Stealth understands how to do web scraping, the next task is workflow automation (including something like scheduled downloads, exports, searches, analytics etc.) with the idea that the user can teach the browser while not having to program it.

After that I always wanted to pursue the idea of “Stutter”, a simple syntax clutter free programming language that uses normal sentences, with the underlying idea to be understandable and predictable by non-programming people.

I wanted to develop it mostly to enable voice-based programming. I have a prototype for that laying around for a while that’s still using Tacotron, but I think in the context of web assistants, this can be a huge thing. It might be its own product, or part of Stealth, I don’t know yet.

Where do you see software development in general heading next?

With my past hobby project, lychee.js, I tried to solve programming by creating a backpropagated ES/HyperNEAT based AI that was able to learn how Composite Pattern (and Entity/Component concepts) work, and this was pretty successful in the ABB Research project I did where it was able to fully re-program and optimize a virtual and real factory with all its ~200 robot cells and literally millions of states, vectors, and methods it learned.

I’m not sure whether this is where the future is headed, and whether the “Logic of Logic” can be automated (it probably can’t) but I think that hybrid systems that have both supervised and unsupervised parts are a very likely scenario.

In the case of Stealth I think this is where I want to go, and using AI/ML techniques to ease up its usage, classification of content, and labelling of content or helping the user out.

Where do you see open source heading next?

Phew, this is a tough question. I think that the majority of programmers will always use centralized solutions because they are more convenient to use and “to grasp”. We’ve seen google code, savannah, sourceforge and others come and go (and come again), but I think that a real future-proof solution would be to decentralize it. But decentralization implies lack of a business model, that’s why we don’t have it (yet).

The gittorrent prototype of 2015 was pretty cool, but I don’t know for sure whether this can be done in a manner where it’s accessible and more importantly - constantly accessible in the future when the seeders stop seeding.

The web, just as everything in the digital world, has a huge amnesia problem. And it gets worse when thinking about the books, blogs, and articles I’ve assimilated in my youth to learn my profession - that have disappeared from the web already.

Even when you know the exact time when you read it and know the website’s domain it was hosted on - you’ll have a very hard time clicking through the thousands of snapshots on the web archive, just to find out that it was a dynamically served content with some “?postid=123” in the URL, and therefore was archived as a 301 redirect error.

Do you have any suggestions for someone trying to make their first contribution to an open source project?

I don’t know, honestly, how to give a good recommendation on this. I got stuck with Arch (and AUR) not just because I like the project - but rather because I can fix things real quick once they break again.

If you find something that’s annoying you (maybe on a daily basis?) why not just try to figure out whether you can fix it or not?

In most cases it isn’t that complicated, it’s just that people need help for the little things, too. A lot of projects in the open source world are private projects that are unpaid, and a lot of maintainers would love to get more help and people that support their projects with their time.

A small contribution goes a long way. Docs missing? Tests missing? A little bug that they didn’t have the time to fix?

And this doesn’t have to be code only. A lot of times projects would love to have UI/UX designers on their team, too.