Featuring Dub, Whisper, and Typesense
Discover hundreds of cool open source projects on opensourcehub.io and list your own project to connect with the Open Source Hub community!
Dub is an open-source link shortener with built-in analytics and free custom domains. Dub is built with Vercel Edge Functions and Upstash Redis.
Whisper is a general-purpose speech recognition model trained and open sourced by OpenAI on a large dataset of diverse audio. It can perform multilingual speech recognition as well as speech translation and language identification.
Typesense is a fast, typo-tolerant search engine for building delightful search experiences. It is an open source alternative to Algolia and ElasticSearch.
Join thousands of other open-source enthusiasts and developers in the Open Source Hub Discord server to continue the discussion on the projects in this week's email!
Not subscribed to Console? Subscribe now to get a list of cool open-source projects and an interesting interview every week!
Hey Jason! Thanks for joining us! Let us start with your background. Where are you from, where have you worked in the past, how did you learn to program, and what languages or frameworks do you like?
I’m Jason Bosco, currently Co-Founder of Typesense. I’m based out of Los Angeles.
Before Typesense, I worked at an e-commerce startup called Verishop, heading Engineering, Product and Design. Before that I worked at Dollar Shave Club - started out as the 2nd engineer, built v1 of some of the core systems like subscription billing, marketing automation, etc, then built teams around these systems and was VP of Engineering when I left.
I learnt to program when I was 11 years old! My first language was C, then picked up a bit of Java, Visual Basic, and C++. I then discovered web development through PHP and stuck with it for a few years. Then picked up Erlang out of necessity for a massively multiplayer game server we were working on (and loved it!). Finally stumbled on Ruby/Rails and have been primarily a Rubyist since 2012. Meanwhile the JS revolution exploded in the last couple of years and I finally found a chance to pick up some modern ES6 while working on the JS client for Typesense.
For web development, I’ve tried a couple of frameworks in Node, PHP and Ruby and I always find myself coming back to Rails. What I love about it is that I can focus on building the business logic rather than having to deal with plumbing different libraries together in other frameworks. Rails seems to come “pre-plumbed” which increases my productivity. I also like Erlang (though my knowledge is probably outdated now) because it exposed me to a whole new programming paradigm. No variables, only constants - that blew my mind! Pattern matching, spawning processes for everything, message passing - pretty cool features.
Why was Typesense started?
My co-founder, Kishore and I have worked on search related products in the past and we repeatedly saw first-hand the effort and complexity of using various search engines that were available at the time - ElasticSearch, Solr, etc. So just out of intellectual curiosity, Kishore started looking into what goes into building a search engine from scratch and why it’s so complex. Turns out, search need not be that complex for 80% of the use cases. This effort slowly took the shape of a product with an API and we figured more people might find this useful. And Typesense was born. We put up our project on Github towards the end of 2015 and have been chipping away at it since.
In the meantime, Algolia started out as a venture-backed company, solving similar problems, but with a proprietary (and expensive-at-scale) search product. As a happy coincidence, Typesense ended up becoming a free and open source alternative to Algolia.
What would you say is the biggest difference between Typesense and Algolia?
Besides price, we’ve also solved a few key pain points that Algolia users typically run into. For example: Algolia requires separate indices for each sort order (which eats into your usage), whereas with Typesense you can configure sorting dynamically when you query. This allows for more flexible use cases. In general, many settings that can only be configured at the index level in Algolia, can be configured dynamically at search time in Typesense.
Then there’s the flexibility you get with software you can download and run yourself anywhere. Unlike Algolia, you can run Typesense locally on your development machine while you develop against it. You can run it in your CI environment for integration tests if you need to. You can also deploy the same docker image to your production k8s cluster. There is no proprietary vendor lock-in, since the entire codebase is open source.
How did you and Kishore meet?
Kishore and I met during our CS undergrad!
Are there any overarching goals of Typesense that drive design or implementation?
The overarching goal with Typesense is to offer sub-50ms search and developer productivity. So with every feature we build, we pore over search performance on one side and also make sure we make it an out-of-the-box and intuitive experience for developers who use that feature. We also want to ensure that deploying Typesense to production is a simple and straight-forward process.
What tradeoffs have been made in Typesense as a consequence of these goals?
To make search queries fast, Typesense holds the entire index in memory. But the tradeoff is that for petabyte-scale data (like log data), you’d need a ton of RAM to index it, which might make it cost prohibitive and so Typesense wouldn’t be a fit for these datasets.
With developer productivity, the tradeoff I find ourselves making is how configurable a feature should be. Too much flexibility and we end up with one too many config parameters (like ElasticSearch with a couple of thousand parameters), too little flexibility and we end up only being useful only in a particular set of circumstances. Balancing this tradeoff with every feature is a nice challenge.
What is the most challenging problem that’s been solved in Typesense so far?
We spent about 4 months earlier this year focusing on improving indexing performance and concurrency. And we had to keep an eye on performance at every step in the pipeline, since an additional per-record latency of even 0.25ms can add up to several hundred seconds lag in a large enough dataset.
All of this was challenging in that we had to go through about 25 iterations before we were satisfied with what we saw. We also had to put a couple of other key feature requests on hold while we worked on this, which was hard.
Overall, I’m really happy with where we landed. I tested a dataset with ~3 million records (Amazon product data) that was ~13GB on disk and was able to get a throughput of 250 concurrent search queries per second on a 16GB, 8-vCPU 3-node Typesense cluster. I was able to ingest this dataset in about 20 minutes into the cluster.
Any interesting insights related to the concurrency optimizations?
We switched out our memory allocator from malloc to jemalloc, improved on our existing lock-free concurrency mechanism (using shards) to take advantage of all CPU cores, we switched to raft-based clustering so all nodes in the cluster can service reads and writes, and switched data ingests to use streams to handle large volumes of data.
What were you using prior to migrating to Raft?
We previously had a primary-replica model where writes could only be sent to one node.
Did you implement Raft yourself or are you using a library?
Phew, no! Thankfully there's a battle-tested raft library that we use. The library provides hooks into various life cycle events that you integrate with your application.
How is Typesense currently monetized?
We started monetizing Typesense this year. As an open source project, we wanted to make sure that it is also a sustainable revenue-generating business to ensure its longevity.
We initially tried the model of open-core and paid premium features. But we quickly realized that this model hurts adoption. We also had to repeatedly make the hard decision of whether a new feature will be part of the open core or if it will be a premium feature.
Based on feedback from users, we have now pivoted to offering a hosted SaaS version of Typesense, called Typesense Cloud (in public beta since Sep 2020), for those of us who would rather not manage any servers. We run the same open source version in our managed offering, so users can choose to either self host or let us manage their Typesense cluster for them. We have also open-sourced all previously-premium features.
The nice thing about the Open Source + Cloud model is that incentives are aligned well. We dog food our own product on Typesense Cloud and it is in our best interest to make it as easy as possible to operate our product.
In addition to Typesense Cloud which is a paid product, we have also started offering paid prioritized support for companies that need it. We help with best practices around deployment and use, troubleshooting, etc. given our experience running Typesense Cloud.
How do you balance your work on open source with your day job and other responsibilities?
I look at it as having 2-3 projects going parallely and I try to maintain some variety in the type of work I do in each of those projects. For example, if I’m working on the visual designs in one project, I try to work on the dev pieces in another project, and then on customer support in a 3rd project. Having this variety helps keep me motivated on all projects.
What is the best way for a new developer to contribute to Typesense?
The best way would be to contribute an API library in your favorite language. We do have a REST API, but we also have official libraries in JS, PHP, Python and Ruby. We’d love to support libraries in more languages, but we’re not experts in all of them! So we welcome contributions.
Another way would be to use Typesense with the datasets you already have, see how it performs and share benchmarks and feedback with us.
Where do you see the project heading next?
We want it to be a self-sustaining revenue-generating open-source business, that provides powerful yet affordable search solutions for both the solo developer working on their side project, and the large team working on building an instant search experience in their product. So the next big goal is to get the word out that Typesense exists to our peers in the industry.
Do you have any suggestions for someone trying to make their first contribution to an open source project?
If you want to contribute to open source and don’t know where to start, just look at your project’s dependencies file (package.json, Gemfile, etc). Pick a project from there that seems like a relatively small one. Look through their issue tracker for things that seem like low hanging fruit and offer to help. If you can’t find one, start a conversation with the authors and ask them where you can help.
Want to join the conversation about one of the projects featured this week? Drop a comment, or see what others are saying!