Sponsorships
The Sample
Tired of social media? The Sample is a newsletter curated by machine learning. Our recommendation algorithm forwards a different newsletter to you each morning, based on your interests. Subscribe now
jina
Jina is an easier way to build neural search on the cloud.
language: Python, stars: 2465, watchers: 67, forks: 374, issues: 77
last commit: March 05, 2021, first commit: February 13, 2020
legacy-cc
The earliest versions of the c compiler known to exist, written by Dennis Ritchie.
language: C, stars: 2751, watchers: 127, forks: 281, issues: 1
last commit: October 22, 2017, first commit: March 01, 2013
Clone-Wars
Clone-wars is 70+ open-source clones of popular sites like Airbnb, Amazon, Instagram, Netflix, Tiktok, Spotify, Whatsapp, Youtube etc.
stars: 2661, watchers: 91, forks: 141, issues: 7
last commit: March 12, 2021, first commit: December 02, 2020
github-elements
github-elements is GitHub’s collection of web components.
language: JavaScript, stars: 345, watchers: 91, forks: 12, issues: 1
last commit: March 12, 2021, first commit: June 26, 2018
Help Wanted
If you’re interested in posting a help wanted ad for your project to thousands of open-source developers, send an email to console.substack@gmail.com
An Interview With Maximilian of Jina.ai
Hey Maximilian! Let’s start with your background. Where have you worked in the past, where are you from, how did you learn how to program, what languages or frameworks do you like, etc?
I studied mathematics but have focused on software engineering for several years. Soon after I discovered my passion for bringing machine learning into production with Python. Doing this taught me that plain language features are often way more powerful and easier to maintain than complex frameworks.
Who or what are your biggest influences as a developer?
Mostly, the people I work with. When I see someone doing something great, I talk to them and try to get something from their toolbox into mine. Then reading the Clean Code book impacted the way I face software development on a daily basis. And lastly, I really enjoy absorbing best practice guides, where the reasons behind the best practices are laid out very clearly.
What's an opinion you have that most people don't agree with?
There should be more liability with software. Saying, “Oh it is a software problem” is mostly wrong. It is a problem of bringing software to production too quickly.
What’s your most controversial programming opinion?
Printline debugging is proper debugging. I often debug via printline way faster than my peers with proper debuggers.
If you could teach every 12 year old in the world one thing, what would it be and why?
Embrace the power of positive words. If you like what someone is doing/saying, speak out loud. Most probably you will see it again. And you even make them happy. This is especially true if the other person is above you in the hierarchy (a teacher, team lead, manager). They seldom hear when they do something good, but only if something goes wrong. You can steer them with just a simple “Thank you” or “That was great”.
If I gave you $10 million to invest in one thing right now, where would you put it?
I love the idea of local robotic gardening. Most probably I would build a local community where we try to get a farming robot up and running for vegetable production.
What are you currently learning?
Time management and organizing my work as a lead. I recently discovered Obsidian as a note taking tool and started my first serious attempt at note taking.
What resources do you use to stay up to date on software engineering?
For Python itself, I mostly read the release logs. Apart from that, I am in the luxury situation to work with a team that discovers new things literally every day and posts them via our internal Slack. And I follow Raymond Hettinger on Twitter. I love his talks and would recommend any Python beginner to watch them. Especially “Transforming Code into Beautiful, Idiomatic Python” and “Beyond PEP 8 -- Best practices for beautiful intelligible code”, even though they are a little old.
How do you separate good project ideas from bad ones?
That is super hard. I am a very positive person and usually start with a “Yes and” reaction to any idea. We then ideate about it as a team and let it rest for a bit afterwards. If the idea persists and comes back repeatedly, most probably it is a real winner and should be implemented.
Why was Jina started?
When Jina was founded, there was no existing framework for proper Neural Search. Even though a new AI model is announced almost every day, a lot of companies struggle to get actual value from them. Jina is the solution for easily using machine learning models in search. Our CEO Han Xiao has researched and implemented Neural Search systems for several years already.
Who, or what was the biggest inspiration for starting Jina?
I believe Jina was a natural step after Han accumulated a vast amount of knowledge in Neural Search over the years. It was definitely inspired by GNES, one of Han’s former projects.
Are there any overarching goals of Jina that drive design or implementation?
While Jina does Neural Search, we aim for scalability and extensibility. It is not a database - it is a framework.
If so, what trade-offs have been made in Jina as a consequence of these goals?
For scalability, each building block can be seen as a microservice, which can potentially run on its own machine including replication. Furthermore, we allow the user the ability to store data in different stores in their search Flows. This introduces huge challenges, e.g. when implementing CRUD for data management operations. The data can not only be in physically different units, but also in logically different components, which makes ACID operations even harder as they are usually.
When you say microservice, is Docker used? Or something else?
You can use Docker, but it isn’t required, this is up to the user. But, the usual way for most users will be using Docker containers.
What is the most challenging problem that’s been solved in Jina, so far (code links to any particularly interesting sections are welcomed)?
We had quite a few iterations on our Document data representation. We needed a system powerful enough to deal with several layers of depth (e.g. whole text, paragraph, sentence and word). At the same time, implementing algorithms as building blocks, you never know which layer is actually coming in. The actual granularity used must be flexible to support a vast amount of use cases. Yet, the structure should feel as if you are using natural Python objects to make onboarding to Jina easier.
What is your typical approach to debugging issues filed in the Jina repo?
Currently most issues we get are rather feature requests, where we need to properly weigh the complexity of adding and maintaining a feature inside the framework against the benefit it brings.
How is Jina intended to be monetized?
We indeed have a plan to commercialize Jina in the long run. But at the current stage, we only focus on improving developer experience.
We are referring to different open-source business models. Our core product, Jina, as well as other existing open source projects will remain open source. At a later stage, we plan to implement enterprise features on top of the core product.
How do you balance your work on open-source with your day job and other responsibilities?
I am in a luxury situation: Contributing to open-source is my day-to-day job, since Jina is open-source.
Do you think any of your projects do more harm than good?
I hope not. Obviously, whenever you come up with a technology that is open source, people can use it in a way that brings harm. Advancing in the field of AI is no exception to this.
What is the best way for a new developer to contribute to Jina?
Any issue opened in our GitHub or Slack helps us detect bugs and understand how Jina is used. If you want to make your own model or algorithm usable by any Jina user, you can also provide a module to our Hub quite soon.
And finally: Join us from anywhere in the world: We are always looking for talented people that embrace challenges, pair programming, learning and a multicultural environment.
If you plan to continue developing Jina, where do you see the project heading next?
We just released version 1.0 with all the features we envisioned that are needed for Neural Search. We are looking forward to having industry partners and the community use it. The next steps will be making it even more robust in bigger environments and developing a bigger supporting infrastructure and a front-end.
Where do you see software development in-general heading next?
I believe machine learning is now at the point of development that software engineering was in the mid 90’s. People are having problems really bringing it into production, and more and more best practices develop. It is developing from a research field into an actual production field. Practices like Clean Code, Test Driven Development etc. will come into existence in the field of machine learning.
Where do you see open-source heading next?
To be honest: I am pretty new to contributing to open source and have no real clue. Anyhow, I am worried about big cloud players milking open source projects without contributing back with serious amounts of money. The Open Source Community will come up with a solution for it, but that might take some years to be really effective.
Do you have any suggestions for someone trying to make their first contribution to an open-source project?
Fix a typo. It is easy to find them and you learn the process of contributing without the need for caring about code quality etc. You can really concentrate on having the contribution experience and it is hard to fail fixing a typo.
Like what you saw here? Tell your friends about Console and win free swag!
Also, don’t forget to subscribe to get a list of new open-source projects curated by an Amazon software engineer directly in your email every week.