Console #96 -- AppFlowy, Flowistry, and Ploomber
An Interview with Ido of Ploomber
LEX has created a new way to invest in real estate.
LEX turns individual buildings into public stocks via IPO. Now you can invest, trade, and manage your own portfolio of high-quality commercial real estate.
Any US investor can open a LEX account, browse assets, and buy shares of individual buildings.
LEX opens up direct and tax advantaged ownership in an asset class that has previously been inaccessible to most investors.
Explore LEX’s live assets in New York City and upcoming IPO in Seattle.
AppFlowy is an open-source alternative to Notion. You are in charge of your data and customizations. Built with Flutter and Rust.
language: Rust, stars: 18759, watchers: 192, forks: 930, issues: 111
last commit: March 07, 2022, first commit: June 16, 2021
Flowistry is a Rust IDE tool that analyzes the information flow of Rust programs. It understands whether it's possible for one piece of code to affect another.
last commit: March 09, 2022, first commit: March 10, 2021
Ploomber is the fastest way to build data pipelines. Use your favorite editor (Jupyter, VSCode, PyCharm) to develop interactively and deploy without code changes (Kubernetes, Airflow, AWS Batch, and SLURM).
language: Python, stars: 1429, watchers: 17, forks: 99, issues: 150
last commit: February 28, 2022, first commit: January 21, 2021
Not subscribed to Console? Subscribe now to get a list of new open-source projects curated by an Amazon engineer in your email every week.
Already subscribed? Why not share Console with the best engineer you know?
Hey Ido! Thanks for joining us! Let’s start with your background. Where have you worked in the past, where are you from, how did you learn to program, what languages or frameworks do you like?
I've always been working with data, even before my Columbia University MS. During the degree, I was focusing on improving networks with ML and I was a research assistant in a VR/AR lab. The goal was to detect the best route in physical optic networks, to do that I had to build a neural network to find the best route. After that, I was leading multiple data teams in AWS and Samsung. My first deep dive with programming was Java and Scala, then I slowly shifted to Python and stayed with it ever since.
What's an opinion you have that most people don't agree with?
That notebooks can run in production. We have a real belief (it’s also backed with data of course) that notebooks can be used in production, this increases collaboration within teams and allows data scientists to build faster.
What is your favorite software tool?
I obviously love Ploomber but I’m biased. It’s such a simple idea but a powerful one, allowing data scientists to build data pipelines faster with their favorite environments.
If I gave you $100 million to invest in one thing right now, where would you put it?
I’d invest in the MLOps space, it seems in the past that ML and DS was mainstream for pet projects, but there’s a shift in the perception now that ML can and will produce business insights, that it needs to run at scale and reliably.
What are you currently learning?
I’m constantly learning how difficult it is for data scientists to perform their work. There are a lot of MLOps tools out there, most of them are forcing the data scientist to get out of their natural environment (Jupyter notebooks). It’s also impressive to see how ingenious some of the solutions to support notebooks are.
Why was Ploomber started?
There are two angles to this question, the first is my personal experience at AWS. I was leading data teams as part of their consultancy group and I’d seen first hand that data science teams/projects allocate 30% of the time to refactor the code, clean it for production and then start testing it. It seemed absurd to me that such waste happens, and we have to rewrite most of our work. The second angle is Eduardo’s from the data science perspective, he couldn’t seem to find a tool to ease and improve his work with SQL, Git, and other common tools. It frustrated him so much that he decided to join forces with me and build Ploomber.
Where did the name for Ploomber come from?
Are there any overarching goals of Ploomber that drive design or implementation?
Yes, Ploomber is 100% community-driven. 90%+ of the existing features were requested by users. We emphasize building only when there are enough requests for a specific feature. We’re also focusing on enhancing the experience with the cloud version that we’re currently running with users in private-beta.
What is the most challenging problem that’s been solved in Ploomber, so far?
There are lots of challenges we’re solving, and we blog about it too! For instance how to create reproducible Python environments, how to write seamless SQL pipelines, and how to test your ML code. If you’re building a modular interactive pipeline, it’s way easier to perform these tasks. We’ve also been working on monitoring data pipelines, and a single click cloud deployment, to run parallel execution as part of the Ploomber cloud.
What is your typical approach to debugging issues filed in the Ploomber repo?
We love this process, it’s pretty instantaneous. Users show up in the community slack channel and start asking questions/requesting features. Usually, within minutes we understand where to guide them, if there’s an issue that requires improvements/bug fixes, or if it’s just a misunderstanding. These really help us create a better experience for the users and improve the product.
Is Ploomber intended to eventually be monetized if it isn’t monetized already?
The open-source repo is already a pretty awesome product that saves ~40% of the development and testing time for data science teams. It will remain our core offering and will continue to be the center of Ploomber.
We are currently working on a private-beta for Ploomber cloud, it will focus on premium features that complement the repository and supports teams and enterprises. It will allow our customers to operationalize their work even faster. We plan to include monitoring for data pipelines, 1-click cloud execution, wider deployments, storage for artifacts (reproducibility), and CI/CD in the cloud version.
What is the best way for a new developer to contribute to Ploomber?
Great question, as with any other big OS project, we have contributing guides and open issues in the github repository. The best place to start is by finding an issue that interests you and submitting a PR. You can understand from the guide how to do it. We also have a very active slack channel where you can ask questions. On top of that, we have a core contributors program where veteran community members can mentor and help with some of the initial obstacles in open source code commits.
What motivates you to continue contributing to Ploomber?
Some of the feedback we were getting from users is amazing like we’re saving them so much time and headache, or that until they found Ploomber they were hacking shell scripts to run stuff. It really gives our work a purpose.
Where do you see software development heading next?
In MLOps, it seems like the web stack 5-6 years ago, there are still old tools that don't perform the work in the ideal way, there’s also a lot of those tools out there who focus on the wrong things. We believe this domain will shift and start focusing on notebooks, and faster development cycles.
Where do you see open-source heading next?
Open source is awesome, it keeps on growing every day now, and I think it’ll only get bigger. There’s already some understanding it’s not a passing trend and big companies are putting a lot of resources to contribute and develop open source.
Want to join the conversation about one of the projects featured this week? Drop a comment, or see what others are saying!
Interested in sponsoring the newsletter or know of any cool projects or interesting developers you want us to interview? Reach out at email@example.com or mention us on Twitter @ConsoleWeekly!