Console #120 -- Interview with Phillip of Ibis - a Python library to write expressive analytics

Featuring Wallabag, Ballerina, and Sampler

Aug 28, 2022

“I hate GPS” - said no one ever. CodeSee is the #1 code visibility platform that visualizes, detects and automates your code for better onboarding, code reviews, code quality, and compliance. Developers across the globe use CodeSee Maps to ship code faster without sacrificing quality.

Get started for free

🏗️ Projects

Wallabag

Wallabag is a self hostable application for saving web pages for later reading. It extracts content so that you won't be distracted by pop-ups.

language: PHP, stars: 7285, issues: 588, last commit: August 26, 2022
repo: github.com/wallabag/wallabag
site: wallabag.org

Ballerina

Ballerina is a statically typed programming language for the cloud that makes it easier to use, combine, and create network services.

language: Ballerina, stars: 3133, issues: 1704, last commit: August 26, 2022
repo: github.com/ballerina-platform/ballerina-lang
site: ballerina.io

Sampler

Visualization for any shell command, right from your terminal. Sampler is a tool for shell commands execution, visualization and alerting. Configured with a simple YAML file.

language: Go, stars: 10,432, issues: 43, last commit: April 25, 2022
repo: github.com/sqshq/sampler
site: sampler.dev

Ibis

https://avatars.githubusercontent.com/u/27442526?s=200&v=4

Ibis is a Python library to help you write expressive analytics at any scale, small to large. Its goal is to simplify analytical workflows and make you more productive.

language: Python, stars: 1961, issues: 71, last commit: August 28, 2022
repo: github.com/ibis-project/ibis
site: ibis-project.org

Join thousands of other open-source enthusiasts and developers in the Open Source Hub Discord server to continue the discussion on the projects in this week's email!

🎤 Interview With Phillip of Ibis

https://avatars.githubusercontent.com/u/417981?v=4

Hey Phillip! Thanks for joining us! Let us start with your background. Where are you from, where have you worked in the past, how did you learn to program, and what languages or frameworks do you like?

I’ve spent about half my life in Texas, mainly Austin and the rest in different parts of New York City.
I’ve worked at a number of places, varying in size. I worked at a 3 person startup right out of graduate school, then at Continuum Analytics (now Anaconda), then at Facebook, then Two Sigma investments, followed by an automated checkout startup called Standard Cognition and now I’m at a startup called Voltron Data.
I guess I’d say I’m a self-taught programmer, though I think that’s somewhat of a misnomer in most cases. Almost everyone gets feedback from _someone_ when they’re learning.
I got interested in programming working in the lab of the late Josh Wallman studying human saccadic eye movements. We programmed in MATLAB and LabView (!) and I was hooked as soon as I got a taste. I then spent a bunch of time that I should’ve been working on in grad school on writing broken Python code and loving every minute.
Since then, I’ve worked on a wide variety of things. In my last role, I dove deep into Rust and spent a bunch of time in the cloud. I really enjoyed writing Rust.
I have a side obsession with dependency management and desire to see humanity converge on a solution to that problem. Over the past two years, I’ve spent a bunch of time learning nix, and so far it’s the only solution for dependency management that I think even has a shot at solving the problem.

Who or what are your biggest influences as a developer?

I think my inspiration for different kinds of development work comes from different people. I try to find work where the people around me can influence me in a deep way.
Python has been a huge influence on me as a developer. It was the primary vehicle through which I learned how to write software.
I think learning Lisp was a huge factor in my programming interests and my fascination with DSLs and programming languages.

What's an opinion you have that most people don't agree with?

Databases should scale down. The tradeoff gap between scale and convenience isn’t wide because of technical challenges. Many of the databases that scale to most massive workloads feel huge to use. I don’t think that’s a given. It’s possible to make the bigger systems feel much more like SQLite than they currently do.
Dependency management can be solved.

What is your favorite software tool?

It changes, but right now I’m really loving VisiData.
I had a moment of clarity when looking at US political data from Wikipedia when I realized that I don’t need to write code to get to insights _and_ I don’t have to sacrifice the power and concision of the command line.

If you could dictate that everyone in the world should read one book, what would it be?

Mountains Beyond Mountains by Tracy Kidder.

Who are your favorite people in the development community, and why?

I think Dave Beazley is great. He spends a lot of time coming up with ways to drive people bananas with Python, and I think it’s a very effective teaching tool. It’s very entertaining to watch him speak and if you’re at a live talk to watch people’s faces crumple up when their brains melt.

If you could teach every 12 year old in the world one thing, what would it be and why?

It’s okay to fail.

What are you currently learning?

How to meditate.

What have you been listening to lately?

Woody Shaw Stepping Stones: Live at the Village Vanguard. I’m really into jazz, in particular the post-bop period from the 50s to the late 60s.

Why was Ibis started?

Ibis was started to make it easy to do exploratory data analysis with complex analytics tools such as massively parallel databases. At the same time, there turns out to be a good deal of API complexity with smaller systems such as PostgreSQL, and ibis fills that niche as well.

Who, or what, was the biggest inspiration for Ibis?

Wes McKinney is the biggest inspiration for ibis. I think he saw the need early on for tools like ibis and drove the initial vision of the project.
Technologically I would say that ibis has roughly four major influences: Python, pandas, SQL, and dplyr (from the R community). This is reflected in the API and spirit of the project in many places.

Are there any overarching goals of Ibis that drive design or implementation? If so, what trade-offs have been made in Ibis as a consequence of these goals?

One design goal for ibis was to maximize the likelihood that your code is going to run before it ever hits the database. One challenge with engines that live closer to the data is that failure modes are more expensive. Ibis does more and deeper validation of your code than tools like dplyr or even SQLAlchemy.
A consequence of this is that it requires more end-user code to extend ibis from outside the library. It’s certainly supported and possible, but it’s less convenient than if we performed no type checking or validation at all.

What is the most challenging problem that’s been solved in Ibis, so far?

Correlated subqueries continue to be a thorn in the side of maintainers, though I wouldn’t say they are solved.

I really like your docs. What advice do you have for projects looking to improve their documentation?

Explore documentation of projects outside of the primary tool of the ecosystem that you work in. If you work in Python, look at Rust’s rustdoc. If you work in Rust, look at mkdocs, and so on. There’s a lot of inspiration living outside of the communities in which people typically work.

What is your typical approach to debugging issues filed in the Ibis repo?

Reproduce the issue
Set a breakpoint and start spelunking

What is the release process like for Ibis?

Almost entirely automated. The only manual step right now is to click a button that runs a GitHub action to kick off the release. All the tagging, artifact publishing, docs building and file-updating is done automatically.

How do you balance your work on open-source with your day job and other responsibilities?

I’m very fortunate that I spend the majority of my time working on open source, so balancing my day job and open-source work is not challenging for me at the moment.
In the past when this wasn’t the case, most of my open source work became personal projects or I contributed very little. There’s no shame in checking out if you want to.

If you plan to continue developing Ibis, where do you see the project heading next?

Saul Pwanson has brought the power of ibis to VisiData in a package called vdsql. We’re making sure that Visidata can scale to the capital-B Big databases using ibis.
We’re investing in the next major release of ibis having a more stable, long-term representation that’ll be a foundation for the next 5+ years. We really want people to be able to build on ibis without having to worry too much about the core making big breaking changes to the internal representation. This is going to unlock some powerful techniques for the project as well, like the ability to implement the simpler algebraic rewrites and optimizations embedded in most relational database optimizers. This will allow us to generate better SQL from both a UX and execution perspective.

What motivates you to continue contributing to Ibis?

Interesting design and implementation problems and the people that work with me on the project. Krisztián Szűcs and more recently Gil Forsyth (from xonsh!) are really wonderful to work with on ibis.

Are there any other projects besides Ibis that you’re working on?

I regularly contribute to and maintain packages in NixOS/nixpkgs. I wrote a tool to rewrite protocol buffers Python imports called protoletariat.

Do you have any other project ideas that you haven’t started?

Plenty, but there’s only so much time in the day!

Do you have any suggestions for someone trying to make their first contribution to an open-source project?

Find a project that interests you. The mechanics of contribution are less important than the desire to contribute. If the project interests you and the maintainers are reasonable people then you’ll figure out a way to do it. That latter point is important, not all open source projects are approachable.
The first major open source project I contributed to was pandas, and the first “contribution” I made was this issue https://github.com/pandas-dev/pandas/issues/1920 where I pasted the thing I wanted as a code snippet into the issue body. This is not an ideal way to contribute, but it’s fine as a starting point. The hardest part can be summoning the courage to interact, but once you do that, the rest feels doable.

Want to join the conversation about one of the projects featured this week? Drop a comment, or see what others are saying!

Console by CodeSee.io

Discussion about this post