Console #123 -- Interview with Michael of Seaborn - statistical graphics in Python

Featuring Libreddit, Fiber, and seaborn

Sep 18, 2022

Libreddit

An alternative private front-end to Reddit - Libreddit hopes to provide an easier way to browse Reddit, without the ads, trackers, and bloat.

language: Rust, stars: 3409, issues: 114, last commit: June 24, 2022
repo: github.com/spikecodes/libreddit
site: libreddit.spike.codes

Fiber is an Express inspired web framework built on top of Fasthttp, the fastest HTTP engine for Go. Designed to ease things up for fast development with zero memory allocation and performance in mind.

language: Go, stars: 22375, issues: 22, last commit: 1 day ago
repo: github.com/gofiber/fiber
site: gofiber.io

Seaborn

Seaborn is a library for making statistical graphics in Python. It builds on top of matplotlib and integrates closely with pandas data structures.

language: Python, stars: 9799, issues: 94, last commit: 2 days ago
repo: github.com/mwaskom/seaborn
site: seaborn.pydata.org

Checkout more projects on Open Source Hub and join thousands of other open-source enthusiasts and developers in our Discord server to continue the discussion on the projects in this week's email!

🎤 Interview With Michael of Seaborn

https://pbs.twimg.com/profile_images/1393312611517845505/QGzZWQ8a_400x400.jpg

Hey Michael! Thanks for joining us! Let us start with your background. Where are you from, where have you worked in the past, how did you learn to program, and what languages or frameworks do you like?

Hi, thanks for having me. I’m currently a data scientist at a health tech company in New York City. My current day job involves building machine learning systems for oncology data, although my background is in computational neuroscience.
I started programming relatively early: I was always writing little functions on my TI-83 to make math homework less tedious. But I didn’t really expect it to be part of a career until I graduated college and started working in a neuroscience lab at MIT.
I had the great fortune of landing in one of the few environments where people were using Python for scientific programming, as MATLAB was otherwise completely dominant at the time. In the intervening decade or so, Python has grown from an odd iconoclasm to the obvious choice for anyone doing machine learning. Nearly all of my professional work uses what is now a standard set of libraries: numpy, pandas, matplotlib, scikit-learn, and pytorch. So I was very much in the right place at the right time.

Who or what are your biggest influences as a developer?

Asking myself “what would scikit-learn do?” has rarely led me astray.

What’s your most controversial programming opinion?

I don’t find programming hot takes to be that interesting; everything has tradeoffs, and it’s easy to mistake a different set of objectives or constraints for a unique insight. But everyone should probably chill out about doing `import *` for interactive work.

What is your favorite software tool?

I’ve had Jupyter running on localhost:8888 more or less continuously for the past decade. Nearly every criticism that people raise about notebooks is correct in some sense, and yet they’ve been incredibly transformational for data science.

What is your favorite book and why?

Moby Dick is a great adventure tale, contains some of the most wondrous and improbable sentences in the English language, and is filled with incredible facts about whales that just happen to be completely untrue.

Have you ever experienced burnout? How did you deal with it?

I would be surprised if any maintainer of an even moderately-successful open source project could say no to this question. It is not a restorative hobby. But I didn’t always appreciate the universality of open-source burnout — or some of the experiences that lead to it — until reading Working in Public by Nadia Eghbal. Among other takeaways, this book contains some good lessons on setting boundaries and not feeling guilty for enforcing them.

If you had to suggest 1 person developers should follow, who would it be?

I can’t think of anyone better than Paul Ford (@ftrain).

If you could teach every 12 year old in the world one thing, what would it be and why?

I believe that we can do a much better job teaching good intuitions about probability and statistics by using interactive visualizations, and 12 doesn’t seem too early to start.

If I gave you $10 million to invest in one thing right now, where would you put it?

S&P 500 index funds.

If I gave you $100 million to invest in one thing right now, where would you put it?

Commonwealth Fusion Systems.

How do you separate good project ideas from bad ones?

To a certain extent, I’m just developing tools for myself. I use seaborn nearly every day for work. So my intuition about what is a good or bad idea is probably better than it would be if I didn’t really know what it felt like to actually use the software. At the same time, using a tool as an expert is not representative of the median user experience. I do try to keep tabs on what people find confusing or difficult by paying attention to the StackOverflow feed and other channels.

What’s the funniest GitHub issue you’ve received?

This wasn’t a GitHub issue, but it does come to mind. There’s a longstanding gag in seaborn where it will error out with a somewhat terse message if you try to apply the infamous “jet” colormap. There’s lots of reasons that you shouldn’t use jet, but this is really just a joke, and it’s easy to get around if you know what you’re doing. Still, the humor is a bit lost on some people, including one Reddit user who responded with multiple rants about how restricting user behavior in this manner violates the core principles of open-source software and betrays the legacy of Richard Stallman. I thought that was a little over the top.

Why was Seaborn started?

It was originally just an attempt to keep my personal code organized and usable across projects. It was up on GitHub because, well, why not? Eventually, people found it, used it, and told their friends about it. I’d had experience developing other open source packages at that point, so it’s not that I couldn’t imagine this happening. But I wasn’t immediately encouraging, and I certainly never expected it to become as widely used as it is.

Where did the name for Seaborn come from?

You’d have to ask Aaron Sorkin.

Are there any overarching goals of Seaborn that drive design or implementation? If so, what trade-offs have been made in Seaborn as a consequence of these goals?

Originally, it was reasonable to assume that anyone using seaborn already knew how to use matplotlib (the lower-level graphics library that seaborn is based on). So seaborn was designed to fit into the normal flow of making matplotlib plots, meaning that it inherited (or at least had to abide by) some of matplotlib’s quirks. And by design, seaborn offered only a higher-level API, delegating a lot of fine-grained configuration to matplotlib. If you already know matplotlib, this offers a fairly powerful usage pattern: seaborn will get you 90% of the way there (which is often enough), and then matplotlib can take you the rest of the way when necessary.
But as seaborn developed a more distinctive API and gained a reputation for being somewhat easier to work with, people started recommending that new users should learn it first. This is sometimes to their benefit, but it can also impose a very discontinuous learning curve, where things feel pretty easy until suddenly they feel very hard.

What is the release process like for Seaborn?

Because seaborn is pure Python, it’s a relatively simple process, especially as I recently switched to using flit. But I do put a lot of work into updating and improving the documentation in the period leading up to a major release. Documentation can always be better, so that can feel a bit like running towards a finish line that never really gets closer. At some point, you’ve just gotta hit the big button.

Is Seaborn intended to eventually be monetized if it isn’t monetized already?

Nope, I expect it will always remain a hobby project. I did get a free security key from PyPI recently, which was nice.

Do you think any of your projects do more harm than good?

No, trust and safety is very important: for instance, as mentioned above, seaborn prevents users from applying the jet colormap.

If you plan to continue developing Seaborn, where do you see the project heading next?

In the past few weeks, I’ve put out a release that basically reinvents the library with a new interface. Among other things, the new interface eliminates constraints from the original design that made it difficult to grow in certain directions. So it’s now possible to implement some long-desired features that weren’t previously a good fit. The new interface also aims to solve the discontinuous learning curve problem mentioned above, and it presents other opportunities to make seaborn even easier to use.

What motivates you to continue contributing to Seaborn?

It’s always rewarding to get pinged with a citation alert for the seaborn paper and to see that someone has used it to do some cool science.

Are there any other projects besides Seaborn that you’re working on?

One downside to having a popular library to maintain is that it can feel difficult to justify spending time on diversionary projects, even if they would be greenfield and potentially more fun.

Where do you see software development heading next?

I guess the big question is what impact tools like GitHub Copilot will have. It’s possible that generative language models will fundamentally transform the act of programming, but I’m a little doubtful. I’m very interested in technology that can enhance human creativity and productivity, but “tool that generates chunks of code” feels like a local minimum that we arrived at mostly because we had the right training data for it. So I hope we don’t get stuck there.

Where do you see open-source heading next?

Academia is gradually coming around to the idea that building tools can be “real work” too and that good scientific software should be funded and rewarded. I think that’s good news for everyone.

Do you have any suggestions for someone trying to make their first contribution to an open-source project?

If you’re a knowledgeable user of a library, an underrated way to contribute is to write good answers to questions on StackOverflow. That directly benefits other users, helps you develop a deeper understanding of a tool’s use cases and rough edges, and can be a strong signal to maintainers that you are someone who knows what they are doing. In fact, I believe that Thomas Caswell originally got recruited into matplotlib development because he was writing good StackOverflow answers, and now he’s the project lead.

Want to join the conversation about one of the projects featured this week? Drop a comment, or see what others are saying!

Console by CodeSee.io

Discussion about this post