Console 54

Sherlock, Send, and DataFusion

Sponsorships

A·Team: A collective for top independent developers + product builders

A·Team is a stealth, invite-only collective of top independent builders; mostly UX/UI designers, software developers, and product folks. We match you to high-paying work with vetted clients, generally $110-$190/hr. We're a bit different from other platforms though in that we're quite small & selective, have a community at the core, and try to meet everyone 1:1 to better understand what they're looking for. We also match companies with teams rather than individuals, which means a) higher hourly rates and b) it's easy to work alongside friends and other insanely talented builders. We launched ~14 months ago, and have paid more than $4 million to A·Team members since.

We're staying small, but looking for a few more independent builders; if you mention "console" under who referred you, we'll fast-track you in. Here's the link to join (takes <90 seconds).


sherlock

Sherlock is a tool to hunt down social media accounts by username across social networks.

language: Python, stars: 25218, watchers: 861, forks: 2567, issues: 152

last commit: January 15, 2021, first commit: December 24, 2018

https://twitter.com/sidheart

send

Send is a a fork of Mozilla’s Firefox Send, which was simple, private, file sharing.

language: JavaScript, stars: 1147, watchers: 13, forks: 42, issues: 7

last commit: May 19, 2021, first commit: May 24, 2017

https://twitter.com/likecaffeinated

arrow-datafusion

DataFusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.

language: Rust, stars: 437, watchers: 37, forks: 51, issues: 207

last commit: May 23, 2021, first commit: February 05, 2016

https://twitter.com/TheASF

readmelater

Read Me Later is a bookmark with a snooze button. Bookmark, buffer and complete your reading list.

language: Vue, stars: 28, watchers: 3, forks: 3, issues: 0

last commit: May 18, 2021, first commit: May 19, 2019

https://twitter.com/vettijoe


An Interview With Andy Grove of Ballista

Hey Andy! Thanks for joining us! Let’s start with your background. Where have you worked in the past, where are you from, how did you learn how to program, what languages or frameworks do you like, etc?

I’m originally from the UK and started learning how to code when I was around 13 years old. I knew right away that this is what I wanted to do as a career, so I finished secondary school when I was 16 and went straight into a career as a junior programmer for a financial services company in London. My first job was working with databases (dBase III and Clipper). I eventually moved onto C++, and then Java, and have primarily been working with the JVM for the past 20 years. Around 5 years ago, I decided that I needed to start learning some new skills and this is when I began to invest in learning Rust. I was keen to get back to a compiled language and I was intrigued by Rust’s unique approach to memory management.

Who or what are your biggest influences as a developer?

Apache projects: Hadoop, Spark, Catalyst, Arrow, and the people behind those projects.

What’s your most controversial programming opinion?

I’m not often controversial so I’ll take this opportunity to give you two controversial views: 1) Hand-written parsers are better than using parser generators! Since discovering the Pratt parser paper, I have been a huge fan of this approach. 2) When coding in Rust it is often fine to avoid explicit lifetimes and just store data on the heap using Box. You’re still better off than using a garbage-collected language. Optimization can come later, as you get more comfortable with the language.

What are you currently learning?

I’m not learning anything specific right now, but I am continuing to learn about query engines and distributed computing, both in my day job, and with my involvement with Apache Arrow. I’m getting the sense that I could continue learning about these topics for the rest of my career.

How do you separate good project ideas from bad ones?

That’s a tough one. What is a “good” project idea? Is it the one you learn the most from? Is it the one that gets the most GitHub stars? Also, it can be very subjective. I once built a tool called FireStorm/DAO which was a simple tool for reverse-engineering database schemas and then generating Java code based on the schema. Some people loved the tool and some people hated it. That was definitely a good learning experience for me. There are different audiences with different needs and opinions. These days, I tend to work on side projects just to explore ideas or learn a particular skill. They are usually very short-lived. Ballista was an unusual one because it gained a lot of traction early on and grew into something larger.

Why was Ballista started?

Partly for the same bad reasons that many projects start. I wanted to learn something new and I had a new favorite programming language and needed a real project to build. Also, for some well thought out reasons (hopefully) in that a natively-compiled systems level language like Rust has a lot of benefits over a garbage-collected language for large-scale data processing. I was particularly excited for the performance of compiled code and the efficient memory usage. Early experiments demonstrated that Ballista would use an order of magnitude less memory than Spark in some cases. The lack of GC also means that performance is consistent.

Who, or what was the biggest inspiration for Ballista?

Apache Spark is by far the biggest inspiration for Ballista. Apache Spark is a brilliant platform but I always felt that it was implemented in the wrong language. However, rather than being a direct port or rewrite of Spark, Ballista is a reimagining of Spark based on the current state of the art, which has moved on a bit since Spark was started in 2009.

Are there any overarching goals of Ballista  that drive design or implementation?

Yes, language-neutrality.

What trade-offs have been made in Ballista  as a consequence of these goals?

If this had been a pure Rust solution then there would have been no need to write serde code for serializing query plans but because of the language-neutral architecture we have had to write lots of tedious code to translate between Rust structs and Google Protobuf format.

What is the best way for a new developer to contribute to Ballista?

Ballista is now part of the Apache Arrow DataFusion repository, and we have a number of GitHub issues tagged with “good first issue”, so I would recommend finding an issue that looks interesting and then start asking questions there, or on the Arrow mailing list.

Do you have any suggestions for someone trying to make their first contribution to an open-source project?

My advice would be to start small and pick a trivial issue to work on, even a documentation issue. This is a good way to get to know the tools and processes and start to interact with the other developers that are working on the project.


Like what you saw here? Why not share it!

Share

Or, better yet, share Console!

Share Console

Also, don’t forget to subscribe to get a list of new open-source projects curated by an Amazon software engineer directly in your email every week.