Console #78 -- pyWhat, laser_control, and deptoolkit

An Interview With Basile Simon of The Digital Evidence Preservation Toolkit

Sponsorship

Axolo

Still waiting to get your pull request reviewed?

Axolo is a GitHub - Slack app that helps tech teams resolve pull requests faster by creating 1 channel for 1 code review. Oh! and the best part? We don’t require any code access! Get started in 2 minutes for free with no credit card at axolo.co


Not subscribed to Console? Subscribe now to get a list of new open-source projects curated by an Amazon engineer in your email every week.

Already subscribed? Why not spread the word by forwarding Console to the best engineer you know?


Projects

30-seconds-of-code

30-seconds-of-code is short JavaScript code snippets for all your development needs.

language: JavaScript, stars: 87864, watchers: 2383, forks: 9314, issues: 6

last commit: November 03, 2021, first commit: November 29, 2017

social: https://twitter.com/30secondsofcode

pyWhat

Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it’ll tell you what it is!

language: Python, stars: 4729, watchers: 60, forks: 224, issues: 19

last commit: November 01, 2021, first commit: March 19, 2021

social: https://twitter.com/bee_sec_san

Laser_control

Laser_control is an in-progress laser device for neutralizing mosquitoes, weeds, and other pests.

language: Python, stars: 324, watchers: 11, forks: 11, issues: 1

last commit: October 26, 2021, first commit: December 23, 2020

social: https://twitter.com/ildarRakhmatul1

deptoolkit

DEPToolkit is proof-of-concept software for researchers and small teams sifting through online material. With only one click of the mouse, the material will be archived in a framework demonstrating chain of custody and stored durably.

language: TypeScript, stars: 6, watchers: 5, forks: 3, issues: 6

last commit: August 23, 2021, first commit: May 12, 2021

social: https://twitter.com/deptoolkit


Console is powered by donations. We use your donations to grow the newsletter readership via advertisement. If you’d like to see the newsletter reach more people, or would just like to show your appreciation for the projects featured in the newsletter, please consider a donation 😊

Donate To Console


An Interview With Basile Simon of The Digital Evidence Preservation Toolkit

Hey Basile! Thanks for joining us! Let’s start with your background. Where have you worked in the past, where are you from, how did you learn how to program, what languages or frameworks do you like?

My background is in journalism. I studied law and political science, but I worked in news organisations of different kinds, from the BBC (a large broadcaster), the Times & Sunday Times (a newspaper) and Reuters (a gigantic news agency). My thing is data journalism, which I mean in a really broad sense, but I think the industry calls it creative technology, or “data hyphen something”.

I'm self-taught and I picked up some JS first to do imperative D3 for a long time. Then React came along, and I went down the R and tidy data route (look it up if this doesn't ring a bell). Recently, I've gone freelance and got to learn a huge amount from the declarative and dare I say functional world, and I've rediscovered JS through TypeScript and Svelte. This plays amazingly well with this D3-less way of writing bespoke data visualisations.

What are you currently learning?

For fun, I'm picking up a couple of projects in Clojure/Clojurescript, which I find fascinating due to the way writing code feels. It's concise and encourages thinking about isolating (or concentrating and deferring) risky steps. "Does this function do this one thing and this one thing only?" If yes, I can fold it away and stop worrying about it.

And with the REPL, you can test and play as you write. A fresh perspective!

What’s the funniest GitHub issue you’ve received?

Not an issue per se, but a bug report. It’s election night in the UK in 2017, and I was working at The Times and The Sunday Times. It's the big show: all the work goes out as the results start coming in, all of it was planned and built in only a few weeks. It's fairly high-traffic and the moment that the digital teams across news orgs look at each other's work very closely.

In the thick of it, we get word from the project's editor that his name had been changed to some kind of pun. Or at least that’s what the man says his mates are telling him. That’s at the top of the page, at its busiest. Some folks say they saw it, but I’m not sure I did. To this day we're still not quite sure what happened, and kind of think it was some sort of git history trick...

Why was the DEPToolkit started?

A buddy of mine who was outside Minsk in October last year during the protests asked me what they should set up to archive some of the content they had access to, so that Lukashenko would pay the price for his actions. I didn't know what to tell them. I knew of things that were a bit difficult to set up, or not really geared towards the individual researchers, or unable to capture content behind authentication (e.g. private groups).

Archiving in view of prosecution is tricky because preserving the chain of custody of a digital item involves a bit of care from the get-go. You can't move it about and make copies because at some point there will be a human error and the original file will be lost.

In 2014, when the West went all guns blazing in Iraq and then Syria to bomb ISIS, which the public was just discovering, I worked on setting up an NGO/ reporting project called Airwars.org, which was assessing claims of civilian harm as a result of these airstrikes. The project was really, really successful in terms of changing the way the military proactively reports these mistakes.

But, I really felt we were handling this "open source" material (as in, freely available, it's weird lingo but has somehow stuck) in a bit of a funny way. This was fine for talking to the media but I always wondered what would be the probative value of this stuff in court.

Can you give some examples of some of these “difficult to set up” tools?

Hunchly, which is expensive, SugarCube, which is the motherlode, or recently-Console’d ArchiveBox which we couldn’t run on their machine.

Who, or what was the biggest inspiration for the DEPToolkit?

Without a doubt, the Syrian Archive. I met Jeff in 2015 when they were getting started properly and their work is just stellar. They're vacuuming social media platforms at a huge scale and verifying so much material... The North Star goal of theirs being to enable all that have a claim during an eventual international prosecution for what's been happening in Syria.

Are there any competitors or projects similar to the DEPToolkit? If so, what were they lacking that made you consider building something new?

Yeah, there's direct competition, and quite frankly some duplication. The Digital Evidence Vault, out of Carnegie Mellon University, does quite a few of the same things we do, but... it's closed source (rather: soon-to-be-open-source), you can't get to the guy (at least, I couldn’t), and you know that's something that comes up time and time again when talking to their users.

There's real value in the openness of open source software. For small projects like this, particularly, there's nothing you can hide. So far, it seems that this has an appeal.

How do you intend to monetize the DEPToolkit?

Monetized is a bit too strong a word in this instance. The project got off the ground thanks to a grant by a German federal ministry and the Open Knowledge Foundation and I feel a responsibility towards this support to keep things out there and for free. That said, improvements and fixes ideally don't depend on one's happiness to throw some spare time at it – that's not sustainable.

On the other hand, I've got first-hand experience of doing the philanthropy and grants and fellowships dance and reporting and matching funders' objectives. That's not my profession, and it's too much work for me, so I won't do that.

The leads I'm exploring right now are to fund the open source side through commercial engagements with enterprise software providers, some of whom appear to be interested in integrating some of what the Toolkit does in the workflows they have. More on that later (I really hope).

What was the most surprising thing you learned while working on the DEPToolkit?

There's still a huge amount of interest for anything that mentions blockchain or crypto (be it -graphy or -currency). I got the weirdest people reaching out of absolutely nowhere because this project does involve Merkle trees, which are the basis of immutable chains which are cryptographically verifiable.

I can't attest to the quality of these leads but yeah, buzzwords do have an effect.