Console #159 -- Interview with Alessandro of MediaMTX: Zero-dependency server for live video and audio streams

Featuring tinygrad, EasySpider, and MediaMTX

May 28, 2023

🤝 Sponsor

This space is reserved for sponsors that support us to keep the newsletter going! Want to support Console? Send us a note at osh@codesee.io

🏗️ Projects

Browse through open source projects on OpenSourceHub.io, add your project to get more exposure and connect with other maintainers and contributors!

tinygrad

Tinygrad is a deep learning framework using PyTorch and micrograd. It's simple with a sub-1000 line core. It aims to be the easiest framework to add new accelerators to, with support for both inference and training

language: Python stars: 13089 last commit: today
repo: github.com/geohot/tinygrad
site: tinygrad.org

EasySpider

A visual no-code web crawler/spider, for design and execution of crawlers. Also has a command line interface.

language: JS, stars: 7416 last commit: yesterday
repo: github.com/NaiboWang/EasySpider

MediaMTX

Ready-to-use RTSP / RTMP / LL-HLS / WebRTC server and proxy that allows to read, publish and proxy video and audio streams.

language: Go stars: 6269 last commit: today
repo: github.com/bluenviron/mediamtx

Join thousands of other open-source enthusiasts and developers in the Open Source Hub Discord server to continue the discussion on the projects in this week's email!

🎙️ Interview With Alessandro of MediaMTX: Zero-dependency server for live video and audio streams

Hey Alessandro! Thanks for joining us! Let us start with your background.

I’m from Monza, Italy (home of the Formula One circuit, which I’ve been attending whenever possible), I hold an M.Sc. in Mechatronic Engineering, and I’m a software architect in a relevant IT corporation.
I started programming when I was 12 by trying to edit the source code of my favorite website with right click, “view source”, and I never stopped since then.
I build something in almost every major language (PHP, HTTP/CSS/SASS, JS, TypeScript, Python, C/C++, C#, x86 Assembly, Objective C, Scala, Java/Spring, Rust, Bash and of course Golang) at the backend, frontend, architecture and DevOps level.
Regarding preferences, Golang is my choice when building bandwidth-intensive backends (although Rust is slightly more optimized), React for Frontend, InfluxDB and Mongo respectively for a time-series and general purpose database, Deepstream and PyTorch for machine learning, GStreamer for video processing.

Who or what are your biggest influences as a developer?

Without doubts Linus Torvalds (author of Linux), he managed to create a software that is used indistinctly from microcontrollers to supercomputers and was a pioneer with respect to open source, although Linux modularity could be improved and has been a major point of discussion since the beginning.

What’s your most controversial programming opinion?

I’ll answer by considering the reaction of my colleagues that followed the opinion 😂
Personally, I don’t like class inheritance, a concept that is present in most programming languages. Once I did a lengthy discussion with a colleague when I said to him that “Animal must be a member of Dog and not its parent”, and he started yelling “nono, it’s not possible”... programmers aren’t known for their mental openness (sorry!)
Furthermore, in the last months I’m realizing that the microservice architecture can be really inefficient when microservices are too much, and I’m trying to reduce their number even if it means producing diagrams which are simpler. This can be interpreted as an involution, even if in my opinion it’s not.

What is your favorite software tool?

Wireshark. I’ve used it daily since I was fifteen, and it’s the only way to understand networking. I know a lot of engineers who never opened a network dump, I sincerely don’t know how they manage to get their stuff working.
Docker, Kubernetes, Helm, Skaffold, OpenShift. They made deployment a piece of cake.

Why was MediaMTX started?

In 2019, I was working on two separate projects: an autonomous rover that had to be controlled underground in an 8 km gallery of a dam in north-west Italy, and an automatic surveillance system for multiple industrial plants.
Both projects had a common issue: live video routing. Video had to be watched or processed by multiple entities at the same time. And existing solutions were eating most of the CPU. For the rover, we were using Live555 (video proxy), while for the surveillance system we were using ROS (Robot Operating System), which was routing raw, uncompressed frames over TCP, a nightmare, justified only by the fact that ROS offers ready-to-use components that process frames with Machine Learning.
Other solutions were not portable (Wowza) or feasible.
I was looking through Wireshark at the content of the video stream ingested by Live555, the protocol (RTSP) seemed simple and really similar to HTTP and i didn’t understand why Live555 was consuming so much CPU to handle it.
In a short timespan, I wrote two separate software: rtsp-simple-server, which allowed clients to publish video streams, and rtsp-simple-proxy, which pulled video streams from existing servers. Both were less than 500 lines of code and required a low amount of CPU. Then I merged them together, creating a server / proxy hybrid, which is the base of the current project.
I immediately started receiving a lot of traffic and feedback from all around the world, and that allowed me to continue the development.
The project currently allows to route video streams in the order of thousands (I manage enterprise servers for cities with 2000 streams each), convert video streams from a format to another and even to automatically heal video streams. Everything with the constraint of using as less CPU as possible, which was the main bottleneck of video streaming until some time ago.

How does MediaMTX work?

I like to think of MediaMTX as a message broker for video streams, hence a “media broker”. MediaMTX is able to receive live video streams from multiple sources and multiple protocols, and to broadcast them to anyone that needs them, with a protocol of choice.
Internally, it is based on 4 libraries, one for each of the supported protocols (RTSP, HLS, RTMP, UDP/MPEG-TS and WebRTC), which are gortsplib, gohlslib (I’m the author of them too), go-astits and pion/webrtc. Using specialized libraries grants modularity and helps developers to look in the right place when they need to analyze code and send pull requests.
MediaMTX decodes the minimum necessary only in order to minimize resource consumption.

Why did you pick Go?

Parallelism has always been an issue for me (and not only me). I used to spend 70% of the development time fixing race conditions, allocating additional threads for every single detail, or messing with inter-thread communication systems.
Event-based languages like JavaScript didn’t improve the situation, since they are single threaded and can’t be scaled.
Golang offers a native, event-based, multithreaded routine system that merges the efficiency of event-based programming with the efficiency of multi-threading. Basically, threads are created on the basis of available resources and events are distributed on them. All transparently.
Furthermore, Golang offers native inter-routine communication through channels.
Furthermore, Golang offers native cross compilation and allows building dependency-free binaries for every operating system (Linux, Windows, macOS) and architecture without any hassle.
That’s all I’ve ever wanted.
I’ve a lot of respect for Rust too, but syntax is a little more complex.

Where did the name for MediaMTX come from?

MTX stands for “Media Transmission”. I spent a month thinking about a name which has to be protocol free and needs to be “developer-compatible”, since developers always find a way to compress names that are more than 8 characters long (Even “Kubernetes” is too long, and it has become k8s).
That was what came out.

Who, or what, was the biggest inspiration for MediaMTX?

The idea of building a server / proxy hybrid started from the need of merging together two software in order to perform maintenance on a single one. There were no particular inspirations for that 😂
The proxy was certainly inspired by live555 proxy.
The subscriber - publisher model was inspired by the Robot Operating System (ROS), which is similar to the one of MQTT or Kafka, even though at the time I didn’t know them since I was coming from the mechatronic world.
Nowadays, I’m really inspired by the Pion project, which is a set of media libraries that all together implement the WebRTC protocol. I try to contribute to it as much as possible and to structure my libraries (gortsplib and gohlslib) in a similar way.

Are there any overarching goals of MediaMTX that drive design or implementation? If so, what trade-offs have been made in MediaMTX as a consequence of these goals?

I’ve been a student for many years, and affordable things like the Raspberry Pis allowed me to develop my skills. Therefore, one of the goals of this project (and all my projects) is to be compatible with any hardware, from enterprise servers to microcontrollers.
The drawback is that I left out from the project some advanced features that would have required computational power and would have been of use to many people.

What is the most challenging problem that’s been solved in MediaMTX, so far?

One of the challenges was detaching publishers (clients that are sending a stream) from readers (clients that are reading a stream). This is an important operation since a single laggy reader could either slow down all others or fill the RAM up to the exhaustion point. That was performed by developing a custom ring buffer that allows publishers to push data, and readers to pull data asynchronously. The buffer makes use of synchronization primitives and unsafe pointers in order to maximize throughput, that can reach 10000 data units per second:
https://github.com/bluenviron/gortsplib/blob/main/pkg/ringbuffer/ringbuffer.go
Another challenge was finding a way to route video frames independently from the protocol, since the server supports multiple protocols and each of them has its own way to encode video frames. Popular libraries like GStreamer and FFmpeg have solved the issue by decoding video frames up to their elementary units and use these units as the basic data unit, but I find this mechanism not quite efficient.
Therefore, I chose to route video frames in their original format and to decode them if and only if they are requested in another format.
Another gigantic challenge, although less technical, is granting compatibility with most devices, a thing that can be done only by reverse-engineering the minimal details of every protocol.
For example, supporting Apple devices is always a challenge, since a single bit is enough for them to discard a WebRTC stream or a HLS stream with no apparent reason.
Last week I spent an entire day to find out why a video stream generated by OBS Studio was causing a blank screen on Chrome: it was because of a single bit in a header, that had to be set to 1 when the current video frame is an I-frame (frame that can be read independently from the others).
Not a single line of log was coming from Chrome, it was like moving in the darkness, but after tons of tries the issue was fixed.

Are there any projects similar to MediaMTX? If so, what were they lacking that made you consider building something new?

This project was started before the pandemic, and I already discussed the state of the art of that period and the reasons behind the project. The pandemic caused the video streaming sector to flourish and nowadays, there are a lot of open source media servers available, each with its peculiarity.
Nonetheless, MediaMTX is still appreciated for its speed, versatility and compatibility, personally I use it as a building block of more complex architectures, something that other solutions can’t offer since they’re either over-engineered or under-engineered.
There’s a solid community that uses MediaMTX for a wide range of needs, and it’s seen as an established tool.

What was the most surprising thing you learned while working on MediaMTX?

Certainly the fact that nowadays Open Source has a critical role in every company, small to big.
I published the server from my bedroom and in less than two years I was contacted by companies like NASA (and a lot more that I can’t write about) and government agencies in Europe and Australia.
The Shodan Search Engine lists thousands of installations of my servers in all continents.
This is both trilling and worrying, I think that open source must be used with more care, since trusting it too much could result in a huge security threat.

What is your typical approach to debugging issues filed in the MediaMTX repo?

When filling issues, users are guided in order to provide data that allows maintainers to replicate their problem. Invalid issues are automatically discarded.
Every valid issue is reviewed and never closed until solved. I don’t like repositories that automatically close issues after a certain period, regardless of the fact that they have been solved or not.
Issues are split in two categories: bugs and feature requests.
Bugs are reviewed immediately if multiple users are confirming the bug, otherwise, they are reviewed with a priority that depends on their content. If the user provides enough data to replicate the issue (and most does), everything can be solved in a matter of minutes or hours, otherwise the user is asked for more data until the maintainer is able to replicate the issue.
Feature requests are another matter. First, there must be support and consensus from the community regarding new features. Second, if the feature is trivial, I try to encourage users to contribute the feature themselves. Major features are implemented by following an internal strategic plan.

What is the release process like for MediaMTX?

There are two kinds of releases: minor and major releases.
Minor releases are mostly for bug fixing and don’t contain major improvements. An in-depth testing procedure is generally not needed, and these are published as soon as the automated tests pass.
Major releases are for introducing major improvements. After all automated tests have passed, binaries are usually deployed on servers with thousands of video streams for some days, and then released. This procedure is often not enough to avoid regressions, but the community usually reports regressions within 24 hours, and this results in the publishing of a minor release in the following days.
Binaries are compiled by GitHub actions without human intervention, minimizing security risks, and published in parallel on GitHub and Docker Hub.

Is MediaMTX intended to eventually be monetized if it isn’t monetized already? If so, how? If it’s already monetized, what is your main source of revenue?

MediaMTX is currently not monetized.
I’m a private employer, and I don’t plan to make open source my main source of revenue in the near future.

What are you most proud of?

I’m certainly proud but also scared from the attention from big companies for the reasons I explained above.
When I joined my current company, there was a manager that already knew my name because it has used the server to route its private cameras, he came to me and asked: “was it you?” it was really funny.
A lot of users thanked me during the years and this cheered me up in difficult times.

How do you balance your work on open-source with your day job and other responsibilities?

I learned to give things the right priority. Job comes first, my private life comes second (I can’t say “first” or I may get fired 😂) and then there’s Open Source.
When I’m lucky, job and open source become the same thing: there are projects that are based on open source components, and fixing or improving open source components becomes critical. Sometimes we’re even obliged to release everything on GitHub because it’s part of the agreement with the client.

Have you ever experienced burnout? How did you deal with it?

Yes. It happened when renovating my apartment, dealing with a high priority task at a job, and finalizing a major release of MediaMTX. At the same time 🎉
I have solutions for everything: running (10 to 15 kilometers), mindfulness meditation and drumming (14 years). They are my strength. They are not universal, I strongly encourage everyone to find their personal activities for both body and mind.

Do you think any of your projects do more harm than good?

Since the start of the Russia - Ukraine war the world has changed, and I’m not sure that my open source libraries, that include connectors to communicate with Drones (gomavlib), autonomous vehicles (goroslib) and cameras (gortsplib and MediaMTX) are always used for peaceful purposes. I’m not at ease with this idea.

What is the best way for a new developer to contribute to MediaMTX?

Issues in the server and its dependencies are kept organized and categorized. It’s easy to spot small tasks and start developing them.

Where do you see the project heading next?

The project has to support next-generation streaming protocols like SRT and RIST.
Since MediaMTX already offers automatic conversion from a protocol to the other, this will allow users to transition from legacy protocols to newer ones, without changing their infrastructure or hardware.
This is the main feature that I’m working on.

What motivates you to continue contributing to MediaMTX?

I’d like to provide a stable building block for architectures of any scale, in the same way as Kafka or PostgreSQL do.
I also think that new features should be limited in number and focus should be on existing ones. Like Neo said to Smith in Matrix 3: “everything that has a beginning has an end”. I think the same too and when all the objectives of this project will be fulfilled, I’d like for it to enter into a maintenance-only mode.

Are there any other projects besides MediaMTX that you’re working on?

During the years I’ve released Golang-based connectors for a series of protocols, including Mavlink (drones), ROS (autonomous vehicles) and standalone libraries for interfacing with cameras (gortsplib, gohlslib). They are all in the bluenviron organization:
https://github.com/bluenviron

Where do you see software development heading next?

Generative AI, no doubts.

Where do you see open-source heading next?

I’d like to see a general purpose, Generative AI-based framework for creating applications or improving part of them.

Do you have any suggestions for someone trying to make their first contribution to an open-source project?

After you create your fix or new feature, take a look at the existing code in order to adapt the style of your work to the one of the project. This second step is often missing and is the reason why most pull requests get rejected.

What is one question you would like to ask another open-source developer that I didn’t ask you?

“What’s the percentage of time you spend on writing automated tests? Do you use fuzz testing?”
(personally, 50% of time is spent on writing automated tests, and I’ve setup fuzz testing on all primitives)

Want to join the conversation about one of the projects featured this week? Drop a comment, or see what others are saying!

Console by CodeSee.io

Discussion about this post