Sponsorship
Distributed team? Multi-cloud? Servers behind NAT? Create secure, private, cloud native networks between developers, servers, databases, containers, Kubernetes and GitHub Actions across clouds, between datacentres and even behind NAT.
Serverless, direct p2p & end-to-end encrypted connections (no VPN servers)
Private access to apps like GitLab, Grafana, Redis, MongoDB, K8s etc.
No ingress traffic (close the firewall, no open ports to discover, target or attack)
Built-in DNS, load balancing and static addresses
Zero config, works with systems on dynamic IPs
Forget exposed VPN servers, NAT, ACLs, IP addresses, complex configs, proxies, subnets, routing tables, VLANs, certificates & secret keys 💥 close your firewalls, say goodbye to the 1990s and build your own private, dark network with Enclave 🚀
Not subscribed to Console? Subscribe now to get a list of new open-source projects curated by an Amazon engineer in your email every week.
Already subscribed? Refer 10 friends to Console and we’ll donate $100 to an open-source project of your choice and send you a Console sticker pack!
Projects
janusgraph
JanusGraph is a highly scalable graph database optimized for storing and querying large graphs with billions of vertices and edges distributed across a multi-machine cluster.
language: Java, stars: 4110, watchers: 240, forks: 1007, issues: 416
last commit: August 04, 2021, first commit: February 26, 2012
https://twitter.com/janusgraph
langjam
Lang Jam is a weekend coding jam. In Lang Jam, you and your teammates will create a programming language based on the theme for that jam. The first Lang Jam will be held on Friday the 20th of August, starting at 7pm UK time.
stars: 587, watchers: 81, forks: 11, issues: 3
last commit: July 31, 2021, first commit: July 09, 2021
gba-remote-play
gba-remote-play streams games from a Raspberry Pi to a Game Boy Advance through its Link Port.
language: C, stars: 235, watchers: 6, forks: 5, issues: 0
last commit: July 11, 2021, first commit: May 21, 2021
An Interview With Oleksandr Porunov of JanusGraph
Hey Oleksandr! Thanks for joining us! Let’s start with your background. Where have you worked in the past, where are you from, how did you learn how to program, what languages or frameworks do you like, etc?
Originally I'm from Ukraine. I got AS, BS, and MS degrees in Computer Science (2008-2017). I worked as an individual Android developer in 2012-2014. I also participated in quite a few programming olympiads from 2013 to 2016. I have been working as a co-founder / backend developer of a scalable social network project since 2017.
In open source I started to contribute starting in 2016. Mostly I was contributing in infrastructure related projects, but then in 2017 I started to actively use JanusGraph and in 2018 I started to contribute in JanusGraph.
From 2019 on I was accepted as a Committer in JanusGraph and later was accepted as a Technical Steering Committee (TSC) member of JanusGraph.
From July 2020 on I've working as Principal Software Engineer at Mapped.
I’ve used quite a few languages before. I was using C# mainly for a while, and mostly used C++ in programming olympiads. I used Python (just a little) / Bash when I was contributing to infrastructure related projects, and used JavaScript in different areas. Right now I mostly use Java for my main work and my open-source contributions.
As for technologies which I like and have worked with for quite some time they are: Java, JanusGraph, TinkerPop, Spring, SQL, GraphQL, Docker, Linux.
What does being on the TSC for JanusGraph entail?
TSC members are responsible for the project direction and for any major decisions. Usually TSC members vote for any major decisions. Following our policy we should have at least 3 "+1" votes and 0 "-1" votes to approve any proposed decision. TSC members can invite other members to be committers or join TSC teams.
What sort of things have you voted on in the past?
After I become a TSC member I was voting for each release we had. I also voted to include several new contributors to the committers team and TSC team. On December I voted to include JanusGraph under the LF AI & Data umbrella.
Why was JanusGraph started?
JanusGraph started as a fork of TitanDB because the main developers were acquired by DataStax and they were not able to continue working on the project. As only they had direct access to the project, the development of TitanDB was stopped.
After a while JanusGraph was created as a replacement to TitanDB to allow continuation of the project development.
Where did the name JanusGraph come from?
Kelvin Lawrence gave a great comment about the name for JanusGraph project. I will just quote his words here:
"The name Janus was picked as it has two appropriate meanings. Janus is a moon which aligned nicely with Titan also a moon. Janus was also in Greek Mythology the name of a god of new beginnings. This was a new project building on prior work so the name aligned well people felt.”
Are there any competitors or projects similar to JanusGraph? If so, what were they lacking that made you consider building something new?
There are many graph databases that exist. You can check the ranking of well known graph databases here.
I believe JanusGraph has the following main advantages over other graph databases:
Choice
The beauty of JanusGraph is that it doesn't determine low level storage details on its own. Low level storage is provided by the underlying database which is used to store data. Instead of spending time reinventing the wheel, JanusGraph developers can focus on the graph implementation itself and we just reuse the low level storage implementation instead of thinking of how to store the necessary data on disk or in memory.
JanusGraph allows you to use quite a few storage databases underneath it: Cassandra, HBase, Bigtable, BerkeleyDB, ScyllaDB, Aerospike, DynamoDB, FoundationDB, in memory. For advanced search capabilities, it's very easy to plug in search-index engines like ElasticSearch, Solr, or Lucene.
Moreover, you can develop your own storage adapter if you think your data will be better served with different storage.
Scalability
Probably, JanusGraph has the biggest scalability potential out of all other graph databases to date as it allows us to reuse the scalability of other databases (let's say Cassandra / ScyllaDB with multi datacenter replication). There are quite a few graph databases which have out of the box replication and data sharding and that's great, but when we talk about global scalability with multi datacenter clusters - that's a different thing. Having sharding and replication available in a graph database doesn't mean that it will be able to scale globally. Quite a few graph databases which I tested fail to work normally with high latencies between data centers. Thus, even though they are usable for quite a few use cases, they become quickly unusable (or hardly usable) for global scalability. Thus, JanusGraph is a step ahead of other graph databases in this sector as it allows you to scale globally.
Open source
I know that there are some graph databases which are also fully open sourced, but if you take a look at the majority of graph database licenses, you will quickly find that most of them are either proprietary or with mixed licenses (half free / half paid). JanusGraph is fully open sourced and fully free.
Can give some examples for when a graph database would be superior to just using one of these databases.
The thing is that you can build anything with probably any database but the complexity of build a product using a wrong tool will rise dramatically.
For quite some time relational databases (like MySQL, Oracle, Postgres, etc.) were used in most of the situations. They work great for some situations, but not for all the situations.
When there are too many connections in your data and you need to traverse it to compute some result, it becomes extremely difficult to work with relational databases but super easy to work with graph databases.
Usually it's recommended to use graph databases for social networking, recommendation engines, fraud detection, IOT or projects with a lot of connections in data.
Typical queries in graph databases (let's say using Gremlin) becomes much easier to write and the good thing is that graph databases (in particularly JanusGraph) optimize the execution of your queries. The data is stored in such way that those queries become much easier and faster to execute when compared to databases which are not optimized for such a workflow. I don't say that you can't create good recommendation engine using MySQL, I just say that it will be much easier using JanusGraph as you won't need to reinvent the wheel.
Using MySQL you would typically create a separate table to represent a connection between different entities and then you would typically use a lot of joins to traverse your data, but when you have such a situation when you create too many tables to represent you connections and you use too many joins (or "select" with deep traversals in the application side) I would strongly recommend looking at graph databases at this point. Of course you can proceed with the development using relational databases, but the development cost will most likely be very high because you would typically spend a lot of time reinventing all the optimizations which graph databases gives you naturally.
Are there any commercial products that you're aware of that are using graph databases that could only exist because of their graph database usage?
There are plenty of them, but I won't be able to list all the use-cases. Some of the companies which have deployed JanusGraph in production are: eBay, RedHat, Target, Netflix, G Data, Crédit Agricole CIB, FiNC, TimesInternet, Zeotap etc.
I can't say they wouldn't exist without JanusGraph because most of those companies have been already successful companies much before JanusGraph was created, but they improved their use-cases and adopted this database to improve their existing products. That said, many new startups choose JanusGraph as their main database also (ubirch, NowMatters, Mapped, etc. the latter is the company I’m currently working at).
What is your typical approach to debugging issues filed in the JanusGraph repo?
We typically ask for a reproduction scenario. If there is a reproduction, it becomes much easier to locate the issue. In most cases, it's easy to write a reproducible test and debug the execution of the query (or other use case) to understand and fix a problem.
Debugging issues is quite easy with JanusGraph as there are many integration tests which are executed against different storage backends.
We typically use Docker containers to test compatibility with other storage / index engines.
What is the release process like for JanusGraph?
1) Discuss the release;
2) Make GitHub release artifacts and Sonatype staging release;
3) Release voting process which takes 72 hours at minimum. There should be zero -1 votes from TSC members and at least three +1 votes from TSC members;
The detailed release process is described here: https://github.com/JanusGraph/janusgraph/blob/master/RELEASING.md
If you plan to continue developing JanusGraph, where do you see the project heading next?
I definitely plan to continue work on JanusGraph. There are some discussions between TSC members on the next steps JanusGraph should make to increase involvement in the project around the community which we will announce later.
As for now, JanusGraph just started to accept donations. If we are lucky enough, we could potentially hire employees to the project or improve the development flow.
We are very close to the 0.6.0 release, which is the next major release in our cycle and has a lot of improvements compared to 0.5.x releases.
As for the short term new features (probably the next release after 0.6.0), I think we will work on adding a ScyllaDB dedicated driver to the project, updating Cassandra dependencies to 4.0.0, further improving integration with ElasticSearch / Solr / Lucene.
As for long term features I think to work on implementing iterative vertex rebalancing for better data locality when I'm ready. I discussed some ideas about this feature here.
What motivates you to continue contributing to JanusGraph?
The project itself motivates me to do so. I fell in love with the data model which is used by JanusGraph as it is very scalable and extensible.
I believe that the right data model is crucial for almost any project. I also love that JanusGraph has pluggable storage and index backends which gives you freedom on your data.You can read more about the JanusGraph datamodel here.
How do you balance your work on open-source with your day job and other responsibilities?
Typically I have some time in the evening or morning to work on JanusGraph and also I often have some time on weekends to work on JanusGraph.
What is the best way for a new developer to contribute to JanusGraph?
I believe the best way for them to start contributing to JanusGraph is to read the "CONTRIBUTING.md" document which describes how to contribute to JanusGraph and then take interesting open issues in GitHub and try to solve them. The more time you spend in JanusGraph the easier it becomes to understand how to solve different issues.
The contribution document is placed here.
JanusGraph is the silent workhorse of the graph database market. While there are many others out there, I have been continually amazed when I discover new use cases — and the incredible scales they are achieving. Looking forwars to hearing more about the shard-aware binding for Scylla! Should make JanusGraph even zoomier!