Millions of GitHub repositories are potentially vulnerable to RepoJacking. New research by Aqua Nautilus sheds light on the extent of RepoJacking, which if exploited may lead to code execution on organizations’ internal environments or on their customers’ environments. As part of our research, we found an enormous source of data that allowed us to sample a dataset and find some highly popular targets.
Among the repositories found vulnerable to this attack we discovered organizations such as Google, Lyft and some that requested to remain anonymous. All were notified of this vulnerability and promptly mitigated the risks. In this blog we will show how an attacker can exploit this at scale and share the PoC we ran on popular repositories.
In contrast to past studies, our research emphasizes the security implications and severity of this database if exploited by attackers. Many of whom can find within it numerous high-quality targets susceptible to RepoJacking. In this blog we delve deeper into the exploitation scenarios of this attack and provide illustrations of each scenario using real-life examples.
What is RepoJacking?
To read more, you can find additional information in Appendix A.
RepoJacking Restrictions and Bypasses:
There are some restrictions about the capability of the attacker of opening the old repository name (the restrictions are called retired names). However, they are applied only on popular repositories that were popular before the rename, and recently researchers found many bypasses to these restrictions allowing attackers to open any repository they want.
If you want to read more about the restrictions and bypasses, you can find information in the Appendix B.
As we learned from these bypasses, organizations should not depend on the retired names as a security mechanism, so in this research a vulnerable repository is a repository that gets redirected, and the organization name does not exist.
Are You Exposed to RepoJacking?
You may ask yourself; do I own repositories that are directly or indirectly vulnerable to RepoJacking?
The quick answer is that the possibilities of exposure are endless. There are a few basic questions you should ask if you think you may be exposed.
-
What do you know about your organization?
-
What are all the GitHub organization names you used before?
-
Were there any mergers and acquisitions your organization was involved in?
-
Are there any dependencies in my code that lead to a GitHub repository vulnerable to RepoJacking?
-
Is there guidance somewhere (documentation, guides, Stack Overflow answer etc) that suggests you should use a GitHub repository vulnerable to RepoJacking?
As said above, the possibilities of exposure are endless, and depending on the answers to any of these questions you may find your organization is vulnerable.
Compiling a Dataset for Our Research
Attackers don’t need to do all this hard work. They aren’t bound to a specific organization. They can scan the internet and find any victim they’d like and if they sense there’s profit behind the attack, they will continue until they maximize their gain. Websites such as the GHTorrent project provide amazing invaluable data.
The GHTorrent project records any public event (commit, PR, etc.) that happens on Github and saves it in a database. Anyone can download a database dump of a specific timeframe. By utilizing this dataset, malicious actors can uncover the historical names of various organizations and broaden their potential attack surface.
In the image below you can see how easy it is to find a specific timeframe and download it.
Essentially, the entire history of usernames and organizations’ names on GitHub since 2012 is easily accessible to anyone.
It’s important to note that during the research the website ghtorrent.org was available. However, currently it is not online, but the dataset still exists in http://ghtorrent-downloads.ewi.tudelft.nl/mysql.
Our research started from a data sample we found on this website. We downloaded all the logs from a random month (June 2019) and compiled a list of 125 million unique repositories’ names. Next, we sampled 1% (1.25 million repositories’ names) and checked each one to see if it was vulnerable to RepoJacking.
We found that 36,983 repositories were vulnerable to RepoJacking! That is 2.95% success rate. If we extrapolate the result we found on this sample, to the entire GitHub repositories’ base (over 300 million repositories according to GitHub publications), there are potentially millions of vulnerable repositories!
Exploitation Scenarios
Now that we know how widespread RepoJacking is, the remaining question is how can an attacker actually exploit a vulnerable repository?
The attacker can exploit it when there is a reference somewhere in the public internet to the previous name of the repository.
We divided the exploitation scenarios into 2 categories:
- An automated download from a RepoJacking vulnerable repository is when the user doesn’t willingly or knowingly download any resources from ano