If organizations want to get serious about software security, they need to empower their engineers to play a defensive role against cyberattacks as they craft their code.
The problem is, developers haven’t had the most inspiring introduction to security training over the years, and anything that can be done to make their experience more engaging, productive, and fun is going to be a powerful motivator in helping them gain valuable secure coding skills.
And after dedicating precious time to mastering new abilities that can help beat attackers at their own game, the opportunity to test these new powers is not easily found in a safe environment.
So, what is a battle-hardened, security-aware engineer to do?
A new feature released on the Secure Code Warrior platform, named ‘Missions,’ is a challenge category that elevates users from the recall of learned security knowledge to the application of it in a real-world simulation environment.
This scaffolded, microlearning approach builds strong, secure coding skills that are job-relevant and much more entertaining than (vertically) watching endless training videos in the background of a workday.
The first available ‘Mission’ is a simulation of the GitHub Unicode breach. It’s not as simple as it might appear on the surface, and it’s a really clever vulnerability that is fun to dissect. Security researcher 0xsha did a comprehensive case study on how this same bug can be used to exploit Django by way of case transformations while also showing how vulnerability behavior can change between programming languages.
There’s a lot more to discover about this security issue, and here is a great place to start.
GitHub’s Head-On (Case Mapping) Collision
In a blog post from November 28, 2019, security research group Wisdom reported on a security bug they discovered on GitHub. They outlined how they were able to utilize a Case Mapping Collision in Unicode to trigger a password reset email delivery to the wrong email address (or if you were thinking like an attacker, an email address of the threat actor’s choosing).
While a security vulnerability is never good news, security researchers who rock a whitehat do provide some mercy — not to mention the opportunity to avert disaster — if they discover potentially exploitable errors in a codebase. Their blogs and reports often make for great reading, and it’s kind of cool to learn about a new vulnerability and how it works.
In order to move to the next level of secure coding prowess, it is super powerful not just to find common vulnerabilities, but also have a safe, hands-on environment to understand how to exploit them as well.
Keep reading to discover how a Case Mapping Collision in Unicode can be exploited, how it looks in real-time, and how you can take on the mindset of a security researcher and try it out for yourself.
Unicode: More Than Just Emojis
“Unicode” may not be on the radar of the average person, but the chances are good that most people use it in some form every day. If you’ve used a web browser, any Microsoft software, or sent an emoji, then you’ve been up close and personal with Unicode.
It’s a standard for consistent encoding and handling of text from most of the world’s writing systems, ensuring that everybody can (digitally) express themselves using a single character set.
As it stands, there are over 143,000 characters, so you’re covered whether you’re using the Icelandic þ, or the Turkish dotless ı, or anything in between.
Due to the sheer volume of characters Unicode has in its set, a way of converting characters to another “equivalent” character is needed in many cases. For instance, it seems sensible that if you convert a Unicode string with a dotless “ı” to ASCII, that it should simply turn into an “i,” right?
With a great volume of character, encoding comes great responsibility potential for disaster.
A case mapping collision in Unicode is a business logic flaw and can lead to an account takeover of accounts not protected by 2FA. Check out an example of this bug in a real code snippet:
The logic goes something like this:
- It accepts the user-provided email address and uppercases it for consistency.
- It checks if the email address already exists in the database.
- If it does, then it will set a new temporary password (this isn’t best practice, by the way. Instead, use a link with a token that enables a password reset)
- It then sends an email to the address fetched in step 1, containing the temporary password (this is very poor practice, for so many reasons. Yikes.)
Let’s see what happens with the example provided in the original blog post, where a user requests a password reset for the email John@GıtHub.com (note the Turkish dotless i):
- The logic converts John@Gıthub.com to JOHN@GITHUB.COM
- It looks that up in the database and finds the user JOHN@GITHUB.COM
- It generates a new password and sends it to John@Gıthub.com
Note that this process ends up sending the highly sensitive email to the wrong email address. Oops!
How to cast out this Unicode demon
The interesting aspect of this specific vulnerability is that there are multiple factors that make it vulnerable:
- The actual Unicode casting behavior,
- The logic determining email address to use, i.e., the user-provided email address, instead of the one that already exists in the database.
In theory, you can fix this specific issue in two ways, as identified in the blog post from Wisdom:
- Convert the email to ASCII with Punycode conversion,
- Use the email address from the database rather than the one provided by the user.
When it comes to hardening software, it’s a great idea to leave nothing to chance, employing as many layers of defense in place as possible. For all you know, there may be other ways to exploit this encoding – you’re just not aware of them yet. Anything you can do to decrease risk and close windows that may be left open for an attacker is valuable.
Ready To Pilot Your Own Mission?
It’s time to take your secure coding and awareness skills to the next level. Experience this GitHub vulnerability in an immersive, safe simulation, where you can see the impact of bad code in both frontend and backend contexts. Attackers have an advantage, so let’s even the playing field and apply real skills with a whitehat counter-punch.