You can hardly read anything about cryptography on the Internet without seeing the common refrain that is, “don’t roll your own crypto”. The exhortation is to use a library someone else made and be done with it rather than trying to do it yourself, because you’ll inevitably screw up horribly. And it’s a fine sentiment to be sure. There are many things that can go wrong when trying to encrypt data to keep it from prying eyes – much more than can fit in a single post – but uncritically using the first library you find that promises you “security” and not knowing how to use it correctly will almost invariably lead you down a bad path as well.
Let’s see how you can safely implement secure communications by trying to work through securing a hypothetical chat app. We want to let users talk to each other without revealing their messages to anyone else (including us) and ideally want to leak as little information about them as possible to eavesdroppers. As we will soon find out this is a much more complicated problem than “just” using a library. We’ll start with the very core of the system: the cipher.
The cipher is the mathematical algorithm used to encrypt the data. You start with the data you want protected (called the plaintext) and a piece of secret information (called the key), put the two of them into the algorithm, and at the other end comes the ciphertext, which hopefully cannot be decrypted by anyone without the key. An initial point that I’ll return to later is that the encryption process is really a transfer of secrecy from one data blob to another. If we assume that the eavesdropper can read everything sent on the network that’s ok, but if they get the key it’s game over. Protecting the key is of paramount importance.
The basic attack on ciphers that they’re all vulnerable to is just guessing the key. All the modern ciphers only allow a finite number of them, so with enough computing power you could try them all until you find the right one. This is called a brute-force attack, and is stopped by simply having a key length with enough bits. For public-key cryptography which I’ll mention later on key lengths are more complicated, but for the ciphers here 128 bits is considered the gold standard as infeasible to brute-force. If quantum computers ever become practical 256 bits will have to be used instead, but for now it’s not thought to be a serious problem. While 128 bits is necessary for security, it’s not sufficient. Saying “128 bit security” on its own means nothing without knowing whether or not the cipher used is secure or not, so let’s go back to them.
The first real public cipher in the computer age was the Data Encryption Standard, initially created by IBM and made a US government standard in 1977. For many years, it was the only cipher. If you wanted strong encryption, that’s what you used. However for much of its lifetime it had a cloud over it due to the National Security Agency, the notorious US intelligence agency with the dual mission of both strengthening cryptosystems and breaking them. They were involved in the standardization process and made some tweaks to the cipher IBM submitted when they initially invented DES. One tweak – changing the substitution algorithms called S-boxes used in part of it – was suspected by some to be a backdoor that would allow the NSA to decrypt DES communications. As it turned out the tweak actually improved the algorithm against something called differential cryptanalysis, a cipher breaking technique that the NSA knew about then but would not be discovered publicly for over a decade. The other tweak – reducing the key length from 128 bits to 56 bits – was likely not so noble. If the NSA didn’t have the computing power to break 56-bit keys at the time, they would have not too long after.
The problem becomes clearer when looking at how crypto changed in the 90s with the rise of the Internet. At the time, US export regulations forbid exporting any crypto with a key length higher than 40-bits, which was a very short key length that even personal computers of the time could have brute-forced. These restrictions would eventually go away in the late 90s after it became clear they were pointless to keep trying to enforce, but they point to the US government’s continued desire to be able to break any crypto it wants.
Today there are a couple of major ciphers in use. One of the biggest ones is the Advanced Encryption Standard, which was standardized by the US government as the successor to DES. This cipher has no brute-force problem – key lengths range from 128 bits to 256 bits – and no computationally-feasible cracks are publicly known either. So far as we know there’s no way to crack it. There’s no true proof that it’s secure, but a lot of smart people have thought about it really hard and been unable to crack it, and that’s as good as things get when it comes to ciphers.1
While AES has been and is used in a lot of places, one cipher that’s been increasingly used is ChaCha20. This has no government standard behind it and was created by cryptographic academic Daniel J. Bernstein. Is it secure? Again, as far as we know, yes.
Using your cipher
So AES and ChaCha20 are both popular, well-vetted ciphers. They’re both secure. Probably. So just randomly generate a key and throw your data in the cipher, right? No! There are many complications. Rather than enumerating them all, I’ll simply point at two resources. The first is libsodium, a crypto library written to make it easy to do things right. If you want to see all the horrible pitfalls that you may fall into when applying ciphers to data, check out cryptopals. It’s a series of coding challenges where you work through different vulnerabilities and crack them yourself. Doing these is a very good learning exercise for seeing how these encryption schemes actually work, and should give you a healthy paranoia of writing code that even remotely touches cryptography.
Passing around the keys
In my opinion, the real danger people fall into when trying to write secure software comes here. As I mentioned earlier, all of the secrecy in the system lies in the key. Going back to the secure chat app case study, in order for two users to communicate they need to share a key between them.
One way to do this is to just generate them yourself on the server and give them to the users whenever they start a new conversation. But then you would have access to everyone’s conversations; hardly good security. The general way this is done is with something called public-key cryptography. This uses special encryption algorithms where instead of one key, there’s two. Each user generates a private key, which is kept secret, and a public key, which is sent to the other user. Things encrypted with the public key can be decrypted with the private key, and from there an algorithm can be used to generate the keys used for messages. libsodium has an example of how you’d do it if you’re using that library.
This method of key exchange generally works pretty well, but has one severe drawback. Let’s say there’s an attacker that wants to eavesdrop on your user’s messages. Not only can they read things being sent over the Internet, but they can alter them as well. If they intercept the packets where your users send each other the public keys and replace it with their own, the eavesdropper can establish two keys – one for each user – and translate messages between them while being able to read everything!
There’s no perfect defense against this. A common pattern used for encrypted chat is called trust on first use. This means whenever you start chatting with someone new the chat software simply accepts that public key that the other user gave and throws up big warnings if that key ever changes. Someone who can permanently eavesdrop and alter everything will remain undetected, but someone who can only temporarily do this will get found out when the key changes. And for added security, if the users ever have a more secure channel between them they can compare keys (if the app makes this easy to do). That way they can ensure no adversary got between them.
What about the metadata?
Now things get really tricky. All the data your users send may be encrypted, but someone who’s able to eavesdrop on the entire network can see who’s using the app, how often, and how much data they’re sending. They could even figure out who’s talking to each other by correlating messages that are sent and received around the same time by the same users. This is metadata harvesting, and is much more difficult to defend against. The usual defense is generating fake data that looks real to try and obscure the signal. I don’t have many good resources here unfortunately.
A word on libraries
So far I haven’t said much about the actual libraries you should use to implement the secure chat app except for pointing out libsodium, and this is intentional. The crypto libraries out there are like the ciphers I discussed: we know about problems, but not so much what’s known to be secure. And libraries have had a much poorer track record than the modern ciphers have. Two examples.
OpenSSL was the primary crypto library used in the open-source world for a long time, even though many people grumbled about problems in its code and difficulties using the API. Eventually after many security vulnerabilities – most notably Heartbleed – forks of it appeared as well as different open-source libraries such as, again, libsodium. Using OpenSSL isn’t insecure per se, but the risk of there being a current vulnerability is much higher than in competing libraries, and it’s also much harder to use correctly.
Another well-known crypto library is RSA BSAFE, which has a much more disturbing history. It was a commercial product and for a long time it had a good reputation, being used in many different commercial products to provide security. At least it was until Reuters revealed the $10 million the NSA had paid to put the random-number generator DUAL EC DRBG, a standard which is widely suspected to have a backdoor placed by the NSA, in it. RSA, the library’s vendor, had to advise its customers not to use the suspicious code, and BSAFE is no longer sold.
Anyway, the point here is that while libsodium is widely used and trusted now, it may not be state-of-the-art or even trusted at all by the time you read this. There’s no substitute for doing your own research on libraries and keeping whatever one you use up to date on security patches.
Getting to secure
Writing secure software is indeed possible. It’s not a black art that only the “crypto wizards” can do. The “don’t roll your own crypto” mantra is correct as far as it goes, but it’s woefully incomplete and discourages learning about how securing data works. And having knowledge is important because secure software has to be secure from top to bottom. No library can do everything for you, nor can software be “secured” and then left to stagnate while the world changes around it.
- For CS nerds, this is related to the still-unsolved P ≠ NP problem. ↩