Transcript from Cryptography for [Web]Application Developers

The following is a transcript from Sean Comeau's talk at CodeCode Community Week. You can find the video and slides here on our blog.

Who am I?

My background is in ops, programming, and pentesting. I'm not a cryptographer. I'm a security and cryptography enthusiast. I can give you sound advice on how to handle common, well understood scenarios. If your business requires anything outside the scope of this talk then you would be wise to consult with an actual cryptographer. Or better yet, consult with the entire community of cryptographers, in public. Whatever you do, don't try to make up a new scheme on your own and rely on it to protect anything real. SSL, WEP, WPA, and CSS were all broken and these came from the smart guys who brought us web browsers, wireless Ethernet and DVD players.

Why should you care?

Software is fallible. Pitfalls are numerous. SQL and XML External Entity Injection, XSS and XSRF, Broken Authentication or Session Management, Insecure Object References, Missing Access Controls, Unvalidated redirects and forwards, buffer overflows, or the use of components containing any such vulnerabilities are just a few examples of over a hundred common things that may lead to compromise.

I'm not going to talk about any of those vulnerabilities today. Just keep in mind that every piece of software you have ever built and everything you will build in the future is vulnerable in some way. Properly deployed encryption is like seatbelts and airbags in a high speed car crash. In all likelihood it's the only thing that will save you from total disaster and even then it's only going to help if you bothered to use it.

What crypto can and can't do

    Encryption can't just make all your security problems go away. It is a tool that can give you three things:
  1. Encryption /Confidentiality - It can render data unreadable to your adversary
  2. Authentication /Integrity - It can prevent an adversary from modifying data without your knowledge
  3. Identification /Authenticity - It can provide a mechanism to know where data is coming from Other words could be used to describe these capabilities. By carefully ordering or combining these basic capabilities into protocols, we can create more interesting capabilities. For example, we can make use of encryption or authentication primitives to create a commitment scheme which could be used to play a game of chance over a network without allowing either player to cheat.

What encryption can't do is make your software bug free. It can't prevent one party to a conversation from recording it and sharing it with others. Cloud providers count as a "party to a conversation" if one of the endpoints is hosted there. Host on EC2, Amazon can read your data if they like. It can't also prove who is using a computer, nor can stop someone from beating you with a $5 wrench until you give up your passwords. Even for the things crypto can accomplish, one mistake can destroy the security of the whole system.

Algorithms and Protocols

I'm sure everyone here has heard of things like TLS, RSA, AES, or MD5. Often we're told about key sizes as well. eg. "Your data is secured with 256 bit AES". Such claims are usually meaningless. By itself 256 bit AES can't even be used to secure a text message for the simple reason that AES operates on exactly 16 bytes, no more, no less. A text message could be as small as a single character and up to 160 bytes, so we would need more than just AES itself to get the job done. In any system that uses cryptography there's a lot more going on than the main encryption algorithm and the size of its key. Real systems are implementations of protocols which make use of primitives such as AES as building blocks.

Remember the three properties: confidentiality, integrity, and authenticity. Most of the time, at least two of them are going to be required to produce a useful system.

    Today we're going to look at the following primitives:
  • One-way hash functions
  • Private key cryptography
  • Public key cryptography
  • Digital signatures
  • Secure pseudorandom number generation

Hash functions

A hash function takes an arbitrary length input and produces a fixed length output. Hash functions are deterministic. Given the same input they always produce the same output. Hash functions are one way. Given the output there's no way to get the input back.

  • md5("hello") = 5d41402abc4b2a76b9719d911017c592
  • sha1("hello") = aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d
  • sha256("hello") = 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
      The ideal cryptographic hash function has four main properties:
    1. easy to compute the hash value for any given message
    2. infeasible to generate a message that has a given hash
    3. infeasible to modify a message without changing the hash
    4. infeasible to find two different messages with the same hash

Why are these properties important?

If you know the input then calculating the output is fast. For message authentication, or integrity checking, speed is highly desirable. Computationally, it's much, much more efficient to use a hash to implement message authentication than it is to use a public key algorithm. So, for that usage, speed is a real benefit. For other types of applications however, speed can be a serious liability. Password hashing is an example, which I'll discuss in more detail later. The possible inputs are infinite and the possible outputs are finite, so obviously, all hash functions have collisions. A collision is two different inputs that produce the same output. If you change even a single bit of the input then it's critical that the output be completely different. It's also critical that it be necessary to run the entire hash function to know what the output for any given input will be.

The last two properties really depend on the second one. If an attacker can generate a message that results in a given hash, then all he has to do is generate one that starts with the message he wants to send. If he can do that, then the hash is broken.

This is all very abstract. How about a practical example?

Let's say the message happens to be a program. A typical example of this would be an add-on downloaded by another program which is already installed, such as a FreeBSD port. The system already has a stored copy of the hash. It downloads the program, hashes it and then compares the output to the stored hash. If the two match then installation will proceed. If they don't match installation is aborted because the downloaded program is not what was expected.

This check prevents attackers from getting malicious code onto the system by modifying the program being downloaded, but it only works insofar as the hash algorithm makes it infeasible to generate a message that has a given hash.

If the attacker can take his malicious program and find some data that can be added to the end of it which will make the hash of the whole file collide with the stored hash then he wins.

How can you avoid this type of attack?

Avoid weak hashes such as MD5, SHA1, and RIPEMD. Instead use SHA-256. Also consider SHA-3, which should be standardized soon. You can also use multiple hash functions and check to see if they all match.

I'd also like to point out that while hashes can be used to check integrity they can't be used to check authenticity. In other words, you can't trust a hash that you didn't get from a trusted source. Storing the hash of a file in another file in the same directory doesn't accomplish anything. It's useful to make sure the file isn't accidently corrupted, but if an attacker can modify one file he can also modify the other.

Let's talk about hashing secrets. I'm talking about things like user passwords, answers to security questions and stuff like that. This is information the user supplies in order to prove who they are. You don't need to store this in the database in readable form. If you're thinking "but I need to be able to email the user his password in case he forgets" just stop that right now. Users reuse their passwords constantly, so if your user database is ever leaked you don't want to lose everyone's actual passwords.

https://pwnedlist.com/stats/account-imports-area-chart-per-month

Do not, ever, even think about simply taking the users secrets and hashing them. That is a recipe for disaster. There are two problems with it. First, users with the same password will have the same hash and it will be obvious. Second, it's vulnerable to rainbow table attacks so the passwords will be broken in short order. Rainbow tables are pre-computed, pre-sorted hashes stored with the associated input. The attacker does all the work up front, and then storage space is traded for CPU time in the future. These problems have been well known for decades, but that hasn't stopped a lot of developers from continuing to make this mistake. Most of the verified answers to this problem you'll find on StackExchange are bad, so don't just Google "encrypt passwords " and copy whatever you find.

Secrets need to be combined with a random value called a SALT prior to hashing in order to defeat the rainbow table attacks. Even with a SALT a single pass through a hash function isn't even close to good enough. The job requires a one way function that takes a lot of time to run. It only has to run once for a legitimate user, but an attacker has to run it millions of times so we want something that takes hundreds of milliseconds in order to make cracking really difficult. It's also desirable that the function require some memory so that it's not easy or cheap to produce dedicated cracking hardware.

The best solution right now is called scrypt. There's support for pretty much every language now, so just use it. Make sure you put that in there before you go live. Don't put it off, and tell yourself that for now I'll just MD5 the passwords and fix it later. You won't. If you go live with some weak hashing scheme you won't want to change it later because you'll have to reset, crack, or log user passwords to do so. And there will always be something better to do, so just do it right up front. Some of you probably use frameworks that abstract this away. If you do, check to see how it stores passwords and make sure it's using scrypt.

Private key Crypto

Private key crypto uses the same key to encrypt and decrypt. Also known as symmetric cryptography.

Examples: AES (Rijindal), DES, 3DES, Blowfish, Twofish, RC2, RC4, RC6, Camellia, IDEA, Serpent, MARS, A5/1, A5/2 Cleartext = the unencrypted, readable data we care about. Ciphertext = the message after encryption, the data the adversary gets to see. Key = the secret required to encrypt and decrypt the message Encryption: ciphertext = f(key, cleartext) Decryption: cleartext = f(key, ciphertext)

The most important thing to remember about private key algorithms is that if the attacker gets ahold of the key you're screwed. You cannot store the key in the same place that you store the ciphertext. You can't transmit the key over the same channel as you transmit the ciphertext. Doing so defeats the whole point. For some reason developers keep trying to get away with this, doing stupid things like storing secrets inside of mobile apps. Don't even think about trying to obfuscate the key and hide it somewhere in your code. It will be found.

Generally, symmetric keys should not be stored for long. They should either be derived from a passphrase using a key derivation function such as scrypt or established using a Diffie-Hellman exchange then destroyed after use.

Block vs. Stream

Symmetric ciphers belong to one of two general classes: Block or Stream.

Stream

Stream ciphers encrypt the plaintext one bit at a time. They do so by creating a key stream from the key. Examples: RC4, A5/1, A5/2 The nice thing about stream ciphers is that the length of the cleartext doesn't need to be known in advance. They're fast and require very little memory, which makes them attractive to implement in hardware. Don't use them. Creating a secure key stream is actually really hard. If you think you have to use a stream cipher then you should consult a cryptographer otherwise you will probably produce an insecure system.

Block

As the name implies, block ciphers operate on fixed length groups of bits called blocks. Examples: AES (Rijindal), DES, 3DES, Blowfish, Twofish, RC2, RC5, RC6, Camellia, IDEA, Serpent, MARS Typical block sizes are 64 and 128 bits. Recommendation: Use AES with 256 bit keys. It's fast, and many CPUs now support it explicitly making it even faster. This is what you want to use for encrypting bulk data. Modes AES operates on 128 bit blocks. That's only 16 bytes. That's not really useful. So we have modes of operation which enable block ciphers to work more like stream ciphers. We need to split up the data into 16 byte blocks, padding the last one if necessary. Now we could encrypt each block with the same key. This scheme is called Electronic Codebook or ECB. It's insecure. It doesn't matter if we use a really secure cipher such as AES. The result is just not good. The problem is cleartext blocks that are the same will result in the same ciphertext.

Here's what a bitmap image looks like: Remember this next time someone tells you that your data is secure because it uses AES-256. It matters how AES-256 is used. Note the similarity between this and the problem with hashing passwords. If the output of any crypto operation retains any kind of pattern that can be correlated with the data you're trying to protect, it spells trouble. There's a whole alphabet soup of block cipher modes. Most of them I don't understand. I'm only going to talk about a couple of them. The first one is called Cipher Block Chaining, or CBC. What happens here is the first block is XORed with a random value called an initialization vector, and then it is encrypted. The second block is XORed with the ciphertext from the first block then is encrypted. This repeats for the rest of the blocks. The last block needs to be padded, adds some complexity. The next one is called Counter Mode, or CTR. For some reason block cipher modes must have shorthand code names like airports or stock ticker symbols. Counter mode turns the block cipher into a stream cipher by creating a unique key to encrypt each block. It does so by taking a random value called a nonce, which is no different than the initialization vector I just mentioned, and appending an increment by one counter to it. This value is then encrypted, and the ciphertext from that operation is used as the key to encrypt the plaintext. CTR mode doesn't require any padding, and is fully parallelizable, while CBC mode is not. So when you're looking for a solution to encrypt a whole file or arbitrary length message with a secret key, use AES-256 in CTR mode. If you're looking for something to encrypt a filesystem then you definitely do not want to use CBC or CTR mode though. There's a whole other set of issues that comes into play for encrypted filesystems. In that case you're going to want to use XTS mode. That stands for XOR-encrypt-XOR-based tweaked-codebook mode with ciphertext stealing. It's widely supported by various disk encryption packages. You're probably not going to deploy filesystem encryption in your web applications because your server wouldn't be able to start automatically. The only reason I'm bringing it up is because I must point out that neither AES-CBC nor AES-CTR ensures the encrypted data you're trying to decrypt isn't tampered with. And you really must avoid trying to decrypt data given to you by an attacker. I'm going to talk about why when we get to message authentication codes.

Public key crypto

Public key crypto, also known as asymmetric encryption uses different keys for encryption and decryption. It can also be used to ensure message integrity and authenticity. Unlike private key crypto keys can be transmitted over insecure channels. This is not without problems. Making sure we get the key belonging to the party we want to talk to rather than an imposter's key is easier said than done. Describing the solutions to those problems and the new problems they in turn create, will take more time than I have. This is a big and complex topic, so I'm just going to cut to some recommendations and talk about use cases. Use RSA. Use 4096 bit keys. In RSA (and all other public key algorithms) the encryption key is public and the decryption key is private. This makes RSA perfect for automatic encryption. For example, let's say you're processing credit cards on your web site and you would like to store a copy of the card details in case you need to access them later. What you can do is generate a key pair offline. Put a copy of the public key on your web server. Have your web server encrypt the card details with the public key and store the ciphertext in your database. When you need to access the encrypted records you copy them to the machine that has the private key and decrypt them there. Do not generate the key pair on the web server and then remove the private key. Do not copy the private key to the web server and perform decryption there. The private key should never, ever be on the server. Putting it there needlessly exposes it to compromise. This practice has saved me once before. Seeing logs of my compromised web server on pastebin was not a good feeling, but I was so glad to know none of the credit cards leaked. RSA requires the use of a padding scheme, which has been a source of problems in the past. You must also never use the same key that you use to encrypt to sign. I highly recommend avoiding the risk of implementing this stuff at a low level. Instead, just invoke GnuPG to do all the dirty work. There are plenty of wrappers available in various languages. If you're planning on doing anything other than securely logging data with public key algorithms you should almost certainly involve professional cryptographers in the design and implementation so you don't shoot yourself in the foot.

Digital Signatures

Digital Signatures are how we provide authentication. Has this message been tampered with? They also provide authenticity. Where did this message come from? Public The first kind is asymmetric authentication. It's asymmetric encryption in reverse. In this case the private key is used for signing and the public key for verification of signatures. The private key is used to transform some plaintext into some ciphertext. Both the plaintext and the ciphertext are the signature. The public key is used to attempt to transform the ciphertext back into the plaintext. If successful, then the signature is verified. If not, then the signature is invalid. In practice the public key algorithm is only used to sign a hash of the actual data you want to sign.

Again, Use RSA with SHA-256.

Private

The second kind I want to talk about is symmetric authentication, or Message Authentication Codes. Message authentication codes require both sender and receiver to have a shared secret. Because both parties have the shared secret, either of them can sign a message. With RSA only one party has the private key so it's possible to determine exactly who signed a message. When used by only two parties to exchange messages this doesn't really matter. MACs are significantly faster than RSA signatures, both to sign and to verify. They can be constructed from block ciphers or hash functions. MACs based on hash functions are called Hash Based Message Authentication Codes, or HMACs. Use HMAC-SHA256. HMACs are guaranteed not to reveal any information about the plaintext as long as the underlying hash function is secure. It's critical that you use HMAC to verify every message you receive prior to decryption and that you HMAC every message you encrypt prior to transmission. Following this rule will keep invalid messages out of your code. Of course, this doesn't absolve of your responsibility perform input validation on all cleartext you receive. It prevents malleability attacks. Random numbers Without good random numbers you're screwed. You need them to generate keys, setup IVs and so forth. Don't use the time or process IDs or anything like that. Those aren't good enough. Don't use anything which relies on the rand() or srand() functions either. There are cryptographically secure PRNGs out there, with names like Yarrow and Fortuna, but I wouldn't advise touching any of that stuff. Just read from /dev/urandom. You can use a block cipher in counter mode to generate more, or you can just keep reading from /dev/urandom.

Written By

Thumb 963926 10152263078340288 893249867 o  1

Twitter
Linkedin
Google plus

Edit