Badly implementing encryption: Part IV–keyed hash function

I’m trying to not get too deep into the theory of encryption. I’m happy to say that so far I was able to avoid any math whatsoever and hopefully this is an interesting series. I do have to touch on an important topic.

I’m using MD5 here for the purpose of generating a random bitstream to be used as a stream cypher. In the previous post, we looked into a key issue. If we don’t do things properly, we can easily get to the point where a single 16 bytes block that we guess can allow us to decrypt the entire message.

I “solved” that in the previous post by adding the key back again into the MD5 computation. That works, but it isn’t ideal. There are all sorts of subtle issues that you have to take into account (length extensions, for example) and probably other stuff that I’m not even aware of. There is the HMAC family of functions. That is a keyed hash function that has far stronger security properties. Wikipedia does a great job explaining it. Note that there is a cost, HMAC is more expensive than the underlying hash function it uses.

The best practical explanation, by the way, I found here. Our previous method of adding a key to the mix was to concatenate it in front of the message. The problem is that if md5(msgA) == md5(msgB), then md5(key || msgA) == md5(key || msgB). And that isn’t something that we want. I (personally) can’t think of a way to abuse that property to get something nasty going on with the way we use it in this encryption algorithm, but I’m very much not an expert. The HMAC model, on the other hand, would use: md5( key1 || md5(key2 || msgA)). And there is no way to get that to collide, even if the messages generate collisions for MD5.

Let’s see what we need to do to switch to HMAC-MD5 instead of MD5. Here is what the code now looks like:

There are a few things that are worth noting here. We changed the size of the key (since HMAC-MD5 uses 32 bytes, not 16 bytes). Again, I have no comment on the actual security of such a scheme, mostly because I wouldn’t know where to even begin doing this analysis.

We are also no longer using the previous block as the input to the next block. Instead, we use a counter mode, where we hash the nonce and an ever incrementing counter using the provided key. That gives us some additional safety from the previously seen issue where we could figure out what the rest of the key stream would be.

Of course, better than before doesn’t mean that this is actually good. There are several other problems that I (as a non cryptographer) still need to fix, and probably more that I’m not seeing.

Another aspect of using this sort of construction is the additional cost that is involved. The HMAC computation isn’t that much more expensive. Looking at some benchmarks, this is about 3% slower, which is quite reasonable.

Next, we are going to see how we can abuse the malleability of this encryption system for fun and nefarious people’s profit.

RavenDB

RavenDB Cloud

Try

Experience interactive demos and playground server

RavenDB Docs

RavenDB Cloud Docs

Documentation Guide

Download

Features

Performance

Comparison

What’s New

Demo

Bootcamp

Webinars

Workshops

Inside RavenDB Book

GitHub

StackOverflow

Articles

Whitepapers

Events

Promotional Materials

Unlock your business potential

Use Cases

Articles

Whitepapers

Press Releases

Industry Reports

Performance

Comparison

Proof of Concept Program

Academic Program

Events

What’s New

Roadmap

On-premise Pricing

Cloud Pricing

Support

Proof of Concept Program

Academic Program

Badly implementing encryption: Part IV–keyed hash function

Woah, already finished? 🤯

Related Articles

CollabTalk Podcast | Episode 123 with Oren Eini–Building a business with Open Source foundations

RavenDB’s storage engine: Voron–unlocking the secret

Certificates from the Ground Up

Watch Live Demo

RavenDB

RavenDB Cloud

Try

RavenDB Docs

RavenDB Cloud Docs

Documentation Guide

Download

Features

Performance

Comparison

What’s New

Demo

Bootcamp

Webinars

Workshops

Inside RavenDB Book

GitHub

StackOverflow

Articles

Whitepapers

Events

Promotional Materials

Use Cases

Articles

Whitepapers

Press Releases

Industry Reports

Performance

Comparison

Proof of Concept Program

Academic Program

Events