We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy. (Last updated on: August 25, 2022).
left-icon

Application Security in .NET Succinctly®
by Stan Drapkin

Previous
Chapter

2
of
11
A
A
A

CHAPTER 2

Hashes and MACs


Hash functions (hashes) map values from domain A to values from domain B. Domain A is typically a large, variable-length domain, while domain B is typically a much smaller, fixed-length domain. B can often be a subdomain of A (for example, a “string-to-string” mapping). Hash functions have various important properties which you need to be aware of to be able to judge the suitability of any given hash for a particular purpose.

  • Determinism: A deterministic hash function must consistently return the same domain B value given the same domain A value. Determinism is a property that you almost always want, with the exception of few special cases that we will cover later.
  • Uniformity: A hash function with good uniformity characteristics is expected to produce domain B values that are uniformly spread among all possible domain B values; any domain B value should have an equal probability of being returned.
  • Perfection: A perfect (injective) hash function maps each domain A value to a different domain B value. Perfection quality minimizes the number of domain B collisions.

Cryptographic hash functions have the following important properties:

  • First-preimage resistance: Given any domain B value, it is computationally infeasible to find a domain A value that produced it. (Preimage is a simple mathematical concept.)
  • Second-preimage resistance: Given any domain A value, it is computationally infeasible to find another domain A value that produces the same domain B value.
  • Collision resistance: It is computationally infeasible to find any two domain A values that map to the same domain B value. Collision resistance implies second-preimage resistance, but does not imply first-preimage resistance.

Object.GetHashCode

Most .NET developers have to choose and implement a hash function when overriding the Object.GetHashCode() method. The only requirement for GetHashCode() is determinism, with “sameness” of domain A values handled by the Object.Equals() method. Uniformity is not required, but is recommended to improve performance (and, as we will see later, security). Most .NET Framework types have GetHashCode() implementations that are fast and reasonably uniform for general use. Many trivial input scenarios, however, can cause a serious uniformity degradation of built-in .NET GetHashCode() implementations. For example, generating 10 million GUID strings with Guid.NewGuid().ToString() leads to about 11.5k GetHashCode() collisions (0.1%). Doing the same test for the GUID string concatenated with itself leads to  about 750k GetHashCode() collisions (7.5%)—an unexpected 75x increase in the number of collisions. One MSDN blog called it a “curious property.”

You might want to use a different GetHashCode() implementation in order to avoid unexpected surprises of default implementations. Most .NET containers that rely heavily on uniqueness characteristics of GetHashCode() can also take an IEqualityComparer<T> instance as part of their constructor (Dictionary, HashSet, SortedSet, KeyedCollection, ConcurrentDictionary, and others, as well as most LINQ operations). Providing a custom implementation of IEqualityComparer<T> is your chance to override the Equals() and GetHashCode() methods with your custom logic. The next important consideration is what to override GetHashCode() with, so let’s cover a few good options you should be aware of.

FNV-1a hash

FNV-1a is a widely used non-cryptographic hash function that is fast, simple to implement, and exhibits low collision rates as well as good avalanche effect behavior (after a small mod). Here is a 32-bit FNV-1a implementation with an additional post-processing step for better avalanche behavior (based on tests by Bret Mulvey):

Code Listing 2

public static int GetHashCodeFNV1a(byte[] bytes)
{
      const uint fnv32Offset = 2166136261;
      const uint fnv32Prime = 16777619;
      
      unchecked 
      {
            uint hash = fnv32Offset;
            for (var i = 0; i < bytes.Length; ++i)
                  hash = (uint)(hash ^ bytes[i]) * fnv32Prime;
      
            hash += hash << 13;
            hash ^= hash >> 7;
            hash += hash << 3;
            hash ^= hash >> 17;
            hash += hash << 5;
            
            return (int)hash;
      }
}

Hash flooding

Hash flooding attacks are a form of algorithmic complexity attack. Hash functions are commonly used in hash table implementations, which have O(1) lookup and insertion on average but O(n) in the worst case, which is achieved when all inputs hash to the same value. An insert or lookup operation over n inputs can therefore degenerate into an O(n2) situation if worst-case behavior could somehow be triggered. Any hash function that has poor second-preimage resistance is thus vulnerable to hash flooding.

Algorithmic complexity attacks and, specifically, hash flooding attacks are old attacks that have lately come back with a vengeance due to widespread use of vulnerable default hash function implementations in many popular development frameworks (Python, Ruby, .NET, Java, etc.)—low-hanging fruit for denial-of-service attacks. Neither the FNV-1a hash, nor the default .NET hashes are second-preimage resistant, meaning they are all vulnerable to hash flooding.

Two defenses against hash flooding attacks are typically employed: entropy injection and switching to a second-preimage-resistant hash function.

Entropy injection is injecting some variability or randomness into an existing non-cryptographic hash function algorithm to make it more difficult to exploit the weak second-preimage resistance of the algorithm itself, as well as to prevent scalability of an attack.

Entropy injection is very similar in concept to the use of salt. It merely prevents attack scalability, and, perhaps, buys you some time, but does not address the weakness itself. The proper defense is switching to a second-preimage-resistant hash function, which can also be complemented with entropy injection.

The biggest challenge with using a proper second-preimage-resistant hash function for general-purpose 32-bit hashing is that most of the available well-known functions are a lot slower than the fast non-cryptographic hash functions. Part of the reason is that cryptographically strong hash functions are designed to provide first-preimage, second-preimage, and collision resistance, while we only care about second-preimage resistance for the purposes of hash flood protection. Slow performance of cryptographic hash functions (compared to the non-cryptographic ones) is likely the main reason for the default hash function choices of most popular development frameworks.

Microsoft has added a new .NET 4.5 configuration element, UseRandomizedStringHashAlgorithm, which changes the behavior of string.GetHashCode(). When the UseRandomizedStringHashAlgorithm element is enabled, string.GetHashCode() calls into a new, private, external (unmanaged) method “int InternalMarvin32HashString(string s, int sLen, long additionalEntropy)” and returns whatever that new method returns. Microsoft offers no insight whatsoever about design or implementation (or even existence) of this method. We can only hope that this information void is temporary (intended, perhaps, to buy Microsoft some time to produce an implementation that might actually survive peer review). As it stands, though, “Marvin32” has to be treated as security by obscurity.

At this point, however, it is very unlikely that the Marvin32 algorithm makes the already weak default .NET string-hashing logic any weaker than it already is. In fact, it is apparent—both from MSDN documentation of the new config element and from the Marvin32 signature and the invoking code calling it—that it uses AppDomain-specific entropy injection (a 64-bit static CSP-generated value), which can only be a good thing.

Other changes due to UseRandomizedStringHashAlgorithm include internal switching of Dictionary, HashSet, and Hashtable comparers to new, internal, entropy-injected RandomizedObjectEqualityComparer or RandomizedStringEqualityComparer (which internally call Marvin32) when the number of loop iterations during element insertion exceeds 100. This defensive switch will only be triggered if the starting comparer is a well-known (default) comparer; custom-provided comparers do not trigger it.

While we believe that enabling UseRandomizedStringHashAlgorithm is still a good idea (given the alternative weak default logic), these new Marvin32-related APIs are built as an internal .NET defense, which you cannot control or consume directly. The only direct access is via string.GetHashCode(), which will return new, Marvin32-based hash values when enabled in the config. This defense is turned off when you have to provide your own custom comparer, however. There are many other common scenarios when you need proper control over the hash code generating logic, none of which can benefit from the supposed benefits of Microsoft’s new Marvin32 hash function.

FNV-1a with entropy

You might want to inject some entropy into FNV-1a to make it a bit harder to exploit via hash flooding attacks. One simple way to do that is to replace the retrieval of a single byte of input within the loop, data[i], with a permutation table lookup permutationTable[data[i]], which could be constructed as follows.

Code Listing 3

static byte[] permutationTable = new byte[byte.MaxValue];
...
for (byte i = 0; i < byte.MaxValue; ++i) permutationTable[i] = i;
permutationTable = permutationTable.OrderBy(
b => Guid.NewGuid()
).ToArray();

permutationTable is a randomly shuffled one-to-one (bijective) mapping of every byte to some other byte. You could do it with full rigor by using CSP for randomization and Fisher-Yates for shuffling, but even a simple sort-by-GUID approach will produce a uniform shuffling with acceptable unpredictability. The FNV-1a algorithm itself does not change—only the input bytes change as they go through the permutationTable before being consumed by FNV-1a. permutationTable would be calculated once (for example, in a static constructor) and would retain its mapping over the life of AppDomain.

This approach will allow you to keep using FNV-1a while raising the difficulty barrier for mounting a successful hash flood attack against it. Naïve brute-forcing to guess the permutation is infeasible due to the 1684-bit permutation space [log2(256!)], so an attacker would have to do differential cryptanalysis.

SHA-1 hash

Despite SHA-1 being cryptographically retired (significantly less than brute-force attacks are known), there are still compelling reasons to consider using it for certain purposes:

  • SHA-1 is still practically secure with respect to both first-preimage resistance (no collisions are published yet) as well as second-preimage resistance, which is the only property we care about for hash flooding defense. In fact, NIST’s own SP800-131A guidance declares SHA-1 to be acceptable indefinitely for all non-signature applications.
  • SHA-1 is the fastest SHA-family hash function, and, along with MD5, is probably one of the fastest among all other “standardized” cryptographic hash functions. In our non-scientific tests, SHA1Cng was about four to ten times slower than FNV-1a, depending on input length, which might be acceptable in your scenario (FNV-1a is very fast). You might also be able to cache the calculated SHA-1 hash in scenarios where your input is either immutable (strings, anonymous types) or changes infrequently.
  • SHA-1 implementation is available in the .NET Framework.

These reasons make us suggest SHA1Cng as an alternative general-purpose 32-bit hash function able to resist hash flood attacks. SHA-1 is 160 bit; it produces 20 bytes of output. You can take any four bytes of that output and convert them to Int32. If you cannot overcome regulations or policies or any other stigma associated with use of SHA-1, you might as well use SHA512Cng. You can also substitute SHA-1 with HMAC-SHA-1 (covered later), which will be twice as slow as SHA-1 and on par with SHA-512, which is about twice as fast as SHA-1 in 64-bit environments.

SipHash

You might be facing scenarios when neither compromising hash function security at the expense of performance (the status quo) nor compromising performance at the expense of proper security is acceptable. Indeed, why should it be acceptable? Why can’t cryptography experts come up with a design that is fast, relatively simple, and non-collision-resistant yet still second-preimage resistant? One such design is SipHash.

SipHash achieves speeds comparable to modern state-of-the-art non-cryptographic functions, yet does not compromise on security. It can even run as a message authentication code (MAC), with some injected entropy acting as a key, for example. Among the downsides of using SipHash is the fact that it is relatively new, and thus has not received as much scrutiny as older designs. A design needs to become popular in order to attract a lot of scrutiny, which is also a prerequisite for the design to become popular—it takes time to break out of this cycle. Another downside is that SipHash is not built into .NET, and you would be taking a risk trusting any implementation you can find out there by Googling “SipHash c#”, or by implementing it yourself. You would also be taking this risk with a custom FNV-1a implementation. Unlike FNV-1a, however, SipHash cannot be implemented in five lines of code.

Even though there is a lot to like about SipHash, we would recommend against using it for now due to a lack of trusted, time-tested, and maintained implementation in the .NET Framework. We sincerely hope that SipHash is as good as it claims to be, and that tested and vetted .NET implementations will come with time—perhaps even as part of .NET Framework (Perl, Ruby, Rust, and Haskell have already switched to SipHash). In the meantime, you should only consider SipHash if you must have your cake (hashing speed) and eat it too (second-preimage resistance).

SipHash is not the only lightweight cryptographic hash function design—there are other, arguably better options, such as BLAKE2 (also from 2012). They are all too new, and too understudied—user beware.

Modern non-cryptographic hashes

The MurmurHash family, SpookyHash, and CityHash are some examples of modern non-cryptographic hash functions. They are all fast, and are likely closer to perfection than FNV-1a due to extra complexity. None of them are designed to resist hash flooding attacks, however (CVE-2012-6051, oCERT), and none of them have built-in .NET Framework implementations, which leaves little reason to consider any of them over SipHash or BLAKE2.

Cryptographic hashes

It is hard to justify anything other than SHA-2 hash function family for scenarios requiring full-fledged cryptographic hash properties. SHA-2 is time-tested, standardized, NIST-approved, and implemented in .NET. SHA-3 should be treated as a “backup” for SHA-2 in case some major SHA-2 weakness is discovered, but should not be the primary choice. Even when SHA-3 implementation becomes part of the .NET Framework, SHA-2 should still be preferred since SHA-3 software performance is on par with or worse than SHA-2.

You should prefer SHA-512 over SHA-256 because SHA-512 is typically 30–50% faster than SHA-256 in software on 64-bit systems (your typical deployment environment) while being a lot slower than SHA-256 on dedicated hardware, optimized for brute-forcing. The fact that SHA-512 is more secure than SHA-256 due to extra digest length and complexity is not really material, since SHA-256 is secure enough. Many frameworks with built-in support for the SHA‑2 hash family (such as .NET 4.5) use SHA-256 by default. We recommend that you change these defaults to use SHA-512 in 64-bit environments (not for security reasons, but for performance reasons).

The best general-purpose SHA-2 family hash, however, is not SHA-512, but SHA-384, which is a truncated version of SHA-512 (i.e. has the same performance as SHA-512) computed with different initial values. The truncated nature of SHA-384 protects it from a length-extension vulnerability affecting all non-truncated SHA-2 hashes.

Tip: Use SHA-384 as your default general-purpose cryptographic hash.

HMAC (including hash/HMAC factories)

Hash-based message authentication code (HMAC) is a specific form of MAC, which, like all MACs, is used for verifying message authenticity. It can be used with any cryptographic hash function, and requires no initialization vector (IV). HMAC has three inputs: (1) the cryptographic hash function to use, (2) message, and (3) key.

The MAC security model allows the attacker to observe lots of MAC(m,k) results for a fixed secret key k and any message m, which the attacker can choose arbitrarily. The MAC is secure if the attacker cannot compute MAC(M,k) where M is a new message distinct from all previous m messages. Such a new message, if computed, would be a forgery.

The MAC security model is not meant for message confidentiality. Some MAC algorithms might leak quite a lot of information about the message, while other MAC algorithms, such as HMAC, tend to be quite effective at preserving message confidentiality, as well as key confidentiality. It is prudent, however, to use HMAC within the security model it was designed for. Therefore, you should not assume that HMAC (or any other MAC) will provide message confidentiality (but if you screw it up and happen to be using HMAC, it might save you).

HMAC was proven in 2006 (Bellare) to be secure even when the underlying cryptographic hash function is weakly collision-resistant (i.e. collisions in an n-bit hash function could be found with less than n/2-bit complexity), which might explain why no serious weaknesses have been found in HMAC-MD5 and HMAC-SHA1 constructions so far, even though the underlying MD5 and SHA-1 hash functions have serious weaknesses.

HMAC in .NET

.NET has had built-in HMAC support since .NET 2.0. There is a base abstract HMAC class with the core HMAC transformation logic, and several derived classes—HMACMD5, HMACRIPEMD160, HMACSHA1, HMACSHA256, HMACSHA384, and HMACSHA512—that implement a particular HMAC-hash combination. This is a tightly coupled design—if Microsoft wants to add a new hash function implementation (e.g., SHA-3), they will have to create two classes: NewHash and HMACNewHash. Adding another MAC implementation, NewMAC, would necessitate a multitude of additional classes for every NewMAC-hash combination. If you accept this design, you will be in the same situation as Microsoft.

A better design would have been to have a concrete HMAC class that takes any HashFunction factory as part of its constructor. Such a hash factory could be smart enough to return either a FIPS-compliant implementation (when needed), or a managed implementation. Alternatively, the HMAC constructor could have been accepting two HashFunction factories—one that produces FIPS-compliant implementations, and another that produces managed implementations—with all the FIPS-checking logic (which never changes) within the HMAC class itself rather than within the factory.

This factory-based design is, in fact, already inside a private, internal static HMAC method:

Code Listing 4

internal static HashAlgorithm GetHashAlgorithmWithFipsFallback(
      Func<HashAlgorithm> createStandardHashAlgorithmCallback,
      Func<HashAlgorithm> createFipsHashAlgorithmCallback);

This method is called by each deriving HMAC-hash class, which provides its own factories. Unfortunately, this method is only accessible to .NET’s own HMAC-hash implementations; Microsoft does not want you to use it. What Microsoft is effectively doing here is forcing you to use an inheritance-based implementation injection instead of a constructor-based injection. That is okay—we will try to make the best of what is available by using the conveniently provided HMACSHA256 class. There is a problem, however. Run the following test in your .NET environment (we are using LINQPad):

Code Listing 5

void Test(Func<HashAlgorithm> hashFactory)
      {
            var data = new byte[1024];
            int cnt = 1000*100;
            var sw = Stopwatch.StartNew();
            for (int i = 0; i < cnt; ++i)
                  using (var hash = hashFactory())
                  { hash.ComputeHash(data); }
            sw.Elapsed.Dump(hashFactory().GetType().Name);
      }
      Test(() => new SHA512Managed());
      Test(() => new SHA512Cng());
      Test(() => new SHA512CryptoServiceProvider());

Typical results are:

SHA512Managed:             00:00:02.4827152

SHA512Cng:                00:00:00.9306346

SHA512CryptoServiceProvider: 00:00:04.8959034

As you can see, the CSP-based hash construction is about five times slower than the CNG-based construction, and about two times slower than managed. All factories in .NET-provided HMAC hash implementations follow the following logic: if FIPS is not required, use managed, otherwise use CSP (and not CNG). Microsoft is being conservative by ensuring that their implementations are guaranteed to work on all legacy platforms. You, however, probably do not want to pay a five-times penalty for FIPS deployments, because you do not plan to run your production environment on a legacy Windows OS. In fact, .NET 4.5 requires Vista/Windows 2008 or higher, so paying this “legacy tax” makes no sense. HMAC requires two hash function instantiations; Microsoft’s HMAC implementation refers to them as hash1 and hash2. That doubles the actual construction time, while preserving the four-times performance penalty over the managed API.

How significant is this performance penalty in practical scenarios? It is very significant. Many applications of HMAC operate on short message sizes, and require large numbers of quick HMAC calculations. The invocation cost of any HMAC hash in such scenarios will be significant, or, perhaps, even greater than the cost of message processing and result generation. In an extreme example of hashing an empty byte array, 100,000 iterations of SHA-512 take 0.2, 0.5, and four seconds for managed, CNG, and CSP implementations respectively. You might recall similar trade-offs in our discussion of random number generation. Unlike the RNGCryptoServiceProvider class, which is thread safe, HMAC and its derived implementations are not thread safe, and thus have to be constructed for every HMAC computation, amplifying any high invocation costs to a significant level.

In order to get acceptable HMAC performance in FIPS deployments, you have no choice but to make your own HMAC-derived implementations that use improved hash factory logic. The moment you try to provide a concrete implementation of the abstract HMAC class, you will realize that HMAC cannot be sub-classed by external implementations in any meaningful way, because key internal fields such as hash1 and hash2 are internal and not accessible to you. In other words, the public-base HMAC class is built by Microsoft for Microsoft. .NET Framework authors are effectively saying “if you don’t like our HMAC design, build your own.”

Another issue with .NET-provided HMAC-salt implementations is that their default (parameterless) constructor creates a random, CSP-generated 512-bit key for all implementations using 256-bit or lower hash functions (e.g., HMACSHA1, HMACSHA256), or a 1024-bit key for all implementations using hash functions above 256 bits (e.g., HMACSHA384, HMACSHA512). This auto-generated key is then passed to the internal InitializeKey() method, which sets up internal buffers and does various checks. The key auto-generation and the InitializeKey() call are both initialization activities that require computation and time.

A common HMAC usage scenario, however, is using it as a cryptographic primitive in other, more complex cryptographic primitives, such as PBKDF2 or HKDF (covered later). In these scenarios, the HMAC-consuming primitives need to provide their own key to HMAC hash implementations. There are two approaches to achieve that: either the HMAC is instantiated with the default constructor and then re-keyed to the desired key, or HMAC is instantiated with another constructor that takes the key directly. The first approach results in a waste of computation and time spent on auto-generating a strong default key in order to immediately discard it and re-initialize with the provided key. The second approach does not waste computation and time, but it tightly couples the creation of the HMAC-hash primitive to its keying. Such tight coupling is simply a bad design for reusable cryptographic primitives that are supposed to act as building blocks for other, more complex constructs.

In order to generically consume specific method signatures, they need to be either a part of an interface, a base class, or a default constructor (C#). The key-accepting constructor on .NET-provided HMAC hash classes fits neither of these cases, and thus cannot be generically consumed (without tight coupling the consuming code to the specific HMAC hash implementation). Always generating a strong HMAC key in the default constructor is a “security-by-default” approach; Microsoft is being conservative, which would normally be a good idea. In this case, however, the default HMAC constructor is the only way to decouple HMAC-hash creation and keying.

An improved HMAC2 class implementation can be found in the Inferno crypto library. HMAC2 is a lot faster in FIPS and non-FIPS deployments, and is a little faster when consumed by other, more complex primitives that provide their own key. It also avoids tight coupling and reimplementation of the HMAC class.

Scroll To Top
Disclaimer
DISCLAIMER: Web reader is currently in beta. Please report any issues through our support system. PDF and Kindle format files are also available for download.

Previous

Next

TABLE OF CONTENT
About the Author
1. NET Security
2. Hashes and MACs
3. Key Derivation
4. Comparing Byte Arrays
5. Binary Encodings
6. Text Encodings
7. Symmetric Encryption
8. Authenticated Encryption
9. Asymmetric Cryptography
10. Two Factor Authentication 2FA
11. Web Security