Key derivation function: key-hash based computational extractor and stream based pseudorandom expander
- Published
- Accepted
- Received
- Academic Editor
- Leandros Maglaras
- Subject Areas
- Cryptography, Security and Privacy
- Keywords
- Key derivation function, Keyed-hash message authentication code based computational extractor, Stream cipher based pseudorandom expander, Stream ciphers, Keyed-hash message authentication codes, Block ciphers
- Copyright
- © 2024 Chuah et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
- Cite this article
- 2024. Key derivation function: key-hash based computational extractor and stream based pseudorandom expander. PeerJ Computer Science 10:e2249 https://doi.org/10.7717/peerj-cs.2249
Abstract
The key derivation function is a specific cryptographic algorithm that transforms private string and public strings into one or more cryptographic keys. The cryptographic keys are essential for protecting electronic data during transmission on the internet. This function is designed based on a computational extractor and pseudorandom expander and is typically constructed using various cryptography ciphers such as stream ciphers, keyed-hash message authentication codes, and block ciphers. Having secure and efficient key derivation function designs is essential in the development of numerous security systems. A vulnerable key derivation function could potentially give attackers the ability to compromise an otherwise secure cryptosystem. This research proposes a different approach by combining two different cryptography ciphers to develop key derivation functions. The findings demonstrate that a computational extractor utilizing keyed-hash message authentication codes and a pseudorandom expander using stream ciphers maintain the highest level of security while also providing efficiency benefits in terms of execution time compared to existing key derivation function schemes.
Introduction
Key derivation functions are commonly applied in protocols like Transport Layer Security (Rescorla, 2018) and Host Identity Protocol (Moskowitz et al., 2015), making it crucial to develop secure key derivation function proposals to protect electronic data transmitted over insecure channels (Italis, Pierre & Quintero, 2023; Muthakshi & Mahesh, 2024; Venčkauskas et al., 2024; Saif, Migliorini & Spoto, 2024). In recent years, key derivation function proposals have typically been structured in two phases, involving a computational extractor and a pseudorandom expander (Krawczyk, 2010; Barker, Chen & Davis, 2020; Chuah, Dawson & Simpson, 2013; Saran, 2024). This two phases approach allows researchers to separately design and analyze the security of the extractor and expander components. The key derivation function is a cryptographic algorithm takes arbitrary length of private string and public strings and generates one or more pseudorandom cryptographic keys. The entropy of the private string determines the security of the key derivation functions.
Existing key derivation function schemes has keyed-hash message authentication code based key derivation function (Krawczyk, 2010), block cipher key derivation function (Barker, Chen & Davis, 2020), and stream cipher key derivation function (Chuah, Dawson & Simpson, 2013; Chuah & Koh, 2017). Keyed-hash message authentication code and block ciphers produce a fixed-length output from a variable-length input, requiring modification when the desired cryptographic key length is not an exact multiple of the output block size. This can result in wasted bits. To address this issue, a key derivation function based on stream ciphers has been suggested, allowing for the generation of cryptographic keys of any length without discarding excess bits. It should be noted that key derivation functions based on keyed-hash message authentication code offer higher security compared to those based on stream ciphers. The security of key derivation functions is dependent on the underlying ciphers, with keyed-hash message authentication code using SHA512 offering a higher brute force complexity of 2512 and collision complexity of 2256, while Trivium has a brute force complexity of 280 and collision complexity of 240.
To date, the key derivation functions scheme which consists of the computational extractor and the pseudorandom expander are constructed using the same cryptographic primitives. However, this approach has limitations, such as the block ciphers and the keyed-hash message authentication code can only produce fixed blocks length of cryptographic key and are slower, but with better security. On the other hand, stream ciphers can generate arbitrary length of cryptographic key, execute faster but offer lower security.
Our contribution
In this article, we introduce the development of a keyed-hash based computation extractor and a stream-based pseudorandom expander. We denoted this design as HMSKDF. The cryptographic primitive that is constructing the keyed-hash based computation extractor is the keyed-hash message authentication code that utilizes the secure hash algorithm with an output block size of 512 bits. For the stream-based pseudorandom expander, the cryptographic primitive is Trivium. The output for HMSKDF is arbitrary length. The research will focus on evaluating the security, software, and hardware performance of this alternative approach. Overall, our proposed HMSKDF provides a significant advancement in the field of key derivation functions, offering better security for ensuring the pseudo randomness of the generated cryptographic keys and eliminating bits wastage.
Organization of the article
We provide the theoretical definition of key derivation function in ‘Key Derivation Function’. ‘The proposed HMSKDF Scheme’ presents the HMSKDF scheme, the security analysis for the proposed scheme, the software performance evaluation in term of execution time and hardware performance. ‘Conclusion’ concludes the article.
Key Derivation Function
Key derivation function is a cryptographic function. It generates arbitrary length n of cryptographic key K from public string p, public salt s and public context information c as shown in Fig. 1.
Definition 1. Key derivation function is defined as Krawczyk (2010), Chuah, Dawson & Simpson (2013) and Saran (2024): (1)
-
p is private string, it is chosen from the public string space β, such that p ∈ β with the length 𝔩p and probability distribution 𝔓.
-
s is a public random string consisting of salt space τ, such that s ∈ τ with the length 𝔩s and probability distribution 𝔖.
-
c is a public context string consisting of context space ω, such that c ∈ ω with the length 𝔩c and probability distribution ℭ.
-
K isa pseudorandom cryptographic key.
-
n is a positive integer.
Definition 2. (Computational extractor) (Krawczyk, 2010). Set spaces of private string p with m min-entropy is β and public random string s is τ. A computational extractor is defined as: (2)
Figure 1: General key derivation function.
The computational extractor is known as (m, aX, qX, ɛX)–min-entropy computational extractor if for all probabilistic polynomial time adversary aX who makes qX, queries to the Extractor. The probability for the probabilistic polynomial time adversary may distinguish a string of the same length, either the string is the derived PKR from the Extractor or the string is just a random string is not larger than , where ɛX is deemed negligible.
Definition 3. (Pseudorandom expander) (Krawczyk, 2010). Known public context string set space ω. A pseudorandom expander is defined as: (3)
The pseudorandom expander is known as (aY, qY, ɛY)-pseudorandom expander if for all probabilistic polynomial time adversary aY who makes qY queries to the Expander. The probability for the probabilistic polynomial time adversary may distinguish a string of the same length, either the string is the derived cryptographic key from the Expanderor the string is just a random string is not larger than , where ɛY is deemed negligible.
CAM security model
The security objective of a key derivation function is to ensure that the values of private string s is unknown and public string c is known. The cryptographic key K generated by the key derivation function is indistinguishable from truly random binary strings of the same length. Koh & Chuah (2020) proposed the robust security model for key derivation function, namely CAM. The CAM serves as the formal security framework for evaluating the security of key derivation function. The CAM security model employs modern cryptography security proof technique, involving an indistinguishability game between two players known as the challenger and adversary. The indistinguishability game is performed in polynomial time t. The adversaries with the capability of influencing all the inputs of the key derivation function to conduct the bit-flipping attack, the objective is to identify any weaknesses in the key derivation function within polynomial time.
Definition 4. (CAM-secure) (Koh & Chuah, 2020). Key derivation function is CAM-secure if for all probabilistic polynomial time adversary is making q queries () to the key derivation function with chosen bit position of the private string s. The adversary can win this indistinguishability game with probability not greater than where ɛ is deemed negligible.
-
Challenger chooses p←β.
-
For i = 1, …, q′ ≤ q,
-
Adversary chooses the bit position zi of p.
-
Adversary chooses si←τ and ci←ω.
-
Adversary sends zi, si and ci to challenger.
-
Challenger generates , ti is time that used to generate the Ki.
-
Challenger sends Ki and ti to adversary.
-
-
Adversary chooses z, s←τ and c←ω, subjected that (. Adversary sends z, s and c to challenger.
-
Challenger chooses b randomly, .
-
If b = 0, challenger generates , t0 is the execution time that used to generate the cryptographic key K′.
-
Else challenger generates random , t1 is the random time that used to generate the random K′.
-
Challenger sends K′ to adversary.
-
-
Continue q − q′ queries, follows the step 2 subjected .
-
Adversary win the game if b′ = b.
-
If adversary output b′ = 0, then adversary believes K′ is derived cryptographic key K from key derivation function.
-
Else adversary output b′ = 1.
-
Hash-based message authentication code based key derivation function
Krawczyk (2010) proposed hash-based message authentication code based key derivation function (HKDF). The HKDF is constructed using computational extractor and pseudorandom expander. The HKDF has been proven to be CAM-secure (Koh & Chuah, 2020). The computational extractor and pseudorandom expander of HKDF are defined as: (4) (5)
-
𝔉 is hash function, either secure hash algorithm 1(SHA1) or secure hash algorithm 2 (SHA2).
-
⨁ is exclusive OR.
-
∥ is string concatenation.
-
opad is outer padding, is formed by repeating the byte 0x36.
-
ipad is inner padding, is formed by repeating byte 0x5c.
-
The length s, 𝔩s must equal with 𝔩PRK. Therefore, s is hashed using 𝔉, if 𝔩s > 𝔩PRK. Or, s is padded with zero until 𝔩s is equal with 𝔩PRK, if ls < 𝔩PRK. The 𝔩PRK issame with the length of hash digest (SHA1 or SHA2). The first n bits derived cryptographic key K = K(1)||…||K(𝔏 − 1) is utilized as cryptographic keys, while the remaining bits is discarded.
One benefit of HKDF is its ability to handle inputs of arbitrary length, but a drawback is that it generates fixed-length blocks and discards any excess bits, causing wastage.
Definition 5. (HMAC-PRF) (Hirose, 2019; Gaži, Pietrzak & Rybár, 2014; Naik & Singh, 2024). A hash-based message authentication code, 𝔉:𝕂x𝔻 → ℝ, with key of 𝕜 ∈ 𝕂. A keyed-function with specific to the input length, such that G:0, 1cx0, 1b∗ → 0, 1c is (ɛ, t, g, 𝔩)-HMAC-PRF secure, if all probabilistic polynomial time adversary t, making at most q queries, each of length at most 𝔩 (the bbits block), a R:0, 1b∗ → 0, 1c and a uniformly random key 𝕂←0, 1c, therefore, where ɛ is deemed negligible.
Stream cipher based key derivation function
Chuah, Dawson & Simpson (2013) proposed a stream cipher based key derivation function (SCKDF). The SCKDF has been proven not CAM-secure (Koh & Chuah, 2020). The computation extractor of SCKDF is defined as: (6)
The inputs for pseudorandom keystream generator (𝔖) consist of key and initial vector. The length for the key, we denoted as 𝔩sk. The length for the initial vector, we denoted it as 𝔩iv. These inputs are substituted with the input pairs of key derivation function p and s (Italis, Pierre & Quintero, 2023). If s is null, p is divided into 𝔩sk + 𝔩iv per block. If s is not null, the length of s is suggested to be same with 𝔩sk and p is divided into 𝔩ivper block. If 𝔩pis greater than𝔩iv, the first block’s length of p is 𝔩iv. The remaining block’s length of p is 𝔩sk + 𝔩iv. The pseudorandom keystream generator executes the entire blocks p𝔩of p. Then, outputs PRK. The length of PRK, 𝔩PKR is equal to the key length of pseudorandom keystream generator for pseudorandom expander phase 𝔩sk.
The pseudorandom expander of SCKDF is defined as: (7)
The input for pseudorandom keystream generator (Gaži, Pietrzak & Rybár, 2014) is substituted with the input pairs of key derivation function PRK and c (Chuah, Dawson & Simpson, 2013). The public string c can be null or not null. If c is null, c is 0𝔩iv. If c is not null, c is divided 𝔩iv per block. The pseudorandom keystream generator executes the entire blocks of c. Then, outputs n bits K.
The design of SCKDF is allows combination of different types of pseudorandom keystream generator for computational extractor and pseudorandom expander (Chuah, Dawson & Simpson, 2013). This pseudorandom expander can generate arbitrary length of cryptographic key without discarding any excess bits, thus enhancing efficiency (Canniere & Preneel, 2008). However, pseudorandom keystream generator is not able to accommodate inputs of arbitrary length, the modification made to process the key derivation function inputs for the computational extractor phase has led to the SCKDF is not CAM-secure.
Definition 6. (PKG) (Katz & Lindell, 2014). The pseudorandom keystream generator is considered to pass all statistical tests within polynomial time if the polynomial time algorithm can distinguish between the output sequence of the generator and a truly random sequence of equal length with probability significantly greater than .
Block cipher based key derivation function
NIST SP 800-56C specified a cipher-based message authentication code based key derivation function (BKDF) (Barker, Chen & Davis, 2020) which consists of computational extractor and pseudorandom expander. The BKDF has been proven CAM-secure (Koh & Chuah, 2020). The computation extractor of BKDF is defined as: (8)
-
𝔅 is advanced encryption standard with key length of either 128 bits, 192 bits or 256 bits.
-
Di is p divided into 128 bits per block.
-
s is considered as the key for advanced encryption standard.
-
Initial PRK0is0128. lPRK is128 bits.
-
.
The pseudorandom expander of BKDF is defined as: (9)
-
𝔅 is advanced encryption standard with key length of 128 bits.
-
Mi is considered as c with 128 bits per block.
-
PRK is considered as the secret key for advanced encryption standard.
-
Initial K(0)is0128.
-
,
If n is greater than 128, the iterations in generating K are continued until the required length is exceed by . The K is comprised of the initial n bits, while the remaining bits are deleted.
The benefit of BKDF is its ability to handle inputs of arbitrary length. A similar drawback of HKDF is that it produces blocks of a fixed-length and discards any excess bits, leading to inefficiency. Another limitation is that the pseudorandom expander is fixed for advanced encryption standard with a key length of 128 bits.
The proposed HMSKDF Scheme
We formalized our key-hash based computational extractor (Eq. (4) satisfies Definition 5) and stream based pseudorandom expander (Eq. (7) satisfies Definition 6) which is relatively straightforward function as shown in Fig. 2. The computational extractor based HMAC_SHA512 which takes arbitrary length inputs of p and s. The Trivium based pseudorandom expander generates arbitrary length of n bits cryptographic key.
Security analysis
This section provides a security analysis of general attacks that can be used against the different types of computation extractors and a formal security proof for the proposed HMSKDF scheme.
• Brute force attack and collision attack
Assuming that no weaknesses are present in the key derivation functions, the key derivation functions are vulnerable to brute force attacks and collision attacks against the internal state of the computation extractor. If the internal state of the computation extractor is compromised, it can be used to generate the entire cryptographic keys. It should be noted that if the cryptographic keys are compromised, they can no longer be used to protect the security of electronic data.
The brute force attack is a method where the attacker systematically generates all possible strings of internal states of the computation extractor. The adversary can try all possible combinations of bits in the string until the correct one is found, which is then used to generate the cryptographic key. If the length of the internal state is 𝔩is. Then, the complexity to brute force the internal state is 2𝔩is.
Figure 2: The proposed HMSKDF Scheme.
The collision attack is a method where the adversary uses the concept of the birthday paradox algorithm to find two or more inputs into the key derivation function that generate the same cryptographic key. With an internal state length of 𝔩is. using the birthday paradox to calculate the collision, there is approximately a 50% chance of internal state collision at
Table 1 shows the finding of the brute force attack and collision attack towards the different type computation extractors. In general, extractors based on key-hash message authentication code (HMAC) for both HMAC_SHA1 and HMAC_SHA512, offer heightened security against brute force and collision attacks, compared to extractors based on stream ciphers or block ciphers. The extract with the lowest complexity is Trivium-based, with complexity of 280 for brute force attack and 240 for collision attack. Conversely, the extractor with the highest complexity is HMAC_SHA51-based, with complexity of 2512 for brute force attack and 2256 for collision attack. It can be inferred that the proposed key derivation function scheme (HMSKDF) with an HMAC-SHA512 based computation extractor and Trivium based pseudorandom expander offer high level of security with complexity of 2512 for brute force attack and 2256 for collision attack.
• Formal security analysis
HMSKDF is a two phases key derivation function scheme which constructed using the HMAC-SHA512 based computation extractor and Trivium based pseudorandom expander. As presented in ‘The proposed HMSKDF Scheme’, the HMAC-SHA512 based computational extractor and Trivium based pseudorandom expander (satisfying Definition 5 and Definition 6, respectively).
The following theorem establishes the security of HMSKDF based on the highest CAM security model. In this model, the polynomial adversary plays the CAM game and make at most q queries. The polynomial adversary can win this indistinguishability game with probability not greater than where ɛ is deemed negligible. We now provide the formal security proof of the HMSKDF scheme.
Theorem 1: Suppose that a HMAC_SHA512 functions as an ideal pseudorandom function that satisfying Definition 5 and Trivium operates as an ideal pseudorandom keystream generator that satisfying Definition 6. If a HMSKDF is constructed using HMAC_512 based computational extractor and Trivium based pseudorandom expander, the HMSKDF scheme is (m, min{qX, qY}, min{tX, tY}, ɛX + ɛY)-CAM secure with the respect to the private string p with entropy m.
Proof: To meet the requirements stated in Theorem 1, we must demonstrate: (a) the HMAC_512 based computational extractor is a (m, qX, tX, ɛX)-computational extractor; (b) the Trivium based pseudorandom expander is a (qY, tY, ɛY)-pseudorandom expander.
Computational extractor | Brute force | Collision |
---|---|---|
AES based extractor | 2128 | 264 |
Rabbit based extractor | 2128 | 264 |
Trivium based extractor | 280 | 240 |
HMAC_SHA1 based extractor | 2160 | 280 |
HMAC_SHA512 based extractor | 2512 | 2256 |
To prove (a), we assume that the HMAC_SHA512 based computational extractor is not a (m, qX, tX, ɛX)-computational extractor. This would mean that the probability for all probabilistic polynomial time adversary tX can distinguish a 𝔩PRKbits string. The string can be PRK, where it is generated by HMAC_SHA512 based computational extractor using the inputs p with m entropy or a truly random string. The probability probabilistic polynomial time adversary makes the correct guess is not greater , where ɛX is deemed negligible. This would also imply that the adversary can differentiate between the PRK and a truly random string of the same length in the context of the underlying HMAC-PRF using a polynomial time method. This contradicts the assumption that HMAC_SHA512 functions as an HMAC-PRF satisfies Definition 5. Hence, the statement (a) is proven to be true.
To prove (b), we assume that Trivium based pseudorandom expander based is not a (qY, tY, ɛY)-pseudorandom expander. This would mean that the probability for all probabilistic polynomial time adversary tYcan distinguish a n bits string. The string can be K, where it is generated by Trivium based pseudorandom expander or a random string. The probability probabilistic polynomial time adversary makes the correct guess is not greater , where ɛY is deemed negligible. This would also imply that the adversary can differentiate between K and a truly random string of the same length in the context of the underlying Trivium pseudorandom keystream generator using a polynomial time method. This contradicts the assumption that Trivium as secure pseudorandom keystream generator satisfies Definition 6. Hence, the statement (b) is proven to be true.
Hence, by Theorem 1 the HMSKDF that is built from the HMAC_SHA512 based computation extractor and Trivium based pseudorandom expander is (m, min{qX, qY}, min{tX, tY}, ɛX + ɛY)-CAM secure with the respect to the private string p with m entropy.
Software performance evaluation
In this section, the software performance is showcased by analyzing the execution time of 25 different combinations of computational extractors and pseudorandom expanders. These combinations include HMAC_SHA1 (Eastlake & Hansen, 2017), HMAC_512 (Eastlake & Hansen, 2017), AES128 (Song et al., 2006), Rabbit (Boesgaard, Vesterager & Zenner, 2008) and Trivium (Robshaw, 2008). In this simulation, one includes the existing key derivation function schemes such as hash-based message authentication code based key derivation (Krawczyk, 2010), block cipher based key derivation function (Barker, Chen & Davis, 2020) and stream cipher based key derivation function (Chuah, Dawson & Simpson, 2013).
There are two experiments were conducted to calculate the running time required to generate n bits K using the parameters p, s and c. The lengths of the parameters p, s, c, and n are based on the Host Identity Protocol (Moskowitz et al., 2015) as shown in Table 2. The experiments are simulated 100 times and the resulting times are recorded. The average simulation time is then calculated. The execution time is captured in nanoseconds using a CLOCK system. All the simulations were performed on a machine with the following specifications: AMD Ryzen 7 5700U with Radeon Graphics, 1.80 GHz, 16.0GB RAM and running a 64 bits Windows operating system.
Figure 3 displays the simulation results of the experiments, with experiment 2 featuring longer input lengths compared to experiment 1. For all key derivation functions schemes, the execution time shows an increase from experiment 1 to experiment 2.
The Trivium based key derivation function performs faster when the input length is shorter, taking just 13,815 nanoseconds. Conversely, it slows down as the input length 𝔩p increases (experiment 2). In comparison, the key derivation function using HMAC_SHA1 as the computational extractor and Trivium as the pseudorandom expander shows faster simulation speeds of 19,247 nanoseconds in experiment 2. On the other hand, the AES based key derivation function demonstrates the slowest simulation speeds among the different schemes. The combination of HMAC_SHA512 as the computational extractor and Trivium as the pseudorandom expander shows an average faster simulation time of 36,014 nanoseconds for experiment 1 and 41,618 nanoseconds for experiment 2.
Hardware performance
Table 3 illustrates the hardware performance for hash functions (SHA1 and SHA512), stream ciphers (Trivium and Rabbit) and block cipher (AES). The results indicate that the AES has the lowest throughput, while Trivium requires fewer resources and has the highest throughput, and SHA512 has the second highest throughput at 2909 Mb/s. Overall, these results suggest that a hardware-based HMSKDF utilizing HMAC-SHA512 and Trivium offers notable medium throughput and efficiency compared to other key derivation function schemes.
p | s | c | n | |
---|---|---|---|---|
Experiment 1 | 128 bytes | 8 bytes | 32 bytes | 64 bytes |
Experiment 2 | 256 bytes | 8 bytes | 32 bytes | 192 bytes |
Figure 3: Software performance (Time).
SHA1 (Satoh & Inoue, 2007) | SHA512 (Satoh & Inoue, 2007) | Trivium (Good & Benaissa, 2007) | Rabbit (Boesgaard, Vesterager & Zenner, 2008) | AES (Satoh et al., 2001) | |
---|---|---|---|---|---|
Gates | 9859 | 27297 | 4921 | 28000 | 5398 |
Technology (µm) | 0.13 | 0.13 | 0.13.13 | 0.18 | 0.11 |
Throughputs (Mb/s) | 2006 | 2909 | 22300 | 473.6 | 311.09 |
Conclusion
The key derivation function is an essential component in cryptographic systems, utilized to create cryptographic keys from non-uniformly random strings. This research examines different combinations of computational extractors and pseudorandom expanders in a cryptographic context, with emphasis on factors such as execution time, hardware performance and security analysis. It is crucial that the resulting cryptographic keys are impossible to differentiate from random binary strings of equivalent length, as they play a vital role in protecting data during storage and transmission across unsecured channels. Hence, this study includes different cryptographic primitives such as HMAC_SHA512, HMAC_SHA1, AES128, Trivium, and Rabbit. The findings indicate that the combination of a HMAC_SHA256 based computational extractor and a Trivium based pseudorandom expander is ideal for creating a secure key derivation function scheme, denoted as HMSKDF. The HMSKDF demonstrates the highest security with a brute force complexity of are 2512 anda collision attack complexity of 2256 as well as average efficiency in terms of execution time and throughput. Overall, this article establishes that the HMSKDF is proven -CAM secure and efficient as alternative key derivation function proposal that can be implemented in existing applications.