Key derivation function: key-hash based computational extractor and stream based pseudorandom expander

View article
PeerJ Computer Science

Introduction

Key derivation functions are commonly applied in protocols like Transport Layer Security (Rescorla, 2018) and Host Identity Protocol (Moskowitz et al., 2015), making it crucial to develop secure key derivation function proposals to protect electronic data transmitted over insecure channels (Italis, Pierre & Quintero, 2023; Muthakshi & Mahesh, 2024; Venčkauskas et al., 2024; Saif, Migliorini & Spoto, 2024). In recent years, key derivation function proposals have typically been structured in two phases, involving a computational extractor and a pseudorandom expander (Krawczyk, 2010; Barker, Chen & Davis, 2020; Chuah, Dawson & Simpson, 2013; Saran, 2024). This two phases approach allows researchers to separately design and analyze the security of the extractor and expander components. The key derivation function is a cryptographic algorithm takes arbitrary length of private string and public strings and generates one or more pseudorandom cryptographic keys. The entropy of the private string determines the security of the key derivation functions.

Existing key derivation function schemes has keyed-hash message authentication code based key derivation function (Krawczyk, 2010), block cipher key derivation function (Barker, Chen & Davis, 2020), and stream cipher key derivation function (Chuah, Dawson & Simpson, 2013; Chuah & Koh, 2017). Keyed-hash message authentication code and block ciphers produce a fixed-length output from a variable-length input, requiring modification when the desired cryptographic key length is not an exact multiple of the output block size. This can result in wasted bits. To address this issue, a key derivation function based on stream ciphers has been suggested, allowing for the generation of cryptographic keys of any length without discarding excess bits. It should be noted that key derivation functions based on keyed-hash message authentication code offer higher security compared to those based on stream ciphers. The security of key derivation functions is dependent on the underlying ciphers, with keyed-hash message authentication code using SHA512 offering a higher brute force complexity of 2512 and collision complexity of 2256, while Trivium has a brute force complexity of 280 and collision complexity of 240.

To date, the key derivation functions scheme which consists of the computational extractor and the pseudorandom expander are constructed using the same cryptographic primitives. However, this approach has limitations, such as the block ciphers and the keyed-hash message authentication code can only produce fixed blocks length of cryptographic key and are slower, but with better security. On the other hand, stream ciphers can generate arbitrary length of cryptographic key, execute faster but offer lower security.

Our contribution

In this article, we introduce the development of a keyed-hash based computation extractor and a stream-based pseudorandom expander. We denoted this design as HMSKDF. The cryptographic primitive that is constructing the keyed-hash based computation extractor is the keyed-hash message authentication code that utilizes the secure hash algorithm with an output block size of 512 bits. For the stream-based pseudorandom expander, the cryptographic primitive is Trivium. The output for HMSKDF is arbitrary length. The research will focus on evaluating the security, software, and hardware performance of this alternative approach. Overall, our proposed HMSKDF provides a significant advancement in the field of key derivation functions, offering better security for ensuring the pseudo randomness of the generated cryptographic keys and eliminating bits wastage.

Organization of the article

We provide the theoretical definition of key derivation function in ‘Key Derivation Function’. ‘The proposed HMSKDF Scheme’ presents the HMSKDF scheme, the security analysis for the proposed scheme, the software performance evaluation in term of execution time and hardware performance. ‘Conclusion’ concludes the article.

Key Derivation Function

Key derivation function is a cryptographic function. It generates arbitrary length n of cryptographic key K from public string p, public salt s and public context information c as shown in Fig. 1.

Definition 1. Key derivation function is defined as Krawczyk (2010), Chuah, Dawson & Simpson (2013) and Saran (2024): K p , s , c , n

  • p is private string, it is chosen from the public string space β, such that p ∈ β with the length 𝔩p and probability distribution 𝔓.

  • s is a public random string consisting of salt space τ, such that s ∈ τ with the length 𝔩s and probability distribution 𝔖.

  • c is a public context string consisting of context space ω, such that c ∈ ω with the length 𝔩c and probability distribution ℭ.

  • K isa pseudorandom cryptographic key.

  • n is a positive integer.

Definition 2. (Computational extractor) (Krawczyk, 2010). Set spaces of private string p with m min-entropy is β and public random string s is τ. A computational extractor is defined as: E x t r a c o r : 0 , 1 l p x 0 , 1 l s 0 , 1 l P K R .

General key derivation function.

Figure 1: General key derivation function.

The computational extractor is known as (maXqXɛX)–min-entropy computational extractor if for all probabilistic polynomial time adversary aX who makes qX, queries to the Extractor. The probability for the probabilistic polynomial time adversary may distinguish a string of the same length, either the string is the derived PKR from the Extractor or the string is just a random string is not larger than 1 2 + ɛ X , where ɛX is deemed negligible.

Definition 3. (Pseudorandom expander) (Krawczyk, 2010). Known public context string set space ω. A pseudorandom expander is defined as: E x p a n d e r : 0 , 1 l P K R x 0 , 1 l c 0 , 1 n .

The pseudorandom expander is known as (aYqYɛY)-pseudorandom expander if for all probabilistic polynomial time adversary aY who makes qY queries to the Expander. The probability for the probabilistic polynomial time adversary may distinguish a string of the same length, either the string is the derived cryptographic key from the Expanderor the string is just a random string is not larger than 1 2 + ɛ Y , where ɛY is deemed negligible.

CAM security model

The security objective of a key derivation function is to ensure that the values of private string s is unknown and public string c is known. The cryptographic key K generated by the key derivation function is indistinguishable from truly random binary strings of the same length. Koh & Chuah (2020) proposed the robust security model for key derivation function, namely CAM. The CAM serves as the formal security framework for evaluating the security of key derivation function. The CAM security model employs modern cryptography security proof technique, involving an indistinguishability game between two players known as the challenger and adversary. The indistinguishability game is performed in polynomial time t. The adversaries with the capability of influencing all the inputs of the key derivation function to conduct the bit-flipping attack, the objective is to identify any weaknesses in the key derivation function within polynomial time.

Definition 4. (CAM-secure) (Koh & Chuah, 2020). Key derivation function is m , t , q , ɛ CAM-secure if for all probabilistic polynomial time adversary is making q queries ( q < β × γ ) to the key derivation function with chosen bit position of the private string s. The adversary can win this indistinguishability game with probability not greater than 1 2 + ɛ , where ɛ is deemed negligible.

  1. Challenger chooses p←β.

  2. For i = 1, …, q′ ≤ q,

    1. Adversary chooses the bit position zi of p.

    2. Adversary chooses siτ and ciω.

    3. Adversary sends zi, si and ci to challenger.

    4. Challenger generates K i t i K D F p , s i , c i , n , ti is time that used to generate the Ki.

    5. Challenger sends Ki and ti to adversary.

  3. Adversary chooses z,  sτ and cω, subjected that ( z , s , x z i , s i , c i , , z q s q , c q . Adversary sends z, s and c to challenger.

  4. Challenger chooses b randomly, b R 0 , 1 .

    1. If b = 0, challenger generates K t b K D F p , s , c , n , t0 is the execution time that used to generate the cryptographic key K′.

    2. Else challenger generates random K R , t b 0 , 1 n , t1 is the random time that used to generate the random K′.

    3. Challenger sends K′ to adversary.

  5. Continue q − q′ queries, follows the step 2 subjected z i , s i , c i z , s , c .

  6. Adversary win the game if b′ = b.

    1. If adversary output b′ = 0,  then adversary believes K′ is derived cryptographic key K from key derivation function.

    2. Else adversary output b′ = 1. 

Hash-based message authentication code based key derivation function

Krawczyk (2010) proposed hash-based message authentication code based key derivation function (HKDF). The HKDF is constructed using computational extractor and pseudorandom expander. The HKDF has been proven to be CAM-secure (Koh & Chuah, 2020). The computational extractor and pseudorandom expander of HKDF are defined as: P R K F s o p a d F s i p a d p K i + 1 F P R K o p a d F P R K i p a d K i c i

  • 𝔉 is hash function, either secure hash algorithm 1(SHA1) or secure hash algorithm 2 (SHA2).

  • ⨁ is exclusive OR.

  • ∥ is string concatenation.

  • opad is outer padding, is formed by repeating the byte 0x36.

  • ipad is inner padding, is formed by repeating byte 0x5c.

  • 1 i < L , L = n l P R K .

The length s,  𝔩s must equal with 𝔩PRK. Therefore, s is hashed using 𝔉, if 𝔩s > 𝔩PRK. Or, s is padded with zero until 𝔩s is equal with 𝔩PRK, if ls < 𝔩PRK. The 𝔩PRK issame with the length of hash digest (SHA1 or SHA2). The first n bits derived cryptographic key K = K(1)||…||K(𝔏 − 1) is utilized as cryptographic keys, while the remaining bits is discarded.

One benefit of HKDF is its ability to handle inputs of arbitrary length, but a drawback is that it generates fixed-length blocks and discards any excess bits, causing wastage.

Definition 5. (HMAC-PRF) (Hirose, 2019; Gaži, Pietrzak & Rybár, 2014; Naik & Singh, 2024). A hash-based message authentication code, 𝔉:𝕂x𝔻 → ℝ, with key of 𝕜 ∈ 𝕂. A keyed-function with specific to the input length, such that G:0, 1cx0, 1b → 0, 1c is (ɛtg, 𝔩)-HMAC-PRF secure, if all probabilistic polynomial time adversary t, making at most q queries, each of length at most 𝔩 (the bbits block), a R:0, 1b → 0, 1c and a uniformly random key 𝕂←0, 1c, therefore, Δ Adversary G K , R ɛ , where ɛ is deemed negligible.

Stream cipher based key derivation function

Chuah, Dawson & Simpson (2013) proposed a stream cipher based key derivation function (SCKDF). The SCKDF has been proven not CAM-secure (Koh & Chuah, 2020). The computation extractor of SCKDF is defined as: P R K S p 1 s p 2 p 3 p l 1 p l .

The inputs for pseudorandom keystream generator (𝔖) consist of key and initial vector. The length for the key, we denoted as 𝔩sk. The length for the initial vector, we denoted it as 𝔩iv. These inputs are substituted with the input pairs of key derivation function p and s (Italis, Pierre & Quintero, 2023). If s is null, p is divided into 𝔩sk + 𝔩iv per block. If s is not null, the length of s is suggested to be same with 𝔩sk and p is divided into 𝔩ivper block. If 𝔩pis greater than𝔩iv, the first block’s length of p is 𝔩iv. The remaining block’s length of p is 𝔩sk + 𝔩iv. The pseudorandom keystream generator executes the entire blocks p𝔩of p. Then, outputs PRK. The length of PRK,  𝔩PKR is equal to the key length of pseudorandom keystream generator for pseudorandom expander phase 𝔩sk.

The pseudorandom expander of SCKDF is defined as: K S P R K c 1 c 2 c l .

The input for pseudorandom keystream generator (Gaži, Pietrzak & Rybár, 2014) is substituted with the input pairs of key derivation function PRK and c (Chuah, Dawson & Simpson, 2013). The public string c can be null or not null. If c is null, c is 0𝔩iv. If c is not null, c is divided 𝔩iv per block. The pseudorandom keystream generator executes the entire blocks of c. Then, outputs n bits K.

The design of SCKDF is allows combination of different types of pseudorandom keystream generator for computational extractor and pseudorandom expander (Chuah, Dawson & Simpson, 2013). This pseudorandom expander can generate arbitrary length of cryptographic key without discarding any excess bits, thus enhancing efficiency (Canniere & Preneel, 2008). However, pseudorandom keystream generator is not able to accommodate inputs of arbitrary length, the modification made to process the key derivation function inputs for the computational extractor phase has led to the SCKDF is not CAM-secure.

Definition 6. (PKG) (Katz & Lindell, 2014). The pseudorandom keystream generator is considered to pass all statistical tests within polynomial time if the polynomial time algorithm can distinguish between the output sequence of the generator and a truly random sequence of equal length with probability significantly greater than 1 2 .

Block cipher based key derivation function

NIST SP 800-56C specified a cipher-based message authentication code based key derivation function (BKDF) (Barker, Chen & Davis, 2020) which consists of computational extractor and pseudorandom expander. The BKDF has been proven CAM-secure (Koh & Chuah, 2020). The computation extractor of BKDF is defined as: P R K i B s P R K i 1 D i

  • 𝔅 is advanced encryption standard with key length of either 128 bits, 192 bits or 256 bits.

  • Di is p divided into 128 bits per block.

  • s is considered as the key for advanced encryption standard.

  • Initial PRK0is0128. lPRK is128 bits.

  • 1 i < t , t = l p 128 .

The pseudorandom expander of BKDF is defined as: K i B P R K K i 1 M i

  • 𝔅 is advanced encryption standard with key length of 128 bits.

  • Mi is considered as c with 128 bits per block.

  • PRK is considered as the secret key for advanced encryption standard.

  • Initial K(0)is0128.

  • 1 i < t , t = l c 128 ,

If n is greater than 128, the iterations in generating K are continued until the required length is exceed by n 128 . The K is comprised of the initial n bits, while the remaining bits are deleted.

The benefit of BKDF is its ability to handle inputs of arbitrary length. A similar drawback of HKDF is that it produces blocks of a fixed-length and discards any excess bits, leading to inefficiency. Another limitation is that the pseudorandom expander is fixed for advanced encryption standard with a key length of 128 bits.

The proposed HMSKDF Scheme

We formalized our key-hash based computational extractor (Eq. (4) satisfies Definition 5) and stream based pseudorandom expander (Eq. (7) satisfies Definition 6) which is relatively straightforward function as shown in Fig. 2. The computational extractor based HMAC_SHA512 which takes arbitrary length inputs of p and s. The Trivium based pseudorandom expander generates arbitrary length of n bits cryptographic key.

Security analysis

This section provides a security analysis of general attacks that can be used against the different types of computation extractors and a formal security proof for the proposed HMSKDF scheme.

•   Brute force attack and collision attack

Assuming that no weaknesses are present in the key derivation functions, the key derivation functions are vulnerable to brute force attacks and collision attacks against the internal state of the computation extractor. If the internal state of the computation extractor is compromised, it can be used to generate the entire cryptographic keys. It should be noted that if the cryptographic keys are compromised, they can no longer be used to protect the security of electronic data.

The brute force attack is a method where the attacker systematically generates all possible strings of internal states of the computation extractor. The adversary can try all possible combinations of bits in the string until the correct one is found, which is then used to generate the cryptographic key. If the length of the internal state is 𝔩is. Then, the complexity to brute force the internal state is 2𝔩is.

The proposed HMSKDF Scheme.
Figure 2: The proposed HMSKDF Scheme.

The collision attack is a method where the adversary uses the concept of the birthday paradox algorithm to find two or more inputs into the key derivation function that generate the same cryptographic key. With an internal state length of 𝔩is. using the birthday paradox to calculate the collision, there is approximately a 50% chance of internal state collision at 2 l i s 2 .

Table 1 shows the finding of the brute force attack and collision attack towards the different type computation extractors. In general, extractors based on key-hash message authentication code (HMAC) for both HMAC_SHA1 and HMAC_SHA512, offer heightened security against brute force and collision attacks, compared to extractors based on stream ciphers or block ciphers. The extract with the lowest complexity is Trivium-based, with complexity of 280 for brute force attack and 240 for collision attack. Conversely, the extractor with the highest complexity is HMAC_SHA51-based, with complexity of 2512 for brute force attack and 2256 for collision attack. It can be inferred that the proposed key derivation function scheme (HMSKDF) with an HMAC-SHA512 based computation extractor and Trivium based pseudorandom expander offer high level of security with complexity of 2512 for brute force attack and 2256 for collision attack.

•   Formal security analysis

HMSKDF is a two phases key derivation function scheme which constructed using the HMAC-SHA512 based computation extractor and Trivium based pseudorandom expander. As presented in ‘The proposed HMSKDF Scheme’, the HMAC-SHA512 based computational extractor and Trivium based pseudorandom expander (satisfying Definition 5 and Definition 6, respectively).

The following theorem establishes the security of HMSKDF based on the highest CAM security model. In this model, the polynomial adversary plays the CAM game and make at most q queries. The polynomial adversary can win this indistinguishability game with probability not greater than 1 2 + ɛ , where ɛ is deemed negligible. We now provide the formal security proof of the HMSKDF scheme.

Theorem 1: Suppose that a HMAC_SHA512 functions as an ideal pseudorandom function that satisfying Definition 5 and Trivium operates as an ideal pseudorandom keystream generator that satisfying Definition 6. If a HMSKDF is constructed using HMAC_512 based computational extractor and Trivium based pseudorandom expander, the HMSKDF scheme is (m, min{qXqY}, min{tXtY}, ɛX + ɛY)-CAM secure with the respect to the private string p with entropy m.

Proof: To meet the requirements stated in Theorem 1, we must demonstrate: (a) the HMAC_512 based computational extractor is a (mqXtXɛX)-computational extractor; (b) the Trivium based pseudorandom expander is a (qYtYɛY)-pseudorandom expander.

Table 1:
Complexity of brute force attack and collision attack towards the different type computation extractors.
Computational extractor Brute force Collision
AES based extractor 2128 264
Rabbit based extractor 2128 264
Trivium based extractor 280 240
HMAC_SHA1 based extractor 2160 280
HMAC_SHA512 based extractor 2512 2256
DOI: 10.7717/peerjcs.2249/table-1

To prove (a), we assume that the HMAC_SHA512 based computational extractor is not a (mqXtXɛX)-computational extractor. This would mean that the probability for all probabilistic polynomial time adversary tX can distinguish a 𝔩PRKbits string. The string can be PRK, where it is generated by HMAC_SHA512 based computational extractor using the inputs p with m entropy or a truly random string. The probability probabilistic polynomial time adversary makes the correct guess is not greater 1 2 + ɛ X , where ɛX is deemed negligible. This would also imply that the adversary can differentiate between the PRK and a truly random string of the same length in the context of the underlying HMAC-PRF using a polynomial time method. This contradicts the assumption that HMAC_SHA512 functions as an HMAC-PRF satisfies Definition 5. Hence, the statement (a) is proven to be true.

To prove (b), we assume that Trivium based pseudorandom expander based is not a (qYtY,  ɛY)-pseudorandom expander. This would mean that the probability for all probabilistic polynomial time adversary tYcan distinguish a n bits string. The string can be K, where it is generated by Trivium based pseudorandom expander or a random string. The probability probabilistic polynomial time adversary makes the correct guess is not greater 1 2 + ɛ Y , where ɛY is deemed negligible. This would also imply that the adversary can differentiate between K and a truly random string of the same length in the context of the underlying Trivium pseudorandom keystream generator using a polynomial time method. This contradicts the assumption that Trivium as secure pseudorandom keystream generator satisfies Definition 6. Hence, the statement (b) is proven to be true.

Hence, by Theorem 1 the HMSKDF that is built from the HMAC_SHA512 based computation extractor and Trivium based pseudorandom expander is (m, min{qXqY}, min{tXtY}, ɛX + ɛY)-CAM secure with the respect to the private string p with m entropy.

Software performance evaluation

In this section, the software performance is showcased by analyzing the execution time of 25 different combinations of computational extractors and pseudorandom expanders. These combinations include HMAC_SHA1 (Eastlake & Hansen, 2017), HMAC_512 (Eastlake & Hansen, 2017), AES128 (Song et al., 2006), Rabbit (Boesgaard, Vesterager & Zenner, 2008) and Trivium (Robshaw, 2008). In this simulation, one includes the existing key derivation function schemes such as hash-based message authentication code based key derivation (Krawczyk, 2010), block cipher based key derivation function (Barker, Chen & Davis, 2020) and stream cipher based key derivation function (Chuah, Dawson & Simpson, 2013).

There are two experiments were conducted to calculate the running time required to generate n bits K using the parameters p, s and c. The lengths of the parameters p, s, c, and n are based on the Host Identity Protocol (Moskowitz et al., 2015) as shown in Table 2. The experiments are simulated 100 times and the resulting times are recorded. The average simulation time is then calculated. The execution time is captured in nanoseconds using a CLOCK system. All the simulations were performed on a machine with the following specifications: AMD Ryzen 7 5700U with Radeon Graphics, 1.80 GHz, 16.0GB RAM and running a 64 bits Windows operating system.

Figure 3 displays the simulation results of the experiments, with experiment 2 featuring longer input lengths compared to experiment 1. For all key derivation functions schemes, the execution time shows an increase from experiment 1 to experiment 2.

The Trivium based key derivation function performs faster when the input length is shorter, taking just 13,815 nanoseconds. Conversely, it slows down as the input length 𝔩p increases (experiment 2). In comparison, the key derivation function using HMAC_SHA1 as the computational extractor and Trivium as the pseudorandom expander shows faster simulation speeds of 19,247 nanoseconds in experiment 2. On the other hand, the AES based key derivation function demonstrates the slowest simulation speeds among the different schemes. The combination of HMAC_SHA512 as the computational extractor and Trivium as the pseudorandom expander shows an average faster simulation time of 36,014 nanoseconds for experiment 1 and 41,618 nanoseconds for experiment 2.

Hardware performance

Table 3 illustrates the hardware performance for hash functions (SHA1 and SHA512), stream ciphers (Trivium and Rabbit) and block cipher (AES). The results indicate that the AES has the lowest throughput, while Trivium requires fewer resources and has the highest throughput, and SHA512 has the second highest throughput at 2909 Mb/s. Overall, these results suggest that a hardware-based HMSKDF utilizing HMAC-SHA512 and Trivium offers notable medium throughput and efficiency compared to other key derivation function schemes.

Table 2:
Experiment parameters.
p s c n
Experiment 1 128 bytes 8 bytes 32 bytes 64 bytes
Experiment 2 256 bytes 8 bytes 32 bytes 192 bytes
DOI: 10.7717/peerjcs.2249/table-2
Software performance (Time).

Figure 3: Software performance (Time).

Table 3:
Complexity of ideal computational extractor based on stream ciphers, HMAC and block ciphers.
SHA1 (Satoh & Inoue, 2007) SHA512 (Satoh & Inoue, 2007) Trivium (Good & Benaissa, 2007) Rabbit (Boesgaard, Vesterager & Zenner, 2008) AES (Satoh et al., 2001)
Gates 9859 27297 4921 28000 5398
Technology (µm) 0.13 0.13 0.13.13 0.18 0.11
Throughputs (Mb/s) 2006 2909 22300 473.6 311.09
DOI: 10.7717/peerjcs.2249/table-3

Conclusion

The key derivation function is an essential component in cryptographic systems, utilized to create cryptographic keys from non-uniformly random strings. This research examines different combinations of computational extractors and pseudorandom expanders in a cryptographic context, with emphasis on factors such as execution time, hardware performance and security analysis. It is crucial that the resulting cryptographic keys are impossible to differentiate from random binary strings of equivalent length, as they play a vital role in protecting data during storage and transmission across unsecured channels. Hence, this study includes different cryptographic primitives such as HMAC_SHA512, HMAC_SHA1, AES128, Trivium, and Rabbit. The findings indicate that the combination of a HMAC_SHA256 based computational extractor and a Trivium based pseudorandom expander is ideal for creating a secure key derivation function scheme, denoted as HMSKDF. The HMSKDF demonstrates the highest security with a brute force complexity of are 2512 anda collision attack complexity of 2256 as well as average efficiency in terms of execution time and throughput. Overall, this article establishes that the HMSKDF is proven m , q , t , ɛ -CAM secure and efficient as alternative key derivation function proposal that can be implemented in existing applications.

Supplemental Information

1 Citation   Views   Downloads