Secure usage of the Cryptographic Library

Revision as of 13:47, 5 May 2021 by Registered User
Under construction.png Coming soon

This page gives some hints and ways of implement the use of the Cryptographic Library in a more secure manner.

1. Cache-Timing Attacks resistance

Timing Attacks

Timing attacks are specific attacks that leverage on differences in computational timings, in order to retrieve information on the private data used during a cryptographic algorithm execution. Vulnerability to this class of attacks in mainly due to:

  • Conditional branches in the code, involving secret data
  • Usage of low-level machine instruction, whose computational time depends on input data
  • Compiler optimizations

Regular code execution

The solutions to avoid timing attacks are the following:

  • Develop regular code, running for the same amount of time, independently from the input secret data
  • Break the correlation between secret input data and computational timing, by randomizing the secret information, in order for an attacker to not be able to recover the secret from the retrieved randomized data

Cache-Timing Attacks (CTA)

Data leakages could occur even in a regular implementation, because of the presence of caches between CPU and memory.

Indeed, caches are fast memories used to store frequently used data: data loaded from the memory are stored in the cache before being passed to the CPU, and kept in the cache in order to be quickly reloaded the next time they are requested.

Therefore, if data are in cache, the load will be faster, while if the data is not in cache, the loading time will be higher because information must be retrieved from main memory.

CTA exploit this property to steal secret data even from regular algorithm implementations, where secret data is used as index to access a memory location (e.g. table).

Countermeasures against CTA

To avoid CTA, a real constant-time implementation must be developed.

In addition to a regular code, a correct management of the cache flow must be kept in consideration.

Full parsing of the tables

A solution against these attacks could be not to directly access the table using the secret data, but instead perform a loop and load every single memory location of the table, and conditionally keep only the desired data.

The conditional test to see if the current iteration is the desired one, must be performed with regular code, to avoid branches on secret data.

In this way, the attacker is not able to determine the secret index, because every possible index is used and every memory location is loaded, determining a constant computational timing.

The side effect of this countermeasure is a slight increase of the computational timing, due to the need of load every element of the table, instead of the only needed value.

Use non-cacheable memories

Another solution to address caches vulnerability is to avoid the usage of caches: often only flash is affected by cache, but sometimes faster memory than FLASH/SRAM are present in microcontroller, and they are not served by cache.

If those memories are big enough to keep the implementation tables, the advantage in using them is double:

  • no vulnerability to CTA;
  • faster memory loading, resulting in faster computational timing.

How to enable CTA resistance in the Cryptographic Library

The Cryptographic Library addresses CTA with constant-time code and using non-cacheable memories.

The library already has linker labels associated to CTA sensible internal tables (usually residing in flash): the user only needs to map objects labeled with label “CMOX_CTA_PROTECTED_DATA” onto non-cacheable desired memory.

Let’s see an example on how to create a linker script:

IAR Embedded Workbench: (Linker configuration file, .icf)

initialize by copy { readwrite, section CMOX_CTA_PROTECTED_DATA};
...
define region SAFEMEM_region  = mem:[from 0x20030000   to 0x20034000];
...
place in SAFEMEM_region { readwrite section CMOX_CTA_PROTECTED_DATA };

Arm® KEIL µVision® IDE (Scatter file, .sct):

LR_IROM1 0x08000000 0x00200000 {   ; load region size_region
  ...
  SAFEMEM 0x20030000 0x00004000 {
    *(CMOX_CTA_PROTECTED_DATA)
  } 
  ...
}
Info white.png Information
Note: Arm® is a registered trademark of Arm Limited (or its subsidiaries) in the US and/or elsewhere.

STM32CubeIDE (Linker script, .ld)

SECTIONS
{
  ...
  user_SAFEMEM 0x20030000:
  {
    . = ALIGN(8);
    *(CMOX_CTA_PROTECTED_DATA)
    . = ALIGN(8);
  }
  ...
}

In the linker file, add the hightlighted lines, checking in your microcontroller manual for the correct non-cacheable memory start address and size. Please, note that the label SAFEMEM is a simple string decided by the user, and it’s not a specific value to select among a set of predefined values.

Depending on the microcontroller used, there could be two scenarios to face:

  • only flash is affected by cache (e.g. ART caches): in this case no further operations are required to secure the implementation against CTA;
  • even SRAM is affected by cache (e.g. L1 cache on STM32F7/H7): in this case, only for ECC/RSA modules, where temporary buffer is requested for internal computations, the user needs also to declare the temporary buffer in the following way:
  CMOX_CTA_RESISTANT uint8_t membuf[3000];

instead of the standard way:

  uint8_t membuf[3000];

To be able to use that macro, it is necessary to include the cmox_cta.h file, located in the root of the include folder.

2. Robust comparison (against simple faults)

Fault Attacks

Faults are natural events for electronic devices, and could be caused by several elements, like too high/low temperature, unsupported supply voltage or current, strong electric or magnetic fields, and so on.

Fault attacks consist in causing intentional faults to the device, in order to influence the operation of the processor with the purpose of retrieving secret information involved in the computation.

Simple Faults

Simple faults are meant when the attack scenario takes into consideration only those attacks composed by single faults only, with limited possibilities:

  • jump one or more consecutive instructions;
  • set a register (or a memory location) value to 0 or 0xFF…FFF;
  • flip every (or some) bits in a register (or a memory location).

What is robust comparison

Robust comparison is a strong way to check if two buffers are equal or not, returning the appropriate comparison result (success or failure).

Indeed, as much as a signature verification could be securely implemented, an attacker could simply fault the final buffer comparison (or the comparison final result) in order to change the verification response to success.

How to enable robust comparison in the Cryptographic Library

To avoid this, a robust comparison has been implemented, and every verification function in the library has a specific parameter (called P_pFaultCheck) to manage it.

The parameter is an optional pointer, therefore giving the chance to let it be NULL, if the robustness against simple faults is not needed in the current scenario.

If, instead, the parameter is provided to the verification function, the routine will robustly check against simple faults, returning not only the standard return value, but filling the parameter variable with a second return value. This second return value has to be checked against the main return value:

  1. if the main return value is FAIL, then it means that the function correctly failed the verification, therefore there is no need to check the second return value;
  2. if the main return value is SUCCESS, then it means that the function correctly verified the inputs, or it didn’t detect a fault in the verification process: for this reason, it is necessary to check the second return value (the one inside the P_pFaultCheck parameter), comparing it against the first return value:
    1. if it also has SUCCESS value, so it means that the verification was really successful;
    2. otherwise, if it is different from the SUCCESS value, a fault was detected and the verification failed. Please not that, in case of faults, the second return value could be a dirty value, therefore the user must consider a failure every value different from SUCCESS.

This robust comparison alone, however, is not sufficient to protect against simple faults.

Take, as example, the code below:

…
retval = cmox_rsa_pkcs1v15_verify(&rsa_ctx,
                                  pub_key,
                                  digest,
                                  CMOX_RSA_PKCS1V15_HASH_SHA256,
                                  signature,
                                  sizeof(signature),
                                  &fault_check);
if ((retval == CMOX_RSA_SUCCESS) && (fault_check == CMOX_RSA_SUCCESS))
{
  return OK;
}
else
{
  return FAIL;
}

Indeed, an attacker could simply fault the conditional branch after the verification, in order not to take the failure branch, but instead take the success branch.

For this reason it’s important to securely check the two results of the robust verification process, for example like in the code below:

…
retval = cmox_rsa_pkcs1v15_verify(&rsa_ctx,
                                  pub_key,
                                  digest,
                                  CMOX_RSA_PKCS1V15_HASH_SHA256,
                                  signature,
                                  sizeof(signature),
                                  &fault_check);
if (retval == CMOX_RSA_SUCCESS)
{
  /* TODO: perform operations necessary to the final result of the developing functions */
  …
  if (fault_check == CMOX_RSA_SUCCESS)
  {
    return OK;
  }
  else
  {
     /* TODO: rollback previous operations computed after the first successful branch */
    …
    return FAIL;
  }
}
else
{
  return FAIL;
}

In this way an attacker could try to fault the program in order to directly jump to the OK return, but this will avoid the execution of the necessary operations (marked by the comment), and therefore the computation will result in some error or segmentation fault, thwarting the successful attack.

With this new code, the attacker is forced to fault the computation twice, in the exact positions corresponding to the two branches, but this is out of scope of the simple faults protections.

3. RSA countermeasures for Bellcore attack

Bellcore Attack

The Bellcore attack is a fault attack that aims to recover the RSA private exponent by faulting one or more RSA computations (more specifically, the internal modular exponentiation), depending on the algorithm used.

To recover the private exponent of an RSA using CRT (Chinese Remainder Theorem) the attacker needs to fault just one operation, while for a standard RSA implementation, the attacker needs more faulted operations, all performed with certain criteria.

Therefore, in the Cryptographic Library it has been decided to protect the CRT implementation only.

Private operations in RSA are both used for decryption and signature creation, but since the Bellcore Attack needs the output of the faulted operation, and since a faulted decryption will result in a wrong padding causing the output to be filled with zeros, then the RSA decryption processes don’t needs to be protected, because the attacker would not be able to get the faulted output.

Moreover, because of the algorithm itself, RSA PKCS#1 v2.2 is not vulnerable against the Bellcore Attack, therefore only the RSA PKCS#1 v1.5 Signature Generation routine needs to be protected.

Countermeasure

For signatures operations the countermeasure is simple: just need to verify if the signature is valid before outputting it.

Indeed, the signature verification (through the public key) is a very cheap operation with respect to the signature generation process, therefore only resulting in a slight increase of the computational timing.

For decryption, there is no high-level countermeasure as the re-encryption of the decrypted message is non-deterministic (new random is used) and the values will differ. But, as previously written, since a faulted decryption would probably not have a valid padding, the faulted decrypted message will not be available to the attacker, thus not allowing the attack.

How to enable Bellcore Attack countermeasure in the Cryptographic Library

To enable the countermeasure, instead of calling the standard way to set the CRT key:

cmox_rsa_retval_t cmox_rsa_setKeyCRT(cmox_rsa_key_t *P_pPrivKey,
                                     size_t         P_ModulusBitLen,
                                     const uint8_t  *P_pExpP,
                                     size_t         P_ExpPLen,
                                     const uint8_t  *P_pExpQ,
                                     size_t         P_ExpQLen,
                                     const uint8_t  *P_pP,
                                     size_t         P_PLen,
                                     const uint8_t  *P_pQ,
                                     size_t         P_QLen,
                                     const uint8_t  *P_pIq,
                                     size_t         P_IqLen);

the user needs to call the following function:

cmox_rsa_retval_t cmox_rsa_setKeyCRTwithFACM(cmox_rsa_key_t *P_pPrivKey,
                                             size_t         P_ModulusBitLen,
                                             const uint8_t  *P_pExpP,
                                             size_t         P_ExpPLen,
                                             const uint8_t  *P_pExpQ,
                                             size_t         P_ExpQLen,
                                             const uint8_t  *P_pP,
                                             size_t         P_PLen,
                                             const uint8_t  *P_pQ,
                                             size_t         P_QLen,
                                             const uint8_t  *P_pIq,
                                             size_t         P_IqLen,
                                             const uint8_t  *P_pPubKey,
                                             size_t         P_PubKeyLen);

(FACM stands for Fault Attacks CounterMeasure) that requests the user to also provide the public key, in addition to the CRT private fields.

This call will enable the countermeasure during the subsequent RSA computations, whose function calls remain the same.

4. Secure usage of random material

Why random material is so important

Rarely happens to find statements about the strength of the random number generator used by a security system: power consumption and bit generation speed are considered more important than the randomness of the generated bits.

This is strange, considering that in most cryptographic systems, the quality of the random numbers used directly determines the security strength of the system.

The Kerckhoff principle states that the security of the system must depend solely on the key material, and not on the design of the system. This means that all modern security algorithms and protocols have their cryptographic strength expressed in the number of (key) bits that an attacker needs to guess before he/she can break the system (the actual definition of cryptographic strength is a measure of the complexity of a computational attack against the algorithm, often expressed as the log2 of the number of times attacker must invoke the algorithm before learning the value of the key used).

However, even if the key material for a cryptographic algorithm is not directly exposed, its generation from a bad source of randomness can compromise the overall security of the system. If there exists some sort of correlation among the “random” generated bits, an attacker can use statistical methods for deriving part or the whole key.

Some organizations, such as NIST, have defined statistical tests to be run on the output of a random number generator. Those tests should provide answers to some important problems, like determining how much unpredictability the random source generates, or how this unpredictability can be translated into random bits. The challenge is, however, that statistical tests can’t really prove that a source is random: at best, these tests can determine that a source is not, or only partially, unpredictable.

Indeed, it is highly possible to create a module that creates a bit string that will pass all the available statistical tests, but still generates a completely predictable bit stream, once the internal state of the module is known.

For this reason, standardization and certification organizations tend to require that a random number generator design is not tested by passing its output through a statistical test; rather, these organizations require that the design of the generator is reviewed, and its randomness generation properties explained and proven.

TRNG vs DRBG

A True Random Number Generator (TRNG) is a device able to generate random numbers from a physical process, and not by means of an algorithm. Those devices are based on microscopic phenomena generating low-level, statistically random signals, such as thermal noise, photoelectric effect and other quantum phenomena. These stochastic processes are, in theory, completely unpredictable, and the theory's assertions of unpredictability are subject to experimental test.

A Deterministic Random Bit Generator (DRBG), or Pseudorandom Number Generator (PRNG), is an algorithm able to generate a sequence of numbers whose properties approximate those of sequences of true random numbers. The DRBG-generated sequence is not truly random, because it is completely determined by an initial value, called the DRBG's seed or entropy seed.

Although TRNGs are more secure than DRBGs, the latter are very important for several reasons:

  • As long as a DRBG’s internal state is unknown to an attacker, the output it generates is unpredictable by the attacker.
  • A well-constructed DRBG only leaks a limited amount of data about its internal state through the output it produces. Therefore, as long as the DRBG’s internal state is refreshed by new unpredictable data (i.e. the DRBG is reseeded) often enough, its output remains unpredictable.
  • Once a DRBG is properly seeded, it can generate unpredictable output extremely faster than a TRNG.

How to get entropy from STM32

TRNG peripheral

Most of STM32 microcontrollers include TRNG peripheral, and the HAL/LL drivers provide API to generate true random numbers, 32 bits at a time. Please, refer to the driver documentation.

Entropy without a TRNG

When dealing with security in microcontrollers, it is always preferable to choose properly a device that includes a TRNG peripheral. But not all the available STM32 MCU provide such peripheral and, in this case, it is possible to exploit some sources of randomness from other components of the microcontroller.

As an example, all the STM32 MCU provide two internal oscillators that can be used as clock sources: the HSI (High Speed Internal) and the LSI (Low Speed Internal) oscillators. These two clock sources are not as accurate as crystal oscillators, in particular at startup time. Furthermore, the two sources are independent from each other and have different frequencies (e.g. in STM32F401 the HSI frequency is ~16MHz, while the LSI one is ~32kHz, varying from 17 to 47). This can be exploited for generating some sort of randomness that can be used to seed a DRBG.

Cryptolib Random without TRNG.png

In the example showed above, a timer peripheral is configured with HSI as clock source, and it is used to measure the LSI periods. In this way it is possible to check how many times a HSI period is included into a LSI one, and since both provides some uncertainty, the last significant bits of each measures will be different from each other’s. In order to have better results follow these advices:

  • Restart the LSI oscillator at each measurement. This is because after the startup phase the oscillator became stable at a certain frequency. Restarting it the new stable frequency will change.
  • Increase the timer clock source using the PLL together with the HSI oscillator: in this way the randomness measure from the LSI will be more precise.
  • It is possible to increase the timer prescaler in order to generate better random bits, with the cost of more required time. In the above picture, the prescaler has been set to 8, meaning that any measure takes 8 LSI period.
  • The best result is reached if, from any measure, only the least significant bit (LSB) is extracted. This means that, in order to generate N random bits, N measurements are needed.
Info white.png Information
Note: The proposed solution is not theoretically sound and its usage is intended only in case the TRNG is not available in the MCU. Furthermore, the usage of this solution is discouraged for critical security or cryptographic purposes.

How to securely initialize the DRBG

Deterministic Random Bit Generators (DRBGs) are based on cryptographic operations; as an example, the ctrDRBG algorithm is based on the AES block cipher.

Before the DRBG is used for generate random material, it must be initialized with a seed. This operation will determine the initial state of the DRBG.

It is possible, also, to periodically reseed the DRBG, in order to make the random material generation stronger during time.

The initial seed is composed by three fields:

  • An entropy input, that must be generated by a reliable source of randomness (e.g. TRNG) and must be kept secret, otherwise the security of the DRBG will be compromised. The security of the DRBG highly depends on the security strength of the randomness source used for the entropy generation.
  • A nonce, that its value is expected to not repeat during new instantiations. It doesn’t need to be secret.
  • A personalization string, that is optional but recommended, that is used to integrate some additional information into the instantiation of the DRBG. It doesn’t need to be secret.

Some DRBG implementations provide a derivation function that is used internally to derive a more secure seed from the giving seeding materials. This means that, with these implementations, it is possible to provide a not full entropy input, that is larger than the requested security strength, without compromising the overall security of the DRBG.

The seed used for reseeding is based on:

  • The current internal state value of the DRBG.
  • A new entropy input.
  • An optional additional input.

The Cryptographic Library provides an implementation of the ctrDRBG algorithm with derivation function, that is specified in the NIST Special Publication 800-90A, together with other DRBG algorithms.

How to manage random material for ECC/RSA

The Cryptographic Library ECC/RSA APIs demand to the user the responsibility to generate random bytes and provide them to the function call.

Indeed, the random data could be generated not only with the library DRBG, but with a TRNG or other source preferred by the user.

Therefore, the Cryptographic Library functions ask to the user for random bytes, and return a specific error if the provided data does not fulfil the requirements.

A good way to manage the random usage in the Cryptographic Library, could be to implement a loop into which generate random data and invoke the cryptographic function, continuing iterating until no more random-related errors are returned by the cryptographic function.

Hereafter an example with RSA:

cmox_rsa_retval_t rv; /* CLv4 return value */
uint8_t random_data[256]; /* array that will be filled with random bytes */
/* generate random until it's valid for RSA encryption function */
do
{
  /* ToDo: fill 'random_data' with bytes generated by a specific source (DRBG, TRNG, etc.) */
  ...
  
  /* call the RSA encryption function */
  rv = cmox_rsa_pkcs1v15_encrypt(..., random_data, sizeof(random_data), ...);
}
while (rv == CMOX_RSA_ERR_WRONG_RANDOM);

Necessary random material for ECC API

Some ECC functionalities need random material in order to process.

In particular, the following services need a random bytes vector as input, and this byte array needs to be at least long as the chosen curve N parameter byte length (e.g. 32 bytes for SECP256R1, 66 bytes for SECP521R1):

  • ECDSA key generation
  • ECDSA signature generation
  • SM2 key generation
  • SM2 signature generation

Even if the random array has the correct length, it could happen a failure with return value CMOX_ECC_ERR_WRONG_RANDOM. Indeed, a security check is done inside the ECC functions, assuring not only that the byte length is correct but also that the value is compatible with the chosen curve N parameter.

For NIST curves, that have the N parameter starting with 0xFFFFFFFF…, this scenario is highly improbable, while for other curves (e.g. brainpoolP384r1) the probability can almost reach the 50%.

The solution to this, is to generate a new random bytes array and recall the cryptographic API.

Necessary random material for RSA API

Some RSA functionalities need random material in order to process.

In particular, the following services need a random byte vector as input:

RSA PKCS#1 v1.5 Encryption
The array needs to contain at least (ModulusLength - InputLength - 3) not NULL bytes (i.e. 0x00).
Indeed, only that number of bytes will be used if no NULL bytes (i.e. 0x00) are in the sequence, otherwise more bytes will be necessary to replace the NULL bytes.
RSA PKCS#1 v2.2 Signature Generation
The array length needs to be ≤ ModulusLength - HashLength - 2, where HashLength is the byte length of the digest generated by the chosen hash function.
All the given bytes will be processed and used inside an internal hashing process.
RSA PKCS#1 v2.2 Encryption
The array length needs to be ≥ HashLength, where HashLength is the byte length of the digest generated by the chosen hash function.
Indeed, only HashLength bytes will be used, while the others will be discarded.

5. References

[TIMING-KOC96] P. Kocher, “Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems”, CRYPTO 1996

[FAULT-BDL97] D. Boneh, R. DeMillo, R. Lipton, “On the importance of checking cryptographic protocols for faults”, CRYPTO 1997

[BELLCORE-BDL01] D. Boneh, R. DeMillo, R. Lipton, “On the Importance of Eliminating Errors in Cryptographic Computations”, Journal of Cryptology 14(2), 2001

[RND-RFC4086] Randomness Requirements for Security

[RND-NIST.SP.800-90B] Recommendation for the Entropy Sources Used for Random Bit Generation