Deep Learning with GPAM
In this tutorial we use the new GPAM model introduced by
Generalized Power Attacks against Crypto Hardware using Long-Range Deep LearningBursztein, Elie and Invernizzi, Luca and Král, Karel and Moghimi, Daniel and Picod, Jean-Michel and Zhang, Marina,IACR Transactions on Cryptographic Hardware and Embedded Systems, 2024
The script docs/tutorials/sca/tiny_aes.py contains the training code.
The GPAM model has been introduced as a part of the SCAAML project GPAM. In this tutorial we (us and you, the reader) work together to modernize the original A Hacker Guide To Deep Learning Based Side Channel Attacks tutorial.
Creating the Model
We first need to define hyperparameters same as listed in the GPAM paper.
batch_size: int = 64 # hyperparametersteps_per_epoch: int = 800 # hyperparameterepochs: int = 750 # hyperparametertarget_lr: float = 0.0005 # hyperparametermerge_filter_1: int = 0 # hyperparametermerge_filter_2: int = 0 # hyperparametertrace_len: int = 80_000 # hyperparameterpatch_size: int = 200 # hyperparameterval_steps: int = 16
Then we can load the dataset and take note of maxima and minima of traces so
that we can normalize each trace to the interval [-1, 1]
which helps deep
learning since most of the time we initialize the weights with the assumption
that the inputs are normally distributed with mean zero and variance 1.
# Load the datasetfrom sedpack.io import Dataset
dataset = Dataset(dataset_path)
# Create the definition of inputs and outputs.assert dataset.dataset_structure.saved_data_description[0].name == "trace1"trace_min = dataset.dataset_structure.saved_data_description[0].custom_metadata["min"]trace_max = dataset.dataset_structure.saved_data_description[0].custom_metadata["max"]inputs = {"trace1": {"min": trace_min, "delta": trace_max - trace_min}}outputs = {"sub_bytes_in_0": {"max_val": 256}}
Then we can just import and create the model.
from scaaml.models import get_gpam_model
model = get_gpam_model( inputs=inputs, outputs=outputs, output_relations=[], trace_len=trace_len, merge_filter_1=merge_filter_1, merge_filter_2=merge_filter_2, patch_size=patch_size,)
# Compile modelmodel.compile( optimizer=keras.optimizers.Adafactor(target_lr), loss=["categorical_crossentropy" for _ in range(len(outputs))], metrics={name: ["acc", MeanRank()] for name in outputs},)model.summary()
We need one-hot representation of the target classes. Thus we add the following code which transforms records into suitable input-output pairs for Keras training:
def process_record(record: dict[str, Any]) -> tuple[Any, dict[str, Any]]: """Processing of a single record. The input is a dictionary of string and tensor, the output of this function is a tuple the neural network's input (trace) and a dictionary of one-hot encoded expected outputs. """ # The first neural network was using just the first half of the trace: inputs = record["trace1"] outputs = { "sub_bytes_in_0": keras.ops.one_hot( record["sub_bytes_in"][0], num_classes=256, ), } return (inputs, outputs)
And we train using
train_ds = dataset.as_tfdataset( split="train", process_record=process_record, batch_size=batch_size,)validation_ds = dataset.as_tfdataset( split="test", process_record=process_record, batch_size=batch_size,)
# Train the model._ = model.fit( train_ds, steps_per_epoch=steps_per_epoch, epochs=epochs, validation_data=validation_ds, validation_steps=val_steps,)
Exercises for the Reader
Hypertuning
The values have not been hypertuned. We still achieve around 60% accuracy of predicting the byte value. And with relatively few epochs we see some leakage already (10 to 100 epochs). But here are some ideas of what could be tuned:
- The learning rate is rather small. What happens when we increase
target_lr
? - The
patch_size
was chosen to be roughly square root of the trace length. What do we get for different values? - We know that
patch_size
must divide thetrace_len
. What happens when duringprocess_record
we cut just a part of the trace so that it is a multiple ofpatch_size
? - We do not use
merge_filter_1
andmerge_filter_2
. What is the influence of those?
You can leverage KerasTuner to find the right hyperparameters.
Cutting the Traces
In our SNR tutorial we saw that most of the leakage comes from a single point. Can we benefit from cutting the trace and possibly changing the hyperparameters? Do we get training speedup? Do we get an improvement in accuracy?
Multiple Outputs
It is possible to train with multiple outputs. Can you output predictions of two or even all 16 bytes of S-BOX input values?