EmotionKD: A Cross-Modal Knowledge Distillation Framework for Emotion Recognition Based on Physiological Signals

ACM Multimedia 2023

Yucheng Liu1Ziyu Jia1,   Haichao Wang2
1Institute of Automation, Chinese Academy of Sciences
2Tsinghua-Berkeley Shenzhen Institute


Webpage | Code | Video | Paper | Poster
EmotionKD

Fig. 1 Structure of proposed EmotionKD. The knowledge in the multi-modal model is transferred to the unimodal model through knowledge distillation. The IMF feature is obtained from the interactivity-based modal fusion (IMF) module.

We design knowledge distillation framework to improve the performance of unimodal student models through knowledge distillation. Specifically, the EmotionKD framework includes the following modules:

  • A multi-modal teacher model called EmotionNet-Teacher;
  • A unimodal student model called EmotionNet-Student;
  • An adaptive feedback knowledge distillation.


Fig. 2 Structure of proposed Emotion-Teacher network.

In the Emotion-Teacher model, the heterogeneity from each modality is extracted by transformer encoders. As for the interactivity, it is extracted from the heterogeneity feature by the IMF module. Finally, we employ the classifier to clasify the IMF feature to obtain the higher performance.


Fig. 3 The adaptive feadback knowledge distillation method.

Adaptive feedback knowledge distillation method is proposed in this paper so that the teacher model is able to adjust the ouput feature during the knowledge distillation. Hence, student model will have better performacne.

In summary, our contributions are as follows:

  • We propose a novel multi-modal EmotionNet-Teacher with an Interactivity-based Modal Fusion (IMF) module;
  • We design an adaptive feedback mechanism for cross-modal knowledge distillation;
  • To the best of our knowledge, this is the first application of cross-modal knowledge distillation in the field of physiological signal-based emotion recognition to transfer fused EEG and GSR features to the unimodal GSR model.

Experiments
We firstly compare our EmotionNet-Student with other methods. The results show that EmotionNet-Student has a SOTA performance compared to other unimodal or knowledge transfer-based baseline methods on the emotion recognition task. It proves that our enhanced unimodal model is better than other unimodal methods.
Then, we compare our EmotionNet-Teachaer with other baselines. The results show that the proposed EmotionNet-Teacher has a significantly better performance compared to other multimodal baseline methods on the emotion recognition task. It also proves that our proposed multimodal method is effective.
Talk
Reference
            @inproceedings{10.1145/3581783.3612277,
                author = {Liu, Yucheng and Jia, Ziyu and Wang, Haichao},
                title = {EmotionKD: A Cross-Modal Knowledge Distillation Framework for Emotion Recognition Based on Physiological Signals},
                year = {2023},
                isbn = {9798400701085},
                publisher = {Association for Computing Machinery},
                address = {New York, NY, USA},
                url = {https://doi.org/10.1145/3581783.3612277},
                doi = {10.1145/3581783.3612277},
                abstract = {Emotion recognition using multi-modal physiological signals is an emerging field in affective computing that significantly improves performance compared to unimodal approaches. The combination of Electroencephalogram(EEG) and Galvanic Skin Response(GSR) signals are particularly effective for objective and complementary emotion recognition. However, the high cost and inconvenience of EEG signal acquisition severely hinder the popularity of multi-modal emotion recognition in real-world scenarios, while GSR signals are easier to obtain. To address this challenge, we propose EmotionKD, a framework for cross-modal knowledge distillation that simultaneously models the heterogeneity and interactivity of GSR and EEG signals under a unified framework. By using knowledge distillation, fully fused multi-modal features can be transferred to an unimodal GSR model to improve performance. Additionally, an adaptive feedback mechanism is proposed to enable the multi-modal model to dynamically adjust according to the performance of the unimodal model during knowledge distillation, which guides the unimodal model to enhance its performance in emotion recognition. Our experiment results demonstrate that the proposed model achieves state-of-the-art performance on two public datasets. Furthermore, our approach has the potential to reduce reliance on multi-modal data with lower sacrificed performance, making emotion recognition more applicable and feasible. The source code is available at https://github.com/YuchengLiu-Alex/EmotionKD},
                booktitle = {Proceedings of the 31st ACM International Conference on Multimedia},
                pages = {6122–6131},
                numpages = {10},
                keywords = {knowledge distillation, emotion recognition, galvanic skin response, affective computing, electroencephalogram},
                location = {Ottawa ON, Canada},
                series = {MM '23}
                }