EmotionKD: A Cross-Modal Knowledge Distillation Framework for Emotion Recognition Based on Physiological Signals
|
Webpage | Code | Video | Paper | Poster |
We design knowledge distillation framework to improve the performance of unimodal student models through knowledge distillation. Specifically, the EmotionKD framework includes the following modules:
In the Emotion-Teacher model, the heterogeneity from each modality is extracted by transformer encoders. As for the interactivity, it is extracted from the heterogeneity feature by the IMF module. Finally, we employ the classifier to clasify the IMF feature to obtain the higher performance.
Adaptive feedback knowledge distillation method is proposed in this paper so that the teacher model is able to adjust the ouput feature during the knowledge distillation. Hence, student model will have better performacne.
In summary, our contributions are as follows:
@inproceedings{10.1145/3581783.3612277, author = {Liu, Yucheng and Jia, Ziyu and Wang, Haichao}, title = {EmotionKD: A Cross-Modal Knowledge Distillation Framework for Emotion Recognition Based on Physiological Signals}, year = {2023}, isbn = {9798400701085}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3581783.3612277}, doi = {10.1145/3581783.3612277}, abstract = {Emotion recognition using multi-modal physiological signals is an emerging field in affective computing that significantly improves performance compared to unimodal approaches. The combination of Electroencephalogram(EEG) and Galvanic Skin Response(GSR) signals are particularly effective for objective and complementary emotion recognition. However, the high cost and inconvenience of EEG signal acquisition severely hinder the popularity of multi-modal emotion recognition in real-world scenarios, while GSR signals are easier to obtain. To address this challenge, we propose EmotionKD, a framework for cross-modal knowledge distillation that simultaneously models the heterogeneity and interactivity of GSR and EEG signals under a unified framework. By using knowledge distillation, fully fused multi-modal features can be transferred to an unimodal GSR model to improve performance. Additionally, an adaptive feedback mechanism is proposed to enable the multi-modal model to dynamically adjust according to the performance of the unimodal model during knowledge distillation, which guides the unimodal model to enhance its performance in emotion recognition. Our experiment results demonstrate that the proposed model achieves state-of-the-art performance on two public datasets. Furthermore, our approach has the potential to reduce reliance on multi-modal data with lower sacrificed performance, making emotion recognition more applicable and feasible. The source code is available at https://github.com/YuchengLiu-Alex/EmotionKD}, booktitle = {Proceedings of the 31st ACM International Conference on Multimedia}, pages = {6122–6131}, numpages = {10}, keywords = {knowledge distillation, emotion recognition, galvanic skin response, affective computing, electroencephalogram}, location = {Ottawa ON, Canada}, series = {MM '23} }