NEUROCOMPUTING, vol.666, pp.1-23, 2026 (SCI-Expanded)
Emotion recognition plays a central role in advancing HCI, healthcare,
and affective computing. However, due to the rapid and dynamic nature of
emotions, traditional unimodal approaches often fall short in capturing
their complexity. As a result, multimodal emotion recognition has
gained significant attention, with EEG emerging as a core modality due
to its noninvasiveness, high temporal resolution, and direct neural
activity. This paper presents a comprehensive review of EEG-based
multimodal emotion recognition, focusing on sensors, fusion strategies,
and benchmark datasets employed in state-of-the-art studies. Unlike
previous surveys, this review makes three distinct contributions. First,
it introduces a sensor-level categorisation that highlights
device-specific constraints and opportunities for fusion design. Second,
it systematically maps fusion strategies—including sensor-, feature-,
and decision-level fusion—using structured comparisons that detail
assumptions, data requirements, and computational costs. This
methodological progression—from handcrafted pipelines to deep learning,
hybrid, and transformer-based architectures—has also shaped multimodal
emotion recognition approaches. Third, it provides an in-depth
benchmarking of widely used EEG-based multimodal datasets, offering
multidimensional comparisons across sample size, demographics, labelling
schemes, stimulation protocols, and evaluation strategies. To
contextualise these advances, a review of unimodal approaches is also
provided. Together, these contributions establish a practical reference
for designing robust EEG-based multimodal emotion recognition systems,
selecting appropriate datasets, and ensuring consistency in comparative
evaluations. The review also highlights key challenges and future
opportunities, including dataset standardisation, cross-subject
generalisation, and ethical considerations, to guide the next generation
of research in this rapidly evolving field.