Multimodal emotion analysis involves the integration of information from various modalities to better understand human emotions. In this paper. we propose the Cross-modal Emotion Recognition based on multi-layer semantic fusion (CM-MSF) model. which aims to leverage the complementarity of important information between modalities and extract advanced features in an adaptive manner. https://www.getpureroutine.com/