Videos, Slides, Films

DenseRecognition of Spoken Languages of Electrocardiograms

Conferences
ICPR 2020 MAIN CONFERENCE PS T5.8: Image, Signals and Audio processing (2021)
Available as
Online
Summary

In the present study, we have considered a large number (27) of Indian languages for recognition from their speech signals of different sources. A dense convolutional network architecture (DenseNet...

In the present study, we have considered a large number (27) of Indian languages for recognition from their speech signals of different sources. A dense convolutional network architecture (DenseNet) has been used for this classification task. Dynamic elimination of low energy frames from the input speech signal has been considered as a preprocessing operation. Mel-spectrogram of pre-processed speech signal is fed as input to the DenseNet architecture. Language recognition performance of this architecture has been compared with that of several state-of-the-art deep architectures which include a convolutional neural network (CNN), ResNet, CNN-BLSTM and DenseNet-BLSTM hybrid architectures. Additionally, we obtained recognition performances of a stacked BLSTM architecture fed with different sets of handcrafted features for comparison purpose. Simulations for both speaker independent and speaker dependent scenarios have been performed on two different standard datasets which include (i) IITKGP-MLILSC dataset of news clips in 27 different Indian languages and (ii) Linguistic Data Consortium (LDC) dataset of telephonic conversations in 5 different Indian languages. In each case, recognition performance of the DenseNet architecture along with Mel-spectrogram features has been found to be significantly better than all other frameworks implemented in this study.

Details

Additional Information