In the recent advancements of applications, one of the challenging task in many gadgets are incorporated, which is based on audio classification and recognition. A set of emotion detection after post-surgical issues, classification of various voice sequence, classification of random voice data, surveillance and speaker detection audio data act as a crucial input. Most of the audio data is inherent with the environmental noise or instrumental noise. Extracting the unique features from the audio data is very important to determine the speaker effectively. Such kind of a novel idea is evaluated here. The research focus is based on classification of TV broadcast audios in which the type of audio is being class separated through a novel approach. The design evaluates, the five different categories of audio data such as advertisement, news, songs, cartoon and sports from the data collected using the TV tuner card. The proposed design associated with python as a Development environment. The audio samples are converted to images using Spectrogram and then transfer learning is applied on the pretrained models ResNet50 and Inceptionv3 to extract the deep features and to classify the audio data. Inception V3 is compared here with the ResNet50 to get greater accuracy in classification. The pre-trained models are models that was trained on the ImageNet data set for a certain task and are used here to quick train the audio classification model on training set with high accuracy. The proposed model produces accuracy of 94% for Inceptionv3 which gives greater accuracy when compared with the ResNet50 which gives 93%. accuracy.
CITATION STYLE
B.*, K., & Dhanalakshmi, Dr. P. (2020). An Efficient Model for TV broadcast A udio C lassification through InceptionV3 and ResNet50. International Journal of Innovative Technology and Exploring Engineering, 9(5), 2234–2238. https://doi.org/10.35940/ijitee.e2984.039520
Mendeley helps you to discover research relevant for your work.