Speech Recognition Research Topics Ideas

By: Prof. Dr. Fazal Rehman | Last updated: February 3, 2024

List of Research Topics and Ideas of Speech Recognition for MS and Ph.D Thesis. 1. Self-training and Pre-training are Complementary for Speech Recognition 2. Development of the cuhk elderly speech recognition system for neurocognitive disorder detection using the dementiabank corpus 3. Internal language model estimation for domain-adaptive end-to-end speech recognition 4. Using Radio Archives for Low-Resource Speech Recognition: Towards an Intelligent Virtual Assistant for Illiterate Users 5. Improving speech recognition models with small samples for air traffic control systems 6. The accented english speech recognition challenge 2020: open datasets, tracks, baselines, results and methods 7. A GDPR-compliant Ecosystem for Speech Recognition with Transfer, Federated, and Evolutionary Learning 8. Citrinet: Closing the Gap between Non-Autoregressive and Autoregressive End-to-End Models for Automatic Speech Recognition 9. Representation transfer learning from deep end-to-end speech recognition networks for the classification of health states from speech 10. Simplified self-attention for transformer-based end-to-end speech recognition 11. Transformer-based online speech recognition with decoder-end adaptive computation steps 12. An evaluation of word-level confidence estimation for end-to-end automatic speech recognition 13. Aispeech-sjtu accent identification system for the accented english speech recognition challenge 14. Bayesian transformer language models for speech recognition 15. Streaming models for joint speech recognition and translation 16. Data augmentation for end-to-end code-switching speech recognition 17. Efficient neural architecture search for end-to-end speech recognition via straight-through gradients 18. Learning to count words in fluent speech enables online speech recognition 19. A Further Study of Unsupervised Pretraining for Transformer Based Speech Recognition 20. Learned transferable architectures can surpass hand-designed architectures for large scale speech recognition 21. Multimodal integration for large-vocabulary audio-visual speech recognition 22. Federated Acoustic Modeling for Automatic Speech Recognition 23. Directional ASR: A new paradigm for E2E multi-speaker speech recognition with source localization 24. Semi-supervised speech recognition via graph-based temporal classification 25. Noise-Robust Speech-to-Text Latency via Bayesian Speech Denoising and Attention-Based Sequence-to-Sequence DNN Speech Recognition in 16nm FinFET 26. Interacting effects of frontal lobe neuroanatomy and working memory capacity to older listeners’ speech recognition in noise 27. A Progressive Learning Approach to Adaptive Noise and Speech Estimation for Speech Enhancement and Noisy Speech Recognition 28. Mondegreen: A Post-Processing Solution to Speech Recognition Error Correction for Voice Search Queries 29. Improving Streaming Automatic Speech Recognition With Non-Streaming Model Distillation On Unsupervised Data 30. Deformable TDNN with adaptive receptive fields for speech recognition 31. Lookup-Table Recurrent Language Models for Long Tail Speech Recognition 32. Evaluation of the effectiveness and efficiency of state-of-the-art features and models for automatic speech recognition error detection 33. Comparison of speech recognition and localization ability in single-sided deaf patients implanted with different cochlear implant electrode array designs 34. Memory-Efficient Speech Recognition on Smart Devices 35. Multi-Quartznet: Multi-Resolution Convolution for Speech Recognition with Multi-Layer Feature Fusion 36. Domain-aware Neural Language Models for Speech Recognition 37. Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion 38. Exploring the use of Common Label Set to Improve Speech Recognition of Low Resource Indian Languages 39. Training augmentation with TANDEM acoustic modelling in Punjabi adult speech recognition system 40. Non-autoregressive Mandarin-English Code-switching Speech Recognition with Pinyin Mask-CTC and Word Embedding Regularization 41. A bandit approach to curriculum generation for automatic speech recognition 42. Fast offline Transformer-based end-to-end automatic speech recognition for real-world applications 43. Human-robot-interaction using cloud-based speech recognition systems 44. cif-based collaborative decoding for end-to-end contextual speech recognition 45. Mini-batch sample selection strategies for deep learning based speech recognition 46. Hierarchical Phoneme Classification for Improved Speech Recognition 47. Design and implementation of speech recognition system integrated with internet of things 48. Improving ultrasound-based multimodal speech recognition with predictive features from representation learning 49. Context-aware RNNLM Rescoring for Conversational Speech Recognition 50. Listen with Intent: Improving Speech Recognition with Audio-to-Intent Front-End 51. Does neural activity in the auditory cortex predict speech recognition with CI? 52. The influence of stimulation levels on auditory thresholds and speech recognition in adult cochlear implant users 53. Jira: a Kurdish Speech Recognition System Designing and Building Speech Corpus and Pronunciation Lexicon 54. Concatenative Speech Recognition using Morphemes 55. Interactions among talker sex, masker number, and masker intelligibility in speech-on-speech recognition 56. The Performance Evaluation of Continuous Speech Recognition Based on Korean Phonological Rules of Cloud-Based Speech Recognition Open API 57. An Critical Analysis of Speech Recognition of Tamil and Malay Language Through Artificial Neural Network 58. Is Speech Recognition Software a Viable Future for Dysarthric Speakers? A Critical Review 59. Automatic Speech Recognition of Continuous Speech Signal of Gujarati Language Using Machine Learning 60. Visual Speech Recognition using VGG16 Convolutional Neural Network 61. EMOTION BIAS IN AUTOMATIC SPEECH RECOGNITION 62. Speech Recognition Using Spectrogram-Based Visual Features 63. A Study on Correlation between Automatic Speech Recognition Accuracy and Speech QoE 64. Implementation of The Speech Recognition System Using a real time web Server Based 65. Speech Recognition Using Neural Network for Mobile Robot Navigation 66. The presence of background noise reduces interlingual phonological competition during non-native speech recognition 67. Fine-tuning of Pre-trained End-to-end Speech Recognition with Generative Adversarial Networks 68. Dynamic out-of-vocabulary word registration to language model for speech recognition 69. Speech recognition based on convolutional neural networks and MFCC algorithm 70. Turkish Speech Recognition Techniques and Applications of Recurrent Units (LSTM and GRU) 71. Noise Robust Speech Recognition by Integration of MLLR Adaptation and Feature Extraction for Noise Reduced Speech 72. Automatic Communication Error Detection Using Speech Recognition and Linguistic Analysis for Proactive Control of Loss of Separation 73. On Portability of Automatic Speech Recognition: A Study Case 74. Final Report on Research Grant GR/L59566 ROPA: Phonetically-featured syllables for speech recognition 75. E cient Semantic Constraint for Speech Recognition 76. Anaphora Resolution in a speech recognition environment 77. Acceptability of collecting speech samples from the elderly via the telephone 78. Effects of Hearing Loss on School-Aged Children’s Ability to Benefit From F0 Differences Between Target and Masker Speech 79. Mismatch between objective measure and subjective perception of speech recognition in a patient with single-sided deafness and unilateral cochlear implant 80. A Speech Command Control-Based Recognition System for Dysarthric Patients Based on Deep Learning Technology 81. Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings 82. Language dialect based speech emotion recognition through deep learning techniques 83. Deep Neural Network Driven Speech Classification for Relevance Detection in Automatic Medical Documentation 84. IJERT-Communication Aiding System for People with Speech Impairment 85. EEG-based Speech Activity Detection 86. Low-activity supervised convolutional spiking neural networks applied to speech commands recognition 87. Protecting gender and identity with disentangled speech representations 88. Large-Scale Self-and Semi-Supervised Learning for Speech Translation 89. Contrastive Unsupervised Learning for Speech Emotion Recognition 90. Distortion-controlled training for end-to-end reverberant speech separation with auxiliary autoencoding loss 91. NeMo Toolbox for Speech Dataset Construction 92. Prototype Of Speech Translation System For Audio Effective Communication 93. Exploring Machine Speech Chain for Domain Adaptation and Few-Shot Speaker Adaptation 94. ADL-MVDR: All deep learning MVDR beamformer for target speech separation 95. Aegan: Time-frequency speech denoising via generative adversarial networks 96. High Fidelity Speech Regeneration with Application to Speech Enhancement 97. Talk, Don’t Write: A Study of Direct Speech-Based Image Retrieval 98. Automatic detection of prosodic boundaries in spontaneous speech 99. Prior audio-visual learning facilitates auditory-only speech and voice-identity recognition in noisy listening conditions 100. Fused acoustic and text encoding for multimodal bilingual pretraining and speech translation 101. Fusion of mel and gammatone frequency cepstral coefficients for speech emotion recognition using deep C-RNN 102. A Noise Robust Speech Processing and Recognition Development System 103. Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients 104. VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation 105. Analyzing Vocal Tract Parameters of Speech 106. Highland Puebla Nahuatl Speech Translation Corpus for Endangered Language Documentation 107. Using Synthetic Audio to Improve the Recognition of Out-of-Vocabulary Words in End-to-End Asr Systems 108. Speaker normalization in speech perception 109. Speech Emotion Recognition: A Review 110. Improving Convolutional Recurrent Neural Networks for Speech Emotion Recognition 111. A Noise-Robust Speech Recogniser supported by a TMS320C31 Platform 112. Supervised Machine Learning Model for Accent Recognition in English Speech Using Sequential MFCC Features 113. Streaming simultaneous speech translation with augmented memory transformer 114. Cross corpus multi-lingual speech emotion recognition using ensemble learning 115. Performance of Forced-Alignment Algorithms on Children’s Speech 116. A Speech Recognized Dynamic Word Cloud Visualization for Text Summarization 117. A Simple Method for Speaker Recognition and Speaker Verification 118. LIS-Net: An end-to-end light interior search network for speech command recognition 119. Towards unsupervised learning of speech features in the wild 120. Articulatory-to-Acoustic Conversion of Mandarin Emotional Speech Based on PSO-LSSVM 121. Chunk-Level Speech Emotion Recognition: A General Framework of Sequence-to-One Dynamic Temporal Modeling 122. Automatic Speech Emotion Recognition using Mel Frequency Cepstrum Co-efficient and Machine Learning Technique 123. Voice Activity Detection for Ultrasound-based Silent Speech Interfaces using Convolutional Neural Networks 124. A Deep Learning Generative Approach for Speech-to-Scene Generation 125. The roles of cognitive abilities and hearing acuity in older adults’ recognition of words taken from fast and spectrally reduced speech 126. Anti-transfer learning for task invariance in convolutional neural networks for speech processing 127. Efficient Speech to Emotion Recognition Using Convolutional Neural Network 128. Real-time pre-processing for improved feature extraction of noisy speech 129. Computational Linguistics-Based Tamil Character Recognition System for Text to Speech Conversion 130. MLT-DNet: Speech emotion recognition using 1D dilated CNN based on multi-learning trick approach 131. Reader: Speech Synthesizer and Speech Recognizer 132. Older Listeners’ Perception of Speech With Strengthened and Weakened Dynamic Pitch Cues in Background Noise 133. Audio-visual speech inpainting with deep learning 134. A Novel Approach to EEG Speech Activity Detection with Visual Stimuli and Mobile BCI 135. Teager Energy Cepstral Coefficients for Classification of Normal vs. Whisper Speech 136. Semi-supervised spoken language understanding via self-supervised speech and language model pretraining 137. Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition 138. Unsupervised low-rank representations for speech emotion recognition 139. Class-Conditional Defense GAN Against End-To-End Speech Attacks 140. Pre-training for low resource speech-to-intent applications 141. Dilated U-net based approach for multichannel speech enhancement from First-Order Ambisonics recordings 142. Data augmenting contrastive learning of speech representations in the time domain 143. End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection 144. Progressive Co-Teaching for Ambiguous Speech Emotion Recognition 145. TransMask: A Compact and Fast Speech Separation Model Based on Transformer 146. Bionic optimization of MFCC features based on speaker fast recognition 147. Speech Perception Across The Lifespan by Means of Artificial Intelligence 148. Encoding and decoding of meaning through structured variability in intonational speech prosody 149. Introducing the Talk Markup Language (TalkML): Adding a little social intelligence to industrial speech interfaces 150. No interaction between fundamental-frequency differences and spectral region when perceiving speech in a speech background 151. 1D CNN based approach for speech emotion recognition using MFCC 152. Analysis of Emotion Recognition from Cross-lingual Speech: Arabic, English, and Urdu 153. Self-Supervised Learning for Personalized Speech Enhancement 154. Emotion controllable speech synthesis using emotion-unlabeled dataset with the assistance of cross-domain speech emotion recognition 155. Don’t shoot butterfly with rifles: Multi-channel continuous speech separation with early exit transformer 156. A Conditional Cycle Emotion Gan for Cross Corpus Speech Emotion Recognition 157. Speech Enhancement for Wake-Up-Word detection in Voice Assistants 158. DeepLPC: A deep learning approach to augmented Kalman filter-based single-channel speech enhancement 159. Modification of misarticulated fricative/s/in cleft lip and palate speech 160. Implementation of audio recognition using mel frequency cepstrum coefficient and dynamic time warping in wirama praharsini 161. Bilateral and bimodal cochlear implant listeners can segregate competing speech using talker sex cues, but not spatial cues 162. A Method of Speech Signal Analysis Using Multi-level Wavelet Transform 163. Arabic Part of Speech Tagging by Using the Stanford System: Prepositions as a Case Study 164. Implementation of low-latency electrolaryngeal speech enhancement based on multi-task CLDNN 165. Coarse-to-fine speech emotion recognition based on multi-task learning 166. Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training 167. Research on Speech Changes Due to Environmental Noise 168. Unsupervised feature selection and NMF de-noising for robust Speech Emotion Recognition 169. Utterance Verification-based Dysarthric Speech Intelligibility Assessment using Phonetic Posterior 170. Exploiting ultrasound tongue imaging for the automatic detection of speech articulation errors 171. Recent developments on espnet toolkit boosted by conformer 172. A Study of F0 Modification for X-Vector Based Speech Pseudonymization Across Gender 173. Study on Automatic Speech Therapy System for Patients 174. Medicare Adds Audiology, Speech-Language Pathology Codes to Temporary Telehealth Coverage 175. Subtitle Automatic Generation System using Speech to Text 176. Sequence-Level Self-Teaching Regularization 177. Seen and unseen emotional style transfer for voice conversion with a new emotional speech dataset 178. Extending a Japanese Speech- to- Gesture Dataset Towards Building a Pedagogical Agent for Second Language Learning 179. Adversarial attack and defense strategies for deep speaker recognition systems 180. Generalized RNN beamformer for target speech separation 181. Practical Speech Re-use Prevention in Voice-driven Services 182. The effect of speech and noise levels on the quality perceived by cochlear implant and normal hearing listeners 183. Individuals With Mild Cognitive Impairment and Alzheimer’s Disease Benefit From Audiovisual Speech Cues and Supportive Sentence Context 184. Clear Speech Perception: Linguistic and Cognitive Benefits 185. Phonemic restoration of interrupted locally time-reversed speech 186. VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency 187. Privacy and utility of x-vector based speaker anonymization 188. Statistical corpus-based speech segmentation 189. Local discriminant preservation projection embedded ensemble learning based dimensionality reduction of speech data of Parkinson’s disease 190. Complex Neural Spatial Filter: Enhancing Multi-channel Target Speech Separation in Complex Domain 191. Automated accurate speech emotion recognition system using twine shuffle pattern and iterative neighborhood component analysis techniques 192. Development of the Mechanisms Underlying Audiovisual Speech Perception Benefit 193. Research on Implementation of User Authentication Based on Gesture Recognition of Human 194. Construction of a Large-Scale Japanese ASR Corpus on TV Recordings 195. A modified feature selection method based on metaheuristic algorithms for speech emotion recognition 196. MFFCN: Multi-layer Feature Fusion Convolution Network for Audio-visual Speech Enhancement 197. Accent and gender recognition from English language speech and audio using signal processing and deep learning 198. Speech Decomposition Based on a Hybrid Speech Model and Optimal Segmentation 199. Attention-Based Multi-Encoder Automatic Pronunciation Assessment 200. Multiresolution Cochleagram Speech Enhancement Algorithm Using Improved Deep Neural Networks with Skip Connections 201. Speaker Recognition Based on Fusion of a Deep and Shallow Recombination Gaussian Supervector 202. Transforming imagined thoughts into speech using a covariance-based subset selection method 203. Cascaded encoders for unifying streaming and non-streaming ASR 204. Cross lingual speech emotion recognition via triple attentive asymmetric convolutional neural network 205. Evaluating synthetic speech workload with oculo-motor indices: preliminary observations for Japanese speech 206. Towards more efficient DNN-based speech enhancement using quantized correlation mask 207. A Survey on Dynamic Sign Language Recognition 208. Speech stress recognition using semi-eager learning 209. Hypothesis Stitcher for End-to-End Speaker-attributed ASR on Long-form Multi-talker Recordings 210. Hybrid phonetic-neural model for correction in speech recognition systems. 211. Speech spectrum analyses for estimating operator functionality 212. Lithuanian speech-to-text Transcriber 213. AI TTS Smartphone App for Communication of Speech Impaired People 214. Distributed speech separation in spatially unconstrained microphone arrays 215. Speech Processing: MFCC Based Feature Extraction Techniques-An Investigation 216. Development of the Mechanisms Underlying Audiovisual Speech Perception Benefit. Brain Sci. 2021, 11, 49 217. Attention-based multi-task learning for speech-enhancement and speaker-identification in multi-speaker dialogue scenario 218. Sapaugment: Learning a sample adaptive policy for data augmentation 219. PhoNLP: A joint multi-task learning model for Vietnamese part-of-speech tagging, named entity recognition and dependency parsing 220. Radial Basis Function Neural Network Based Speech Enhancement System Using SLANTLET Transform Through Hybrid Vector Wiener Filter 221. A data layout method suitable for workflow in a cloud computing environment with speech applications 222. Complex Spectral Mapping With Attention Based Convolution Recurrent Neural Network for Speech Enhancement 223. RNN-T models fail to generalize to out-of-domain audio: Causes and solutions 224. Wavelet feature selection of audio and imagined/vocalized EEG signals for ANN based multimodal ASR system 225. Fixed-MAML for Few Shot Classification in Multilingual Speech Emotion Recognition 226. Transfer of Learning from Vision to Touch: A Hybrid Deep Convolutional Neural Network for Visuo-Tactile 3D Object Recognition 227. Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 2: a discussion of chemical and biological data used for AI in drug discovery 228. Symmetric Sub-graph Spatio-Temporal Graph Convolution and its application in Complex Activity Recognition 229. NUTQNI ANIQLASH VA SINTEZLASH TIZIMLARI TAHLILI 230. Voice-Based Railway Station Identification Using LSTM Approach 231. Two-Layer Fuzzy Multiple Random Forest for Speech Emotion Recognition 232. Neural Text Normalization in Speech-to-Text Systems with Rich Features 233. Feasibility of remote assessment of the binaural intelligibility level difference in school-age children 234. Deficient Basis Estimation of Noise Spatial Covariance Matrix for Rank-Constrained Spatial Covariance Matrix Estimation Method in Blind Speech Extraction 235. Speech based Depression Severity Level Classification Using a Multi-Stage Dilated CNN-LSTM Model 236. Ieee slt 2021 alpha-mini speech challenge: Open datasets, tracks, rules and baselines 237. Language Identification—A Supportive Tool for Multilingual ASR in Indian Perspective 238. Reliability and critical differences for an implementation of the coordinate response measure in speech-shaped noise 239. Impact of Visual Representation of Audio Signals for Indian Language Identification 240. Can deep learning beat numerical weather prediction? 241. Speech discrimination impairment of the worse-hearing ear in asymmetric hearing loss 242. Speech enhancement based on perceptually motivated guided spectrogram filtering 243. Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation 244. Evaluation of working memory in relation to cochlear implant consonant speech discrimination 245. Voice-controlled quantum chemistry 246. Evaluation of error-and correlation-based loss functions for multitask learning dimensional speech emotion recognition 247. Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation 248. Phone Calls Speech-to-Text: A Comparison Between APIs for the Portuguese Language 249. Federated Marginal Personalization for ASR Rescoring 250. FastTalker: A neural text-to-speech architecture with shallow and group autoregression 251. The Generalized Bayes Method for High-Dimensional Data Recognition with Applications to Audio Signal Recognition 252. Lexical stress representation in spoken word recognition 253. Probing Acoustic Representations for Phonetic Properties 254. Developing a Framework for Acquisition and Analysis of Speeches 255. Phonotactics in Spoken-Word Recognition 256. Patient Emotion Recognition in Human Computer Interaction System Based on Machine Learning Method and Interactive Design Theory 257. Identification of Food Quality Descriptors in Customer Chat Conversations using Named Entity Recognition 258. A comparative analysis of active learning for biomedical text mining 259. Towards Realizing Sign Language to Emotional Speech Conversion by Deep Learning 260. Gender Identification Over Voice Sample Using Machine Learning 261. Any-to-One Sequence-to-Sequence Voice Conversion Using Self-Supervised Discrete Speech Representations 262. Pair consensus decoding improves accuracy of neural network basecallers for nanopore sequencing 263. Reducing Spelling Inconsistencies in Code-Switching ASR Using Contextualized CTC Loss 264. Personalized speech enhancement through self-supervised data augmentation and purification 265. LSTM-convolutional-BLSTM encoder-decoder network for minimum mean-square error approach to speech enhancement 266. Mean field analysis of deep neural networks 267. BRDS: An FPGA-based LSTM Accelerator with Row-Balanced Dual-Ratio Sparsification 268. Autonomy Voice Assistant for NPAS (NASA Platform for Autonomous Systems) 269. CDPAM: Contrastive learning for perceptual audio similarity 270. Read my lips! Perception of speech in noise by preschool children with autism and the impact of watching the speaker’s face 271. Improving deep speech denoising by Noisy2Noisy signal mapping 272. MarbleNet: Deep 1D Time-Channel Separable Convolutional Neural Network for Voice Activity Detection 273. Audio albert: A lite bert for self-supervised learning of audio representation 274. An Experimental Analysis of Deep Learning Architectures for Supervised Speech Enhancement 275. Compressive Sensing and Contourlet Transform Applications in Speech Signal 276. Depressed Patients Intelligent Recognition in Smart Home Environment 277. An integrated multi-channel approach for joint noise reduction and dereverberation 278. Detection of replay spoof speech using teager energy feature cues 279. 11 TOPS photonic convolutional accelerator for optical neural networks 280. Multi-channel adaptive loudness compensation algorithm based on noise tracking in digital hearing aids 281. How Does the Brain Represent Speech? 282. Using fuzzy string matching for automated assessment of listener transcripts in speech intelligibility studies 283. A BRIEF INTRODUCTION TO HANDWRITTEN CHARACTER RECOGNITION 284. Adjuvant migraine medications in the treatment of sudden sensorineural hearing loss 285. Multi-channel target speech extraction with channel decorrelation and target speaker adaptation 286. Smart Non-intrusive Device Recognition Based on Deep Learning Methods 287. Cortical tracking of speech in delta band relates to individual differences in speech in noise comprehension in older adults 288. An Automatic Sound Classification Framework with Non-volatile Memory 289. Brain electrical dynamics in speech segmentation depends upon prior experience with the language 290. Discriminant Analysis of Voice Commands in the Presence of an Unmanned Aerial Vehicle 291. Transcripts and Accessibility: Student Views from Using Webinars in Built Environment Education 292. Understanding Speech Amid the Jingle and Jangle: Recommendations for Improving Measurement Practices in Listening Effort Research 293. Cross-Silo Federated Training in the Cloud with Diversity Scaling and Semi-Supervised Learning 294. Icassp 2021 deep noise suppression challenge 295. Is talker variability a critical component of effective phonetic training for nonnative speech? 296. Progressive loss functions for speech enhancement with deep neural networks 297. Autokws: Keyword spotting with differentiable architecture search 298. A Review of Intelligent Smartphone-Based Object Detection Techniques for Visually Impaired People 299. Listening Effort in School-Age Children With Normal Hearing Compared to Children With Limited Useable Hearing Unilaterally 300. The Impact of Musical Training on Understanding Dysarthric Speech: A Preliminary Study of Transcription Errors 301. From perception to action using observed actions to learn gestures 302. Validation of an intelligibility assessment tool in an Indian language for perceptual speech analysis in oral cancer patients 303. Neural Network-based Virtual Microphone Estimator 304. Jointly trained transformers models for spoken language translation 305. Crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder 306. Conversational transfer learning for emotion recognition 307. Adversarial defense for automatic speaker verification by cascaded self-supervised learning models 308. Exposing Speech Transsplicing Forgery with Noise Level Inconsistency 309. Dualformer: a unified bidirectional sequence-to-sequence learning 310. The phonology of parent-child speech 311. BW-EDA-EEND: Streaming End-to-End Neural Speaker Diarization for a Variable Number of Speakers 312. SVM and GMM based Speech/music Classification using SBC 313. Spoken Language Identification in Unseen Target Domain Using Within-Sample Similarity Loss 314. Introduction to Apple ML Tools 315. Kurtosis-based, data-selective affine projection adaptive filtering algorithm for speech processing application 316. Design of Artificial Intelligence Converged Media Experimental System 317. An Evaluation into Deep Learning Capabilities, Functions and Its Analysis 318. Neural mos prediction for synthesized speech using multi-task learning with spoofing detection and spoofing type classification 319. Deep audio-visual learning: A survey 320. Association between the auditory profile and speech-language-hearing diagnosis in children and adolescents 321. Correlation of Visual Perceptions and Extraction of Visual Articulators for Kannada Lip Reading 322. SWav 0.1 User Manual 323. Neural tracking of the speech envelope is differentially modulated by attention and language experience 324. Aging Effects on Categorical Perception of Mandarin Lexical Tones in Noise 325. Detection of heterogeneous parallel steganography for low bit-rate VoIP speech streams 326. A study on Arabic sign language recognition for differently abled using advanced machine learning classifiers 327. Active listening 328. Micaugment: One-Shot Microphone Style Transfer 329. The demand for AI skills in the labor market 330. Cvt: Introducing convolutions to vision transformers 331. Dichotic listening performance with cochlear-implant simulations of ear asymmetry is consistent with difficulty ignoring clearer speech 332. Convolutional neural network 333. Audio fingerprint for automatic Balinese rindik music identification using gaussian mixture model 334. A comparative study of acoustic and linguistic features classification for alzheimer’s disease detection 335. Semi-supervised Multichannel Speech Separation Based on a Phone-and Speaker-Aware Deep Generative Model of Speech Spectrograms 336. Facial Expression Recognition Using Kernel Entropy Component Analysis Network and DAGSVM 337. Two-Stage Fuzzy Fusion Based-Convolution Neural Network for Dynamic Emotion Recognition 338. Robustness of on-device Models: Adversarial Attack to Deep Learning Models on Android Apps 339. Adversarial attacks on audio source separation 340. Facial expression recognition based on facial part attention mechanism 341. Computer Vision-based Intelligent Bookshelf System 342. Cross-cultural emotion recognition and in-group advantage in vocal expression: A meta-analysis 343. Fastpitch: Parallel text-to-speech with pitch prediction 344. Interspeech 2021 Deep Noise Suppression Challenge 345. Text classification and sentiment analysis 346. A Multi-Resolution Approach to GAN-Based Speech Enhancement 347. Arabic grapheme-to-phoneme conversion based on joint multi-gram model 348. Automatic assessment of intelligibility in speakers with dysarthria from coded telephone speech using glottal features 349. Analisis part of speech tagging dengan menggunakan Hidden Markov Model pada data Al-Qur’an 350. Perceptual integration of linguistic and non-linguistic properties of speech 351. Show and speak: Directly synthesize spoken description of images 352. Modeling the Conditional Distribution of Co-Speech Upper Body Gesture Jointly Using Conditional-GAN and Unrolled-GAN 353. An AI-Application-Oriented In-Class Teaching Evaluation Model by Using Statistical Modeling and Ensemble Learning 354. VISUALVOICE: Audio-Visual Speech Separation with Cross-Modal Consistency (Supplementary Materials) 355. Noise and acoustic modeling with waveform generator in text-to-speech and neutral speech conversion 356. Exploring Multimodal Interactions in Human-Autonomy Teaming Using a Natural User Interface 357. Unconstrained online handwritten Uyghur word recognition based on recurrent neural networks and connectionist temporal classification 358. Role of brainwaves in neural speech decoding 359. Public reasoning about voluntary assisted dying: An analysis of submissions to the Queensland Parliament, Australia 360. Cepstral Speech/Pause Detectors 361. Generating EEG features from acoustic features 362. Contextually Aware Multimodal Emotion Recognition 363. Build an app 364. The automatic detection of heart failure using speech signals 365. On the quantization of recurrent neural networks 366. Finding Answers in a Text Document 367. Hypergraph network model for nested entity mention recognition 368. IoT-Based Voice-Controlled Energy-Efficient Intelligent Traffic and Street Light Monitoring System 369. Indian Regional Spoken Language Identification Using Deep Learning Approach 370. A systematic review of hidden markov models and their applications 371. Improved acoustic word embeddings for zero-resource languages using multilingual transfer 372. Advances in Parkinson’s Disease detection and assessment using voice and speech: A review of the articulatory and phonatory aspects 373. A Comparative Analysis of AlexNet and GoogLeNet with a Simple DCNN for Face Recognition 374. Spoken Language Dialogue Systems 375. SILENCE DETECTION AND VOWEL/CONSONANT DISCRIMINATION IN VIDEO SEQUENCES 376. Design Space for Voice-Based Professional Reporting 377. Xie, Liu, & Jaeger (2020). Cross-talker generalization during foreign-accented speech perception 378. Backdoor attack against speaker verification 379. Role Aware Multi-Party Dialogue Question Answering 380. A Survey on Deep Learning for Time-Series Forecasting 381. Reservoir computing based on a silicon microring and time multiplexing for binary and analog operations 382. Brain activations while processing degraded speech in adults with autism spectrum disorder 383. Going deeper with image transformers 384. Towards the Objective Speech Assessment of Smoking Status based on Voice Features: A Review of the Literature 385. Question answering 386. A review on basic deep learning technologies and applications 387. Self-supervised pretraining of visual features in the wild 388. SeeHear: Signer diarisation and a new dataset 389. Robust Computing for Machine Learning-Based Systems 390. Visualizing the Evolution of the AI Ecosystem 391. AN ANALYSIS OF HUMOR SPEECH ACT OF THE BIG BANG THEORY AT CBS TELEVISION SERIES 392. Effective Low-Cost Time-Domain Audio Separation Using Globally Attentive Locally Recurrent Networks 393. What Do We See in Them? Identifying Dimensions of Partner Models for Speech Interfaces Using a Psycholexical Approach 394. Transfer learning helps to improve the accuracy to classify patients with different speech disorders in different languages 395. Deep-emotion: Facial expression recognition using attentional convolutional network 396. Functional impacts of aminoglycoside treatment on speech perception and extended high-frequency hearing loss in a pediatric cystic fibrosis cohort 397. Audio segmentation and speaker localization in meeting videos 398. Comparative analysis and application of LBP face image recognition algorithms 399. Lateralized Cerebral Processing of Abstract Linguistic Structure in Clear and Degraded Speech 400. DBnet: Doa-Driven Beamforming Network for end-to-end Reverberant Sound Source Separation 401. The detection of Parkinsons disease from speech using voice source information 402. Leaky Integrator Dynamical Systems and Reachable Sets 403. Lexical and acoustic characteristics of young and older healthy adults 404. ASVspoof 2019: spoofing countermeasures for the detection of synthesized, converted and replayed speech 405. Villain or guardian?’The smart toy is watching you now….’ 406. A Study on Image Analysis and Recognition Using Learning Methods: CNN as the Best Image Learner 407. Adversarially learning disentangled speech representations for robust multi-factor voice conversion 408. Short-term prediction of passenger volume for urban rail systems: A deep learning approach based on smart-card data 409. In-scalp incision technique for cochlear implantation 410. Simultaneous bilateral stapes surgery after follow-up of 13 years 411. Natural Language Processing 412. Highly sensitive ultrathin flexible thermoplastic polyurethane/carbon black fibrous film strain sensor with adjustable scaffold networks 413. Advanced Safe Home Systems using Face-Recognition with Unique Passcode Systems 414. Training with an auditory perceptual learning game transfers to speech in competition 415. Computer-based remedial training in phoneme awareness and phonological decoding: Effects on the posttraining development of word recognition 416. Sign language segmentation with temporal convolutional networks 417. Dense CNN with self-attention for time-domain speech enhancement 418. Poly Scale Space Technique for Feature Extraction in Lip Reading: A New Strategy 419. Iqra reading verification with mel frequency cepstrum coefficient and dynamic time warping 420. A memory-efficient tool for bengali parts of speech tagging 421. Cochlear implantation in children with single-sided deafness 422. Knowledge distillation: A survey 423. NLP in Customer Service 424. A scoping review on the use, processing and fusion of geographic data in virtual assistants 425. Evolving Criteria for Adult and Pediatric Cochlear Implantation 426. Telefitting of Nucleus cochlear implants: a feasibility study 427. Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis 428. Data Quality Measures and Efficient Evaluation Algorithms for Large-Scale High-Dimensional Data 429. Efferent unmasking of speech-in-noise encoding? 430. Infant-directed Speech by Dutch Fathers: Increased Pitch Variability within and across Utterances 431. SAMPLE ASR2000 PAPER 432. PolyDL: Polyhedral Optimizations for Creation of High-performance DL Primitives 433. The 2020 Personalized Voice Trigger Challenge: Open Database, Evaluation Metrics and the Baseline Systems 434. To ban or not to ban: Bayesian attention networks for reliable hate speech detection 435. A Survey on Automatic Multimodal Emotion Recognition in the Wild 436. Joint Intent Detection and Slot Filling Based on Continual Learning Model 437. Apraxia of speech 438. Fatigue in Children With Hearing Loss 439. Gated Convolutional Neural Networks for Text Classification 440. Inception recurrent convolutional neural network for object recognition 441. Speech Signal Processing Toolkit User and Programmer Manual SPro 3.3. 442. Assessing the effect of visual servoing on the performance of linear microphone arrays in moving human-robot interaction scenarios 443. Santa Claus, the Tooth Fairy, and Auditory-Visual Integration: Three Phenomena in Search of Empirical Support 444. Cognitive Hearing Science: Three Memory Systems, Two Approaches, and the Ease of Language Understanding Model 445. Electrooculogram signal identification for elderly disabled using Elman network 446. Multilingual and unsupervised subword modeling for zero-resource languages 447. Studying Alignment in Spontaneous Speech via Automatic Methods: How Do Children Use Task-specific Referents to Succeed in a Collaborative Learning Activity? 448. Contrastive learning of general-purpose audio representations 449. Machine Learning Basics 450. Scene text detection and recognition: The deep learning era 451. Abnormally high water temperature prediction using LSTM deep learning model 452. Effect of exceeding compliance voltage on speech perception in cochlear implants 453. Discriminative neural clustering for speaker diarisation 454. RGAN: Rényi Generative Adversarial Network 455. Global Stock Selection with Hidden Markov Model 456. Closed-set speaker identification system based on MFCC and PNCC features combination with different fusion strategies 457. Effective computer-assisted pronunciation training based on phone-sensitive word recommendation 458. Emotion Recognition of EEG Signals Based on the Ensemble Learning Method: AdaBoost 459. On Enhancing the Accuracy of Nearest Neighbour Time Series Classifier Using Improved Shape Exchange Algorithm 460. Development of a Low Cost Device for Speech Conversion for Mute Community 461. Acoustic Classification of Bird Species 462. Efficient attention: Attention with linear complexities 463. Automated detection of mouse scratching behaviour using convolutional recurrent neural network 464. An Integrated CNN-LSTM Model for Bangla Lexical Sign Language Recognition 465. Neural Networks for Keyword Spotting on IoT Devices 466. Deep Learning Architectures for Medical Diagnosis 467. Data-driven detection and classification of regimes in chaotic systems via hidden markov modeling 468. A Korean named entity recognition method using bi-LSTM-CRF and masked self-attention 469. Acoustic and prosodic information for home monitoring of bipolar disorder 470. Deep Ensemble Siamese Network For Incremental Signal Classification 471. Editorial commentary: Artificial intelligence in sports medicine diagnosis needs to improve 472. Failure Prediction by Confidence Estimation of Uncertainty-Aware Dirichlet Networks 473. Spectral images based environmental sound classification using CNN with meaningful data augmentation 474. Synchrotron radiation X-ray microtomography for the visualization of intra-cochlear anatomy in human temporal bones implanted with a perimodiolar cochlear implant … 475. Author profiling and related applications 476. A Review of Plant Phenotypic Image Recognition Technology Based on Deep Learning 477. AI in Healthcare and Medical Imaging 478. The benefits of preserving residual hearing following cochlear implantation: a systematic review 479. Advance Security and Challenges with Intelligent IoT Devices 480. A Survey on Deep Reinforcement Learning for Audio-Based Applications 481. Smart Non-intrusive Device Recognition Based on Physical Methods 482. New activation functions for single layer feedforward neural network 483. Measuring the subjective cost of listening effort using a discounting task 484. Shedding Light on the Black Box: Explaining Deep Neural Network Prediction of Clinical Outcomes 485. The Role of Machine Learning Algorithms for Diagnosing Diseases 486. 1D convolutional neural networks and applications: A survey 487. gpuRIR: A python library for room impulse response simulation with GPU acceleration 488. Machine translation 489. An Optimized Parallel Implementation of Non-Iteratively Trained Recurrent Neural Networks 490. CIoTVID: Towards an Open IoT-Platform for Infective Pandemic Diseases such as COVID-19 491. A neural network approach for speech activity detection for Apollo corpus 492. Detection of hate speech in Arabic tweets using deep learning 493. Reconocimiento autom atico del habla en tareas de dominio restringido: la tarea mla 494. Covid-19 shifted patent applications toward technologies that support working from home 495. Usability Evaluation of Artificial Intelligence-Based Voice Assistants: The Case of Amazon Alexa 496. COVID-19 and Tinnitus 497. Non-autoregressive sequence-to-sequence voice conversion 498. TEAM HUB@ LT-EDI-EACL2021: Hope Speech Detection Based On Pre-trained Language Model 499. Skeleton-Based Emotion Recognition Based on Two-Stream Self-Attention Enhanced Spatial-Temporal Graph Convolutional Network 500. FMRI-based identity classification accuracy in left temporal and frontal regions predicts speaker recognition performance 501. Ising spin configurations with the deep learning method 502. Excitable speech: A politics of the performative 503. Multifunctional sensing platform based on green-synthesized silver nanostructure and microcrack architecture 504. Self-Supervised Text-Independent Speaker Verification Using Prototypical Momentum Contrastive Learning 505. A Speech-Driven 3-D Tongue Model with Realistic Movement in Mandarin Chinese 506. Smart Non-intrusive Device Recognition Based on Intelligent Multi-label Classification Methods 507. Amrita@ LT-EDI-EACL2021: Hope Speech Detection on Multilingual Text 508. End-to-End Speaker Height and age estimation using Attention Mechanism with LSTM-RNN 509. Improving adversarial robustness via channel-wise activation suppressing 510. Effortful listening under the microscope: Examining relations between pupillometric and subjective markers of effort and tiredness from listening 511. Vehicle Recognition Using CNN 512. The folded space of machine listening 513. Digital Medical School: New Paradigms for Tomorrow’s Surgical Education 514. Collaborative Learning to Generate Audio-Video Jointly 515. Sign language recognition through Leap Motion controller and input prediction algorithm 516. Randomly wired network based on RoBERTa and dialog history attention for response selection 517. Practice and experience predict coarticulation in child speech 518. Improved Deep Learning Based Method for Molecular Similarity Searching Using Stack of Deep Belief Networks 519. Digital transformation: A multidisciplinary reflection and research agenda 520. Is the User Enjoying the Conversation? A Case Study on the Impact on the Reward Function 521. Remote Microphone System Use in Preschool Children With Autism Spectrum Disorder and Language Disorder in the Classroom: A Pilot Efficacy Study 522. Classification of thought evoked potentials for navigation and communication using multilayer neural network 523. What all do audio transformer models hear? Probing Acoustic Representations for Language Delivery and its Structure 524. Topological properties of the set of functions generated by neural networks of fixed size 525. Disaster City Digital Twin: A vision for integrating artificial and human intelligence for disaster management 526. Character-based handwritten text transcription with attention networks 527. Data, measurement, and causal inferences in machine learning: opportunities and challenges for marketing 528. Natural Language Understanding 529. RECOGNITION OF THE EMOTIONAL STATE OF RUSSIAN AND INDIAN CHILDREN WHILE LISTENING TO THEIR SPEECH BY RUSSIAN AND INDIAN … 530. Micronets: Neural network architectures for deploying tinyml applications on commodity microcontrollers 531. Aberrant COL11A1 splicing causes prelingual autosomal dominant nonsyndromic hearing loss in the DFNA37 locus 532. Engineering ai systems: A research agenda 533. Improvement of the prediction quality of electrical load profiles with artificial neural networks 534. A Review of Plant Phenotypic Image Recognition Technology Based on Deep Learning. Electronics 2021, 10, 81 535. Bag of Tricks 536. Recurrent Neural Networks 537. Simultaneous speaker identification and watermarking 538. Feature Selection Is Important: State-of-the-Art Methods and Application Domains of Feature Selection on High-Dimensional Data 539. Online support information for students with disabilities in colleges and universities during the COVID-19 pandemic 540. Effect of Carnatic Music Listening Training on Speech in Noise Performance in Adults 541. Deep Learning with Swift for TensorFlow 542. Cylinder Pressure Prediction of An HCCI Engine Using Deep Learning 543. Towards AI ingredients 544. Voice-Based Gender Identification Using qPSO Neural Network 545. Machine learning: Algorithms, real-world applications and research directions 546. The basics of machine learning 547. Stress in Parents of School-Age Children and Adolescents With Cochlear Implants 548. Employment of an electronic tongue combined with deep learning and transfer learning for discriminating the storage time of Pu-erh tea 549. Spontaneous Language Models: Techniques and Experimental Results 550. Vocal drum sounds in human beatboxing: An acoustic and articulatory exploration using electromagnetic articulography 551. Decoding imagined speech and computer control using brain waves 552. Deep Sparse Autoencoder Network for Facial Emotion Recognition 553. Benchmark and survey of automated machine learning frameworks 554. Artificial Neural Network (ANN) for Forecasting of Flood at Kasol in Satluj River, India 555. An efficient modified Hyperband and trust-region-based mode-pursuing sampling hybrid method for hyperparameter optimization 556. Bat Algorithm with Applications to Signal, Speech, and Image Processing—A Review 557. Bearing fault diagnosis based on vibro-acoustic data fusion and 1D-CNN network 558. Adversarial deepfakes: Evaluating vulnerability of deepfake detectors to adversarial examples 559. Bone-conduction hearing aid is effective in congenital oval window atresia 560. Maoqin@ DravidianLangTech-EACL2021: The Application of Transformer-Based Model 561. Adversarial Black-Box Attacks with Timing Side-Channel Leakage 562. Preventing fake information generation against media clone attacks 563. A Study on Deep Learning in Neurodegenerative Diseases and Other Brain Disorders 564. Purchase Predictive Design Using Skeleton Model and Purchase Record 565. Effects of assistive technology for students with reading and writing disabilities 566. Voice and Gesture Based App for Blind People 567. Multimodal recognition of emotions in music and language 568. A deep active learning system for species identification and counting in camera trap images 569. Hidden Markov chains and fields with observations in Riemannian manifolds 570. Assessment of Reliability and Validity of the Cochlear Implant Skills Review: A New Measure to Evaluate Cochlear Implant Users’ Device Skills and Knowledge 571. Sarcasm Detection of Media Text Using Deep Neural Networks 572. Siamese neural networks: An overview 573. Self-assessed hearing handicap in the elderly: a pilot study on Iranian population 574. Lexicon-Based Sentiment Analysis 575. Statistical guarantees for regularized neural networks 576. Graph and Convolution Recurrent Neural Networks for Protein-Compound Interaction Prediction 577. Late fusion framework for Acoustic Scene Classification using LPCC, SCMC, and log-Mel band energies with Deep Neural Networks 578. Facial Imitation Improves Emotion Recognition in Adults with Different Levels of Sub-Clinical Autistic Traits 579. Blog text quality assessment using a 3D CNN-based statistical framework 580. Information retrieval: a view from the Chinese IR community 581. Digital Technologies for Governance 582. Intelligibility of face-masked speech depends on speaking style: Comparing casual, clear, and emotional speech 583. Transfer learning for nonparametric classification: Minimax rate and adaptive classifier 584. Detection of False Synchronization of Stereo Image Transmission Using a Convolutional Neural Network 585. Comparison of speech outcomes using type 2b intravelar veloplasty or furlow double-opposing Z plasty for soft palate repair of patients with unilateral cleft lip and … 586. SciANN: A Keras/TensorFlow wrapper for scientific computations and physics-informed deep learning using artificial neural networks 587. Multistain segmentation of renal histology: first steps toward artificial intelligence–augmented digital nephropathology 588. Is this Enough?-Evaluation of Malayalam Wordnet 589. Speech treatment effects on narrative intelligibility in French-speaking children with dysarthria 590. Deep multi-task learning with relational attention for business success prediction 591. Analysis of Methods used to Investigate Engineering Measured Experimental Data 592. Emotional Human-Robot Interaction Systems 593. A character representation enhanced on-device Intent Classification 594. Dynamic Simulated Annealing with Adaptive Neighborhood Using Hidden Markov Model 595. Cloud-Based Federated Learning Implementation Across Medical Centers 596. Environment Transfer for Distributed Systems 597. Language Specificity of Infant-directed Speech: Speaking Rate and Word Position in Word-learning Contexts 598. LPPCNN: A Laplacian Pyramid-based Pulse Coupled Neural Network Method for Medical Image Fusion 599. A comprehensive survey of multi-view video summarization 600. Classification of Indian Languages Through Audio  

Research Topics Computer Science

Top 10 research topics of Speech Recognition | list of research topics of Speech Recognition | trending research topics of Speech Recognition | research topics for dissertation in Speech Recognition | dissertation topics of Speech Recognition in pdf | dissertation topics in Speech Recognition | research area of interest Speech Recognition | example of research paper topics in Speech Recognition | top 10 research thesis topics of Speech Recognition | list of research thesis topics of Speech Recognition| trending research thesis topics of Speech Recognition | research thesis topics for dissertation in Speech Recognition | thesis topics of Speech Recognition in pdf | thesis topics in Speech Recognition | examples of thesis topics of Speech Recognition | PhD research topics examples of Speech Recognition | PhD research topics in Speech Recognition | PhD research topics in computer science | PhD research topics in software engineering | PhD research topics in information technology | Masters (MS) research topics in computer science | Masters (MS) research topics in software engineering | Masters (MS) research topics in information technology | Masters (MS) thesis topics in Speech Recognition.

Leave a Comment

All Copyrights Reserved 2025 Reserved by T4Tutorials