By: Prof. Fazal Rehman Shamil

List of Research Topics and Ideas of Speech Recognition for MS and Ph.D Thesis.

1. Self-training and Pre-training are Complementary for Speech Recognition
2. Development of the cuhk elderly speech recognition system for neurocognitive disorder detection using the dementiabank corpus
3. Internal language model estimation for domain-adaptive end-to-end speech recognition
4. Using Radio Archives for Low-Resource Speech Recognition: Towards an Intelligent Virtual Assistant for Illiterate Users
5. Improving speech recognition models with small samples for air traffic control systems
6. The accented english speech recognition challenge 2020: open datasets, tracks, baselines, results and methods
7. A GDPR-compliant Ecosystem for Speech Recognition with Transfer, Federated, and Evolutionary Learning
8. Citrinet: Closing the Gap between Non-Autoregressive and Autoregressive End-to-End Models for Automatic Speech Recognition
9. Representation transfer learning from deep end-to-end speech recognition networks for the classification of health states from speech
10. Simplified self-attention for transformer-based end-to-end speech recognition
11. Transformer-based online speech recognition with decoder-end adaptive computation steps
12. An evaluation of word-level confidence estimation for end-to-end automatic speech recognition
13. Aispeech-sjtu accent identification system for the accented english speech recognition challenge
14. Bayesian transformer language models for speech recognition
15. Streaming models for joint speech recognition and translation
16. Data augmentation for end-to-end code-switching speech recognition
17. Efficient neural architecture search for end-to-end speech recognition via straight-through gradients
18. Learning to count words in fluent speech enables online speech recognition
19. A Further Study of Unsupervised Pretraining for Transformer Based Speech Recognition
20. Learned transferable architectures can surpass hand-designed architectures for large scale speech recognition
21. Multimodal integration for large-vocabulary audio-visual speech recognition
22. Federated Acoustic Modeling for Automatic Speech Recognition
23. Directional ASR: A new paradigm for E2E multi-speaker speech recognition with source localization
24. Semi-supervised speech recognition via graph-based temporal classification
25. Noise-Robust Speech-to-Text Latency via Bayesian Speech Denoising and Attention-Based Sequence-to-Sequence DNN Speech Recognition in 16nm FinFET
26. Interacting effects of frontal lobe neuroanatomy and working memory capacity to older listeners’ speech recognition in noise
27. A Progressive Learning Approach to Adaptive Noise and Speech Estimation for Speech Enhancement and Noisy Speech Recognition
28. Mondegreen: A Post-Processing Solution to Speech Recognition Error Correction for Voice Search Queries
29. Improving Streaming Automatic Speech Recognition With Non-Streaming Model Distillation On Unsupervised Data
30. Deformable TDNN with adaptive receptive fields for speech recognition
31. Lookup-Table Recurrent Language Models for Long Tail Speech Recognition
32. Evaluation of the effectiveness and efficiency of state-of-the-art features and models for automatic speech recognition error detection
33. Comparison of speech recognition and localization ability in single-sided deaf patients implanted with different cochlear implant electrode array designs
34. Memory-Efficient Speech Recognition on Smart Devices
35. Multi-Quartznet: Multi-Resolution Convolution for Speech Recognition with Multi-Layer Feature Fusion
36. Domain-aware Neural Language Models for Speech Recognition
37. Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion
38. Exploring the use of Common Label Set to Improve Speech Recognition of Low Resource Indian Languages
39. Training augmentation with TANDEM acoustic modelling in Punjabi adult speech recognition system
40. Non-autoregressive Mandarin-English Code-switching Speech Recognition with Pinyin Mask-CTC and Word Embedding Regularization
41. A bandit approach to curriculum generation for automatic speech recognition
42. Fast offline Transformer-based end-to-end automatic speech recognition for real-world applications
43. Human-robot-interaction using cloud-based speech recognition systems
44. cif-based collaborative decoding for end-to-end contextual speech recognition
45. Mini-batch sample selection strategies for deep learning based speech recognition
46. Hierarchical Phoneme Classification for Improved Speech Recognition
47. Design and implementation of speech recognition system integrated with internet of things
48. Improving ultrasound-based multimodal speech recognition with predictive features from representation learning
49. Context-aware RNNLM Rescoring for Conversational Speech Recognition
50. Listen with Intent: Improving Speech Recognition with Audio-to-Intent Front-End
51. Does neural activity in the auditory cortex predict speech recognition with CI?
52. The influence of stimulation levels on auditory thresholds and speech recognition in adult cochlear implant users
53. Jira: a Kurdish Speech Recognition System Designing and Building Speech Corpus and Pronunciation Lexicon
54. Concatenative Speech Recognition using Morphemes
55. Interactions among talker sex, masker number, and masker intelligibility in speech-on-speech recognition
56. The Performance Evaluation of Continuous Speech Recognition Based on Korean Phonological Rules of Cloud-Based Speech Recognition Open API
57. An Critical Analysis of Speech Recognition of Tamil and Malay Language Through Artificial Neural Network
58. Is Speech Recognition Software a Viable Future for Dysarthric Speakers? A Critical Review
59. Automatic Speech Recognition of Continuous Speech Signal of Gujarati Language Using Machine Learning
60. Visual Speech Recognition using VGG16 Convolutional Neural Network
62. Speech Recognition Using Spectrogram-Based Visual Features
63. A Study on Correlation between Automatic Speech Recognition Accuracy and Speech QoE
64. Implementation of The Speech Recognition System Using a real time web Server Based
65. Speech Recognition Using Neural Network for Mobile Robot Navigation
66. The presence of background noise reduces interlingual phonological competition during non-native speech recognition
67. Fine-tuning of Pre-trained End-to-end Speech Recognition with Generative Adversarial Networks
68. Dynamic out-of-vocabulary word registration to language model for speech recognition
69. Speech recognition based on convolutional neural networks and MFCC algorithm
70. Turkish Speech Recognition Techniques and Applications of Recurrent Units (LSTM and GRU)
71. Noise Robust Speech Recognition by Integration of MLLR Adaptation and Feature Extraction for Noise Reduced Speech
72. Automatic Communication Error Detection Using Speech Recognition and Linguistic Analysis for Proactive Control of Loss of Separation
73. On Portability of Automatic Speech Recognition: A Study Case
74. Final Report on Research Grant GR/L59566 ROPA: Phonetically-featured syllables for speech recognition
75. E cient Semantic Constraint for Speech Recognition
76. Anaphora Resolution in a speech recognition environment
77. Acceptability of collecting speech samples from the elderly via the telephone
78. Effects of Hearing Loss on School-Aged Children’s Ability to Benefit From F0 Differences Between Target and Masker Speech
79. Mismatch between objective measure and subjective perception of speech recognition in a patient with single-sided deafness and unilateral cochlear implant
80. A Speech Command Control-Based Recognition System for Dysarthric Patients Based on Deep Learning Technology
81. Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings
82. Language dialect based speech emotion recognition through deep learning techniques
83. Deep Neural Network Driven Speech Classification for Relevance Detection in Automatic Medical Documentation
84. IJERT-Communication Aiding System for People with Speech Impairment
85. EEG-based Speech Activity Detection
86. Low-activity supervised convolutional spiking neural networks applied to speech commands recognition
87. Protecting gender and identity with disentangled speech representations
88. Large-Scale Self-and Semi-Supervised Learning for Speech Translation
89. Contrastive Unsupervised Learning for Speech Emotion Recognition
90. Distortion-controlled training for end-to-end reverberant speech separation with auxiliary autoencoding loss
91. NeMo Toolbox for Speech Dataset Construction
92. Prototype Of Speech Translation System For Audio Effective Communication
93. Exploring Machine Speech Chain for Domain Adaptation and Few-Shot Speaker Adaptation
94. ADL-MVDR: All deep learning MVDR beamformer for target speech separation
95. Aegan: Time-frequency speech denoising via generative adversarial networks
96. High Fidelity Speech Regeneration with Application to Speech Enhancement
97. Talk, Don’t Write: A Study of Direct Speech-Based Image Retrieval
98. Automatic detection of prosodic boundaries in spontaneous speech
99. Prior audio-visual learning facilitates auditory-only speech and voice-identity recognition in noisy listening conditions
100. Fused acoustic and text encoding for multimodal bilingual pretraining and speech translation
101. Fusion of mel and gammatone frequency cepstral coefficients for speech emotion recognition using deep C-RNN
102. A Noise Robust Speech Processing and Recognition Development System
103. Convolution neural network based automatic speech emotion recognition using Mel-frequency Cepstrum coefficients
104. VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation
105. Analyzing Vocal Tract Parameters of Speech
106. Highland Puebla Nahuatl Speech Translation Corpus for Endangered Language Documentation
107. Using Synthetic Audio to Improve the Recognition of Out-of-Vocabulary Words in End-to-End Asr Systems
108. Speaker normalization in speech perception
109. Speech Emotion Recognition: A Review
110. Improving Convolutional Recurrent Neural Networks for Speech Emotion Recognition
111. A Noise-Robust Speech Recogniser supported by a TMS320C31 Platform
112. Supervised Machine Learning Model for Accent Recognition in English Speech Using Sequential MFCC Features
113. Streaming simultaneous speech translation with augmented memory transformer
114. Cross corpus multi-lingual speech emotion recognition using ensemble learning
115. Performance of Forced-Alignment Algorithms on Children’s Speech
116. A Speech Recognized Dynamic Word Cloud Visualization for Text Summarization
117. A Simple Method for Speaker Recognition and Speaker Verification
118. LIS-Net: An end-to-end light interior search network for speech command recognition
119. Towards unsupervised learning of speech features in the wild
120. Articulatory-to-Acoustic Conversion of Mandarin Emotional Speech Based on PSO-LSSVM
121. Chunk-Level Speech Emotion Recognition: A General Framework of Sequence-to-One Dynamic Temporal Modeling
122. Automatic Speech Emotion Recognition using Mel Frequency Cepstrum Co-efficient and Machine Learning Technique
123. Voice Activity Detection for Ultrasound-based Silent Speech Interfaces using Convolutional Neural Networks
124. A Deep Learning Generative Approach for Speech-to-Scene Generation
125. The roles of cognitive abilities and hearing acuity in older adults’ recognition of words taken from fast and spectrally reduced speech
126. Anti-transfer learning for task invariance in convolutional neural networks for speech processing
127. Efficient Speech to Emotion Recognition Using Convolutional Neural Network
128. Real-time pre-processing for improved feature extraction of noisy speech
129. Computational Linguistics-Based Tamil Character Recognition System for Text to Speech Conversion
130. MLT-DNet: Speech emotion recognition using 1D dilated CNN based on multi-learning trick approach
131. Reader: Speech Synthesizer and Speech Recognizer
132. Older Listeners’ Perception of Speech With Strengthened and Weakened Dynamic Pitch Cues in Background Noise
133. Audio-visual speech inpainting with deep learning
134. A Novel Approach to EEG Speech Activity Detection with Visual Stimuli and Mobile BCI
135. Teager Energy Cepstral Coefficients for Classification of Normal vs. Whisper Speech
136. Semi-supervised spoken language understanding via self-supervised speech and language model pretraining
137. Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition
138. Unsupervised low-rank representations for speech emotion recognition
139. Class-Conditional Defense GAN Against End-To-End Speech Attacks
140. Pre-training for low resource speech-to-intent applications
141. Dilated U-net based approach for multichannel speech enhancement from First-Order Ambisonics recordings
142. Data augmenting contrastive learning of speech representations in the time domain
143. End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection
144. Progressive Co-Teaching for Ambiguous Speech Emotion Recognition
145. TransMask: A Compact and Fast Speech Separation Model Based on Transformer
146. Bionic optimization of MFCC features based on speaker fast recognition
147. Speech Perception Across The Lifespan by Means of Artificial Intelligence
148. Encoding and decoding of meaning through structured variability in intonational speech prosody
149. Introducing the Talk Markup Language (TalkML): Adding a little social intelligence to industrial speech interfaces
150. No interaction between fundamental-frequency differences and spectral region when perceiving speech in a speech background
151. 1D CNN based approach for speech emotion recognition using MFCC
152. Analysis of Emotion Recognition from Cross-lingual Speech: Arabic, English, and Urdu
153. Self-Supervised Learning for Personalized Speech Enhancement
154. Emotion controllable speech synthesis using emotion-unlabeled dataset with the assistance of cross-domain speech emotion recognition
155. Don’t shoot butterfly with rifles: Multi-channel continuous speech separation with early exit transformer
156. A Conditional Cycle Emotion Gan for Cross Corpus Speech Emotion Recognition
157. Speech Enhancement for Wake-Up-Word detection in Voice Assistants
158. DeepLPC: A deep learning approach to augmented Kalman filter-based single-channel speech enhancement
159. Modification of misarticulated fricative/s/in cleft lip and palate speech
160. Implementation of audio recognition using mel frequency cepstrum coefficient and dynamic time warping in wirama praharsini
161. Bilateral and bimodal cochlear implant listeners can segregate competing speech using talker sex cues, but not spatial cues
162. A Method of Speech Signal Analysis Using Multi-level Wavelet Transform
163. Arabic Part of Speech Tagging by Using the Stanford System: Prepositions as a Case Study
164. Implementation of low-latency electrolaryngeal speech enhancement based on multi-task CLDNN
165. Coarse-to-fine speech emotion recognition based on multi-task learning
166. Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training
167. Research on Speech Changes Due to Environmental Noise
168. Unsupervised feature selection and NMF de-noising for robust Speech Emotion Recognition
169. Utterance Verification-based Dysarthric Speech Intelligibility Assessment using Phonetic Posterior
170. Exploiting ultrasound tongue imaging for the automatic detection of speech articulation errors
171. Recent developments on espnet toolkit boosted by conformer
172. A Study of F0 Modification for X-Vector Based Speech Pseudonymization Across Gender
173. Study on Automatic Speech Therapy System for Patients
174. Medicare Adds Audiology, Speech-Language Pathology Codes to Temporary Telehealth Coverage
175. Subtitle Automatic Generation System using Speech to Text
176. Sequence-Level Self-Teaching Regularization
177. Seen and unseen emotional style transfer for voice conversion with a new emotional speech dataset
178. Extending a Japanese Speech- to- Gesture Dataset Towards Building a Pedagogical Agent for Second Language Learning
179. Adversarial attack and defense strategies for deep speaker recognition systems
180. Generalized RNN beamformer for target speech separation
181. Practical Speech Re-use Prevention in Voice-driven Services
182. The effect of speech and noise levels on the quality perceived by cochlear implant and normal hearing listeners
183. Individuals With Mild Cognitive Impairment and Alzheimer’s Disease Benefit From Audiovisual Speech Cues and Supportive Sentence Context
184. Clear Speech Perception: Linguistic and Cognitive Benefits
185. Phonemic restoration of interrupted locally time-reversed speech
186. VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
187. Privacy and utility of x-vector based speaker anonymization
188. Statistical corpus-based speech segmentation
189. Local discriminant preservation projection embedded ensemble learning based dimensionality reduction of speech data of Parkinson’s disease
190. Complex Neural Spatial Filter: Enhancing Multi-channel Target Speech Separation in Complex Domain
191. Automated accurate speech emotion recognition system using twine shuffle pattern and iterative neighborhood component analysis techniques
192. Development of the Mechanisms Underlying Audiovisual Speech Perception Benefit
193. Research on Implementation of User Authentication Based on Gesture Recognition of Human
194. Construction of a Large-Scale Japanese ASR Corpus on TV Recordings
195. A modified feature selection method based on metaheuristic algorithms for speech emotion recognition
196. MFFCN: Multi-layer Feature Fusion Convolution Network for Audio-visual Speech Enhancement
197. Accent and gender recognition from English language speech and audio using signal processing and deep learning
198. Speech Decomposition Based on a Hybrid Speech Model and Optimal Segmentation
199. Attention-Based Multi-Encoder Automatic Pronunciation Assessment
200. Multiresolution Cochleagram Speech Enhancement Algorithm Using Improved Deep Neural Networks with Skip Connections
201. Speaker Recognition Based on Fusion of a Deep and Shallow Recombination Gaussian Supervector
202. Transforming imagined thoughts into speech using a covariance-based subset selection method
203. Cascaded encoders for unifying streaming and non-streaming ASR
204. Cross lingual speech emotion recognition via triple attentive asymmetric convolutional neural network
205. Evaluating synthetic speech workload with oculo-motor indices: preliminary observations for Japanese speech
206. Towards more efficient DNN-based speech enhancement using quantized correlation mask
207. A Survey on Dynamic Sign Language Recognition
208. Speech stress recognition using semi-eager learning
209. Hypothesis Stitcher for End-to-End Speaker-attributed ASR on Long-form Multi-talker Recordings
210. Hybrid phonetic-neural model for correction in speech recognition systems.
211. Speech spectrum analyses for estimating operator functionality
212. Lithuanian speech-to-text Transcriber
213. AI TTS Smartphone App for Communication of Speech Impaired People
214. Distributed speech separation in spatially unconstrained microphone arrays
215. Speech Processing: MFCC Based Feature Extraction Techniques-An Investigation
216. Development of the Mechanisms Underlying Audiovisual Speech Perception Benefit. Brain Sci. 2021, 11, 49
217. Attention-based multi-task learning for speech-enhancement and speaker-identification in multi-speaker dialogue scenario
218. Sapaugment: Learning a sample adaptive policy for data augmentation
219. PhoNLP: A joint multi-task learning model for Vietnamese part-of-speech tagging, named entity recognition and dependency parsing
220. Radial Basis Function Neural Network Based Speech Enhancement System Using SLANTLET Transform Through Hybrid Vector Wiener Filter
221. A data layout method suitable for workflow in a cloud computing environment with speech applications
222. Complex Spectral Mapping With Attention Based Convolution Recurrent Neural Network for Speech Enhancement
223. RNN-T models fail to generalize to out-of-domain audio: Causes and solutions
224. Wavelet feature selection of audio and imagined/vocalized EEG signals for ANN based multimodal ASR system
225. Fixed-MAML for Few Shot Classification in Multilingual Speech Emotion Recognition
226. Transfer of Learning from Vision to Touch: A Hybrid Deep Convolutional Neural Network for Visuo-Tactile 3D Object Recognition
227. Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 2: a discussion of chemical and biological data used for AI in drug discovery
228. Symmetric Sub-graph Spatio-Temporal Graph Convolution and its application in Complex Activity Recognition
230. Voice-Based Railway Station Identification Using LSTM Approach
231. Two-Layer Fuzzy Multiple Random Forest for Speech Emotion Recognition
232. Neural Text Normalization in Speech-to-Text Systems with Rich Features
233. Feasibility of remote assessment of the binaural intelligibility level difference in school-age children
234. Deficient Basis Estimation of Noise Spatial Covariance Matrix for Rank-Constrained Spatial Covariance Matrix Estimation Method in Blind Speech Extraction
235. Speech based Depression Severity Level Classification Using a Multi-Stage Dilated CNN-LSTM Model
236. Ieee slt 2021 alpha-mini speech challenge: Open datasets, tracks, rules and baselines
237. Language Identification—A Supportive Tool for Multilingual ASR in Indian Perspective
238. Reliability and critical differences for an implementation of the coordinate response measure in speech-shaped noise
239. Impact of Visual Representation of Audio Signals for Indian Language Identification
240. Can deep learning beat numerical weather prediction?
241. Speech discrimination impairment of the worse-hearing ear in asymmetric hearing loss
242. Speech enhancement based on perceptually motivated guided spectrogram filtering
243. Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation
244. Evaluation of working memory in relation to cochlear implant consonant speech discrimination
245. Voice-controlled quantum chemistry
246. Evaluation of error-and correlation-based loss functions for multitask learning dimensional speech emotion recognition
247. Convolutive Transfer Function Invariant SDR Training Criteria for Multi-Channel Reverberant Speech Separation
248. Phone Calls Speech-to-Text: A Comparison Between APIs for the Portuguese Language
249. Federated Marginal Personalization for ASR Rescoring
250. FastTalker: A neural text-to-speech architecture with shallow and group autoregression
251. The Generalized Bayes Method for High-Dimensional Data Recognition with Applications to Audio Signal Recognition
252. Lexical stress representation in spoken word recognition
253. Probing Acoustic Representations for Phonetic Properties
254. Developing a Framework for Acquisition and Analysis of Speeches
255. Phonotactics in Spoken-Word Recognition
256. Patient Emotion Recognition in Human Computer Interaction System Based on Machine Learning Method and Interactive Design Theory
257. Identification of Food Quality Descriptors in Customer Chat Conversations using Named Entity Recognition
258. A comparative analysis of active learning for biomedical text mining
259. Towards Realizing Sign Language to Emotional Speech Conversion by Deep Learning
260. Gender Identification Over Voice Sample Using Machine Learning
261. Any-to-One Sequence-to-Sequence Voice Conversion Using Self-Supervised Discrete Speech Representations
262. Pair consensus decoding improves accuracy of neural network basecallers for nanopore sequencing
263. Reducing Spelling Inconsistencies in Code-Switching ASR Using Contextualized CTC Loss
264. Personalized speech enhancement through self-supervised data augmentation and purification
265. LSTM-convolutional-BLSTM encoder-decoder network for minimum mean-square error approach to speech enhancement
266. Mean field analysis of deep neural networks
267. BRDS: An FPGA-based LSTM Accelerator with Row-Balanced Dual-Ratio Sparsification
268. Autonomy Voice Assistant for NPAS (NASA Platform for Autonomous Systems)
269. CDPAM: Contrastive learning for perceptual audio similarity
270. Read my lips! Perception of speech in noise by preschool children with autism and the impact of watching the speaker’s face
271. Improving deep speech denoising by Noisy2Noisy signal mapping
272. MarbleNet: Deep 1D Time-Channel Separable Convolutional Neural Network for Voice Activity Detection
273. Audio albert: A lite bert for self-supervised learning of audio representation
274. An Experimental Analysis of Deep Learning Architectures for Supervised Speech Enhancement
275. Compressive Sensing and Contourlet Transform Applications in Speech Signal
276. Depressed Patients Intelligent Recognition in Smart Home Environment
277. An integrated multi-channel approach for joint noise reduction and dereverberation
278. Detection of replay spoof speech using teager energy feature cues
279. 11 TOPS photonic convolutional accelerator for optical neural networks
280. Multi-channel adaptive loudness compensation algorithm based on noise tracking in digital hearing aids
281. How Does the Brain Represent Speech?
282. Using fuzzy string matching for automated assessment of listener transcripts in speech intelligibility studies
284. Adjuvant migraine medications in the treatment of sudden sensorineural hearing loss
285. Multi-channel target speech extraction with channel decorrelation and target speaker adaptation
286. Smart Non-intrusive Device Recognition Based on Deep Learning Methods
287. Cortical tracking of speech in delta band relates to individual differences in speech in noise comprehension in older adults
288. An Automatic Sound Classification Framework with Non-volatile Memory
289. Brain electrical dynamics in speech segmentation depends upon prior experience with the language
290. Discriminant Analysis of Voice Commands in the Presence of an Unmanned Aerial Vehicle
291. Transcripts and Accessibility: Student Views from Using Webinars in Built Environment Education
292. Understanding Speech Amid the Jingle and Jangle: Recommendations for Improving Measurement Practices in Listening Effort Research
293. Cross-Silo Federated Training in the Cloud with Diversity Scaling and Semi-Supervised Learning
294. Icassp 2021 deep noise suppression challenge
295. Is talker variability a critical component of effective phonetic training for nonnative speech?
296. Progressive loss functions for speech enhancement with deep neural networks
297. Autokws: Keyword spotting with differentiable architecture search
298. A Review of Intelligent Smartphone-Based Object Detection Techniques for Visually Impaired People
299. Listening Effort in School-Age Children With Normal Hearing Compared to Children With Limited Useable Hearing Unilaterally
300. The Impact of Musical Training on Understanding Dysarthric Speech: A Preliminary Study of Transcription Errors
301. From perception to action using observed actions to learn gestures
302. Validation of an intelligibility assessment tool in an Indian language for perceptual speech analysis in oral cancer patients
303. Neural Network-based Virtual Microphone Estimator
304. Jointly trained transformers models for spoken language translation
305. Crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder
306. Conversational transfer learning for emotion recognition
307. Adversarial defense for automatic speaker verification by cascaded self-supervised learning models
308. Exposing Speech Transsplicing Forgery with Noise Level Inconsistency
309. Dualformer: a unified bidirectional sequence-to-sequence learning
310. The phonology of parent-child speech
311. BW-EDA-EEND: Streaming End-to-End Neural Speaker Diarization for a Variable Number of Speakers
312. SVM and GMM based Speech/music Classification using SBC
313. Spoken Language Identification in Unseen Target Domain Using Within-Sample Similarity Loss
314. Introduction to Apple ML Tools
315. Kurtosis-based, data-selective affine projection adaptive filtering algorithm for speech processing application
316. Design of Artificial Intelligence Converged Media Experimental System
317. An Evaluation into Deep Learning Capabilities, Functions and Its Analysis
318. Neural mos prediction for synthesized speech using multi-task learning with spoofing detection and spoofing type classification
319. Deep audio-visual learning: A survey
320. Association between the auditory profile and speech-language-hearing diagnosis in children and adolescents
321. Correlation of Visual Perceptions and Extraction of Visual Articulators for Kannada Lip Reading
322. SWav 0.1 User Manual
323. Neural tracking of the speech envelope is differentially modulated by attention and language experience
324. Aging Effects on Categorical Perception of Mandarin Lexical Tones in Noise
325. Detection of heterogeneous parallel steganography for low bit-rate VoIP speech streams
326. A study on Arabic sign language recognition for differently abled using advanced machine learning classifiers
327. Active listening
328. Micaugment: One-Shot Microphone Style Transfer
329. The demand for AI skills in the labor market
330. Cvt: Introducing convolutions to vision transformers
331. Dichotic listening performance with cochlear-implant simulations of ear asymmetry is consistent with difficulty ignoring clearer speech
332. Convolutional neural network
333. Audio fingerprint for automatic Balinese rindik music identification using gaussian mixture model
334. A comparative study of acoustic and linguistic features classification for alzheimer’s disease detection
335. Semi-supervised Multichannel Speech Separation Based on a Phone-and Speaker-Aware Deep Generative Model of Speech Spectrograms
336. Facial Expression Recognition Using Kernel Entropy Component Analysis Network and DAGSVM
337. Two-Stage Fuzzy Fusion Based-Convolution Neural Network for Dynamic Emotion Recognition
338. Robustness of on-device Models: Adversarial Attack to Deep Learning Models on Android Apps
339. Adversarial attacks on audio source separation
340. Facial expression recognition based on facial part attention mechanism
341. Computer Vision-based Intelligent Bookshelf System
342. Cross-cultural emotion recognition and in-group advantage in vocal expression: A meta-analysis
343. Fastpitch: Parallel text-to-speech with pitch prediction
344. Interspeech 2021 Deep Noise Suppression Challenge
345. Text classification and sentiment analysis
346. A Multi-Resolution Approach to GAN-Based Speech Enhancement
347. Arabic grapheme-to-phoneme conversion based on joint multi-gram model
348. Automatic assessment of intelligibility in speakers with dysarthria from coded telephone speech using glottal features
349. Analisis part of speech tagging dengan menggunakan Hidden Markov Model pada data Al-Qur’an
350. Perceptual integration of linguistic and non-linguistic properties of speech
351. Show and speak: Directly synthesize spoken description of images
352. Modeling the Conditional Distribution of Co-Speech Upper Body Gesture Jointly Using Conditional-GAN and Unrolled-GAN
353. An AI-Application-Oriented In-Class Teaching Evaluation Model by Using Statistical Modeling and Ensemble Learning
354. VISUALVOICE: Audio-Visual Speech Separation with Cross-Modal Consistency (Supplementary Materials)
355. Noise and acoustic modeling with waveform generator in text-to-speech and neutral speech conversion
356. Exploring Multimodal Interactions in Human-Autonomy Teaming Using a Natural User Interface
357. Unconstrained online handwritten Uyghur word recognition based on recurrent neural networks and connectionist temporal classification
358. Role of brainwaves in neural speech decoding
359. Public reasoning about voluntary assisted dying: An analysis of submissions to the Queensland Parliament, Australia
360. Cepstral Speech/Pause Detectors
361. Generating EEG features from acoustic features
362. Contextually Aware Multimodal Emotion Recognition
363. Build an app
364. The automatic detection of heart failure using speech signals
365. On the quantization of recurrent neural networks
366. Finding Answers in a Text Document
367. Hypergraph network model for nested entity mention recognition
368. IoT-Based Voice-Controlled Energy-Efficient Intelligent Traffic and Street Light Monitoring System
369. Indian Regional Spoken Language Identification Using Deep Learning Approach
370. A systematic review of hidden markov models and their applications
371. Improved acoustic word embeddings for zero-resource languages using multilingual transfer
372. Advances in Parkinson’s Disease detection and assessment using voice and speech: A review of the articulatory and phonatory aspects
373. A Comparative Analysis of AlexNet and GoogLeNet with a Simple DCNN for Face Recognition
374. Spoken Language Dialogue Systems
376. Design Space for Voice-Based Professional Reporting
377. Xie, Liu, & Jaeger (2020). Cross-talker generalization during foreign-accented speech perception
378. Backdoor attack against speaker verification
379. Role Aware Multi-Party Dialogue Question Answering
380. A Survey on Deep Learning for Time-Series Forecasting
381. Reservoir computing based on a silicon microring and time multiplexing for binary and analog operations
382. Brain activations while processing degraded speech in adults with autism spectrum disorder
383. Going deeper with image transformers
384. Towards the Objective Speech Assessment of Smoking Status based on Voice Features: A Review of the Literature
385. Question answering
386. A review on basic deep learning technologies and applications
387. Self-supervised pretraining of visual features in the wild
388. SeeHear: Signer diarisation and a new dataset
389. Robust Computing for Machine Learning-Based Systems
390. Visualizing the Evolution of the AI Ecosystem
392. Effective Low-Cost Time-Domain Audio Separation Using Globally Attentive Locally Recurrent Networks
393. What Do We See in Them? Identifying Dimensions of Partner Models for Speech Interfaces Using a Psycholexical Approach
394. Transfer learning helps to improve the accuracy to classify patients with different speech disorders in different languages
395. Deep-emotion: Facial expression recognition using attentional convolutional network
396. Functional impacts of aminoglycoside treatment on speech perception and extended high-frequency hearing loss in a pediatric cystic fibrosis cohort
397. Audio segmentation and speaker localization in meeting videos
398. Comparative analysis and application of LBP face image recognition algorithms
399. Lateralized Cerebral Processing of Abstract Linguistic Structure in Clear and Degraded Speech
400. DBnet: Doa-Driven Beamforming Network for end-to-end Reverberant Sound Source Separation
401. The detection of Parkinsons disease from speech using voice source information
402. Leaky Integrator Dynamical Systems and Reachable Sets
403. Lexical and acoustic characteristics of young and older healthy adults
404. ASVspoof 2019: spoofing countermeasures for the detection of synthesized, converted and replayed speech
405. Villain or guardian?’The smart toy is watching you now….’
406. A Study on Image Analysis and Recognition Using Learning Methods: CNN as the Best Image Learner
407. Adversarially learning disentangled speech representations for robust multi-factor voice conversion
408. Short-term prediction of passenger volume for urban rail systems: A deep learning approach based on smart-card data
409. In-scalp incision technique for cochlear implantation
410. Simultaneous bilateral stapes surgery after follow-up of 13 years
411. Natural Language Processing
412. Highly sensitive ultrathin flexible thermoplastic polyurethane/carbon black fibrous film strain sensor with adjustable scaffold networks
413. Advanced Safe Home Systems using Face-Recognition with Unique Passcode Systems
414. Training with an auditory perceptual learning game transfers to speech in competition
415. Computer-based remedial training in phoneme awareness and phonological decoding: Effects on the posttraining development of word recognition
416. Sign language segmentation with temporal convolutional networks
417. Dense CNN with self-attention for time-domain speech enhancement
418. Poly Scale Space Technique for Feature Extraction in Lip Reading: A New Strategy
419. Iqra reading verification with mel frequency cepstrum coefficient and dynamic time warping
420. A memory-efficient tool for bengali parts of speech tagging
421. Cochlear implantation in children with single-sided deafness
422. Knowledge distillation: A survey
423. NLP in Customer Service
424. A scoping review on the use, processing and fusion of geographic data in virtual assistants
425. Evolving Criteria for Adult and Pediatric Cochlear Implantation
426. Telefitting of Nucleus cochlear implants: a feasibility study
427. Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
428. Data Quality Measures and Efficient Evaluation Algorithms for Large-Scale High-Dimensional Data
429. Efferent unmasking of speech-in-noise encoding?
430. Infant-directed Speech by Dutch Fathers: Increased Pitch Variability within and across Utterances
432. PolyDL: Polyhedral Optimizations for Creation of High-performance DL Primitives
433. The 2020 Personalized Voice Trigger Challenge: Open Database, Evaluation Metrics and the Baseline Systems
434. To ban or not to ban: Bayesian attention networks for reliable hate speech detection
435. A Survey on Automatic Multimodal Emotion Recognition in the Wild
436. Joint Intent Detection and Slot Filling Based on Continual Learning Model
437. Apraxia of speech
438. Fatigue in Children With Hearing Loss
439. Gated Convolutional Neural Networks for Text Classification
440. Inception recurrent convolutional neural network for object recognition
441. Speech Signal Processing Toolkit User and Programmer Manual SPro 3.3.
442. Assessing the effect of visual servoing on the performance of linear microphone arrays in moving human-robot interaction scenarios
443. Santa Claus, the Tooth Fairy, and Auditory-Visual Integration: Three Phenomena in Search of Empirical Support
444. Cognitive Hearing Science: Three Memory Systems, Two Approaches, and the Ease of Language Understanding Model
445. Electrooculogram signal identification for elderly disabled using Elman network
446. Multilingual and unsupervised subword modeling for zero-resource languages
447. Studying Alignment in Spontaneous Speech via Automatic Methods: How Do Children Use Task-specific Referents to Succeed in a Collaborative Learning Activity?
448. Contrastive learning of general-purpose audio representations
449. Machine Learning Basics
450. Scene text detection and recognition: The deep learning era
451. Abnormally high water temperature prediction using LSTM deep learning model
452. Effect of exceeding compliance voltage on speech perception in cochlear implants
453. Discriminative neural clustering for speaker diarisation
454. RGAN: Rényi Generative Adversarial Network
455. Global Stock Selection with Hidden Markov Model
456. Closed-set speaker identification system based on MFCC and PNCC features combination with different fusion strategies
457. Effective computer-assisted pronunciation training based on phone-sensitive word recommendation
458. Emotion Recognition of EEG Signals Based on the Ensemble Learning Method: AdaBoost
459. On Enhancing the Accuracy of Nearest Neighbour Time Series Classifier Using Improved Shape Exchange Algorithm
460. Development of a Low Cost Device for Speech Conversion for Mute Community
461. Acoustic Classification of Bird Species
462. Efficient attention: Attention with linear complexities
463. Automated detection of mouse scratching behaviour using convolutional recurrent neural network
464. An Integrated CNN-LSTM Model for Bangla Lexical Sign Language Recognition
465. Neural Networks for Keyword Spotting on IoT Devices
466. Deep Learning Architectures for Medical Diagnosis
467. Data-driven detection and classification of regimes in chaotic systems via hidden markov modeling
468. A Korean named entity recognition method using bi-LSTM-CRF and masked self-attention
469. Acoustic and prosodic information for home monitoring of bipolar disorder
470. Deep Ensemble Siamese Network For Incremental Signal Classification
471. Editorial commentary: Artificial intelligence in sports medicine diagnosis needs to improve
472. Failure Prediction by Confidence Estimation of Uncertainty-Aware Dirichlet Networks
473. Spectral images based environmental sound classification using CNN with meaningful data augmentation
474. Synchrotron radiation X-ray microtomography for the visualization of intra-cochlear anatomy in human temporal bones implanted with a perimodiolar cochlear implant …
475. Author profiling and related applications
476. A Review of Plant Phenotypic Image Recognition Technology Based on Deep Learning
477. AI in Healthcare and Medical Imaging
478. The benefits of preserving residual hearing following cochlear implantation: a systematic review
479. Advance Security and Challenges with Intelligent IoT Devices
480. A Survey on Deep Reinforcement Learning for Audio-Based Applications
481. Smart Non-intrusive Device Recognition Based on Physical Methods
482. New activation functions for single layer feedforward neural network
483. Measuring the subjective cost of listening effort using a discounting task
484. Shedding Light on the Black Box: Explaining Deep Neural Network Prediction of Clinical Outcomes
485. The Role of Machine Learning Algorithms for Diagnosing Diseases
486. 1D convolutional neural networks and applications: A survey
487. gpuRIR: A python library for room impulse response simulation with GPU acceleration
488. Machine translation
489. An Optimized Parallel Implementation of Non-Iteratively Trained Recurrent Neural Networks
490. CIoTVID: Towards an Open IoT-Platform for Infective Pandemic Diseases such as COVID-19
491. A neural network approach for speech activity detection for Apollo corpus
492. Detection of hate speech in Arabic tweets using deep learning
493. Reconocimiento autom atico del habla en tareas de dominio restringido: la tarea mla
494. Covid-19 shifted patent applications toward technologies that support working from home
495. Usability Evaluation of Artificial Intelligence-Based Voice Assistants: The Case of Amazon Alexa
496. COVID-19 and Tinnitus
497. Non-autoregressive sequence-to-sequence voice conversion
498. TEAM HUB@ LT-EDI-EACL2021: Hope Speech Detection Based On Pre-trained Language Model
499. Skeleton-Based Emotion Recognition Based on Two-Stream Self-Attention Enhanced Spatial-Temporal Graph Convolutional Network
500. FMRI-based identity classification accuracy in left temporal and frontal regions predicts speaker recognition performance
501. Ising spin configurations with the deep learning method
502. Excitable speech: A politics of the performative
503. Multifunctional sensing platform based on green-synthesized silver nanostructure and microcrack architecture
504. Self-Supervised Text-Independent Speaker Verification Using Prototypical Momentum Contrastive Learning
505. A Speech-Driven 3-D Tongue Model with Realistic Movement in Mandarin Chinese
506. Smart Non-intrusive Device Recognition Based on Intelligent Multi-label Classification Methods
507. Amrita@ LT-EDI-EACL2021: Hope Speech Detection on Multilingual Text
508. End-to-End Speaker Height and age estimation using Attention Mechanism with LSTM-RNN
509. Improving adversarial robustness via channel-wise activation suppressing
510. Effortful listening under the microscope: Examining relations between pupillometric and subjective markers of effort and tiredness from listening
511. Vehicle Recognition Using CNN
512. The folded space of machine listening
513. Digital Medical School: New Paradigms for Tomorrow’s Surgical Education
514. Collaborative Learning to Generate Audio-Video Jointly
515. Sign language recognition through Leap Motion controller and input prediction algorithm
516. Randomly wired network based on RoBERTa and dialog history attention for response selection
517. Practice and experience predict coarticulation in child speech
518. Improved Deep Learning Based Method for Molecular Similarity Searching Using Stack of Deep Belief Networks
519. Digital transformation: A multidisciplinary reflection and research agenda
520. Is the User Enjoying the Conversation? A Case Study on the Impact on the Reward Function
521. Remote Microphone System Use in Preschool Children With Autism Spectrum Disorder and Language Disorder in the Classroom: A Pilot Efficacy Study
522. Classification of thought evoked potentials for navigation and communication using multilayer neural network
523. What all do audio transformer models hear? Probing Acoustic Representations for Language Delivery and its Structure
524. Topological properties of the set of functions generated by neural networks of fixed size
525. Disaster City Digital Twin: A vision for integrating artificial and human intelligence for disaster management
526. Character-based handwritten text transcription with attention networks
527. Data, measurement, and causal inferences in machine learning: opportunities and challenges for marketing
528. Natural Language Understanding
530. Micronets: Neural network architectures for deploying tinyml applications on commodity microcontrollers
531. Aberrant COL11A1 splicing causes prelingual autosomal dominant nonsyndromic hearing loss in the DFNA37 locus
532. Engineering ai systems: A research agenda
533. Improvement of the prediction quality of electrical load profiles with artificial neural networks
534. A Review of Plant Phenotypic Image Recognition Technology Based on Deep Learning. Electronics 2021, 10, 81
535. Bag of Tricks
536. Recurrent Neural Networks
537. Simultaneous speaker identification and watermarking
538. Feature Selection Is Important: State-of-the-Art Methods and Application Domains of Feature Selection on High-Dimensional Data
539. Online support information for students with disabilities in colleges and universities during the COVID-19 pandemic
540. Effect of Carnatic Music Listening Training on Speech in Noise Performance in Adults
541. Deep Learning with Swift for TensorFlow
542. Cylinder Pressure Prediction of An HCCI Engine Using Deep Learning
543. Towards AI ingredients
544. Voice-Based Gender Identification Using qPSO Neural Network
545. Machine learning: Algorithms, real-world applications and research directions
546. The basics of machine learning
547. Stress in Parents of School-Age Children and Adolescents With Cochlear Implants
548. Employment of an electronic tongue combined with deep learning and transfer learning for discriminating the storage time of Pu-erh tea
549. Spontaneous Language Models: Techniques and Experimental Results
550. Vocal drum sounds in human beatboxing: An acoustic and articulatory exploration using electromagnetic articulography
551. Decoding imagined speech and computer control using brain waves
552. Deep Sparse Autoencoder Network for Facial Emotion Recognition
553. Benchmark and survey of automated machine learning frameworks
554. Artificial Neural Network (ANN) for Forecasting of Flood at Kasol in Satluj River, India
555. An efficient modified Hyperband and trust-region-based mode-pursuing sampling hybrid method for hyperparameter optimization
556. Bat Algorithm with Applications to Signal, Speech, and Image Processing—A Review
557. Bearing fault diagnosis based on vibro-acoustic data fusion and 1D-CNN network
558. Adversarial deepfakes: Evaluating vulnerability of deepfake detectors to adversarial examples
559. Bone-conduction hearing aid is effective in congenital oval window atresia
560. Maoqin@ DravidianLangTech-EACL2021: The Application of Transformer-Based Model
561. Adversarial Black-Box Attacks with Timing Side-Channel Leakage
562. Preventing fake information generation against media clone attacks
563. A Study on Deep Learning in Neurodegenerative Diseases and Other Brain Disorders
564. Purchase Predictive Design Using Skeleton Model and Purchase Record
565. Effects of assistive technology for students with reading and writing disabilities
566. Voice and Gesture Based App for Blind People
567. Multimodal recognition of emotions in music and language
568. A deep active learning system for species identification and counting in camera trap images
569. Hidden Markov chains and fields with observations in Riemannian manifolds
570. Assessment of Reliability and Validity of the Cochlear Implant Skills Review: A New Measure to Evaluate Cochlear Implant Users’ Device Skills and Knowledge
571. Sarcasm Detection of Media Text Using Deep Neural Networks
572. Siamese neural networks: An overview
573. Self-assessed hearing handicap in the elderly: a pilot study on Iranian population
574. Lexicon-Based Sentiment Analysis
575. Statistical guarantees for regularized neural networks
576. Graph and Convolution Recurrent Neural Networks for Protein-Compound Interaction Prediction
577. Late fusion framework for Acoustic Scene Classification using LPCC, SCMC, and log-Mel band energies with Deep Neural Networks
578. Facial Imitation Improves Emotion Recognition in Adults with Different Levels of Sub-Clinical Autistic Traits
579. Blog text quality assessment using a 3D CNN-based statistical framework
580. Information retrieval: a view from the Chinese IR community
581. Digital Technologies for Governance
582. Intelligibility of face-masked speech depends on speaking style: Comparing casual, clear, and emotional speech
583. Transfer learning for nonparametric classification: Minimax rate and adaptive classifier
584. Detection of False Synchronization of Stereo Image Transmission Using a Convolutional Neural Network
585. Comparison of speech outcomes using type 2b intravelar veloplasty or furlow double-opposing Z plasty for soft palate repair of patients with unilateral cleft lip and …
586. SciANN: A Keras/TensorFlow wrapper for scientific computations and physics-informed deep learning using artificial neural networks
587. Multistain segmentation of renal histology: first steps toward artificial intelligence–augmented digital nephropathology
588. Is this Enough?-Evaluation of Malayalam Wordnet
589. Speech treatment effects on narrative intelligibility in French-speaking children with dysarthria
590. Deep multi-task learning with relational attention for business success prediction
591. Analysis of Methods used to Investigate Engineering Measured Experimental Data
592. Emotional Human-Robot Interaction Systems
593. A character representation enhanced on-device Intent Classification
594. Dynamic Simulated Annealing with Adaptive Neighborhood Using Hidden Markov Model
595. Cloud-Based Federated Learning Implementation Across Medical Centers
596. Environment Transfer for Distributed Systems
597. Language Specificity of Infant-directed Speech: Speaking Rate and Word Position in Word-learning Contexts
598. LPPCNN: A Laplacian Pyramid-based Pulse Coupled Neural Network Method for Medical Image Fusion
599. A comprehensive survey of multi-view video summarization
600. Classification of Indian Languages Through Audio

