Last update:
Sat Jun 8 14:56:32 MDT 2024
Anonymous Table of Contents . . . . . . . . . . . 1--2 Anonymous Table of Contents . . . . . . . . . . . 3--4 L. Deng and S. Renals and M. Federico and M. Ostendorf Editorial: Expanding the Technical Reach of our Transactions . . . . . . . . . . 5--5 J. Taghia and R. Martin Objective Intelligibility Measures Based on Mutual Information for Speech Subjected to Speech Enhancement Processing . . . . . . . . . . . . . . . 6--16 Liang Lu and A. Ghoshal and S. Renals Cross-Lingual Subspace Gaussian Mixture Models for Low-Resource Speech Recognition . . . . . . . . . . . . . . 17--27 M. Gasic and S. Young Gaussian Processes for POMDP-Based Dialogue Manager Optimization . . . . . 28--40 I. Mezghani-Marrakchi and G. Mahe and S. Djaziri-Larbi and M. Jaidane and M. Turki-Hadj Alouane Nonlinear Audio Systems Identification Through Audio Input Gaussianization . . 41--53 J. B. Crespo and R. C. Hendriks Multizone Speech Reinforcement . . . . . 54--66 Chao Pan and Jingdong Chen and J. Benesty Performance Study of the MVDR Beamformer as a Function of the Source Incidence Angle . . . . . . . . . . . . . . . . . 67--79 Hung-yi Lee and Lin-shan Lee Improved Semantic Retrieval of Spoken Content by Document/Query Expansion with Random Walk Over Acoustic Similarity Graphs . . . . . . . . . . . . . . . . . 80--94 V. Leutnant and A. Krueger and R. Haeb-Umbach A New Observation Model in the Logarithmic Mel Power Spectral Domain for the Automatic Recognition of Noisy Reverberant Speech . . . . . . . . . . . 95--109 N. F. Chen and S. W. Tam and Wade Shen and J. P. Campbell Characterizing Phonetic Transformations and Acoustic Differences Across English Dialects . . . . . . . . . . . . . . . . 110--124 D. Markovic and K. Kowalczyk and F. Antonacci and C. Hofmann and A. Sarti and W. Kellermann Estimation of Acoustic Reflection Coefficients Through Pseudospectrum Matching . . . . . . . . . . . . . . . . 125--137 Zhiyao Duan and Jinyu Han and B. Pardo Multi-pitch Streaming of Harmonic Sound Mixtures . . . . . . . . . . . . . . . . 138--150 Shilin Liu and Khe Chai Sim Temporally Varying Weight Regression: A Semi-Parametric Trajectory Model for Automatic Speech Recognition . . . . . . 151--160 V. S. Tomar and R. C. Rose A Family of Discriminative Manifold Learning Algorithms and Their Application to Speech Recognition . . . 161--171 H. Doi and T. Toda and K. Nakamura and H. Saruwatari and K. Shikano Alaryngeal Speech Enhancement Based on One-to-Many Eigenvoice Conversion . . . 172--183 E. Arisoy and S. F. Chen and B. Ramabhadran and A. Sethy Converting Neural Network Language Models into Back-off Language Models for Efficient Decoding in Automatic Speech Recognition . . . . . . . . . . . . . . 184--192 C. T. Jin and N. Epain and A. Parthy Design, Optimization and Evaluation of a Dual-Radius Spherical Microphone Array 193--204 R. Mignot and G. Chardon and L. Daudet Low Frequency Interpolation of Room Impulse Responses Using Compressed Sensing . . . . . . . . . . . . . . . . 205--216 M. Senoussaoui and P. Kenny and T. Stafylakis and P. Dumouchel A Study of the Cosine Distance-Based Mean Shift for Telephone Speech Diarization . . . . . . . . . . . . . . 217--227 H. Tachibana and N. Ono and S. Sagayama Singing Voice Enhancement in Monaural Music Signals Based on Two-stage Harmonic/Percussive Sound Separation on Multiple Resolution Spectrograms . . . . 228--237 N. R. Shabtai and B. Rafaely Generalized Spherical Array Beamforming for Binaural Speech Reproduction . . . . 238--247 S. Cumani and P. Laface Factorized Sub-Space Estimation for Fast and Memory Effective $I$-vector Extraction . . . . . . . . . . . . . . . 248--259 Yuan Zeng and R. C. Hendriks Distributed Delay and Sum Beamformer for Speech Enhancement via Randomized Gossip 260--273 Zhenghua Li and Min Zhang and Wanxiang Che and Ting Liu and Wenliang Chen Joint Optimization for Chinese POS Tagging and Dependency Parsing . . . . . 274--286 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing --- EDICS . . . 289--290 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing Information for Authors . . . . . . . . . . . . . . . . 291--292 Anonymous Open Access . . . . . . . . . . . . . . 293--293 Anonymous [Blank page] . . . . . . . . . . . . . . B287--B288 Anonymous Front Cover . . . . . . . . . . . . . . C1 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing publication information . . . . . . . . . . . . . . C2 Anonymous IEEE Signal Processing Society Information . . . . . . . . . . . . . . C3 Anonymous [Blank page --- back cover] . . . . . . C4
Anonymous Table of contents . . . . . . . . . . . 289--290 Anonymous Table of contents . . . . . . . . . . . 291--292 Dehong Gao and Wenjie Li and Xiaoyan Cai and Renxian Zhang and You Ouyang Sequential Summarization: a Full View of Twitter Trending Topics . . . . . . . . 293--302 P. W. J. van Hengel and J. D. Krijnders A Comparison of Spectro-Temporal Representations of Audio Signals . . . . 303--313 I. Zitouni and Y. Benajiba Aligned-Parallel-Corpora Based Semi-Supervised Learning for Arabic Mention Detection . . . . . . . . . . . 314--324 E. Molina and A. M. Barbancho and L. J. Tardon and I. Barbancho Dissonance Reduction In Polyphonic Audio Using Harmonic Reorganization . . . . . 325--334 D. P. K. Lun and Tak-Wai Shen and K. C. Ho A Novel Expectation-Maximization Framework for Speech Enhancement in Non-Stationary Noise Environments . . . 335--346 S. Cosentino and T. H. Falk and D. McAlpine and T. Marquardt Cochlear Implant Filterbank Design and Optimization: A Simulation Study . . . . 347--353 M. Souden and K. Kinoshita and M. Delcroix and T. Nakatani Location Feature Integration for Clustering-Based Speech Separation in Distributed Microphone Arrays . . . . . 354--367 H. Kallasjoki and J. F. Gemmeke and K. J. Palomaki Estimating Uncertainty to Improve Exemplar-Based Feature Enhancement for Noise Robust Speech Recognition . . . . 368--380 T. Hasan and J. H. L. Hansen Maximum Likelihood Acoustic Factor Analysis Models for Robust Speaker Verification in Noise . . . . . . . . . 381--391 O. Schwartz and S. Gannot Speaker Tracking Using Recursive EM Algorithms . . . . . . . . . . . . . . . 392--402 Yu Tsao and S. Matsuda and C. Hori and H. Kashioka and Chin-Hui Lee A MAP-based Online Estimation Approach to Ensemble Speaker and Speaking Environment Modeling . . . . . . . . . . 403--416 Pui-Yu Hui and H. Meng Latent Semantic Analysis for Multimodal User Input With Speech and Gestures . . 417--429 J. Jensen and C. H. Taal Speech Intelligibility Prediction Based on Mutual Information . . . . . . . . . 430--440 A. Primavera and S. Cecchi and Junfeng Li and F. Piazza Objective and Subjective Investigation on a Novel Method for Digital Reverberator Parameters Estimation . . . 441--452 M. Speed and D. Murphy and D. Howard Modeling the Vocal Tract Transfer Function Using a $3$D Digital Waveguide Mesh . . . . . . . . . . . . . . . . . . 453--464 Hüseyim Hacìhabibo\uglu Theoretical Analysis of Open Spherical Microphone Arrays for Acoustic Intensity Measurements . . . . . . . . . . . . . . 465--476 Taemin Cho and J. P. Bello On the Relative Importance of Individual Components of Chord Recognition Systems 477--492 T. Otsuka and K. Ishiguro and H. Sawada and H. G. Okuno Bayesian Nonparametrics for Microphone Array Processing . . . . . . . . . . . . 493--504 Jianjun He and Ee-Leng Tan and Woon-Seng Gan Linear Estimation Based Primary-Ambient Extraction for Stereo Audio Signals . . 505--517 S. Gonzalez and M. Brookes PEFAC --- A Pitch Estimation Algorithm Robust to High Levels of Noise . . . . . 518--530 Min Zhang and Xiangyu Duan and Wenliang Chen Bayesian Constituent Context Model for Grammar Induction . . . . . . . . . . . 531--541 Dah-Chung Chang and Fei-Tao Chu Feedforward Active Noise Control With a New Variable Tap-Length and Step-Size Filtered-X LMS Algorithm . . . . . . . . 542--555 M. McVicar and R. Santos-Rodriguez and Yizhao Ni and Tijl De Bie Automatic Chord Estimation from Audio: a Review of the State of the Art . . . . . 556--575 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing --- EDICS . . . 576--577 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing Information for Authors . . . . . . . . . . . . . . . . 578--579 Anonymous Open Access . . . . . . . . . . . . . . 580--580 Anonymous Front Cover . . . . . . . . . . . . . . C1 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing publication information . . . . . . . . . . . . . . C2 Anonymous IEEE Signal Processing Society Information . . . . . . . . . . . . . . C3 Anonymous [Blank page --- back cover] . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 581--582 Anonymous Table of Contents . . . . . . . . . . . 583--584 Chung-Hsien Wu and Yi-Chin Huang and Chung-Han Lee and Jun-Cheng Guo Synthesis of Spontaneous Speech With Syllable Contraction Using State-Based Context-Dependent Voice Transformation 585--595 M. Airaksinen and T. Raitio and B. Story and P. Alku Quasi Closed Phase Glottal Inverse Filtering Analysis With Weighted Linear Prediction . . . . . . . . . . . . . . . 596--607 Jae-Mo Yang and Hong-Goo Kang Online Speech Dereverberation Algorithm Based on Adaptive Multichannel Linear Prediction . . . . . . . . . . . . . . . 608--619 A. Asaei and M. Golbabaee and H. Bourlard and V. Cevher Structured Sparsity Models for Reverberant Speech Separation . . . . . 620--633 R. S. Rashobh and A. W. H. Khong and Di Liu Multichannel Equalization in the KLT and Frequency Domains With Application to Speech Dereverberation . . . . . . . . . 634--646 P. Samarasinghe and T. Abhayapala and M. Poletti Wavefield Analysis Over Large Areas Using Distributed Higher Order Microphones . . . . . . . . . . . . . . 647--658 Wen-Li Wei and Chung-Hsien Wu and Jen-Chun Lin and Han Li Exploiting Psychological Factors for Interaction Style Recognition in Spoken Conversation . . . . . . . . . . . . . . 659--671 S. A. Raczy\'nski and E. Vincent Genre-Based Music Language Modeling with Latent Hierarchical Pitman-Yor Process Allocation . . . . . . . . . . . . . . . 672--681 Dalei Wu and Wei-Ping Zhu and M. N. S. Swamy The Theory of Compressive Sensing Matching Pursuit Considering Time-domain Noise with Application to Speech Enhancement . . . . . . . . . . . . . . 682--696 T. Nanjundaswamy and K. Rose Cascaded Long Term Prediction for Enhanced Compression of Polyphonic Audio Signals . . . . . . . . . . . . . . . . 697--710 K. Audhkhasi and A. M. Zavou and P. G. Georgiou and S. S. Narayanan Theoretical Analysis of Diversity in an Ensemble of Automatic Speech Recognition Systems . . . . . . . . . . . . . . . . 711--726 J. Nikunen and T. Virtanen Direction of Arrival Based Spatial Covariance Model for Blind Sound Source Separation . . . . . . . . . . . . . . . 727--739 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing EDICS . . . . . 741--742 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing Information for Authors . . . . . . . . . . . . . . . . 743--744 Anonymous Open Access . . . . . . . . . . . . . . 745--745 Anonymous Publish your article in IEEE Access . . 746--746 Anonymous [Blank page] . . . . . . . . . . . . . . B740 Anonymous [Front cover] . . . . . . . . . . . . . C1 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing publication information . . . . . . . . . . . . . . C2 Anonymous IEEE Signal Processing Society Information . . . . . . . . . . . . . . C3 Anonymous [Blank page --- back cover] . . . . . . C4
Anonymous Table of contents . . . . . . . . . . . 741--742 Anonymous Table of contents . . . . . . . . . . . 743--744 Jinyu Li and Li Deng and Yifan Gong and R. Haeb-Umbach An Overview of Noise-Robust Automatic Speech Recognition . . . . . . . . . . . 745--777 R. Sarikaya and G. E. Hinton and A. Deoras Application of Deep Belief Networks for Natural Language Understanding . . . . . 778--784 R. Serizel and M. Moonen and B. Van Dijk and J. Wouters Low-rank Approximation Based Multichannel Wiener Filter Algorithms for Noise Reduction with Application in Cochlear Implants . . . . . . . . . . . 785--799 M. Crocco and A. Trucco Design of Superdirective Planar Arrays With Sparse Aperiodic Layouts for Processing Broadband Signals via $3$-D Beamforming . . . . . . . . . . . . . . 800--815 J. R. Zapata and M. E. P. Davies and E. Gomez Multi-Feature Beat Tracking . . . . . . 816--825 A. Narayanan and Deliang Wang Investigation of Speech Separation as a Front-End for Noise Robust Speech Recognition . . . . . . . . . . . . . . 826--835 Xiaojia Zhao and Yuxuan Wang and Deliang Wang Robust Speaker Identification in Noisy and Reverberant Conditions . . . . . . . 836--845 S. Cumani and O. Plchot and P. Laface On the use of $i$-vector posterior distributions in Probabilistic Linear Discriminant Analysis . . . . . . . . . 846--857 Chung-Hsien Wu and Han-Ping Shen and Yan-Ting Yang Chinese--English Phone Set Construction for Code-Switching ASR Using Acoustic and DNN-Extracted Articulatory Features 858--862 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing EDICS . . . . . 863--864 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing Information for Authors . . . . . . . . . . . . . . . . 865--866 Anonymous Open Access . . . . . . . . . . . . . . 867--867 Anonymous Publish your article in IEEE Access . . 868--868 Anonymous [Front cover] . . . . . . . . . . . . . C1 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing publication information . . . . . . . . . . . . . . C2 Anonymous IEEE Signal Processing Society Information . . . . . . . . . . . . . . C3 Anonymous [Blank page --- back cover] . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 869--870 Anonymous Table of Contents . . . . . . . . . . . 871--872 Weibin Zhang and P. Fung Discriminatively Trained Sparse Inverse Covariance Matrices for Speech Recognition . . . . . . . . . . . . . . 873--882 Hung-yi Lee and Sz-Rung Shiang and Ching-Feng Yeh and Yun-Nung Chen and Yu Huang and Sheng-Yi Kong and Lin-shan Lee Spoken Knowledge Organization by Semantic Structuring and a Prototype Course Lecture System for Personalized Learning . . . . . . . . . . . . . . . . 883--898 L. Zão and R. Coelho and P. Flandrin Speech Enhancement with EMD and Hurst-Based Mode Selection . . . . . . . 899--911 D. Giacobello and M. G. Christensen and T. L. Jensen and M. N. Murthi and S. H. Jensen and M. Moonen Stable $1$-Norm Error Minimization Based Linear Predictors for Speech Modeling 912--922 Y. Lacouture-Parodi and E. A. P. Habets and Jingdong Chen and J. Benesty Multichannel Noise Reduction in the Karhunen--Lo\`eve Expansion Domain . . . 923--936 S. O. Sadjadi and J. H. L. Hansen Blind Spectral Weighting for Robust Speaker Identification under Reverberation Mismatch . . . . . . . . . 937--945 G. Mantena and S. Achanta and K. Prahallad Query-by-Example Spoken Term Detection using Frequency Domain Linear Prediction and Non-Segmental Dynamic Time Warping 946--955 C. Osterwise and S. L. Grant On Over-Determined Frequency Domain BSS 956--966 D. P. Jarrett and M. Taseska and E. A. P. Habets and P. A. Naylor Noise Reduction in the Spherical Harmonic Domain Using a Tradeoff Beamformer and Narrowband DOA Estimates 967--978 V. Rieser and O. Lemon and S. Keizer Natural Language Generation as Incremental Planning Under Uncertainty: Adaptive Information Presentation for Statistical Dialogue Systems . . . . . . 979--994 J. Cheer and S. J. Elliott Comments on ``Complete Parallel Narrowband Active Noise Control Systems'' . . . . . . . . . . . . . . . 995--996 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing EDICS . . . . . 999--1000 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing Information for Authors . . . . . . . . . . . . . . . . 1001--1002 Anonymous Blank page . . . . . . . . . . . . . . . B997--B998 Anonymous [Front cover] . . . . . . . . . . . . . C1 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing publication information . . . . . . . . . . . . . . C2 Anonymous IEEE Signal Processing Society Information . . . . . . . . . . . . . . C3 Anonymous [Blank page --- back cover] . . . . . . C4
Anonymous Table of contents . . . . . . . . . . . 999--1000 Anonymous Table of contents . . . . . . . . . . . 1001--1002 V. Arora and L. Behera Musical Source Clustering and Identification in Polyphonic Audio . . . 1003--1012 R. C. Nongpiur Design of Minimax Broadband Beamformers that are Robust to Microphone Gain, Phase, and Position Errors . . . . . . . 1013--1022 A. Venkitaraman and C. S. Seelamantula Binaural Signal Processing Motivated Generalized Analytic Signal Construction and AM--FM Demodulation . . . . . . . . 1023--1036 J. T. Geiger and F. Weninger and J. F. Gemmeke and M. Wollmer and B. Schuller and G. Rigoll Memory-Enhanced Neural Networks and NMF for Robust ASR . . . . . . . . . . . . . 1037--1046 Haiquan Zhao and Yi Yu and Shibin Gao and Xiangping Zeng and Zhengyou He Memory Proportionate APA with Individual Activation Factors for Acoustic Echo Cancellation . . . . . . . . . . . . . . 1047--1055 M. J. Gangeh and P. Fewzee and A. Ghodsi and M. S. Kamel and F. Karray Multiview Supervised Dictionary Learning in Speech Emotion Recognition . . . . . 1056--1068 Jae-Hun Choi and Joon-Hyuk Chang Dual-Microphone Voice Activity Detection Technique Based on Two-Step Power Level Difference Ratio . . . . . . . . . . . . 1069--1081 X. Alameda-Pineda and R. Horaud A Geometric Approach to Sound Source Localization from Time-Delay Estimates 1082--1095 K. Reindl and S. Meier and H. Barfuss and W. Kellermann Minimum Mutual Information-Based Linearly Constrained Broadband Signal Extraction . . . . . . . . . . . . . . . 1096--1108 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing EDICS . . . . . 1109--1110 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing Information for Authors . . . . . . . . . . . . . . . . 1111--1112 Anonymous [Front cover] . . . . . . . . . . . . . C1 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing publication information . . . . . . . . . . . . . . C2 Anonymous IEEE Signal Processing Society Information . . . . . . . . . . . . . . C3 Anonymous [Blank page --- back cover] . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 1113--1114 Anonymous Table of Contents . . . . . . . . . . . 1115--1116 M. H. Bahari and N. Dehak and H. Van hamme and L. Burget and A. M. Ali and J. Glass Non-Negative Factor Analysis of Gaussian Mixture Model Weight Adaptation for Language and Dialect Recognition . . . . 1117--1129 Guangzhao Bao and Yangfei Xu and Zhongfu Ye Learning a Discriminative Dictionary for Single-Channel Speech Separation . . . . 1130--1138 I. J. Kelly and F. M. Boland Detecting Arrivals in Room Impulse Responses With Dynamic Time Warping . . 1139--1147 M. Guldenschuh and R. de Callafon Detection of Secondary-Path Irregularities in Active Noise Control Headphones . . . . . . . . . . . . . . . 1148--1157 Sin-Horng Chen and Chiao-Hua Hsieh and Chen-Yu Chiang and Hsi-Chun Hsiao and Yih-Ru Wang and Yuan-Fu Liao and Hsiu-Min Yu Modeling of Speaking Rate Influences on Mandarin Speech Prosody and Its Application to Speaking Rate-controlled TTS . . . . . . . . . . . . . . . . . . 1158--1171 D. Comminiello and M. Scarpiniti and L. A. Azpicueta-Ruiz and J. Arenas-Garcia and A. Uncini Nonlinear Acoustic Echo Cancellation Based on Sparse Functional Link Representations . . . . . . . . . . . . 1172--1183 Wen Zhang and T. D. Abhayapala Three Dimensional Sound Field Reproduction using Multiple Circular Loudspeaker Arrays: Functional Analysis Guided Approach . . . . . . . . . . . . 1184--1194 M. Taseska and E. A. P. Habets Informed Spatial Filtering for Sound Extraction Using Distributed Microphone Arrays . . . . . . . . . . . . . . . . . 1195--1207 Mo Shen and D. Kawahara and S. Kurohashi Dependency Parse Reranking with Rich Subtree Features . . . . . . . . . . . . 1208--1218 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing EDICS . . . . . 1221--1222 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing Information for Authors . . . . . . . . . . . . . . . . 1223--1224 Anonymous Open Access . . . . . . . . . . . . . . 1225--1225 Anonymous [Blank page] . . . . . . . . . . . . . . B1219--B1220 Anonymous [Front cover] . . . . . . . . . . . . . C1 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing publication information . . . . . . . . . . . . . . C2 Anonymous IEEE Signal Processing Society Information . . . . . . . . . . . . . . C3 Anonymous [Blank page --- back cover] . . . . . . C4
Anonymous Table of contents . . . . . . . . . . . 1221--1222 Anonymous Table of contents . . . . . . . . . . . 1223--1224 Zhibao Li and K. F. C. Yiu and S. Nordholm On the Indoor Beamformer Design With Reverberation . . . . . . . . . . . . . 1225--1235 M. B. Hawes and Wei Liu Sparse Array Design for Wideband Beamforming With Reduced Complexity in Tapped Delay-Lines . . . . . . . . . . . 1236--1247 Yi FanChiang and Cheng-Wen Wei and Yi-Le Meng and Yu-Wen Lin and Shyh-Jye Jou and Tian-Sheuan Chang Low Complexity Formant Estimation Adaptive Feedback Cancellation for Hearing Aids Using Pitch Based Processing . . . . . . . . . . . . . . . 1248--1259 S. Conan and O. Derrien and M. Aramaki and S. Ystad and R. Kronland-Martinet A Synthesis Model With Intuitive Control Capabilities for Rolling Sounds . . . . 1260--1273 C. Schuldt and P. Handel Decay Rate Estimators and Their Performance for Blind Reverberation Time Estimation . . . . . . . . . . . . . . . 1274--1284 S. Ganapathy and S. H. Mallidi and H. Hermansky Robust Feature Extraction Using Modulation Filtering of Autoregressive Models . . . . . . . . . . . . . . . . . 1285--1295 Bo Li and Khe Chai Sim A Spectral Masking Approach to Noise-Robust Speech Recognition Using Deep Neural Networks . . . . . . . . . . 1296--1305 E. Yilmaz and J. F. Gemmeke and H. Van hamme Noise Robust Exemplar Matching Using Sparse Representations of Speech . . . . 1306--1319 D. Schmid and G. Enzner and S. Malik and D. Kolossa and R. Martin Variational Bayesian Inference for Multichannel Dereverberation and Noise Reduction . . . . . . . . . . . . . . . 1320--1335 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing EDICS . . . . . 1336--1337 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing Information for Authors . . . . . . . . . . . . . . . . 1338--1339 Anonymous Open Access . . . . . . . . . . . . . . 1340--1340 Anonymous [Front cover] . . . . . . . . . . . . . C1 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing publication information . . . . . . . . . . . . . . C2 Anonymous IEEE Signal Processing Society Information . . . . . . . . . . . . . . C3 Anonymous [Blank page --- back cover] . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 1341--1342 Anonymous Table of Contents . . . . . . . . . . . 1343--1344 B. Masiero and M. Vorlander A Framework for the Calculation of Dynamic Crosstalk Cancellation Filters 1345--1354 A. Schasse and R. Martin Estimation of Subband Speech Correlations for Noise Reduction via MVDR Processing . . . . . . . . . . . . 1355--1365 Michal Novotný and Jan Rusz and Roman \vCmejla and Ev\vzen R\ru\vzi\vcka Automatic Evaluation of Articulatory Disorders in Parkinson's Disease . . . . 1366--1378 F. Lim and Wancheng Zhang and E. A. P. Habets and P. A. Naylor Robust Multichannel Dereverberation using Relaxed Multichannel Least Squares 1379--1390 S. H. Ghalehjegh and R. C. Rose Linear Regression Based Acoustic Adaptation for the Subspace Gaussian Mixture Model . . . . . . . . . . . . . 1391--1402 J. Botts and L. Savioja Spectral and Pseudospectral Properties of Finite Difference Models Used in Audio and Room Acoustics . . . . . . . . 1403--1412 Yong Xiang and I. Natgunanathan and Song Guo and Wanlei Zhou and S. Nahavandi Patchwork-Based Audio Watermarking Method Robust to De-synchronization Attacks . . . . . . . . . . . . . . . . 1413--1423 I. V. McLoughlin Super-Audible Voice Activity Detection 1424--1433 A. Alinaghi and P. J. Jackson and Qingju Liu and Wenwu Wang Joint Mixing Vector and Binaural Model Based Stereo Source Separation . . . . . 1434--1448 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing EDICS . . . . . 1451--1452 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing Information for Authors . . . . . . . . . . . . . . . . 1453--1454 Anonymous Open Access . . . . . . . . . . . . . . 1455--1455 Anonymous Together, we are advancing technology 1456--1456 Anonymous [Blank page] . . . . . . . . . . . . . . B1449--B1450 Anonymous Front Cover . . . . . . . . . . . . . . C1 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing publication information . . . . . . . . . . . . . . C2 Anonymous IEEE Signal Processing Society Information . . . . . . . . . . . . . . C3 Anonymous [Blank page --- back cover] . . . . . . C4
Anonymous Table of contents . . . . . . . . . . . 1451--1452 Anonymous Table of contents . . . . . . . . . . . 1453--1454 Liheng Zhao and J. Benesty and Jingdong Chen Design of Robust Differential Microphone Arrays . . . . . . . . . . . . . . . . . 1455--1466 P. Jain and R. B. Pachori Event-Based Method for Instantaneous Fundamental Frequency Estimation from Voiced Speech Based on Eigenvalue Decomposition of the Hankel Matrix . . . 1467--1482 Y. Vaizman and B. McFee and G. Lanckriet Codebook-Based Audio Feature Representation for Music Information Retrieval . . . . . . . . . . . . . . . 1483--1493 O. Nadiri and B. Rafaely Localization of Multiple Speakers under High Reverberation using a Spherical Microphone Array and the Direct-Path Dominance Test . . . . . . . . . . . . . 1494--1505 Zhizheng Wu and T. Virtanen and Eng Siong Chng and Haizhou Li Exemplar-Based Sparse Representation With Residual Compensation for Voice Conversion . . . . . . . . . . . . . . . 1506--1521 D. S. Talagala and Wen Zhang and T. D. Abhayapala Efficient Multi-Channel Adaptive Room Compensation for Spatial Soundfield Reproduction Using a Modal Decomposition 1522--1532 O. Abdel-Hamid and A.-R. Mohamed and Hui Jiang and Li Deng and G. Penn and Dong Yu Convolutional Neural Networks for Speech Recognition . . . . . . . . . . . . . . 1533--1545 S. Koyama and K. Furuya and Y. Hiwasaki and Y. Haneda and Y. Suzuki Wave Field Reconstruction Filtering in Cylindrical Harmonic Domain for With-Height Recording and Reproduction 1546--1557 Chia-Ping Chen and Yi-Chin Huang and Chung-Hsien Wu and Kuan-De Lee Polyglot Speech Synthesis Based on Cross-Lingual Frame Selection Using Auditory and Articulatory Features . . . 1558--1570 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing EDICS . . . . . 1571--1572 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing Information for Authors . . . . . . . . . . . . . . . . 1573--1574 Anonymous Open Access . . . . . . . . . . . . . . 1575--1575 Anonymous Together, we are advancing technology 1576--1576 Anonymous Front Cover . . . . . . . . . . . . . . C1 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing publication information . . . . . . . . . . . . . . C2 Anonymous IEEE Signal Processing Society Information . . . . . . . . . . . . . . C3 Anonymous [Blank page --- back cover] . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 1577--1578 Anonymous Table of Contents . . . . . . . . . . . 1579--1580 Jian Xu and Zhi-Jie Yan and Qiang Huo An Unsupervised Adaptation Approach to Leveraging Feedback Loop Data by Using $i$-Vector for Data Clustering and Selection . . . . . . . . . . . . . . . 1581--1589 S. Cumani and P. Laface Large-Scale Training of Pairwise Support Vector Machines for Speaker Recognition 1590--1600 Jun Du and Qiang Huo An Improved VTS Feature Compensation using Mixture Models of Distortion and IVN Training for Noisy Speech Recognition . . . . . . . . . . . . . . 1601--1611 M. Togami and Y. Kawaguchi Simultaneous Optimization of Acoustic Echo Reduction, Speech Dereverberation, and Noise Reduction against Mutual Interference . . . . . . . . . . . . . . 1612--1623 J. Lorente and M. Ferrer and M. de Diego and A. Gonzalez GPU Implementation of Multichannel Adaptive Algorithms for Local Active Noise Control . . . . . . . . . . . . . 1624--1635 T. Helie Simulation of Fractional-Order Low-Pass Filters . . . . . . . . . . . . . . . . 1636--1647 B. Defraene and T. van Waterschoot and M. Diehl and M. Moonen Embedded-Optimization-Based Loudspeaker Precompensation Using a Hammerstein Loudspeaker Model . . . . . . . . . . . 1648--1659 Guangsen Wang and Khe Chai Sim Regression-Based Context-Dependent Modeling of Deep Neural Networks for Speech Recognition . . . . . . . . . . . 1660--1669 R. Badeau and M. D. Plumbley Multichannel High-Resolution NMF for Modeling Convolutive Mixtures of Non-Stationary Signals in the Time-Frequency Domain . . . . . . . . . 1670--1680 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing EDICS . . . . . 1683--1684 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing Information for Authors . . . . . . . . . . . . . . . . 1685--1686 Anonymous [Blank page] . . . . . . . . . . . . . . B1681--B1682 Anonymous Front Cover . . . . . . . . . . . . . . C1 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing publication information . . . . . . . . . . . . . . C2 Anonymous IEEE Signal Processing Society Information . . . . . . . . . . . . . . C3 Anonymous [Blank page --- back cover] . . . . . . C4
Anonymous Table of contents . . . . . . . . . . . 1683--1685 Deng Farewell editorial: Keeping up the momentum of innovations . . . . . . . . 1687--1687 S. H. Yella and H. Bourlard Overlapping Speech Detection Using Long-Term Conversational Features for Speaker Diarization in Meeting Room Conversations . . . . . . . . . . . . . 1688--1700 R. K. Chivukula and Y. A. Reznik and Yanyan Hu and V. Devarajan and M. Jayendra-Lakshman Fast Algorithms for Low-Delay TDAC Filterbanks in MPEG-4 AAC--ELD . . . . . 1701--1712 Shaofei Xue and O. Abdel-Hamid and Hui Jiang and Lirong Dai and Qingfeng Liu Fast Adaptation of Deep Neural Network Based on Discriminant Codes for Speech Recognition . . . . . . . . . . . . . . 1713--1725 M. E. P. Davies and P. Hamel and K. Yoshii and M. Goto AutoMashUpper: Automatic Creation of Multi-Song Music Mashups . . . . . . . . 1726--1737 Chao Weng and D. L. Thomson and P. Haffner and B.-H. F. Juang Latent Semantic Rational Kernels for Topic Spotting on Conversational Speech 1738--1749 N. Wachowski and M. R. Azimi-Sadjadi Detection and Classification of Nonstationary Transient Signals Using Sparse Approximations and Bayesian Networks . . . . . . . . . . . . . . . . 1750--1764 G. Percival and G. Tzanetakis Streamlined Tempo Estimation Based on Autocorrelation and Cross-correlation With Pulses . . . . . . . . . . . . . . 1765--1776 A. Barkefors and M. Sternad and L.-J. Brannmark Design and Analysis of Linear Quadratic Gaussian Feedforward Controllers for Active Noise Control . . . . . . . . . . 1777--1791 M. Cobos and J. J. Perez-Solano and S. Felici-Castell and J. Segura and J. M. Navarro Cumulative-Sum-Based Localization of Sound Events in Low-Cost Wireless Acoustic Sensor Networks . . . . . . . . 1792--1802 V. Tourbabin and B. Rafaely Theoretical Framework for the Optimization of Microphone Array Configuration for Humanoid Robot Audition . . . . . . . . . . . . . . . . 1803--1814 Y. Zakharov and V. H. Nascimento Sliding-Window RLS Low-Cost Implementation of Proportionate Affine Projection Algorithms . . . . . . . . . 1815--1824 S. D'Angelo and V. Valimaki Generalized Moog Ladder Filter: Part I --- Linear Analysis and Parameterization 1825--1832 Na Yang and He Ba and Weiyang Cai and I. Demirkol and W. Heinzelman BaNa: a Noise Resilient Fundamental Frequency Detection Algorithm for Speech and Music . . . . . . . . . . . . . . . 1833--1848 Yuxuan Wang and A. Narayanan and Deliang Wang On Training Targets for Supervised Speech Separation . . . . . . . . . . . 1849--1858 Ling-Hui Chen and Zhen-Hua Ling and Li-Juan Liu and Li-Rong Dai Voice Conversion Using Deep Neural Networks With Layer-Wise Generative Training . . . . . . . . . . . . . . . . 1859--1872 S. D'Angelo and V. Valimaki Generalized Moog Ladder Filter: Part II --- Explicit Nonlinear Model through a Novel Delay-Free Loop Implementation Method . . . . . . . . . . . . . . . . . 1873--1883 Z. Rafii and Zhiyao Duan and B. Pardo Combining Rhythm-Based and Pitch-Based Methods for Background and Melody Separation . . . . . . . . . . . . . . . 1884--1893 J. Ramo and V. Valimaki and B. Bank High-Precision Parallel Graphic Equalizer . . . . . . . . . . . . . . . 1894--1904 Y. Panagakis and C. L. Kotropoulos and G. R. Arce Music Genre Classification via Joint Sparse Low-Rank Representation of Audio Features . . . . . . . . . . . . . . . . 1905--1917 A. Maezawa and K. Itoyama and K. Yoshii and H. G. Okuno Nonparametric Bayesian Dereverberation of Power Spectrograms Based on Infinite-Order Autoregressive Processes 1918--1930 M. Krawczyk and T. Gerkmann STFT Phase Reconstruction in Voiced Speech for an Improved Single-Channel Speech Enhancement . . . . . . . . . . . 1931--1940 V. Khanagha and K. Daoudi and H. M. Yahia Detection of Glottal Closure Instants Based on the Microcanonical Multiscale Formalism . . . . . . . . . . . . . . . 1941--1950 A. Venturini and L. Zao and R. Coelho On speech features fusion, $ \alpha $-integration Gaussian modeling and multi-style training for noise robust speaker classification . . . . . . . . . 1951--1964 P. Foster and M. Mauch and S. Dixon Sequential Complexity as a Descriptor for Musical Similarity . . . . . . . . . 1965--1977 Gang Liu and J. H. L. Hansen An Investigation into Back-end Advancements for Speaker Recognition in Multi-Session and Noisy Enrollment Scenarios . . . . . . . . . . . . . . . 1978--1992 Jitong Chen and Yuxuan Wang and Deliang Wang A Feature Study for Classification-Based Speech Separation at Low Signal-to-Noise Ratios . . . . . . . . . . . . . . . . . 1993--2002 J. van Mourik and D. Murphy Explicit Higher-Order FDTD Schemes for $3$D Room Acoustic Simulation . . . . . 2003--2011 Pei Chee Yong and S. Nordholm and Hai Huyen Dam Effective Binaural Multi-Channel Processing Algorithm for Improved Environmental Presence . . . . . . . . . 2012--2024 A. Chen and M. A. Hasegawa-Johnson Mixed Stereo Audio Classification Using a Stereo-Input Mixed-to-Panned Level Feature . . . . . . . . . . . . . . . . 2025--2033 Gongping Huang and J. Benesty and Tao Long and Jingdong Chen A Family of Maximum SNR Filters for Noise Reduction . . . . . . . . . . . . 2034--2047 Su Yan and Xiaojun Wan SRRank: Leveraging Semantic Roles for Extractive Multi-Document Summarization 2048--2058 H. Tachibana and N. Ono and H. Kameoka and S. Sagayama Harmonic/Percussive Sound Separation Based on Anisotropic Smoothness of Spectrograms . . . . . . . . . . . . . . 2059--2073 J. M. Gil-Cacho and T. van Waterschoot and M. Moonen and S. H. Jensen A Frequency-Domain Adaptive Filter (FDAF) Prediction Error Method (PEM) Framework for Double-Talk-Robust Acoustic Echo Cancellation . . . . . . . 2074--2086 Qi Wang and W. L. Woo and S. S. Dlay Informed Single-Channel Speech Separation Using HMM--GMM User-Generated Exemplar Source . . . . . . . . . . . . 2087--2100 D. Erro and T.-C. Zorila and Y. Stylianou Enhancing the Intelligibility of Statistically Generated Synthetic Speech by Means of Noise-Independent Modifications . . . . . . . . . . . . . 2101--2111 Yi Jiang and Deliang Wang and Runsheng Liu and ZhenMing Feng Binaural Classification for Reverberant Speech Segregation Using Deep Neural Networks . . . . . . . . . . . . . . . . 2112--2121 Li Su and Hsin-Ming Lin and Yi-Hsuan Yang Sparse Modeling of Magnitude and Phase-Derived Spectra for Playing Technique Classification . . . . . . . . 2122--2132 V. V. Reddy and A. W. H. Khong and Boon Poh Ng Unambiguous Speech DOA Estimation Under Spatial Aliasing Conditions . . . . . . 2133--2145 A. Mohammadi and S. S. Sarfjoo and C. Demiroglu Eigenvoice Speaker Adaptation with Minimal Data for Statistical Speech Synthesis Systems Using a MAP Approach and Nearest-Neighbors . . . . . . . . . 2146--2157 Kun Han and Deliang Wang Neural Network Based Pitch Tracking in Very Noisy Speech . . . . . . . . . . . 2158--2168 Yongsheng Mu and Peifeng Ji and Wei Ji and Ming Wu and Jun Yang Modeling and Compensation for the Distortion of Parametric Loudspeakers Using a One-Dimension Volterra Filter 2169--2181 O. Thiergart and M. Taseska and E. A. P. Habets An Informed Parametric Spatial Filter Based on Instantaneous Direction-of-Arrival Estimates . . . . . 2182--2196 J. F. Santos and T. H. Falk Updating the SRMR--CI Metric for Improved Intelligibility Prediction for Cochlear Implant Users . . . . . . . . . 2197--2206 Seon Man Kim and Hong Kook Kim Direction-of-Arrival Based SNR Estimation for Dual-Microphone Speech Enhancement . . . . . . . . . . . . . . 2207--2217 T. Otsuka and K. Ishiguro and T. Yoshioka and H. Sawada and H. G. Okuno Multichannel Sound Source Dereverberation and Separation for Arbitrary Number of Sources Based on Bayesian Nonparametrics . . . . . . . . 2218--2232 J. Traa and P. Smaragdis Multichannel Source Separation and Tracking With RANSAC and Directional Statistics . . . . . . . . . . . . . . . 2233--2243 Weifeng Li and Longbiao Wang and Yicong Zhou and J. Dines and M. Magimai-Doss and H. Bourlard and Qingmin Liao Feature Mapping of Multiple Beamformed Sources for Robust Overlapping Speech Recognition Using a Microphone Array . . 2244--2255 Y. FanChiang and C.-W. Wei and Y.-L. Meng and Y.-W. Lin and S.-J. Jou and T.-S. Chang Correction to ``Low Complexity Formant Estimation Adaptive Feedback Cancellation for Hearing Aids Using Pitch Based Processing'' [Aug \bf 14 1248--1259] . . . . . . . . . . . . . . 2256--2256 Anonymous List of Reviewers . . . . . . . . . . . 2257--2259 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing EDICS . . . . . 2260--2261 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing Information for Authors . . . . . . . . . . . . . . . . 2262--2263 Anonymous 2014 Index IEEE/ACM Transactions on Audio, Speech, and Language Processing Vol. 22 . . . . . . . . . . . . . . . . 2264--2288 Anonymous [Blank page] . . . . . . . . . . . . . . B1686 Anonymous Front Cover . . . . . . . . . . . . . . C1 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing publication information . . . . . . . . . . . . . . C2 Anonymous IEEE Signal Processing Society Information . . . . . . . . . . . . . . C3 Anonymous [Blank page --- back cover] . . . . . . C4
Anonymous Table of contents . . . . . . . . . . . 1--2 Anonymous Table of contents . . . . . . . . . . . 3--4 H Li Inaugural Editorial: Embracing New Opportunities for Growth . . . . . . . . 5--6 Yong Xu and Jun Du and Li-Rong Dai and Chin-Hui Lee A Regression Approach to Speech Enhancement Based on Deep Neural Networks . . . . . . . . . . . . . . . . 7--19 H. Phan and M. Maas and R. Mazur and A. Mertins Random Regression Forests for Acoustic Event Detection and Classification . . . 20--31 Yuntao Wu and L. Amir and J. R. Jensen and Guisheng Liao Joint Pitch and DOA Estimation Using the ESPRIT Method . . . . . . . . . . . . . 32--45 R. Decorsiere and P. L. Sòndergaard and E. N. MacDonald and T. Dau Inversion of Auditory Spectrograms, Traditional Spectrograms, and Other Envelope Representations . . . . . . . . 46--56 J. Poignant and L. Besacier and G. Quénot Unsupervised Speaker Identification in TV Broadcast Based on Written Names . . 57--68 Renjie Tong and Yingyue Zhou and Long Zhang and Guangzhao Bao and Zhongfu Ye A Robust Time-Frequency Decomposition Model for Suppression of Mixed Gaussian-Impulse Noise in Audio Signals 69--79 S. Ahani and S. Ghaemmaghami and Z. J. Wang A Sparse Representation-Based Wavelet Domain Speech Steganography Method . . . 80--91 A. Narayanan and Deliang Wang Improving Robustness of Deep Neural Network Acoustic Models via Speech Separation and Joint Adaptive Training 92--101 Rongfeng Su and Xunying Liu and Lan Wang Automatic Complexity Control of Generalized Variable Parameter HMMs for Noise Robust Speech Recognition . . . . 102--114 Zixing Zhang and E. Coutinho and Jun Deng and B. Schuller Cooperative Learning and its Application to Emotion Recognition from Speech . . . 115--126 Pei-hao Su and Chuan-hsun Wu and Lin-shan Lee A Recursive Dialogue Game for Personalized Computer-Aided Pronunciation Training . . . . . . . . . 127--141 A. Rakotomamonjy and G. Gasso Histogram of Gradients of Time--Frequency Representations for Audio Scene Classification . . . . . . . 142--153 S. A. Khoubrouy and I. M. S. Panahi and J. H. L. Hansen Howling Detection in Hearing Aids Based on Generalized Teager--Kaiser Operator 154--161 J. B. B. Nielsen and J. Nielsen and J. Larsen Perception-Based Personalization of Hearing Aids Using Gaussian Processes and Active Learning . . . . . . . . . . 162--173 J. R. Jensen and M. G. Christensen and J. Benesty and S. H. Jensen Joint Spatio-Temporal Filtering Methods for DOA and Fundamental Frequency Estimation . . . . . . . . . . . . . . . 174--185 J. Jensen and Zheng-Hua Tan Minimum Mean-Square Error Estimation of Mel-Frequency Cepstral Features --- A Theoretically Consistent Approach . . . 186--197 C.-D. Martinez-Hinarejos and J.-M. Benedi and V. Tamarit Unsegmented Dialogue Act Annotation and Decoding With $N$-Gram Transducers . . . 198--211 Lin Wang and Zhe Chen and Fuliang Yin A Novel Hierarchical Decomposition Vector Quantization Method for High-Order LPC Parameters . . . . . . . 212--221 Anonymous [Blank page] . . . . . . . . . . . . . . 222--222 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing EDICS . . . . . 223--224 Anonymous Information for Authors . . . . . . . . 225--226 Anonymous Open Access . . . . . . . . . . . . . . 227--227 Anonymous [Front cover] . . . . . . . . . . . . . C1 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing publication information . . . . . . . . . . . . . . C2 Anonymous IEEE Signal Processing Society Information . . . . . . . . . . . . . . C3 Anonymous [Blank page --- back cover] . . . . . . C4
Anonymous Table of contents . . . . . . . . . . . 223--224 Anonymous Table of contents . . . . . . . . . . . 225--226 Guang Hua and J. Goh and V. L. L. Thing Time-Spread Echo-Based Audio Watermarking With Optimized Imperceptibility and Robustness . . . . 227--239 O. Schwartz and S. Gannot and E. A. P. Habets Multi-Microphone Speech Dereverberation and Noise Reduction Using Relative Early Transfer Functions . . . . . . . . . . . 240--251 E. Molina and L. J. Tardon and A. M. Barbancho and I. Barbancho SiPTH: Singing Transcription Based on Hysteresis Defined on the Pitch-Time Curve . . . . . . . . . . . . . . . . . 252--263 Haipeng Wang and Tan Lee and Cheung-Chi Leung and Bin Ma and Haizhou Li Acoustic Segment Modeling with Spectral Clustering Methods . . . . . . . . . . . 264--277 V. Arora and L. Behera Multiple F0 Estimation and Source Clustering of Polyphonic Music Audio Using PLCA and HMRFs . . . . . . . . . . 278--287 R. Sugiura and Y. Kamamoto and N. Harada and H. Kameoka and T. Moriya Resolution Warped Spectral Representation for Low-Delay and Low-Bit-Rate Audio Coder . . . . . . . . 288--299 Chao Weng and B.-H. F. Juang Discriminative Training Using Non-Uniform Criteria for Keyword Spotting on Spontaneous Speech . . . . . 300--312 Y. Matsuyama and A. Saito and S. Fujie and T. Kobayashi Automatic Expressive Opinion Sentence Generation for Enjoyable Conversational Systems . . . . . . . . . . . . . . . . 313--326 P. N. Petkov and W. B. Kleijn Spectral Dynamics Recovery for Enhanced Speech Intelligibility in Noise . . . . 327--338 E. Bicici and D. Yuret Optimizing Instance Selection for Statistical Machine Translation with Feature Decay Algorithms . . . . . . . . 339--350 Mengqiu Zhang and R. A. Kennedy and T. D. Abhayapala Empirical Determination of Frequency Representation in Spherical Harmonics-Based HRTF Functional Modeling 351--360 Zu-Ren Feng and Qing Zhou and Jun Zhang and Ping Jiang and Xue-Wen Yang A Target Guided Subband Filter for Acoustic Event Detection in Noisy Environments Using Wavelet Packets . . . 361--372 N. Hirayama and K. Yoshino and K. Itoyama and S. Mori and H. G. Okuno Automatic Speech Recognition for Mixed Dialect Utterances by Mixing Dialect Language Models . . . . . . . . . . . . 373--382 A. Schasse and T. Gerkmann and R. Martin and W. Sorgel and T. Pilgrim and H. Puder Two-Stage Filter-Bank System for Improved Single-Channel Noise Reduction in Hearing Aids . . . . . . . . . . . . 383--393 B. Schwartz and S. Gannot and E. A. P. Habets Online Speech Dereverberation Using Kalman Filter and EM Algorithm . . . . . 394--406 B. Gerazov and Z. Ivanovski Kernel Power Flow Orientation Coefficients for Noise-Robust Speech Recognition . . . . . . . . . . . . . . 407--419 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing EDICS . . . . . 420--421 Anonymous Information for Authors . . . . . . . . 422--423 Anonymous Open Access . . . . . . . . . . . . . . 424--424 Anonymous [Front cover] . . . . . . . . . . . . . C1 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing publication information . . . . . . . . . . . . . . C2 Anonymous IEEE Signal Processing Society Information . . . . . . . . . . . . . . C3 Anonymous [Blank page --- back cover] . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 425--426 H. Li and M. Federico and X. He and H. Meng and I. Trancoso Introduction to the Special Section on Continuous Space and Related Methods in Natural Language Processing . . . . . . 427--430 H. Adel and Ngoc Thang Vu and K. Kirchhoff and D. Telaar and T. Schultz Syntactic and Semantic Features For Code-Switching Factored Language Models 431--440 Xiaodong Zeng and D. F. Wong and L. S. Chao and I. Trancoso Graph-Based Lexicon Regularization for PCFG With Latent Annotations . . . . . . 441--450 Wenliang Chen and Min Zhang and Yue Zhang Distributed Feature Representations for Dependency Parsing . . . . . . . . . . . 451--460 Ruiji Fu and Jiang Guo and Bing Qin and Wanxiang Che and Haifeng Wang and Ting Liu Learning Semantic Hierarchies: a Continuous Vector Space Approach . . . . 461--471 R. E. Banchs and L. F. D'Haro and Haizhou Li Adequacy--Fluency Metrics: Evaluating MT in the Continuous Space Model Framework 472--482 Deyi Xiong and Min Zhang and Xing Wang Topic-Based Coherence Modeling for Statistical Machine Translation . . . . 483--493 B. Hutchinson and M. Ostendorf and M. Fazel A Sparse Plus Low-Rank Exponential Language Model for Limited Resource Scenarios . . . . . . . . . . . . . . . 494--504 M. A. A. Rashwan and A. A. Al Sallab and H. M. Raafat and A. Rafea Deep Learning Framework with Confused Sub-Set Resolution Architecture for Automatic Arabic Diacritization . . . . 505--516 M. Sundermeyer and H. Ney and R. Schluter From Feedforward to Recurrent LSTM Neural Networks for Language Modeling 517--529 G. Mesnil and Y. Dauphin and Kaisheng Yao and Y. Bengio and Li Deng and D. Hakkani-Tur and Xiaodong He and L. Heck and G. Tur and Dong Yu and G. Zweig Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding 530--539 I. McLoughlin and Haomin Zhang and Zhipeng Xie and Yan Song and Wei Xiao Robust Sound Event Classification Using Deep Neural Networks . . . . . . . . . . 540--552 D. Zahoransky and I. Polasek Text Search of Surnames in Some Slavic and Other Morphologically Rich Languages Using Rule Based Phonetic Algorithms . . 553--563 Yow-Bang Wang and Lin-shan Lee Supervised Detection and Unsupervised Discovery of Pronunciation Error Patterns for Computer-Assisted Language Learning . . . . . . . . . . . . . . . . 564--579 T. Nakashika and T. Takiguchi and Y. Ariki Voice Conversion Using RNN Pre-Trained by Recurrent Temporal Restricted Boltzmann Machines . . . . . . . . . . . 580--587 N. Obin and P. Lanchantin Symbolic Modeling of Prosody: From Linguistics to Statistics . . . . . . . 588--599 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing EDICS . . . . . 601--602 Anonymous Information for Authors . . . . . . . . 603--604 Anonymous IEEE Member Digital Library . . . . . . 606--606 Anonymous Blank page . . . . . . . . . . . . . . . B600 Anonymous Front Cover . . . . . . . . . . . . . . C1 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing publication information . . . . . . . . . . . . . . C2 Anonymous IEEE Signal Processing Society Information . . . . . . . . . . . . . . C3 Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 601--602 Anonymous Table of Contents . . . . . . . . . . . 603--604 Langzhou Chen and N. Braunschweiler and M. J. F. Gales Speaker and Expression Factorization for Audiobook Data: Expressiveness and Transplantation . . . . . . . . . . . . 605--618 Xinjie Zhou and Xiaojun Wan and Jianguo Xiao CLOpinionMiner: Opinion Target Extraction in a Cross-Language Scenario 619--630 Pan Zhou and Hui Jiang and Li-Rong Dai and Yu Hu and Qing-Feng Liu State-Clustering Based Multiple Deep Neural Networks Modeling Approach for Speech Recognition . . . . . . . . . . . 631--642 Ying Hu and Guizhong Liu Separation of Singing Voice Using Nonnegative Matrix Partial Co-Factorization for Singer Identification . . . . . . . . . . . . . 643--653 D. Kitamura and H. Saruwatari and H. Kameoka and Yu. Takahashi and K. Kondo and S. Nakamura Multichannel Signal Separation Combining Directional Clustering and Nonnegative Matrix Factorization with Spectrogram Restoration . . . . . . . . . . . . . . 654--669 Van-Khanh Mai and D. Pastor and A. Aissa-El-Bey and R. Le-Bidan Robust Estimation of Non-Stationary Noise Power Spectrum for Speech Enhancement . . . . . . . . . . . . . . 670--682 E. Blanco and D. Moldovan A Semantic Logic-Based Approach to Determine Textual Similarity . . . . . . 683--693 Myung Jong Kim and Younggwan Kim and Hoirin Kim Automatic Intelligibility Assessment of Dysarthric Speech Using Phonologically-Structured Sparse Linear Model . . . . . . . . . . . . . . . . . 694--704 G. Aneeja and B. Yegnanarayana Single Frequency Filtering Approach for Discriminating Speech and Nonspeech . . 705--717 A. Deleforge and R. Horaud and Y. Y. Schechner and L. Girin Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression . . . . . . . 718--731 D. Dov and R. Talmon and I. Cohen Audio-Visual Voice Activity Detection Using Diffusion Maps . . . . . . . . . . 732--745 M. Habibi and A. Popescu-Belis Keyword Extraction and Clustering for Document Recommendation in Conversations 746--759 N. Mamun and W. A. Jassim and M. S. A. Zilany Prediction of Speech Intelligibility Using a Neurogram Orthogonal Polynomial Measure (NOPM) . . . . . . . . . . . . . 760--773 E. De Sena and N. Antonello and M. Moonen and T. van Waterschoot On the Modeling of Rectangular Geometries in Room Acoustic Simulations 774--786 Hao Huang and Haihua Xu and Xianhui Wang and W. Silamu Maximum F1-Score Discriminative Training Criterion for Automatic Mispronunciation Detection . . . . . . . . . . . . . . . 787--797 Chung-Che Wang and J.-S. R. Jang Improving Query-by-Singing/Humming by Combining Melody and Lyric Information 798--806 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing EDICS . . . . . 807--808 Anonymous Information for Authors . . . . . . . . 809--810 Anonymous IEEE Member Digital Library . . . . . . 812--812 Anonymous Front Cover . . . . . . . . . . . . . . C1 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing publication information . . . . . . . . . . . . . . C2 Anonymous IEEE Signal Processing Society Information . . . . . . . . . . . . . . C3 Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 813--814 Anonymous Table of Contents . . . . . . . . . . . 815--816 F. Krebs and A. Holzapfel and A. T. Cemgil and G. Widmer Inferring Metrical Structure in Music Using Particle Filters . . . . . . . . . 817--827 Janghoon Cho and C. D. Yoo Underdetermined Convolutive BSS: Bayes Risk Minimization Based on a Mixture of Super-Gaussian Posterior Approximation 828--839 Hao Mu and Woon-Seng Gan and Ee-Leng Tan An Objective Analysis Method for Perceptual Quality of a Virtual Bass System . . . . . . . . . . . . . . . . . 840--850 R. C. Hendriks and J. B. Crespo and J. Jensen and C. H. Taal Optimal Near-End Speech Intelligibility Improvement Incorporating Additive Noise and Late Reverberation Under an Approximation of the Short-Time SII . . 851--862 A. H. Abdelaziz and S. Zeiler and D. Kolossa Learning Dynamic Stream Weights For Coupled-HMM-Based Audio-Visual Speech Recognition . . . . . . . . . . . . . . 863--876 R. Berkun and I. Cohen and J. Benesty Combined Beamformers for Robust Broadband Regularized Superdirective Beamforming . . . . . . . . . . . . . . 877--886 J. Breebaart Evaluation of Statistical Inference Tests Applied to Subjective Audio Quality Data With Small Sample Size . . 887--897 M. Zivanovi\'c Harmonic Bandwidth Companding for Separation of Overlapping Harmonics in Pitched Signals . . . . . . . . . . . . 898--908 Jen-Tzung Chien Laplace Group Sensing for Acoustic Models . . . . . . . . . . . . . . . . . 909--922 Ying Wei and Yinfeng Wang Design of Low Complexity Adjustable Filter Bank for Personalized Hearing Aid Solutions . . . . . . . . . . . . . . . 923--931 A. Perez-Carrillo and M. M. Wanderley Indirect Acquisition of Violin Instrumental Controls from Audio Signal with Hidden Markov Models . . . . . . . 932--940 A. Mansikkaniemi and M. Kurimo Adaptation of Morph-Based Speech Recognition for Foreign Names and Acronyms . . . . . . . . . . . . . . . . 941--950 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing EDICS . . . . . 953--954 Anonymous Information for Authors . . . . . . . . 955--956 Anonymous Open Access . . . . . . . . . . . . . . 957--957 Anonymous Blank page . . . . . . . . . . . . . . . B951--B952 Anonymous [Front cover] . . . . . . . . . . . . . C1 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing publication information . . . . . . . . . . . . . . C2 Anonymous IEEE Signal Processing Society Information . . . . . . . . . . . . . . C3 Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 953--954 Anonymous Table of Contents . . . . . . . . . . . 955--956 Shih-Hung Liu and Kuan-Yu Chen and B. Chen and Hsin-Min Wang and Hsu-Chun Yen and Wen-Lian Hsu Combining Relevance Language Modeling and Clarity Measure for Extractive Speech Summarization . . . . . . . . . . 957--969 M. Niedzwiecki and M. Ciolek and K. Cisowski Elimination of Impulsive Disturbances From Stereo Audio Recordings Using Vector Autoregressive Modeling and Variable-order Kalman Filtering . . . . 970--981 Kun Han and Yuxuan Wang and Deliang Wang and W. S. Woods and I. Merks and Tao Zhang Learning Spectral Mapping for Speech Dereverberation and Denoising . . . . . 982--992 P. Foster and S. Dixon and A. Klapuri Identifying Cover Songs Using Information-Theoretic Measures of Similarity . . . . . . . . . . . . . . . 993--1005 A. Schwarz and W. Kellermann Coherent-to-Diffuse Power Ratio Estimation for Dereverberation . . . . . 1006--1018 M. Cernak and P. N. Garner and A. Lazaridis and P. Motlicek and Xingyu Na Incremental Syllable-Context Phonetic Vocoding . . . . . . . . . . . . . . . . 1019--1030 M. Rouvier and S. Oger and G. Linares and D. Matrouf and B. Merialdo and Y. Li Audio-Based Video Genre Identification 1031--1041 H. Kameoka and K. Yoshizato and T. Ishihara and K. Kadowaki and Y. Ohishi and K. Kashino Generative Modeling of Voice Fundamental Frequency Contours . . . . . . . . . . . 1042--1053 Dejan Markovi\'c and Fabio Antonacci and Augusto Sarti and Stefano Tubaro Multiview Soundfield Imaging in the Projective Ray Space . . . . . . . . . . 1054--1067 A. P. Bates and Z. Khalid and R. A. Kennedy Novel Sampling Scheme on the Sphere for Head-Related Transfer Function Measurements . . . . . . . . . . . . . . 1068--1081 Maoshen Jia and Ziyu Yang and Changchun Bao and Xiguang Zheng and C. Ritz Encoding Multiple Audio Objects Using Intra-Object Sparsity . . . . . . . . . 1082--1095 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing EDICS . . . . . 1096--1097 Anonymous Information for Authors . . . . . . . . 1098--1099 Anonymous Open Access . . . . . . . . . . . . . . 1100--1100 Anonymous [Front cover] . . . . . . . . . . . . . C1 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing publication information . . . . . . . . . . . . . . C2 Anonymous IEEE Signal Processing Society Information . . . . . . . . . . . . . . C3 Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 1101--1102 Anonymous Table of Contents . . . . . . . . . . . 1103--1104 M. McVicar and S. Fukayama and M. Goto AutoGuitarTab: Computer-Aided Composition of Rhythm and Lead Guitar Parts in the Tablature Space . . . . . . 1105--1117 M. Van Segbroeck and R. Travadi and S. S. Narayanan Rapid Language Identification . . . . . 1118--1129 D. Marelli and R. Baumgartner and P. Majdak Efficient Approximation of Head-Related Transfer Functions in Subbands for Accurate Sound Localization . . . . . . 1130--1143 Ching-Feng Yeh and Lin-shan Lee An Improved Framework for Recognizing Highly Imbalanced Bilingual Code-Switched Lectures with Cross-Language Acoustic Modeling and Frame-Level Language Identification . . 1144--1159 D. Basaran and A. T. Cemgil and E. Anarim A Probabilistic Model-Based Approach for Aligning Multiple Audio Sequences . . . 1160--1171 Dongpeng Chen and B. K.-W. Mak Multitask Learning of Deep Neural Networks for Low-Resource Speech Recognition . . . . . . . . . . . . . . 1172--1183 T. Meyer and N. Hajlaoui and A. Popescu-Belis Disambiguating Discourse Connectives for Statistical Machine Translation . . . . 1184--1197 U. Remes and A. Ramirez Lopez and K. Palomaki and M. Kurimo Bounded Conditional Mean Imputation with Observation Uncertainties and Acoustic Model Adaptation . . . . . . . . . . . . 1198--1208 Rui Wang and Hai Zhao and Bao-Liang Lu and M. Utiyama and E. Sumita Bilingual Continuous-Space Language Model Growing for Statistical Machine Translation . . . . . . . . . . . . . . 1209--1220 Tze Yuang Chong and R. E. Banchs and Eng Siong Chng and Haizhou Li Decoupling Word-Pair Distance and Co-occurrence Information for Effective Long History Context Language Modeling 1221--1232 Meng Sun and Yinan Li and J. F. Gemmeke and Xiongwei Zhang Speech Enhancement Under Low SNR Conditions Via Noise Estimation Using Sparse and Low-Rank NMF with Kullback--Leibler Divergence . . . . . . 1233--1242 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing EDICS . . . . . 1245--1246 Anonymous Information for Authors . . . . . . . . 1247--1248 Anonymous Blank page . . . . . . . . . . . . . . . B1243--B1244 Anonymous Front Cover . . . . . . . . . . . . . . C1 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing publication information . . . . . . . . . . . . . . C2 Anonymous IEEE Signal Processing Society Information . . . . . . . . . . . . . . C3 Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 1245--1246 Anonymous Table of Contents . . . . . . . . . . . 1247--1248 H. Momeni and H. R. Abutalebi and A. Tadaion Joint Detection and Estimation of Speech Spectral Amplitude Using Noncontinuous Gain Functions . . . . . . . . . . . . . 1249--1258 Jen-Tzung Chien Hierarchical Pitman--Yor--Dirichlet Language Model . . . . . . . . . . . . . 1259--1272 M. Fallahpour and D. Megias Audio Watermarking Based on Fibonacci Numbers . . . . . . . . . . . . . . . . 1273--1282 P. Mowlaee and J. Kulmer Phase Estimation in Single-Channel Speech Enhancement: Limits-Potential . . 1283--1294 M. Morchid and M. Bouallegue and R. Dufour and G. Linares and D. Matrouf and R. De Mori Compact Multiview Representation of Documents Based on the Total Variability Space . . . . . . . . . . . . . . . . . 1295--1308 R. Sugiura and Y. Kamamoto and N. Harada and H. Kameoka and T. Moriya Optimal Coding of Generalized-Gaussian-Distributed Frequency Spectra for Low-Delay Audio Coder With Powered All-Pole Spectrum Estimation . . . . . . . . . . . . . . . 1309--1321 Kuan-Yu Chen and Shih-Hung Liu and B. Chen and Hsin-Min Wang and Ea-Ee Jan and Wen-Lian Hsu and Hsin-Hsi Chen Extractive Broadcast News Summarization Leveraging Recurrent Neural Network Language Modeling Techniques . . . . . . 1322--1334 Z. Koldovsky and J. Malek and S. Gannot Spatial Source Subtraction Based on Incomplete Measurements of Relative Transfer Function . . . . . . . . . . . 1335--1347 D. Dimitriadis and E. Bocchieri Use of Micro-Modulation Features in Large Vocabulary Continuous Speech Recognition Tasks . . . . . . . . . . . 1348--1357 Xun Wang and Y. Yoshida and T. Hirao and M. Nagata and K. Sudoh Summarization Based on Task-Oriented Discourse Parsing . . . . . . . . . . . 1358--1367 C. Spa and A. Rey and E. Hernandez A GPU Implementation of an Explicit Compact FDTD Algorithm with a Digital Impedance Filter for Room Acoustics Applications . . . . . . . . . . . . . . 1368--1380 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing EDICS . . . . . 1381--1382 Anonymous Information for Authors . . . . . . . . 1383--1384 Anonymous Front Cover . . . . . . . . . . . . . . C1 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing publication information . . . . . . . . . . . . . . C2 Anonymous IEEE Signal Processing Society Information . . . . . . . . . . . . . . C3 Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 1385--1386 Anonymous Table of Contents . . . . . . . . . . . 1387--1388 Lin-shan Lee and J. Glass and Hung-yi Lee and Chun-an Chan Spoken Content Retrieval --- Beyond Cascading Speech Recognition with Text Retrieval . . . . . . . . . . . . . . . 1389--1420 Yishan Jiao and V. Berisha and Ming Tu and J. Liss Convex Weighting Criteria for Speaking Rate Estimation . . . . . . . . . . . . 1421--1430 Jianjun He and Woon-Seng Gan and Ee-Leng Tan Primary-Ambient Extraction Using Ambient Spectrum Estimation for Immersive Spatial Audio Reproduction . . . . . . . 1431--1444 Qing Shen and Wei Liu and Wei Cui and Siliang Wu and Y. D. Zhang and M. G. Amin Low-Complexity Direction-of-Arrival Estimation Based on Wideband Co-Prime Arrays . . . . . . . . . . . . . . . . . 1445--1456 Yu-Ren Chien and Hsin-Min Wang and Shyh-Kang Jeng An Acoustic-Phonetic Model of F0 Likelihood for Vocal Melody Extraction 1457--1468 Xiaodong Cui and V. Goel and B. Kingsbury Data Augmentation for Deep Neural Network Acoustic Modeling . . . . . . . 1469--1477 E. De Sena and H. Hacìhabibo\uglu and Z. Cvetkovi\'c and J. O. Smith Efficient Synthesis of Room Acoustics via Scattering Delay Networks . . . . . 1478--1492 Lin Wang and T. Gerkmann and S. Doclo Noise Power Spectral Density Estimation Using MaxNSR Blocking Matrix . . . . . . 1493--1508 A. Jukic and T. van Waterschoot and T. Gerkmann and S. Doclo Multi-Channel Linear Prediction-Based Speech Dereverberation With Sparse Priors . . . . . . . . . . . . . . . . . 1509--1520 P. Mowlaee and J. Kulmer Harmonic Phase Estimation in Single-Channel Speech Enhancement Using Phase Decomposition and SNR Information 1521--1532 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing EDICS . . . . . 1535--1536 Anonymous Information for Authors . . . . . . . . 1537--1538 Anonymous How can you get your idea to market first? . . . . . . . . . . . . . . . . . 1539--1539 Anonymous Blank page . . . . . . . . . . . . . . . B1533--B1534 Anonymous Front Cover . . . . . . . . . . . . . . C1 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing publication information . . . . . . . . . . . . . . C2 Anonymous IEEE Signal Processing Society Information . . . . . . . . . . . . . . C3 Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 1535--1536 Anonymous Table of Contents . . . . . . . . . . . 1537--1538 S. Tervo and A. Politis Direction of Arrival Estimation of Reflections from Room Impulse Responses Using a Spherical Microphone Array . . . 1539--1551 Jia-Ching Wang and Yu-Hao Chin and Bo-Wei Chen and Chang-Hong Lin and Chung-Hsien Wu Speech Emotion Verification Using Emotion Variance Modeling and Discriminant Scale-Frequency Maps . . . 1552--1562 A. Canclini and P. Bestagini and F. Antonacci and M. Compagnoni and A. Sarti and S. Tubaro A Robust and Low-Complexity Source Localization Algorithm for Asynchronous Distributed Microphone Networks . . . . 1563--1575 Jianjun He and Woon-Seng Gan and Ee-Leng Tan Time-Shifting Based Primary-Ambient Extraction for Spatial Audio Reproduction . . . . . . . . . . . . . . 1576--1588 P. Shah and I. Lewis and S. Grant and S. Angrignon Nonlinear Acoustic Echo Cancellation Using Voltage and Current Feedback . . . 1589--1599 Li Su and Yi-Hsuan Yang Combining Spectral and Temporal Representations for Multipitch Estimation of Polyphonic Music . . . . . 1600--1612 T. Fujioka and Y. Nagata and M. Abe High-Precision Harmonic Distortion Level Measurement of a Loudspeaker Using Adaptive Filters in a Noisy Environment 1613--1622 Tsz-Kin Hon and Lin Wang and J. D. Reiss and A. Cavallaro Audio Fingerprinting for Multi-Device Self-Localization . . . . . . . . . . . 1623--1636 Ye Tian and Zhe Chen and Fuliang Yin Distributed IMM-Unscented Kalman Filter for Speaker Tracking in Microphone Array Networks . . . . . . . . . . . . . . . . 1637--1647 Na Li and Man-Wai Mak SNR-Invariant PLDA Modeling in Nonparametric Subspace for Robust Speaker Verification . . . . . . . . . . 1648--1659 J. Vilkamo and S. Delikaris-Manias Perceptual Reproduction of Spatial Sound Using Loudspeaker-Signal-Domain Parametrization . . . . . . . . . . . . 1660--1669 Chao Weng and Dong Yu and M. L. Seltzer and J. Droppo Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition . . . . 1670--1679 M. Ruhland and J. Bitzer and M. Brandt and S. Goetze Reduction of Gaussian, Supergaussian, and Impulsive Noise by Interpolation of the Binary Mask Residual . . . . . . . . 1680--1691 Y. Dorfan and S. Gannot Tree-Based Recursive Expectation-Maximization Algorithm for Localization of Acoustic Sources . . . . 1692--1703 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing EDICS . . . . . 1704--1705 Anonymous Information for Authors . . . . . . . . 1706--1707 Anonymous How can you get your idea to market first? . . . . . . . . . . . . . . . . . 1708--1708 Anonymous Front Cover . . . . . . . . . . . . . . C1 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing publication information . . . . . . . . . . . . . . C2 Anonymous IEEE Signal Processing Society Information . . . . . . . . . . . . . . C3 Anonymous Blank page . . . . . . . . . . . . . . . C4
A. Sarmiento and I. Duran-Diaz and A. Cichocki and S. Cruces A Contrast Function Based on Generalized Divergences for Solving the Permutation Problem in Convolved Speech Mixtures . . 1713--1726 Xiaojia Zhao and Yuxuan Wang and Deliang Wang Cochannel Speaker Identification in Anechoic and Reverberant Conditions . . 1727--1736 Liang-Yu Chen and J.-S. R. Jang Automatic Pronunciation Scoring with Score Combination by Learning to Rank and Class-Normalized DP-Based Quantization . . . . . . . . . . . . . . 1737--1749 Duyu Tang and Bing Qin and Furu Wei and Li Dong and Ting Liu and Ming Zhou A Joint Segmentation and Classification Framework for Sentence Level Sentiment Classification . . . . . . . . . . . . . 1750--1761 F.-M. Hoffmann and F. M. Fazi Theoretical Study of Acoustic Circular Arrays With Tangential Pressure Gradient Sensors . . . . . . . . . . . . . . . . 1762--1774 N. Souviraa-Labastie and A. Olivero and E. Vincent and F. Bimbot Multi-Channel Audio Source Separation Using Multiple Deformed References . . . 1775--1787 D. Baby and T. Virtanen and J. F. Gemmeke and H. Van hamme Coupled Dictionaries for Exemplar-Based Speech Enhancement and Automatic Speech Recognition . . . . . . . . . . . . . . 1788--1799 M. T. Islam and C. Shahnaz and Wei-Ping Zhu and M. O. Ahmad Speech Enhancement Based on Student Modeling of Teager Energy Operated Perceptual Wavelet Packet Coefficients and a Custom Thresholding Function . . . 1800--1811 Quynh Thi Ngoc Do and S. Bethard and M.-F. Moens Domain Adaptation in Semantic Role Labeling Using a Neural Language Model and Linguistic Resources . . . . . . . . 1812--1823 H. Aragonda and C. S. Seelamantula Demodulation of Narrowband Speech Spectrograms Using the Riesz Transform 1824--1834 D. T. Tran and E. Vincent and D. Jouvet Nonparametric Uncertainty Estimation and Propagation for Noise Robust ASR . . . . 1835--1846 Mei Tu and Yu Zhou and Chengqing Zong Exploring Diverse Features for Statistical Machine Translation Model Pruning . . . . . . . . . . . . . . . . 1847--1857 G. Okopal and S. Wisdom and L. Atlas Speech Analysis With the Strong Uncorrelating Transform . . . . . . . . 1858--1868 M. F. Simon Galvez and S. J. Elliott and J. Cheer Time Domain Optimization of Filters Used in a Loudspeaker Array for Personal Audio . . . . . . . . . . . . . . . . . 1869--1878 M. H. Bokaei and H. Sameti and Yang Liu Linear Discourse Segmentation of Multi-Party Meetings Based on Local and Global Information . . . . . . . . . . . 1879--1891 Chung-Hsien Wu and Han-Ping Shen and Chun-Shan Hsu Code-Switching Event Detection by Using a Latent Language Space Model and the Delta-Bayesian Information Criterion . . 1892--1903 Zhangli Chen and V. Hohmann Online Monaural Speech Enhancement Based on Periodicity Analysis and A Priori SNR Estimation . . . . . . . . . . . . . . . 1904--1916 S. Sarreshtedari and M. A. Akhaee and A. Abbasfar A Watermarking Method for Digital Speech Self-Recovery . . . . . . . . . . . . . 1917--1925 N. Moritz and J. Anemuller and B. Kollmeier An Auditory Inspired Amplitude Modulation Filter Bank for Robust Feature Extraction in Automatic Speech Recognition . . . . . . . . . . . . . . 1926--1937 Yajie Miao and Hao Zhang and F. Metze Speaker Adaptive Training of Deep Neural Network Acoustic Models Using $I$-Vectors . . . . . . . . . . . . . . 1938--1949 V. Morfi and G. Degottex and A. Mouchtaris Speech Analysis and Synthesis with a Computationally Efficient Adaptive Harmonic Model . . . . . . . . . . . . . 1950--1962 J. Dennis and H. D. Tran and Haizhou Li Generalized Hough Transform for Speech Pattern Classification . . . . . . . . . 1963--1972 Feng Deng and Changchun Bao and W. B. Kleijn Sparse Hidden Markov Models for Speech Enhancement in Non-Stationary Noise Environments . . . . . . . . . . . . . . 1973--1987 R. Ranjan and Woon-Seng Gan Natural Listening over Headphones in Augmented Reality Using Adaptive Filtering Techniques . . . . . . . . . . 1988--2002 L.-H. Chen and T. Raitio and C. Valentini-Botinhao and Z.-H. Ling and J. Yamagishi A Deep Generative Architecture for Postfiltering in Statistical Parametric Speech Synthesis . . . . . . . . . . . . 2003--2014 Ho Seon Shin and T. Fingscheidt and Hong-Goo Kang A Priori SNR Estimation Using Air- and Bone-Conduction Microphones . . . . . . 2015--2025 Ji Wu and Miao Li and Chin-Hui Lee A Probabilistic Framework for Representing Dialog Systems and Entropy-Based Dialog Management Through Dynamic Stochastic State Evolution . . . 2026--2035 S. Cumani Fast Scoring of Full Posterior PLDA Models . . . . . . . . . . . . . . . . . 2036--2045 V. Tourbabin and B. Rafaely Direction of Arrival Estimation Using Microphone Array Processing for Moving Humanoid Robots . . . . . . . . . . . . 2046--2058 Y. J. Chu and S. C. Chan A New Local Polynomial Modeling-Based Variable Forgetting Factor RLS Algorithm and Its Acoustic Applications . . . . . 2059--2069 F. de-la-Calle-Silos and F. J. Valverde-Albacete and A. Gallardo-Antolin and C. Pelaez-Moreno Morphologically Filtered Power-Normalized Cochleograms as Robust, Biologically Inspired Features for ASR 2070--2080 T. Hirao and M. Nishino and Y. Yoshida and J. Suzuki and N. Yasuda and M. Nagata Summarizing a Document by Trimming the Discourse Tree . . . . . . . . . . . . . 2081--2092 Chao Pan and Jingdong Chen and J. Benesty Theoretical Analysis of Differential Microphone Array Beamforming and an Improved Solution . . . . . . . . . . . 2093--2105
Wanxiang Che and Yanyan Zhao and Honglei Guo and Zhong Su and Ting Liu Sentence Compression for Aspect-Based Sentiment Analysis . . . . . . . . . . . 2111--2124 J. Sheaffer and M. van Walstijn and B. Rafaely and K. Kowalczyk Binaural Reproduction of Finite Difference Simulations Using Spherical Array Processing . . . . . . . . . . . . 2125--2135 Po-Sen Huang and Minje Kim and M. Hasegawa-Johnson and P. Smaragdis Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation . . . . . . . . . . . 2136--2147 A. Heidel and Hsiang-Hung Lu and Lin-Shan Lee Finding Complex Features for Guest Language Fragment Recovery in Resource-Limited Code-Mixed Speech Recognition . . . . . . . . . . . . . . 2148--2161 D. Marquardt and V. Hohmann and S. Doclo Interaural Coherence Preservation in Multi-Channel Wiener Filtering-Based Noise Reduction for Binaural Hearing Aids . . . . . . . . . . . . . . . . . . 2162--2176 Kai Yu and Kai Sun and Lu Chen and Su Zhu Constrained Markov Bayesian Polynomial for Efficient Dialogue State Tracking 2177--2188 C. A. Anderson and P. D. Teal and M. A. Poletti Spatially Robust Far-field Beamforming Using the von Mises(--Fisher) Distribution . . . . . . . . . . . . . . 2189--2197 J. Schroder and S. Goetze and J. Anemuller Spectro-Temporal Gabor Filterbank Features for Acoustic Event Detection 2198--2208 Inseok Heo and W. A. Sethares Classification Based on Speech Rhythm via a Temporal Alignment of Spoken Sentences . . . . . . . . . . . . . . . 2209--2216 P. Samarasinghe and T. Abhayapala and M. Poletti and T. Betlehem An Efficient Parameterization of the Room Transfer Function . . . . . . . . . 2217--2227 Yong Xiang and I. Natgunanathan and Yue Rong and Song Guo Spread Spectrum-Based High Embedding Capacity Watermarking Method for Audio Signals . . . . . . . . . . . . . . . . 2228--2237 In-Chul Yoo and Hyeontaek Lim and Dongsuk Yook Formant-Based Robust Voice Activity Detection . . . . . . . . . . . . . . . 2238--2245 T. Hueber and L. Girin and X. Alameda-Pineda and G. Bailly Speaker-Adaptive Acoustic-Articulatory Inversion Using Cascaded Gaussian Mixture Regression . . . . . . . . . . . 2246--2259 Hequn Bai and G. Richard and L. Daudet Late Reverberation Synthesis: From Radiance Transfer to Feedback Delay Networks . . . . . . . . . . . . . . . . 2260--2271 I. Bayram A Multichannel Audio Denoising Formulation Based on Spectral Sparsity 2272--2285 H. Delgado and X. Anguera and C. Fredouille and J. Serrano Fast Single- and Cross-Show Speaker Diarization Using Binary Key Speaker Modeling . . . . . . . . . . . . . . . . 2286--2297 W. S. Percybrooks and E. Moore A New HMM-Based Voice Conversion Methodology Evaluated on Monolingual and Cross-Lingual Conversion Tasks . . . . . 2298--2310 M. Graja and M. Jaoua and L. H. Belguith Statistical Framework with Knowledge Base Integration for Robust Speech Understanding of the Tunisian Dialect 2311--2321 F. Strasser and H. Puder Adaptive Feedback Cancellation for Realistic Hearing Aid Applications . . . 2322--2333 Yu Ting Yeung and Tan Lee and Cheung-Chi Leung Supervised Single-Microphone Multi-Talker Speech Separation with Conditional Random Fields . . . . . . . 2334--2342 Wenyu Jin and W. B. Kleijn Theory and Design of Multizone Soundfield Reproduction Using Sparse Methods . . . . . . . . . . . . . . . . 2343--2355 Xionghu Zhong and J. R. Hopgood A Time--Frequency Masking Based Random Finite Set Particle Filtering Method for Multiple Acoustic Source Detection and Tracking . . . . . . . . . . . . . . . . 2356--2370 K. Vijayan and K. S. R. Murty Analysis of Phase Spectrum of Speech Signals Using Allpass Modeling . . . . . 2371--2383 D. Marquardt and E. Hadad and S. Gannot and S. Doclo Theoretical Analysis of Linearly Constrained Multi-Channel Wiener Filtering Algorithms for Combined Noise Reduction and Binaural Cue Preservation in Binaural Hearing Aids . . . . . . . . 2384--2397 M. Zohrer and R. Peharz and F. Pernkopf Representation Learning for Single-Channel Source Separation and Bandwidth Extension . . . . . . . . . . 2398--2409 Hao Fang and M. Ostendorf and P. Baumann and J. Pierrehumbert Exponential Language Modeling Using Morphological Features and Multi-Task Learning . . . . . . . . . . . . . . . . 2410--2421 M. A. Carlin and M. Elhilali A Framework for Speech Activity Detection Using Adaptive Auditory Receptive Fields . . . . . . . . . . . . 2422--2433 S. Saito and K. Oishi and T. Furukawa Convolutive Blind Source Separation Using an Iterative Least-Squares Algorithm for Non-Orthogonal Approximate Joint Diagonalization . . . . . . . . . 2434--2448 E. Hadad and D. Marquardt and S. Doclo and S. Gannot Theoretical Analysis of Binaural Transfer Function MVDR Beamformers with Interference Cue Preservation Constraints . . . . . . . . . . . . . . 2449--2464 Guang Yang and R. F. Lyon and E. M. Drakakis Psychophysical Evaluation of An Ultra-Low Power, Analog Biomimetic Cochlear Implant Processor Filterbank Architecture With Across Channels AGC 2465--2473 Anonymous List of Reviewers . . . . . . . . . . . 2474--2476
Anonymous Table of Contents . . . . . . . . . . . 1--2 Anonymous Table of Contents . . . . . . . . . . . 3--4 S. Brognaux and T. Drugman HMM-Based Speech Segmentation: Improvements of Fully Automatic Approaches . . . . . . . . . . . . . . . 5--15 M. Tahon and L. Devillers Towards a Small Set of Robust Acoustic Features for Emotion Recognition: Challenges . . . . . . . . . . . . . . . 16--28 H. Behravan and V. Hautamaki and S. M. Siniscalchi and T. Kinnunen and Chin-Hui Lee $i$-Vector Modeling of Speech Attributes for Automatic Foreign Accent Recognition 29--41 R. Saeidi and P. Alku and T. Backstrom Feature Extraction Using Power-Law Adjusted Linear Prediction With Application to Speaker Recognition Under Severe Vocal Effort Mismatch . . . . . . 42--53 I. T. Ardekani and J. P. Kaipio and A. Nasiri and H. Sharifzadeh and W. H. Abdulla A Statistical Inverse Problem Approach to Online Secondary Path Modeling in Active Noise Control . . . . . . . . . . 54--64 T. Stafylakis and P. Kenny and M. J. Alam and M. Kockmann Speaker and Channel Factors in Text-Dependent Speaker Recognition . . . 65--78 Yanzhang He and P. Baumann and Hao Fang and B. Hutchinson and A. Jaech and M. Ostendorf and E. Fosler-Lussier and J. Pierrehumbert Using Pronunciation-Based Morphological Subword Units to Improve OOV Handling in Keyword Search . . . . . . . . . . . . . 79--92 Meng Sun and Xiongwei Zhang and H. Van Hamme and T. F. Zheng Unseen Noise Estimation Using Separable Deep Auto Encoder for Speech Enhancement 93--104 L. Ferrer and Yun Lei and M. McLaren and N. Scheffer Study of Senone-Based Deep Neural Network Approaches for Spoken Language Recognition . . . . . . . . . . . . . . 105--116 S. I. Adalbjornsson and T. Kronvall and S. Burgess and K. Astrom and A. Jakobsson Sparse Localization of Harmonic Audio Sources . . . . . . . . . . . . . . . . 117--129 Man-Wai Mak and Xiaomin Pang and Jen-Tzung Chien Mixture of PLDA for Noise Robust $I$-Vector Speaker Verification . . . . 130--142 C. A. Anderson and P. D. Teal and M. A. Poletti Spatial Correlation of Radial Gaussian and Uniform Spherical Volume Near-Field Source Distributions . . . . . . . . . . 143--150 H. Torres and J. Gurlekian Novel Estimation Method for the Superpositional Intonation Model . . . . 151--160 S. Bilbao and B. Hamilton and J. Botts and L. Savioja Finite Volume Time Domain Room Acoustics Simulation under General Impedance Boundary Conditions . . . . . . . . . . 161--173 A. H. Harati Nejad Torbati and J. Picone A Doubly Hierarchical Dirichlet Process Hidden Markov Model with a Non-Ergodic Structure . . . . . . . . . . . . . . . 174--184 Jen-Tzung Chien and Po-Kai Yang Bayesian Factorization and Learning for Monaural Source Separation . . . . . . . 185--195 D. L. Alon and B. Rafaely Beamforming with Optimal Aliasing Cancellation in Spherical Microphone Arrays . . . . . . . . . . . . . . . . . 196--210 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing Edics . . . . . 211--212 Anonymous Information for authors . . . . . . . . 213--214 Anonymous Special issue on sound scene and event analysis . . . . . . . . . . . . . . . . 215 Anonymous [Front cover] . . . . . . . . . . . . . C1 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing publication information . . . . . . . . . . . . . . C2 Anonymous IEEE Signal Processing Society Information . . . . . . . . . . . . . . C3 Anonymous [Blank page] . . . . . . . . . . . . . . C4
Anonymous Table of contents . . . . . . . . . . . 211--212 Anonymous Table of contents . . . . . . . . . . . 213--214 E. Rasumow and M. Hansen and S. van de Par and D. Puschel and V. Mellert and S. Doclo and M. Blau Regularization Approaches for Synthesizing HRTF Directivity Patterns 215--225 Chao Pan and J. Benesty and Jingdong Chen Design of Directivity Patterns with a Unique Null of Maximum Multiplicity . . 226--235 Jeih-Weih Hung and Hsin-Ju Hsieh and Berlin Chen Robust Speech Recognition via Enhancing the Complex-Valued Acoustic Spectrum in Modulation Domain . . . . . . . . . . . 236--251 Xiao-Lei Zhang and DeLiang Wang Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection . . . . . . . . . . . . . . . 252--264 M. A. Tugtekin Turan and E. Erzin Source and Filter Estimation for Throat-Microphone Speech Enhancement . . 265--275 N. Mohammadiha and S. Doclo Speech Dereverberation Using Non-Negative Convolutive Transfer Function and Spectro-Temporal Modeling 276--289 A. Sharma and S. Kaul Two-Stage Supervised Learning-Based Method to Detect Screams and Cries in Urban Environments . . . . . . . . . . . 290--299 Xiaoguang Wu and Huawei Chen Directivity Factors of the First-Order Steerable Differential Array With Microphone Mismatches: Deterministic and Worst-Case Analysis . . . . . . . . . . 300--315 A. I. Koutrouvelis and G. P. Kafentzis and N. D. Gaubitch and R. Heusdens A Fast Method for High-Resolution Voiced/Unvoiced Detection and Glottal Closure/Opening Instant Estimation of Speech . . . . . . . . . . . . . . . . . 316--328 T. Nakamura and E. Nakamura and S. Sagayama Real-Time Audio-to-Score Alignment of Music Performances Containing Errors and Arbitrary Repeats and Skips . . . . . . 329--339 A. Bahne and A. Ahlen Optimizing the Similarity of Loudspeaker-Room Responses in Multiple Listening Positions . . . . . . . . . . 340--353 J. M. Kates and K. H. Arehart The Hearing-Aid Audio Quality Index (HAAQI) . . . . . . . . . . . . . . . . 354--365 H. Schepker and S. Doclo A Semidefinite Programming Approach to Min-max Estimation of the Common Part of Acoustic Feedback Paths in Hearing Aids 366--377 Bong-Ki Lee and Joon-Hyuk Chang Packet Loss Concealment Based on Deep Neural Networks for Digital Speech Transmission . . . . . . . . . . . . . . 378--387 L. Bentivogli and N. Bertoldi and M. Cettolo and M. Federico and M. Negri and M. Turchi On the Evaluation of Adaptive Machine Translation for Human Post-Editing . . . 388--399 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing Edics . . . . . 400--401 Anonymous Information for authors . . . . . . . . 402--403 Anonymous Special issue on sound scene and event analysis . . . . . . . . . . . . . . . . 404 Anonymous [Front cover] . . . . . . . . . . . . . C1 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing publication information . . . . . . . . . . . . . . C2 Anonymous IEEE Signal Processing Society Information . . . . . . . . . . . . . . C3 Anonymous [Blank page] . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 405--406 Anonymous Table of Contents . . . . . . . . . . . 407--408 Reinhard Sonnleitner and Gerhard Widmer Robust Quad-Based Audio Fingerprinting 409--421 Li Dong and Furu Wei and Ke Xu and Shixia Liu and Ming Zhou Adaptive Multi-Compositionality for Recursive Neural Network Models . . . . 422--431 Zheng Lin and Xiaolong Jin and Xueke Xu and Yuanzhuo Wang and Xueqi Cheng and Weiping Wang and Dan Meng An Unsupervised Cross-Lingual Topic Model Framework for Sentiment Classification . . . . . . . . . . . . . 432--444 Anil Nagathil and Claus Weihs and Rainer Martin Spectral Complexity Reduction of Music Signals for Mitigating Effects of Cochlear Hearing Loss . . . . . . . . . 445--458 Tian Tan and Yanmin Qian and Kai Yu Cluster Adaptive Training for Deep Neural Network Based Acoustic Model . . 459--468 Arne Leijon and Gustav Eje Henter and Martin Dahlquist Bayesian Analysis of Phoneme Confusion Matrices . . . . . . . . . . . . . . . . 469--482 Donald S. Williamson and Yuxuan Wang and DeLiang Wang Complex Ratio Masking for Monaural Speech Separation . . . . . . . . . . . 483--492 Johannes Traa and David Wingate and Noah D. Stein and Paris Smaragdis Robust Source Localization and Enhancement With a Probabilistic Steered Response Power Model . . . . . . . . . . 493--503 Sven Ewan Shepstone and Kong Aik Lee and Haizhou Li and Zheng-Hua Tan and Sòren Holdt Jensen Total Variability Modeling Using Source-Specific Priors . . . . . . . . . 504--517 Martin Schneider and Walter Kellermann Multichannel Acoustic Echo Cancellation in the Wave Domain With Increased Robustness to Nonuniqueness . . . . . . 518--529 Ken O'Hanlon and Hidehisa Nagano and Nicolas Keriven and Mark D. Plumbley Non-Negative Group Sparsity with Subspace Note Modelling for Polyphonic Transcription . . . . . . . . . . . . . 530--542 Elior Hadad and Simon Doclo and Sharon Gannot The Binaural LCMV Beamformer and its Performance Analysis . . . . . . . . . . 543--558 Felipe Grijalva and Luiz Martini and Dinei Florencio and Siome Goldenstein A Manifold Learning Approach for Personalizing HRTFs from Anthropometric Features . . . . . . . . . . . . . . . . 559--570 Lin Wang and Simon Doclo Correlation Maximization-Based Sampling Rate Offset Estimation for Distributed Microphone Arrays . . . . . . . . . . . 571--582 Nasim Radmanesh and Ian S. Burnett and Bhaskar D. Rao A Lasso-LS Optimization with a Frequency Variable Dictionary in a Multizone Sound System . . . . . . . . . . . . . . . . . 583--593 Xin Liu and Changchun Bao Audio Bandwidth Extension Based on Ensemble Echo State Networks with Temporal Evolution . . . . . . . . . . . 594--607 Anonymous EDICS Categories for IEEE/ACM Transactions on Audio, Speech, and Language Processing . . . . . . . . . . 608--609 Anonymous Information for Authors . . . . . . . . 610--611 Anonymous Special issue on sound scene and event analysis . . . . . . . . . . . . . . . . 612 Anonymous Introducing IEEE Collabratec . . . . . . 613 Anonymous Front Cover . . . . . . . . . . . . . . C1 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing . . . . . . . . C2 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing . . . . . . . . C3
Anonymous Table of Contents . . . . . . . . . . . 608--609 Anonymous Table of Contents . . . . . . . . . . . 610--611 Peifeng Li and Guodong Zhou Joint Argument Inference in Chinese Event Extraction with Argument Consistency and Event Relevance . . . . 612--622 Jianming Liu and Steven L. Grant Proportionate Adaptive Filtering for Block-Sparse System Identification . . . 623--630 Jesper Rindom Jensen and Jacob Benesty and Mads Græsbòll Christensen Noise Reduction with Optimal Variable Span Linear Filters . . . . . . . . . . 631--644 Sidsel Marie Nòrholm and Jesper Rindom Jensen and Mads Græsbòll Christensen Enhancement and Noise Statistics Estimation for Non-Stationary Voiced Speech . . . . . . . . . . . . . . . . . 645--658 Daryush D. Mehta and Jarrad H. Van Stan and Robert E. Hillman Relationships Between Vocal Function Measures Derived from an Acoustic Microphone and a Subglottal Neck-Surface Accelerometer . . . . . . . . . . . . . 659--668 Herman Kamper and Aren Jansen and Sharon Goldwater Unsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word Embeddings . . . . . . . . . . . . . . . 669--679 Ina Kodrasi and Simon Doclo Joint Dereverberation and Noise Reduction Based on Acoustic Multi-Channel Equalization . . . . . . . 680--693 Hamid Palangi and Li Deng and Yelong Shen and Jianfeng Gao and Xiaodong He and Jianshu Chen and Xinying Song and Rabab Ward Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval . . 694--707 Michael Jeffet and Noam R. Shabtai and Boaz Rafaely Theory and Perceptual Evaluation of the Binaural Reproduction and Beamforming Tradeoff in the Generalized Spherical Array Beamformer . . . . . . . . . . . . 708--718 Pablo Peso Parada and Dushyant Sharma and Jose Lainez and Daniel Barreda and Toon van Waterschoot and Patrick A. Naylor A Single-Channel Non-Intrusive C50 Estimator Correlated With Speech Recognition Performance . . . . . . . . 719--732 Ming-Hsiang Su and Chung-Hsien Wu and Yu-Ting Zheng Exploiting Turn-Taking Temporal Evolution for Personality Trait Perception in Dyadic Conversations . . . 733--744 Sadaf Abdul-Rauf and Holger Schwenk and Patrik Lambert and Mohammad Nawaz Empirical Use of Information Retrieval to Build Synthetic Data for SMT Domain Adaptation . . . . . . . . . . . . . . . 745--754 Shinnosuke Takamichi and Tomoki Toda and Alan W. Black and Graham Neubig and Sakriani Sakti and Satoshi Nakamura Postfilters to Modify the Modulation Spectrum for Statistical Parametric Speech Synthesis . . . . . . . . . . . . 755--767 Zhizheng Wu and Phillip L. De Leon and Cenk Demiroglu and Ali Khodabakhsh and Simon King and Zhen-Hua Ling and Daisuke Saito and Bryan Stewart and Tomoki Toda and Mirjam Wester and Junichi Yamagishi Anti-Spoofing for Text-Independent Speaker Verification: an Initial Database, Comparison of Countermeasures, and Human Performance . . . . . . . . . 768--783 Kristian Timm Andersen and Marc Moonen Adaptive Time-Frequency Analysis for Noise Reduction in an Audio Filter Bank With Low Delay . . . . . . . . . . . . . 784--795 Zhong-Qiu Wang and DeLiang Wang A Joint Training Framework for Robust Automatic Speech Recognition . . . . . . 796--806 Huy Phan and Lars Hertel and Marco Maass and Radoslaw Mazur and Alfred Mertins Learning Representations for Nonspeech Audio Events Through Their Similarities to Speech Patterns . . . . . . . . . . . 807--822 Anonymous EDICS Categories for IEEE/ACM Transactions on Audio, Speech, and Language Processing . . . . . . . . . . 823--824 Anonymous Information for Authors . . . . . . . . 825--826 Anonymous Special issue on sound scene and event analysis . . . . . . . . . . . . . . . . 827 Anonymous Introducing IEEE Collabratec . . . . . . 828 Anonymous Front Cover . . . . . . . . . . . . . . C1 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing . . . . . . . . C2
Anonymous Table of Contents . . . . . . . . . . . 829--830 Anonymous Table of Contents . . . . . . . . . . . 831--832 T. J. Tsai and Andreas Stolcke Robust and Efficient Multiple Alignment of Unsynchronized Meeting Recordings . . 833--845 Simon Receveur and Robin Weiß and Tim Fingscheidt Turbo Automatic Speech Recognition . . . 846--862 Ricard Marxer and Hendrik Purwins Unsupervised Incremental Online Learning and Prediction of Musical Audio Signals 863--874 Mohammad Adeli and Jean Rouat and Sean Wood and Stéphane Molotchnikoff and Eric Plourde A Flexible Bio-Inspired Hierarchical Model for Analyzing Musical Timbre . . . 875--889 Geliang Zhang and Simon Godsill Fundamental Frequency Estimation in Speech Signals With Variable Rate Particle Filters . . . . . . . . . . . . 890--900 Nadine Kroher and Emilia Gómez Automatic Transcription of Flamenco Singing From Polyphonic Music Recordings 901--913 Fiete Winter and Jens Ahrens and Sascha Spors On Analytic Methods for $ 2.5$-D Local Sound Field Synthesis Using Circular Distributions of Secondary Sources . . . 914--926 Siddharth Sigtia and Emmanouil Benetos and Simon Dixon An End-to-End Neural Network for Polyphonic Piano Music Transcription . . 927--939 Martin Krawczyk-Becker and Timo Gerkmann Fundamental Frequency Informed Speech Enhancement in a Flexible Statistical Framework . . . . . . . . . . . . . . . 940--951 Joseph Szurley and Alexander Bertrand and Bas Van Dijk and Marc Moonen Binaural Noise Cue Preservation in a Binaural Noise Reduction System With a Remote Microphone Signal . . . . . . . . 952--966 Xiao-Lei Zhang and DeLiang Wang A Deep Ensemble Learning Method for Monaural Speech Separation . . . . . . . 967--977 Haotian Xu and Zhijian Ou Scalable Discovery of Audio Fingerprint Motifs in Broadcast Streams With Determinantal Point Process Based Motif Clustering . . . . . . . . . . . . . . . 978--989 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing Edics . . . . . 990--991 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing information for authors . . . . . . . . . . . . . . . . 992--993 Anonymous Special issue on sound scene and event analysis . . . . . . . . . . . . . . . . 994 Anonymous Special Issue on Biosignal-based Spoken Communication . . . . . . . . . . . . . 995 Anonymous Front Cover . . . . . . . . . . . . . . C1 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing publication information . . . . . . . . . . . . . . C2 Anonymous IEEE Power Electronics Society Information . . . . . . . . . . . . . . C3 Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 990--991 Anonymous Table of Contents . . . . . . . . . . . 992--993 Asli Celikyilmaz and Ruhi Sarikaya and Minwoo Jeong and Anoop Deoras An Empirical Investigation of Word Class-Based Features for Natural Language Understanding . . . . . . . . . 994--1005 Duc Hoang Ha Nguyen and Xiong Xiao and Eng Siong Chng and Haizhou Li Feature Adaptation Using Linear Spectro-Temporal Transform for Robust Speech Recognition . . . . . . . . . . . 1006--1019 Xiaojun Qian and Helen Meng and Frank Soong A Two-Pass Framework of Mispronunciation Detection and Diagnosis for Computer-Aided Pronunciation Training 1020--1028 Lijiang Chen and Xia Mao and Hong Yan Text-Independent Phoneme Segmentation Combining EGG and Speech Data . . . . . 1029--1037 Vincent Mohammad Tavakoli and Jesper Rindom Jensen and Mads Græsbòll Christensen and Jacob Benesty A Framework for Speech Enhancement With Ad Hoc Microphone Arrays . . . . . . . . 1038--1051 Yan-You Chen and Chung-Hsien Wu and Yi-Chin Huang and Shih-Lun Lin and Jhing-Fa Wang Candidate Expansion and Prosody Adjustment for Natural Speech Synthesis Using a Small Corpus . . . . . . . . . . 1052--1065 Xueliang Zhang and Hui Zhang and Shuai Nie and Guanglai Gao and Wenju Liu A Pairwise Algorithm Using the Deep Stacking Network for Speech Separation and Pitch Estimation . . . . . . . . . . 1066--1078 Lin Wang and Tsz-Kin Hon and Joshua D. Reiss and Andrea Cavallaro An Iterative Approach to Source Counting and Localization Using Two Distant Microphones . . . . . . . . . . . . . . 1079--1093 Seán O'Leary and Axel Röbel A Montage Approach to Sound Texture Synthesis . . . . . . . . . . . . . . . 1094--1105 Chahid Ouali and Pierre Dumouchel and Vishwa Gupta Fast Audio Fingerprinting System Using GPU and a Clustering-Based Technique . . 1106--1118 Francisco Raposo and Ricardo Ribeiro and David Martins de Matos Using Generic Summarization to Improve Music Information Retrieval Tasks . . . 1119--1128 Lantian Li and Dong Wang and Chenhao Zhang and Thomas Fang Zheng Improving Short Utterance Speaker Recognition by Modeling Speech Unit Classes . . . . . . . . . . . . . . . . 1129--1139 Jalal Taghia and Rainer Martin A Frequency-Domain Adaptive Line Enhancer With Step-Size Control Based on Mutual Information for Harmonic Noise Reduction . . . . . . . . . . . . . . . 1140--1154 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing Edics . . . . . 1155--1156 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing information for authors . . . . . . . . . . . . . . . . 1157--1158 Anonymous Special issue on sound scene and event analysis . . . . . . . . . . . . . . . . 1159 Anonymous Special Issue on Biosignal-based Spoken Communication . . . . . . . . . . . . . 1160 Anonymous Front Cover . . . . . . . . . . . . . . C1 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing publication information . . . . . . . . . . . . . . C2 Anonymous IEEE Power Electronics Society Information . . . . . . . . . . . . . . C3 Anonymous Blank page . . . . . . . . . . . . . . . C4
Min Gao and Jing Lu and Xiaojun Qiu A Simplified Subband ANC Algorithm Without Secondary Path Modeling . . . . 1164--1174 Ryo Aihara and Tetsuya Takiguchi and Yasuo Ariki Multiple Non-Negative Matrix Factorization for Many-to-Many Voice Conversion . . . . . . . . . . . . . . . 1175--1184 Kai Chen and Qiang Huo Training Deep Bidirectional LSTM Acoustic Model for LVCSR by a Context-Sensitive-Chunk BPTT Approach 1185--1193 Themos Stafylakis and Md. Jahangir Alam and Patrick Kenny Text-Dependent Speaker Recognition With Random Digit Strings . . . . . . . . . . 1194--1203 K. T. Deepak and S. R. Mahadeva Prasanna Foreground Speech Segmentation and Enhancement Using Glottal Closure Instants and Mel Cepstral Coefficients 1204--1218 Habib Hajimolahoseini and Rassoul Amirfattahi and Saeed Gazor and Hamid Soltanian-Zadeh Robust Estimation and Tracking of Pitch Period Using an Efficient Bayesian Filter . . . . . . . . . . . . . . . . . 1219--1229 Subhasmita Sahoo and Aurobinda Routray A Novel Method of Glottal Inverse Filtering . . . . . . . . . . . . . . . 1230--1241 Gilles Degottex and Luc Ardaillon and Axel Roebel Multi-Frame Amplitude Envelope Estimation for Modification of Singing Voice . . . . . . . . . . . . . . . . . 1242--1254 Zhizheng Wu and Simon King Improving Trajectory Modelling for DNN-Based Speech Synthesis by Using Stacked Bottleneck Features and Minimum Generation Error Training . . . . . . . 1255--1265 Xabier Jaureguiberry and Emmanuel Vincent and Gaël Richard Fusion Methods for Speech Enhancement and Audio Source Separation . . . . . . 1266--1279 Rajib Lochan Das and Mrityunjoy Chakraborty Improving the Performance of the PNLMS Algorithm Using Norm Regularization . . 1280--1290 Maja Taseska and Emanuël A. P. Habets Spotforming: Spatial Filtering With Distributed Arrays for Position-Selective Sound Acquisition . . 1291--1304 Guangyou Zhou and Zhiwen Xie and Tingting He and Jun Zhao and Xiaohua Tony Hu Learning the Multilingual Translation Representations for Question Retrieval in Community Question Answering via Non-Negative Matrix Factorization . . . 1305--1314 Chanwoo Kim and Richard M. Stern Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition . . 1315--1329
Henning Schepker and Simon Doclo Least-Squares Estimation of the Common Pole-Zero Filter of Acoustic Feedback Paths in Hearing Aids . . . . . . . . . 1334--1347 Hannes Pessentheiner and Martin Hagmüller and Gernot Kubin Localization and Characterization of Multiple Harmonic Sources . . . . . . . 1348--1363 Hanieh Khalilian and Ivan V. Baji\'c and Rodney G. Vaughan Comparison of Loudspeaker Placement Methods for Sound Field Reproduction . . 1364--1379 Cheng-Yen Yang and Chih-Wei Liu and Shyh-Jye Jou A Systematic ANSI S1.11 Filter Bank Specification Relaxation and Its Efficient Multirate Architecture for Hearing-Aid Systems . . . . . . . . . . 1380--1392 Bracha Laufer-Goldshtein and Ronen Talmon and Sharon Gannot Semi-Supervised Sound Source Localization Based on Manifold Regularization . . . . . . . . . . . . . 1393--1407 Dionyssos Kounades-Bastian and Laurent Girin and Xavier Alameda-Pineda and Sharon Gannot and Radu Horaud A Variational EM Algorithm for the Separation of Time-Varying Convolutive Audio Mixtures . . . . . . . . . . . . . 1408--1423 Jun Du and Yanhui Tu and Li-Rong Dai and Chin-Hui Lee A Regression Approach to Single-Channel Speech Separation Via High-Resolution Deep Neural Networks . . . . . . . . . . 1424--1437 Xunying Liu and Xie Chen and Yongqiang Wang and Mark J. F. Gales and Philip C. Woodland Two Efficient Lattice Rescoring Methods Using Recurrent Neural Network Language Models . . . . . . . . . . . . . . . . . 1438--1449 Pawel Swietojanski and Jinyu Li and Steve Renals Learning Hidden Unit Contributions for Unsupervised Acoustic Model Adaptation 1450--1463 Meng Zhang and Yang Liu and Huanbo Luan and Maosong Sun Listwise Ranking Functions for Statistical Machine Translation . . . . 1464--1472
Anonymous Table of Contents . . . . . . . . . . . 1477--1478 Anonymous Table of Contents . . . . . . . . . . . 1479--1480 Daniel C. Cavalieri and Sira E. Palazuelos-Cagigas and Teodiano F. Bastos-Filho and Mário Sarcinelli-Filho Combination of Language Models for Word Prediction: An Exponential Approach . . 1481--1494 Ofer Schwartz and Sharon Gannot and Emanuël A. P. Habets An Expectation-Maximization Algorithm for Multimicrophone Speech Dereverberation and Noise Reduction With Coherence Matrix Estimation . . . . . . 1495--1510 Symeon Delikaris-Manias and Juha Vilkamo and Ville Pulkki Signal-Dependent Spatial Filtering Based on Weighted-Orthogonal Beamformers in the Spherical Harmonic Domain . . . . . 1511--1523 Sheng Li and Yuya Akita and Tatsuya Kawahara Semi-Supervised Acoustic Model Training by Discriminative Data Selection From Multiple ASR Systems' Hypotheses . . . . 1524--1534 Christian Dittmar and Meinard Müller Reverse Engineering the Amen Break --- Score-Informed Separation and Restoration Applied to Drum Recordings 1535--1547 Chao Pan and Jingdong Chen and Jacob Benesty Reduced-Order Robust Superdirective Beamforming With Uniform Linear Microphone Arrays . . . . . . . . . . . 1548--1559 Derry FitzGerald and Antoine Liutkus and Roland Badeau Projection-Based Demixing of Spatial Audio . . . . . . . . . . . . . . . . . 1560--1572 Lin Wang and Joshua D. Reiss and Andrea Cavallaro Over-Determined Source Separation and Localization Using Distributed Microphones . . . . . . . . . . . . . . 1573--1588 Yang Liu and Sujian Li and Furu Wei and Heng Ji Relation Classification Via Modeling Augmented Dependency Paths . . . . . . . 1589--1598 Adam Kuklasi\'nski and Simon Doclo and Sòren Holdt Jensen and Jesper Jensen Maximum Likelihood PSD Estimation for Speech Enhancement in Reverberation and Noise . . . . . . . . . . . . . . . . . 1599--1612 Sam Karimian-Azari and Jesper Rindom Jensen and Mads Græsbòll Christensen Computationally Efficient and Noise Robust DOA and Pitch Estimation . . . . 1613--1625 Daichi Kitamura and Nobutaka Ono and Hiroshi Sawada and Hirokazu Kameoka and Hiroshi Saruwatari Determined Blind Source Separation Unifying Independent Vector Analysis and Nonnegative Matrix Factorization . . . . 1626--1641 Nicolas Obin and Axel Roebel Similarity Search of Acted Voices for Automatic Voice Casting . . . . . . . . 1642--1651 Aditya Arie Nugraha and Antoine Liutkus and Emmanuel Vincent Multichannel Audio Source Separation With Deep Neural Networks . . . . . . . 1652--1664 Stephen H. Shum and David F. Harwath and Najim Dehak and James R. Glass On the Use of Acoustic Unit Discovery for Language Recognition . . . . . . . . 1665--1676 Anonymous \booktitleIEEE/ACM Transactions on Audio, Speech, and Language Processing Edics . . . . . . . . . . . . . . . . . 1677--1678 Anonymous \booktitleIEEE Transactions on Multimedia information for authors . . . 1679--1680 Anonymous Introducing the IEEE PES Resource Center 1681 Anonymous Front Cover . . . . . . . . . . . . . . C1 Anonymous IEEE Signal Processing Society . . . . . C2 Anonymous IEEE Signal Processing Society . . . . . C3 Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 1677--1678 Anonymous Table of Contents . . . . . . . . . . . 1679--1680 James Eaton and Nikolay D. Gaubitch and Alastair H. Moore and Patrick A. Naylor Estimation of Room Acoustic Parameters: The ACE Challenge . . . . . . . . . . . 1681--1693 Takashi Nose Efficient Implementation of Global Variance Compensation for Parametric Speech Synthesis . . . . . . . . . . . . 1694--1704 Shabnam Ghaffarzadegan and Hynek Bo\vril and John H. L. Hansen Generative Modeling of Pseudo-Whisper for Robust Whispered Speech Recognition 1705--1720 Seyedmahdad Mirsamadi and John H. L. Hansen A Generalized Nonnegative Tensor Factorization Approach for Distant Speech Recognition With Distributed Microphones . . . . . . . . . . . . . . 1721--1731 Laura Fuster and Maria de Diego and Luis A. Azpicueta-Ruiz and Miguel Ferrer Adaptive Filtered-x Algorithms for Room Equalization Based on Block-Based Combination Schemes . . . . . . . . . . 1732--1745 Kamil Adilo\uglu and Emmanuel Vincent Variational Bayesian Inference for Source Separation and Robust Feature Extraction . . . . . . . . . . . . . . . 1746--1758 Steffen Kortlang and Giso Grimm and Volker Hohmann and Birger Kollmeier and Stephan D. Ewert Auditory Model-Based Dynamic Compression Controlled by Subband Instantaneous Frequency and Speech Presence Probability Estimates . . . . . . . . . 1759--1772 Pawel Swietojanski and Steve Renals Differentiable Pooling for Unsupervised Acoustic Model Adaptation . . . . . . . 1773--1784 Kenta Niwa and Yusuke Hioka and Kazunori Kobayashi Optimal Microphone Array Observation for Clear Recording of Distant Sound Sources 1785--1795 Nicolas Epain and Craig T. Jin Spherical Harmonic Signal Covariance and Sound Field Diffuseness . . . . . . . . 1796--1807 Tudor-C\uat\ualin Zoril\ua and Yannis Stylianou and Tatsuma Ishihara and Masami Akamine Near and Far Field Speech-in-Noise Intelligibility Improvements Based on a Time--Frequency Energy Reallocation Approach . . . . . . . . . . . . . . . . 1808--1818 Xi Ma and Dong Wang and Javier Tejedor Similar Word Model for Unfrequent Word Enhancement in Speech Recognition . . . 1819--1830 Mohammad Hadi Bokaei and Hossein Sameti and Yang Liu Summarizing Meeting Transcripts Based on Functional Segmentation . . . . . . . . 1831--1841 Jiajun Zhang and Yu Zhou and Chengqing Zong Abstractive Cross-Language Summarization via Translation Model Enhanced Predicate Argument Structure Fusing . . . . . . . 1842--1853 Grégoire Lafay and Mathieu Lagrange and Mathias Rossignol and Emmanouil Benetos and Axel Roebel A Morphological Model for Simulating Acoustic Scenes and Its Application to Sound Event Detection . . . . . . . . . 1854--1864 An Ji and Michael T. Johnson and Jeffrey J. Berry Parallel Reference Speaker Weighting for Kinematic-Independent Acoustic-to-Articulatory Inversion . . . 1865--1875 Anonymous \booktitleIEEE/ACM Transactions on Audio, Speech, and Language Processing Edics . . . . . . . . . . . . . . . . . 1876--1877 Anonymous \booktitleIEEE Transactions on Multimedia information for authors . . . 1878--1879 Anonymous Introducing the IEEE PES Resource Center 1880 Anonymous Front Cover . . . . . . . . . . . . . . C1 Anonymous IEEE Signal Processing Society . . . . . C2 Anonymous IEEE Signal Processing Society . . . . . C3 Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 1881--1882 Anonymous Table of Contents . . . . . . . . . . . 1883--1884 Aggelos Gkiokas and Vassilis Katsouros and George Carayannis Towards Multi-Purpose Spectral Rhythm Features: An Application to Dance Style, Meter and Tempo Estimation . . . . . . . 1885--1896 Yi-Chin Huang and Chung-Hsien Wu and Si-Ting Weng Improving Mandarin Prosody Generation Using Alternative Smoothing Techniques 1897--1907 Asger Heidemann Andersen and Jan Mark de Haan and Zheng-Hua Tan and Jesper Jensen Predicting the Intelligibility of Noisy and Nonlinearly Processed Binaural Speech . . . . . . . . . . . . . . . . . 1908--1920 Qiaoling Zhang and Zhe Chen and Fuliang Yin Distributed Marginalized Auxiliary Particle Filter for Speaker Tracking in Distributed Microphone Networks . . . . 1921--1934 Marc Ferr\`as and Srikanth Madikeri and Hervé Bourlard Speaker Diarization and Linking of Meeting Data . . . . . . . . . . . . . . 1935--1945 Yuzong Liu and Katrin Kirchhoff Graph-Based Semisupervised Learning for Acoustic Modeling in Automatic Speech Recognition . . . . . . . . . . . . . . 1946--1956 Jin Wang and Liang-Chih Yu and K. Robert Lai and Xuejie Zhang Community-Based Weighted Graph Model for Valence-Arousal Prediction of Affective Words . . . . . . . . . . . . . . . . . 1957--1968 Alberto Carini and Stefania Cecchi and Laura Romoli Robust Room Impulse Response Measurement Using Perfect Sequences for Legendre Nonlinear Filters . . . . . . . . . . . 1969--1982 Sebastian Ewert and Mark Sandler Piano Transcription in the Studio Using an Extensible Alternating Directions Framework . . . . . . . . . . . . . . . 1983--1997 Yu-Ren Chien and Hsin-Min Wang and Shyh-Kang Jeng Alignment of Lyrics With Accompanied Singing Audio Based on Acoustic-Phonetic Vowel Likelihood Modeling . . . . . . . 1998--2008 Jesper Jensen and Cees H. Taal An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers . . . . . . . . 2009--2022 Xiaodong Cui and Vaibhava Goel Maximum Likelihood Nonlinear Transformations Based on Deep Neural Networks . . . . . . . . . . . . . . . . 2023--2031 Toru Nakashika and Tetsuya Takiguchi and Yasuhiro Minami Non-Parallel Training in Voice Conversion Using an Adaptive Restricted Boltzmann Machine . . . . . . . . . . . 2032--2045 I-Bin Liao and Chen-Yu Chiang and Yih-Ru Wang and Sin-Horng Chen Speaker Adaptation of SR-HPM for Speaking Rate-Controlled Mandarin TTS 2046--2058 Hiroki Ouchi and Kevin Duh and Hiroyuki Shindo and Yuji Matsumoto Transition-Based Dependency Parsing Exploiting Supertags . . . . . . . . . . 2059--2068 Tong Xiao and Derek F. Wong and Jingbo Zhu A Loss-Augmented Approach to Training Syntactic Machine Translation Systems 2069--2083 Yukara Ikemiya and Katsutoshi Itoyama and Kazuyoshi Yoshii Singing Voice Separation and Vocal F0 Estimation Based on Mutual Combination of Robust Principal Component Analysis and Subharmonic Summation . . . . . . . 2084--2095 Siddharth Sigtia and Adam M. Stark and Sacha Krstulovi\'c and Mark D. Plumbley Automatic Environmental Sound Recognition: Performance Versus Computational Cost . . . . . . . . . . . 2096--2107 Srinivas Parthasarathy and Roddy Cowie and Carlos Busso Using Agreement on Direction of Change to Build Rank-Based Emotion Classifiers 2108--2121 Jia-Ching Wang and Yuan-Shan Lee and Chang-Hong Lin and Shu-Fan Wang and Chih-Hao Shih and Chung-Hsien Wu Compressive Sensing-Based Speech Enhancement . . . . . . . . . . . . . . 2122--2131 Siying Wang and Sebastian Ewert and Simon Dixon Robust and Efficient Joint Alignment of Multiple Musical Performances . . . . . 2132--2145 Xie Chen and Xunying Liu and Yongqiang Wang and Mark J. F. Gales and Philip C. Woodland Efficient Training and Evaluation of Recurrent Neural Network Language Models for Automatic Speech Recognition . . . . 2146--2157 Ping-Keng Jao and Li Su and Yi-Hsuan Yang and Brendt Wohlberg Monaural Music Source Separation Using Convolutional Sparse Coding . . . . . . 2158--2170
Andrea Cogliati and Zhiyao Duan and Brendt Wohlberg Context-Dependent Piano Music Transcription With Convolutional Sparse Coding . . . . . . . . . . . . . . . . . 2218--2230 Yanmin Qian and Tian Tan and Dong Yu Neural Network Based Multi-Factor Aware Joint Training for Robust Speech Recognition . . . . . . . . . . . . . . 2231--2240 Lahiru Samarakoon and Khe Chai Sim Factorized Hidden Layer Adaptation for Deep Neural Network Based Acoustic Modeling . . . . . . . . . . . . . . . . 2241--2250 Martin Krawczyk-Becker and Timo Gerkmann On MMSE-Based Estimation of Amplitude and Complex Speech Spectral Coefficients Under Phase-Uncertainty . . . . . . . . 2251--2262 Yanmin Qian and Mengxiao Bi and Tian Tan and Kai Yu Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition . . 2263--2276 Yi-Chan Wu and Homer H. Chen Generation of Affective Accompaniment in Accordance With Emotion Flow . . . . . . 2277--2287 Mahmood Movassagh and Peter Kabal Scalable Audio Coding Using Trellis-Based Optimized Joint Entropy Coding and Quantization . . . . . . . . 2288--2300 Milos Cernak and Alexandros Lazaridis and Afsaneh Asaei and Philip N. Garner Composition of Deep and Spiking Neural Networks for Very Low Bit Rate Speech Coding . . . . . . . . . . . . . . . . . 2301--2312 David Dov and Ronen Talmon and Israel Cohen Kernel Method for Voice Activity Detection in the Presence of Transients 2313--2326 Jesús Villalba and Antonio Miguel and Alfonso Ortega and Eduardo Lleida Bayesian Networks to Model the Variability of Speaker Verification Scores in Adverse Environments . . . . . 2327--2340 Hardik B. Sailor and Hemant A. Patil Novel Unsupervised Auditory Filterbank Learning Using Convolutional RBM for Speech Recognition . . . . . . . . . . . 2341--2353 Sidsel Marie Nòrholm and Jesper Rindom Jensen and Mads Græsbòll Christensen Instantaneous Fundamental Frequency Estimation With Optimal Segmentation for Nonstationary Voiced Speech . . . . . . 2354--2367 Sheng Zhang and Jiashu Zhang and Hongyu Han Robust Variable Step-Size Decorrelation Normalized Least-Mean-Square Algorithm and its Application to Acoustic Echo Cancellation . . . . . . . . . . . . . . 2368--2376 Tom Barker and Tuomas Virtanen Blind Separation of Audio Mixtures Through Nonnegative Tensor Factorization of Modulation Spectrograms . . . . . . . 2377--2389 Jinxin Liu and Xuefeng Chen Adaptive Compensation of Misequalization in Narrowband Active Noise Equalizer Systems . . . . . . . . . . . . . . . . 2390--2399 Atsunori Ogawa and Takaaki Hori and Atsushi Nakamura Estimating Speech Recognition Accuracy Based on Error Type Classification . . . 2400--2413 Finnian Kelly and John H. L. Hansen Score-Aging Calibration for Speaker Verification . . . . . . . . . . . . . . 2414--2424 Bochen Li and Zhiyao Duan An Approach to Score Following for Piano Performances With the Sustained Effect 2425--2438 Niko Moritz and Birger Kollmeier and Jörn Anemüller Integration of Optimized Modulation Filter Sets Into Deep Neural Networks for Automatic Speech Recognition . . . . 2439--2452 Simon Leglaive and Roland Badeau and Gaël Richard Multichannel Audio Source Separation With Probabilistic Reverberation Priors 2453--2465 Sakari Tervo Single Snapshot Detection and Estimation of Reflections From Room Impulse Responses in the Spherical Harmonic Domain . . . . . . . . . . . . . . . . . 2466--2480 Dejan Markovi\'c and Fabio Antonacci and Lucio Bianchi and Stefano Tubaro and Augusto Sarti Extraction of Acoustic Sources Through the Processing of Sound Field Maps in the Ray Space . . . . . . . . . . . . . 2481--2494
Anonymous Table of Contents . . . . . . . . . . . 222--223 Anonymous Table of Contents . . . . . . . . . . . 224--225 Hanchi Chen and Thushara Dheemantha Abhayapala and Prasanga N. Samarasinghe and Wen Zhang Direct-to-Reverberant Energy Ratio Estimation Using a First-Order Microphone . . . . . . . . . . . . . . . 226--237 Peter Bell and Pawel Swietojanski and Steve Renals Multitask Learning of Context-Dependent Targets in Deep Neural Network Acoustic Models . . . . . . . . . . . . . . . . . 238--247 Rui Zhao and Kezhi Mao Topic-Aware Deep Compositional Models for Sentence Classification . . . . . . 248--260 Dalia El Badawy and Ngoc Q. K. Duong and Alexey Ozerov On-the-Fly Audio Source Separation --- A Novel User-Friendly Framework . . . . . 261--272 Filip Elvander and Johan Swärd and Andreas Jakobsson Online Estimation of Multiple Harmonic Signals . . . . . . . . . . . . . . . . 273--284 Vincent Renkens and Hugo Van hamme Weakly Supervised Learning of Hidden Markov Models for Spoken Language Acquisition . . . . . . . . . . . . . . 285--295 Luca Remaggi and Philip J. B. Jackson and Philip Coleman and Wenwu Wang Acoustic Reflector Localization: Novel Image Source Reversion and Direct Localization Methods . . . . . . . . . . 296--309 Prasanga N. Samarasinghe and Thushara D. Abhayapala and Hanchi Chen Estimating the Direct-to-Reverberant Energy Ratio Using a Spherical Harmonics-Based Spatial Correlation Model . . . . . . . . . . . . . . . . . 310--319 Shmulik Markovich-Golan and Sharon Gannot and Walter Kellermann Combined LCMV-TRINICON Beamforming for Separating Multiple Speech Sources in Noisy and Reverberant Environments . . . 320--332 Shakeel Ahmed and Muhammad Tahir Akhtar Gain Scheduling of Auxiliary Noise and Variable Step-Size for Online Acoustic Feedback Cancellation in Narrow-Band Active Noise Control Systems . . . . . . 333--343 Gabriel Sargent and Frédéric Bimbot and Emmanuel Vincent Estimating the Structural Segmentation of Popular Music Pieces Under Regularity Constraints . . . . . . . . . . . . . . 344--358 Jordan Cheer and Stephen Daley An Investigation of Delayless Subband Adaptive Filtering for Multi-Input Multi-Output Active Noise Control Applications . . . . . . . . . . . . . . 359--373 Sebastian J. Schlecht and Emanuël A. P. Habets Feedback Delay Networks: Echo Density and Mixing Time . . . . . . . . . . . . 374--383 Johannes Abel and Magdalena Kaniewska and Cyril Guillaumé and Wouter Tirry and Tim Fingscheidt An Instrumental Quality Measure for Artificially Bandwidth-Extended Speech Signals . . . . . . . . . . . . . . . . 384--396 Robert Rehr and Timo Gerkmann An Analysis of Adaptive Recursive Smoothing with Applications to Noise PSD Estimation . . . . . . . . . . . . . . . 397--408 Emilio Granell and Carlos-D. Martínez-Hinarejos Multimodal Crowdsourcing for Transcribing Handwritten Documents . . . 409--419 Yaping Ma and Yegui Xiao A New Strategy for Online Secondary-Path Modeling of Narrowband Active Noise Control . . . . . . . . . . . . . . . . 420--434 Jose A. Belloch and Alberto Gonzalez and Enrique S. Quintana-Ortí and Miguel Ferrer and Vesa Välimäki GPU-Based Dynamic Wave Field Synthesis Using Fractional Delay Filters and Room Compensation . . . . . . . . . . . . . . 435--447 Anonymous IEEE/ACM Transactions on Audio, Speech, and Language Processing Edics . . . . . 448--449 Anonymous IEEE Transactions on Multimedia information for authors . . . . . . . . 450--451 Anonymous Introducing IEEE Collabratec . . . . . . 452 Anonymous Front Cover . . . . . . . . . . . . . . C1 Anonymous IEEE Signal Processing Society . . . . . C2
Anonymous Table of Contents . . . . . . . . . . . 3--4 Anonymous Table of Contents . . . . . . . . . . . 3--4 Anonymous Table of Contents . . . . . . . . . . . 3--4 Anonymous Table of Contents . . . . . . . . . . . 3--4 Qi He and Feng Bao and Changchun Bao Multiplicative Update of Auto-Regressive Gains for Codebook-Based Speech Enhancement . . . . . . . . . . . . . . 457--468 Zhongqing Wang and Sophia Yat Mei Lee and Shoushan Li and Guodong Zhou Emotion Analysis in Code-Switching Text With Joint Factor Graph Model . . . . . 469--480 Ashwin Bellur and Mounya Elhilali Feedback-Driven Sensory Mapping Adaptation for Robust Speech Activity Detection . . . . . . . . . . . . . . . 481--492 Zhiyuan Tang and Lantian Li and Dong Wang and Ravichander Vipperla Collaborative Joint Training With Multitask Recurrent Model for Speech and Speaker Recognition . . . . . . . . . . 493--504 Bidisha Sharma and S. R. Mahadeva Prasanna Sonority Measurement Using System, Source, and Suprasegmental Information 505--518 Hung-Yi Lee and Bo-Hsiang Tseng and Tsung-Hsien Wen and Yu Tsao Personalizing Recurrent-Neural-Network-Based Language Model by Social Network . . . . . . . . 519--530 Ji Ming and Danny Crookes Speech Enhancement Based on Full-Sentence Correlation and Clean Speech Recognition . . . . . . . . . . . 531--543 Quoc Truong Do and Tomoki Toda and Graham Neubig and Sakriani Sakti and Satoshi Nakamura Preserving Word-Level Emphasis in Speech-to-Speech Translation . . . . . . 544--556 Zhenghua Li and Jiayuan Chao and Min Zhang and Wenliang Chen and Meishan Zhang and Guohong Fu Coupled POS Tagging on Heterogeneous Annotations . . . . . . . . . . . . . . 557--571 Clement S. J. Doire and Mike Brookes and Patrick A. Naylor and Christopher M. Hicks and Dave Betts and Mohammad A. Dmour and Sòren Holdt Jensen Single-Channel Online Enhancement of Speech Corrupted by Reverberation and Noise . . . . . . . . . . . . . . . . . 572--587 Aleksandr Sizov and Kong Aik Lee and Tomi Kinnunen Direct Optimization of the Detection Cost for $I$-Vector-Based Spoken Language Recognition . . . . . . . . . . 588--597 Imran Sheikh and Dominique Fohr and Irina Illina and Georges Linar\`es Modelling Semantic Context of OOV Words in Large Vocabulary Continuous Speech Recognition . . . . . . . . . . . . . . 598--610 Mojtaba Farmani and Michael Syskind Pedersen and Zheng-Hua Tan and Jesper Jensen Informed Sound Source Localization Using Relative Transfer Functions for Hearing Aid Applications . . . . . . . . . . . . 611--623 C. M. Vikram and S. R. Mahadeva Prasanna Epoch Extraction From Telephone Quality Speech Using Single Pole Filter . . . . 624--636 Motoi Omachi and Tetsuji Ogawa and Tetsunori Kobayashi Associative Memory Model-Based Linear Filtering and Its Application to Tandem Connectionist Blind Source Separation 637--650 Dani Cherkassky and Sharon Gannot Blind Synchronization in Wireless Acoustic Sensor Networks . . . . . . . . 651--661 Laurent Girin and Thomas Hueber and Xavier Alameda-Pineda Extending the Cascaded Gaussian Mixture Regression Framework for Cross-Speaker Acoustic-Articulatory Mapping . . . . . 662--673 Mohamad Hasan Bahari and Alexander Bertrand and Marc Moonen Blind Sampling Rate Offset Estimation for Wireless Acoustic Sensor Networks Through Weighted Least-Squares Coherence Drift Estimation . . . . . . . . . . . . 674--686 Adam Kuklasi\'nski and Simon Doclo and Sòren Holdt Jensen and Jesper Jensen Correction to ``Maximum Likelihood PSD Estimation for Speech Enhancement in Reverberation and Noise'' . . . . . . . 687--687
Anonymous Table of Contents . . . . . . . . . . . 688--689 Anonymous Table of Contents . . . . . . . . . . . 690--691 Sharon Gannot and Emmanuel Vincent and Shmulik Markovich-Golan and Alexey Ozerov A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation . . . . . . . . . . . 692--730 Dongwen Ying and Ruohua Zhou and Junfeng Li and Yonghong Yan Window-Dominant Signal Subspace Methods for Multiple Short-Term Speech Source Localization . . . . . . . . . . . . . . 731--744 Sean U. N. Wood and Jean Rouat and Stéphane Dupont and Gueorgui Pironkov Blind Speech Separation and Enhancement With GCC-NMF . . . . . . . . . . . . . . 745--755 Constantin Spille and Birger Kollmeier and Bernd T. Meyer Combining Binaural and Cortical Features for Robust Speech Recognition . . . . . 756--767 Yuma Koizumi and Kenta Niwa and Yusuke Hioka and Kazunori Kobayashi and Hitoshi Ohmuro Informative Acoustic Feature Selection to Maximize Mutual Information for Collecting Target Sources . . . . . . . 768--779 Takuya Higuchi and Nobutaka Ito and Shoko Araki and Takuya Yoshioka and Marc Delcroix and Tomohiro Nakatani Online MVDR Beamformer Based on Complex Gaussian Mixture Model With Spatial Prior for Noise Robust ASR . . . . . . . 780--793 Eita Nakamura and Kazuyoshi Yoshii and Shigeki Sagayama Rhythm Transcription of Polyphonic Piano Music Based on Merged-Output HMM for Multiple Voices . . . . . . . . . . . . 794--806 Omid Ghahabi and Javier Hernando Deep Learning Backend for Single and Multisession $i$-Vector Speaker Recognition . . . . . . . . . . . . . . 807--817 Penny Karanasou and Chunyang Wu and Mark Gales and Philip C. Woodland $I$-Vectors and Structured Neural Networks for Rapid Adaptation of Acoustic Models . . . . . . . . . . . . 818--828 G. Aneeja and B. Yegnanarayana Extraction of Fundamental Frequency From Degraded Speech Using Temporal Envelopes at High SNR Frequencies . . . . . . . . 829--838 Seyyed Saeed Sarfjoo and Cenk Demiro\uglu and Simon King Using Eigenvoices and Nearest-Neighbors in HMM-Based Cross-Lingual Speaker Adaptation With Limited Data . . . . . . 839--851 Yung-Yue Chen and Jia-Hao Zhang Background Noise Reduction Design for Dual Microphone Cellular Phones: Robust Approach . . . . . . . . . . . . . . . . 852--862 Liner Yang and Xinxiong Chen and Zhiyuan Liu and Maosong Sun Improving Word Representations with Document Labels . . . . . . . . . . . . 863--870 Shiliang Zhang and Cong Liu and Hui Jiang and Si Wei and Lirong Dai and Yu Hu Nonrecurrent Neural Structure for Long-Term Dependence . . . . . . . . . . 871--884 Xuefeng Yang and Kezhi Mao Task Independent Fine Tuning for Word Embeddings . . . . . . . . . . . . . . . 885--894 Yu Bao and Huawei Chen Design of Robust Broadband Beamformers Using Worst-Case Performance Optimization: a Semidefinite Programming Approach . . . . . . . . . . . . . . . . 895--907 Sandro Cumani and Pietro Laface Nonlinear I-Vector Transformations for PLDA-Based Speaker Recognition . . . . . 908--919 Anonymous IEEE\slash ACM Transactions on Audio, Speech, and Language Processing Edics 920--921 Anonymous IEEE Transactions on Audio, Speech, and Language Processing information for authors . . . . . . . . . . . . . . . . 922--923 Anonymous Introducing IEEE Collabratec . . . . . . 924 Anonymous Front Cover . . . . . . . . . . . . . . C1 Anonymous IEEE Signal Processing Society . . . . . C2 Anonymous IEEE Signal Processing Society . . . . . C3 Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 925--926 Anonymous Table of Contents . . . . . . . . . . . 927--928 Manu Airaksinen and Tom Bäckström and Paavo Alku Quadratic Programming Approach to Glottal Inverse Filtering by Joint Norm-1 and Norm-2 Optimization . . . . . 929--939 Ofer Schwartz and Sharon Gannot and Emanuël A. P. Habets Multispeaker LCMV Beamformer and Postfilter for Source Separation and Noise Reduction . . . . . . . . . . . . 940--951 Dongmei Wang and Chengzhu Yu and John H. L. Hansen Robust Harmonic Features for Classification-Based Pitch Estimation 952--964 Tara N. Sainath and Ron J. Weiss and Kevin W. Wilson and Bo Li and Arun Narayanan and Ehsan Variani and Michiel Bacchiani and Izhak Shafran and Andrew Senior and Kean Chin and Ananya Misra and Chanwoo Kim Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition . . . . . . . . . . . . . . 965--979 Hanieh Khalilian and Ivan V. Baji\'c and Rodney G. Vaughan A Simulation Study of a Three-Dimensional Sound Field Reproduction System for Immersive Communication . . . . . . . . . . . . . 980--995 Andreas Franck and Wenwu Wang and Filippo Maria Fazi Sparse $ \ell_1$-Optimal Multiloudspeaker Panning and Its Relation to Vector Base Amplitude Panning . . . . . . . . . . . . . . . . 996--1010 Songbin Li and Yizhen Jia and C.-C. Jay Kuo Steganalysis of QIM Steganography in Low-Bit-Rate Speech Signals . . . . . . 1011--1022 Naoyuki Kanda and Xugang Lu and Hisashi Kawai Maximum-a-Posteriori-Based Decoding for End-to-End Acoustic Models . . . . . . . 1023--1034 Navid Shokouhi and John H. L. Hansen Teager--Kaiser Energy Operators for Overlapped Speech Detection . . . . . . 1035--1047 Yi-Chin Huang and Chung-Hsien Wu and Yan-You Chen and Ming-Ge Shie and Jhing-Fa Wang Personalized Spontaneous Speech Synthesis Using a Small-Sized Unsegmented Semispontaneous Speech . . . 1048--1060 Jeongsoo Park and Jaeyoung Shin and Kyogu Lee Exploiting Continuity/Discontinuity of Basis Vectors in Spectrogram Decomposition for Harmonic-Percussive Sound Separation . . . . . . . . . . . . 1061--1074 Xueliang Zhang and DeLiang Wang Deep Learning Based Binaural Speech Separation in Reverberant Environments 1075--1084 Masood Delfarah and DeLiang Wang Features for Masking-Based Monaural Speech Separation in Reverberant Conditions . . . . . . . . . . . . . . . 1085--1094 Feiran Yang and Gerald Enzner and Jun Yang Statistical Convergence Analysis for Optimal Control of DFT-Domain Adaptive Echo Canceler . . . . . . . . . . . . . 1095--1106 Takashi Nose and Yusuke Arao and Takao Kobayashi and Komei Sugiura and Yoshinori Shiga Sentence Selection Based on Extended Entropy Using Phonetic and Prosodic Contexts for Statistical Parametric Speech Synthesis . . . . . . . . . . . . 1107--1116 Gergely Firtha and Péter Fiala and Frank Schultz and Sascha Spors Improved Referencing Schemes for 2.5D Wave Field Synthesis Driving Functions 1117--1127 Esteban Maestre and Gary P. Scavone and Julius O. Smith Joint Modeling of Bridge Admittance and Body Radiativity for Efficient Synthesis of String Instrument Sound by Digital Waveguides . . . . . . . . . . . . . . . 1128--1139 Gongping Huang and Jacob Benesty and Jingdong Chen On the Design of Frequency-Invariant Beampatterns With Uniform Circular Microphone Arrays . . . . . . . . . . . 1140--1153 Zden\vek Pr\ru\vsa and Peter Balazs and Peter Lempel Sòndergaard A Noniterative Method for Reconstruction of Phase From STFT Magnitude . . . . . . 1154--1164 Anonymous \booktitleIEEE/ACM Transactions on Audio, Speech, and Language Processing Edics . . . . . . . . . . . . . . . . . 1167--1168 Anonymous \booktitleIEEE/ACM Transactions on Audio, Speech, and Language Processing for authors . . . . . . . . . . . . . . 1169--1170 Anonymous Open Access . . . . . . . . . . . . . . 1171 Anonymous Introducing IEEE Collabratec . . . . . . 1172 Anonymous Member Get-A-Member (MGM) Program . . . 1173 Anonymous Blank Page . . . . . . . . . . . . . . . B1165--B1166 Anonymous Front Cover . . . . . . . . . . . . . . C1 Anonymous IEEE Signal Processing Society . . . . . C2 Anonymous IEEE Signal Processing Society . . . . . C3 Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 1167--1168 G. Richard and T. Virtanen and J. P. Bello and N. Ono and H. Glotin Introduction to the Special Section on Sound Scene and Event Analysis . . . . . 1169--1171 Héctor A. Sánchez-Hevia and David Ayllón and Roberto Gil-Pita and Manuel Rosa-Zurera Maximum Likelihood Decision Fusion for Weapon Classification in Wireless Acoustic Sensor Networks . . . . . . . . 1172--1182 Nithin Rao Koluguri and G. Nisha Meenakshi and Prasanta Kumar Ghosh Spectrogram Enhancement Using Multiple Window Savitzky--Golay (MWSG) Filter for Robust Bird Sound Detection . . . . . . 1183--1192 Dan Stowell and Emmanouil Benetos and Lisa F. Gill On-Bird Sound Recordings: Automatic Acoustic Recognition of Activities and Contexts . . . . . . . . . . . . . . . . 1193--1206 Brandon T. Carroll and Bradley M. Whitaker and Wayne Dayley and David V. Anderson Outlier Learning via Augmented Frozen Dictionaries . . . . . . . . . . . . . . 1207--1215 Victor Bisot and Romain Serizel and Slim Essid and Gaël Richard Feature Learning With Matrix Factorization Applied to Acoustic Scene Classification . . . . . . . . . . . . . 1216--1229 Yong Xu and Qiang Huang and Wenwu Wang and Peter Foster and Siddharth Sigtia and Philip J. B. Jackson and Mark D. Plumbley Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging . . . . . . . . . . . . . . . . 1230--1241 René Grzeszick and Axel Plinge and Gernot A. Fink Bag-of-Features Methods for Acoustic Event Detection and Classification . . . 1242--1252 Alain Rakotomamonjy Supervised Representation Learning for Audio Scene Classification . . . . . . . 1253--1265 Emmanouil Benetos and Grégoire Lafay and Mathieu Lagrange and Mark D. Plumbley Polyphonic Sound Event Tracking Using Linear Dynamical Systems . . . . . . . . 1266--1277 Huy Phan and Lars Hertel and Marco Maass and Philipp Koch and Radoslaw Mazur and Alfred Mertins Improved Audio Scene Classification Based on Label-Tree Embeddings and Convolutional Neural Networks . . . . . 1278--1290 Emre Çak\i r and Giambattista Parascandolo and Toni Heittola and Heikki Huttunen and Tuomas Virtanen Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection . . 1291--1303 Jens Schröder and Niko Moritz and Jörn Anemüller and Stefan Goetze and Birger Kollmeier Classifier Architectures for Acoustic Scenes and Events: Implications for DNNs, TDNNs, and Perceptual Features from DCASE 2016 . . . . . . . . . . . . 1304--1314 Wenjun Yang and Sridhar Krishnan Combining Temporal Features by Local Binary Pattern for Acoustic Scene Classification . . . . . . . . . . . . . 1315--1321 David Dov and Ronen Talmon and Israel Cohen Multimodal Kernel Method for Activity Detection of Sound Sources . . . . . . . 1322--1334 Keisuke Imoto and Nobutaka Ono Spatial Cepstrum as a Spatial Feature Using a Distributed Microphone Array for Acoustic Scene Analysis . . . . . . . . 1335--1343 Ivo Trowitzsch and Johannes Mohr and Youssef Kashef and Klaus Obermayer Robust Detection of Environmental Sounds in Binaural Auditory Scenes . . . . . . 1344--1356 Abu Shafin Mohammad Mahdee Jameel and Shaikh Anowarul Fattah and Rajib Goswami and Wei-Ping Zhu and M. Omair Ahmad Noise Robust Formant Frequency Estimation Method Based on Spectral Model of Repeated Autocorrelation of Speech . . . . . . . . . . . . . . . . . 1357--1370 Na Li and Man-Wai Mak and Jen-Tzung Chien DNN-Driven Mixture of PLDA for Robust Speaker Verification . . . . . . . . . . 1371--1383 Kai Wu and Vaninirappuputhenpurayil Gopalan Reju and Andy W. H. Khong and Shu Ting Goh Swarm Intelligence Based Particle Filter for Alternating Talker Localization and Tracking Using Microphone Arrays . . . . 1384--1397 Anonymous \booktitleIEEE/ACM Transactions on Audio, Speech, and Language Processing Edics . . . . . . . . . . . . . . . . . 1398--1399 Anonymous \booktitleIEEE/ACM Transactions on Audio, Speech, and Language Processing for authors . . . . . . . . . . . . . . 1400--1401 Anonymous Open Access . . . . . . . . . . . . . . 1402 Anonymous Introducing IEEE Collabratec . . . . . . 1403 Anonymous Member Get-A-Member (MGM) Program . . . 1404 Anonymous Front Cover . . . . . . . . . . . . . . C1 Anonymous IEEE Signal Processing Society . . . . . C2 Anonymous IEEE Signal Processing Society . . . . . C3 Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 1405--1406 Anonymous Table of Contents Edics . . . . . . . . 1407--1408 Yu-An Chen and Ju-Chiang Wang and Yi-Hsuan Yang and Homer H. Chen Component Tying for Mixture Model Adaptation in Personalization of Music Emotion Recognition . . . . . . . . . . 1409--1420 Hossein Zeinali and Hossein Sameti and Luká\vs Burget HMM-Based Phrase-Independent $i$-Vector Extractor for Text-Dependent Speaker Verification . . . . . . . . . . . . . . 1421--1435 Xinzhou Xu and Jun Deng and Nicholas Cummins and Zixing Zhang and Chen Wu and Li Zhao and Björn Schuller A Two-Dimensional Framework of Multiple Kernel Subspace Learning for Recognizing Emotion in Speech . . . . . . . . . . . 1436--1449 Mandy Korpusik and James Glass Spoken Language Understanding for a Nutrition Dialogue System . . . . . . . 1450--1461 Mahmoud Fakhry and Piergiorgio Svaizer and Maurizio Omologo Audio Source Separation in Reverberant Environments Using $ \beta $-Divergence-Based Nonnegative Factorization . . . . . . . . . . . . . 1462--1476 Bracha Laufer-Goldshtein and Ronen Talmon and Sharon Gannot Semi-Supervised Source Localization on Multiple Manifolds With Distributed Microphones . . . . . . . . . . . . . . 1477--1491 Donald S. Williamson and DeLiang Wang Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising . . . . . . . . . . . . . . . 1492--1501 Liang Lu and Steve Renals Small-Footprint Highway Deep Neural Networks for Speech Recognition . . . . 1502--1511 Ina Kodrasi and Simon Doclo Signal-Dependent Penalty Functions for Robust Acoustic Multi-Channel Equalization . . . . . . . . . . . . . . 1512--1525 Jung-Hee Kim and Jin Kim and Jae Hyeon Jeon and Sang Won Nam Delayless Individual-Weighting-Factors Sign Subband Adaptive Filter With Band-Dependent Variable Step-Sizes . . . 1526--1534 Yannan Wang and Jun Du and Li-Rong Dai and Chin-Hui Lee A Gender Mixture Detection Approach to Unsupervised Single-Channel Speech Separation Based on Deep Neural Networks 1535--1546 Giacomo Vairetti and Enzo De Sena and Michael Catrysse and Sòren Holdt Jensen and Marc Moonen and Toon van Waterschoot A Scalable Algorithm for Physically Motivated and Sparse Approximation of Room Impulse Responses With Orthonormal Basis Functions . . . . . . . . . . . . 1547--1561 Anonymous \booktitleIEEE/ACM Transactions on Audio, Speech, and Language Processing Edics . . . . . . . . . . . . . . . . . 1562--1563 Anonymous \booktitleIEEE Transactions on Multimedia information for authors . . . 1564--1565 Anonymous Open Access . . . . . . . . . . . . . . 1566 Anonymous Introducing IEEE Collabratec . . . . . . 1567 Anonymous Front Cover . . . . . . . . . . . . . . C1 Anonymous IEEE Signal Processing Society . . . . . C2 Anonymous IEEE Signal Processing Society . . . . . C3 Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 1562--1563 Anonymous Table of Contents . . . . . . . . . . . 1564--1565 Francis Stevens and Damian T. Murphy and Lauri Savioja and Vesa Välimäki Modeling Sparsely Reflecting Outdoor Acoustic Scenes Using the Waveguide Web 1566--1578 Ferdinando Olivieri and Filippo Maria Fazi and Simone Fontana and Dylan Menzies and Philip Arthur Nelson Generation of Private Sound With a Circular Loudspeaker Array and the Weighted Pressure Matching Method . . . 1579--1591 Samy Elshamy and Nilesh Madhu and Wouter Tirry and Tim Fingscheidt Instantaneous A Priori SNR Estimation by Cepstral Excitation Manipulation . . . . 1592--1605 Paavo Alku and Rahim Saeidi The Linear Predictive Modeling of Speech From Higher-Lag Autocorrelation Coefficients Applied to Noise-Robust Speaker Recognition . . . . . . . . . . 1606--1617 Cheng Pang and Hong Liu and Jie Zhang and Xiaofei Li Binaural Sound Localization Based on Reverberation Weighting and Generalized Parametric Mapping . . . . . . . . . . . 1618--1632 Somanath Pradhan and Vinal Patel and Dipen Somani and Nithin V. George An Improved Proportionate Delayless Multiband-Structured Subband Adaptive Feedback Canceller for Digital Hearing Aids . . . . . . . . . . . . . . . . . . 1633--1643 Szymon Drgas and Tuomas Virtanen and Jörg Lücke and Antti Hurmalainen Binary Non-Negative Matrix Deconvolution for Audio Dictionary Learning . . . . . 1644--1656 Fatemeh Saki and Nasser Kehtarnavaz Real-Time Unsupervised Classification of Environmental Noise Signals . . . . . . 1657--1667 Lakshmish Kaushik and Abhijeet Sangwan and John H. L. Hansen Automatic Sentiment Detection in Naturalistic Audio . . . . . . . . . . . 1668--1679 Ofer Schwartz and Sharon Gannot and Emanuël A. P. Habets Cramér--Rao Bound Analysis of Reverberation Level Estimators for Dereverberation and Noise Reduction . . 1680--1693 Seyran Khademi and Richard C. Hendriks and W. Bastiaan Kleijn Intelligibility Enhancement Based on Mutual Information . . . . . . . . . . . 1694--1708 Yuta Hatano and Chuang Shi and Yoshinobu Kajikawa Compensation for Nonlinear Distortion of the Frequency Modulation-Based Parametric Array Loudspeaker . . . . . . 1709--1717 Yu-Ren Chien and Daryush D. Mehta and Jón Gu\ethnason and Matías Zañartu and Thomas F. Quatieri Evaluation of Glottal Inverse Filtering Algorithms Using a Physiologically Based Articulatory Speech Synthesizer . . . . 1718--1730 Anonymous \booktitleIEEE/ACM Transactions on Audio, Speech, and Language Processing Edics . . . . . . . . . . . . . . . . . 1731--1732 Anonymous \booktitleIEEE Transactions on Multimedia information for authors . . . 1733--1734 Anonymous Open Access . . . . . . . . . . . . . . 1735 Anonymous Introducing IEEE Collabratec . . . . . . 1736 Anonymous Front Cover . . . . . . . . . . . . . . C1 Anonymous IEEE Signal Processing Society . . . . . C2 Anonymous IEEE Signal Processing Society . . . . . C3 Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 1737--1738 Anonymous Table of Contents . . . . . . . . . . . 1739--1740 Jakob Abeßer and Gerald Schuller Instrument-Centered Music Transcription of Solo Bass Guitar Recordings . . . . . 1741--1750 Thomas Le Cornu and Ben Milner Generating Intelligible Audio Speech From Visual Speech . . . . . . . . . . . 1751--1761 Lemao Liu and Atsushi Fujita and Masao Utiyama and Andrew Finch and Eiichiro Sumita Translation Quality Estimation Using Only Bilingual Corpora . . . . . . . . . 1762--1772 Emad M. Grais and Gerard Roma and Andrew J. R. Simpson and Mark D. Plumbley Two-Stage Single-Channel Audio Source Separation Using Deep Neural Networks 1773--1783 Giuliano Bernardi and Toon van Waterschoot and Jan Wouters and Marc Moonen Adaptive Feedback Cancellation Using a Partitioned-Block Frequency-Domain Kalman Filter Approach With PEM-Based Signal Prewhitening . . . . . . . . . . 1784--1798 Vinal Patel and Jordan Cheer and Nithin V. George Modified Phase-Scheduled-Command FxLMS Algorithm for Active Sound Profiling . . 1799--1808 Killian Janod and Mohamed Morchid and Richard Dufour and Georges Linar\`es and Renato De Mori Denoised Bottleneck Features From Deep Autoencoders for Telephone Conversation Analysis . . . . . . . . . . . . . . . . 1809--1820 Nikolaos Stefanakis and Despoina Pavlidi and Athanasios Mouchtaris Perpendicular Cross-Spectra Fusion for Sound Source Localization With a Planar Microphone Array . . . . . . . . . . . . 1821--1835 Takenori Yoshimura and Kei Hashimoto and Keiichiro Oura and Yoshihiko Nankaku and Keiichi Tokuda Simultaneous Optimization of Multiple Tree-Based Factor Analyzed HMM for Speech Synthesis . . . . . . . . . . . . 1836--1845 Eita Nakamura and Kazuyoshi Yoshii and Simon Dixon Note Value Recognition for Piano Transcription Using Markov Random Fields 1846--1858 Anonymous \booktitleIEEE/ACM Transactions on Audio, Speech, and Language Processing Edics . . . . . . . . . . . . . . . . . 1859--1860 Anonymous \booktitleIEEE Transactions on Multimedia information for authors . . . 1861--1862 Anonymous Open Access . . . . . . . . . . . . . . 1863 Anonymous Front Cover . . . . . . . . . . . . . . C1 Anonymous IEEE Signal Processing Society . . . . . C2 Anonymous IEEE Signal Processing Society . . . . . C3 Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 1859--1860 Anonymous Table of Contents . . . . . . . . . . . 1861--1862 Xiaohai Tian and Siu Wa Lee and Zhizheng Wu and Eng Siong Chng and Haizhou Li An Exemplar-Based Approach to Frequency Warping for Voice Conversion . . . . . . 1863--1876 Siying Wang and Sebastian Ewert and Simon Dixon Identifying Missing and Extra Notes in Piano Recordings Using Score-Informed Dictionary Learning . . . . . . . . . . 1877--1889 Sandro Cumani and Pietro Laface Joint Estimation of PLDA and Nonlinear Transformations of Speaker Vectors . . . 1890--1900 Morten Kolbæk and Dong Yu and Zheng-Hua Tan and Jesper Jensen Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks . . . . . . . . . . . . . . . . 1901--1913 Cheng-Tao Chung and Cheng-Yu Tsai and Chia-Hsiang Liu and Lin-Shan Lee Unsupervised Iterative Deep Learning of Speech Features and Acoustic Tokens with Applications to Spoken Term Detection 1914--1928 Niccol\`o Antonello and Enzo De Sena and Marc Moonen and Patrick A. Naylor and Toon van Waterschoot Room Impulse Response Interpolation Using a Sparse Spatio-Temporal Representation of the Sound Field . . . 1929--1941 Yanmin Qian and Nanxin Chen and Heinrich Dinkel and Zhizheng Wu Deep Feature Engineering for Noise Robust Spoofing Detection . . . . . . . 1942--1955 Sina Hafezi and Alastair H. Moore and Patrick A. Naylor Augmented Intensity Vectors for Direction of Arrival Estimation in the Spherical Harmonic Domain . . . . . . . 1956--1968 Byeongho Jo and Jung-Woo Choi Spherical Harmonic Smoothing for Localizing Coherent Sound Sources . . . 1969--1984 Emma Jokinen and Ulpu Remes and Paavo Alku Intelligibility Enhancement of Telephone Speech Using Gaussian Process Regression for Normal-to-Lombard Spectral Tilt Conversion . . . . . . . . . . . . . . . 1985--1996 Xiaofei Li and Laurent Girin and Radu Horaud and Sharon Gannot Multiple-Speaker Localization Based on Direct-Path Features and Likelihood Maximization With Spatial Sparsity Regularization . . . . . . . . . . . . . 1997--2012 Marc Arnela and Oriol Guasch Finite Element Synthesis of Diphthongs Using Tuned Two-Dimensional Vocal Tracts 2013--2023 Deepak Baby and Hugo Van hamme Joint Denoising and Dereverberation Using Exemplar-Based Sparse Representations and Decaying Norm Constraint . . . . . . . . . . . . . . . 2024--2035 Anonymous \booktitleIEEE/ACM Transactions on Audio, Speech, and Language Processing Edics . . . . . . . . . . . . . . . . . 2036--2037 Anonymous \booktitleIEEE Transactions on Multimedia information for authors . . . 2038--2039 Anonymous Open Access . . . . . . . . . . . . . . 2040 Anonymous Front Cover . . . . . . . . . . . . . . C1 Anonymous IEEE Signal Processing Society . . . . . C2 Anonymous IEEE Signal Processing Society . . . . . C3 Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 2041--2042 Anonymous Table of Contents . . . . . . . . . . . 2043--2044 Qinghua Huang and Lin Zhang and Yong Fang Two-Stage Decoupled DOA Estimation Based on Real Spherical Harmonics for Spherical Arrays . . . . . . . . . . . . 2045--2058 Tomoki Hayashi and Shinji Watanabe and Tomoki Toda and Takaaki Hori and Jonathan Le Roux and Kazuya Takeda Duration-Controlled LSTM for Polyphonic Sound Event Detection . . . . . . . . . 2059--2070 Monisankha Pal and Goutam Saha Spectral Mapping Using Prior Re-Estimation of $i$-Vectors and System Fusion for Voice Conversion . . . . . . 2071--2084 Seppo Enarvi and Peter Smit and Sami Virpioja and Mikko Kurimo Automatic Speech Recognition With Very Large Conversational Finnish and Estonian Vocabularies . . . . . . . . . 2085--2097 Hannah Muckenhirn and Pavel Korshunov and Mathew Magimai-Doss and Sébastien Marcel Long-Term Spectral Statistics for Voice Presentation Attack Detection . . . . . 2098--2111 Brian Hamilton and Stefan Bilbao FDTD Methods for $3$-D Room Acoustics Simulation With High-Order Accuracy in Space and Time . . . . . . . . . . . . . 2112--2124 Pejman Mowlaee and Martin Blass and W. Bastiaan Kleijn New Results in Modulation-Domain Single-Channel Speech Enhancement . . . 2125--2137 Dylan Menzies and Filippo Maria Fazi Decoding and Compression of Channel and Scene Objects for Spatial Audio . . . . 2138--2151 Eunwoo Song and Frank K. Soong and Hong-Goo Kang Effective Spectral and Excitation Modeling Techniques for LSTM--RNN-Based Speech Synthesis Systems . . . . . . . . 2152--2161 Pulkit Sharma and Vinayak Abrol and Anil Kumar Sao Deep-Sparse-Representation-Based Features for Speech Recognition . . . . 2162--2175 Iynkaran Natgunanathan and Yong Xiang and Guang Hua and Gleb Beliakov and John Yearwood Patchwork-Based Multilayer Audio Watermarking . . . . . . . . . . . . . . 2176--2187 Chengzhu Yu and John H. L. Hansen Active Learning Based Constrained Clustering For Speaker Diarization . . . 2188--2198 Emil Solsbæk Ottosen and Monika Dörfler A Phase Vocoder Based on Nonstationary Gabor Frames . . . . . . . . . . . . . . 2199--2208 Boaz Schwartz and Sharon Gannot and Emanuël A. P. Habets Two Model-Based EM Algorithms for Blind Source Separation in Noisy Environments 2209--2222 Maja Taseska and Emanuël A. P. Habets Nonstationary Noise PSD Matrix Estimation for Multichannel Blind Speech Extraction . . . . . . . . . . . . . . . 2223--2236 Bruno Di Giorgi and Simon Dixon and Massimiliano Zanoni and Augusto Sarti A Data-Driven Model of Tonal Chord Sequence Complexity . . . . . . . . . . 2237--2250 N. Stefanakis and D. Pavlidi and A. Mouchtaris Corrections to ``Perpendicular Cross-Spectra Fusion for Sound Source Localization With a Planar Microphone Array'' [Sep 17 1821--1835] . . . . . . 2251 Anonymous \booktitleIEEE/ACM Transactions on Audio, Speech, and Language Processing Edics . . . . . . . . . . . . . . . . . 2252--2253 Anonymous \booktitleIEEE Transactions on Multimedia information for authors . . . 2254--2255 Anonymous Open Access . . . . . . . . . . . . . . 2256 Anonymous Front Cover . . . . . . . . . . . . . . C1 Anonymous IEEE Signal Processing Society . . . . . C2 Anonymous IEEE Signal Processing Society . . . . . C3 Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of Contents . . . . . . . . . . . 2252--2253 T. Schultz and T. Hueber and D. J. Krusienski and J. S. Brumberg Introduction to the Special Issue on Biosignal-Based Spoken Communication . . 2254--2256 Tanja Schultz and Michael Wand and Thomas Hueber and Dean J. Krusienski and Christian Herff and Jonathan S. Brumberg Biosignal-Based Spoken Communication: a Survey . . . . . . . . . . . . . . . . . 2257--2271 Christopher Dromey and Katherine M. Black Effects of Laryngeal Activity on Articulation . . . . . . . . . . . . . . 2272--2280 Michal Borsky and Daryush D. Mehta and Jarrad H. Van Stan and Jon Gudnason Modal and Nonmodal Voice Quality Classification Using Acoustic and Electroglottographic Features . . . . . 2281--2291 Alborz Rezazadeh Sereshkeh and Robert Trott and Aurélien Bricout and Tom Chau EEG Classification of Covert Speech Using Regularized Neural Networks . . . 2292--2300 Reza Sahraeian and Dirk Van Compernolle Crosslingual and Multilingual Speech Recognition Based on the Speech Manifold 2301--2312 \Dbaror\dbare T. Grozdi\'c and Slobodan T. Jovi\vci\'c Whispered Speech Recognition Using Deep Denoising Autoencoder and Inverse Filtering . . . . . . . . . . . . . . . 2313--2322 Myungjong Kim and Beiming Cao and Ted Mau and Jun Wang Speaker-Independent Silent Speech Recognition From Flesh-Point Articulatory Movements Using an LSTM Neural Network . . . . . . . . . . . . . 2323--2336 Patrick Lumban Tobing and Kazuhiro Kobayashi and Tomoki Toda Articulatory Controllable Speech Modification Based on Statistical Inversion and Production Mappings . . . 2337--2350 Ingmar Steiner and Sébastien Le Maguer and Alexander Hewer Synthesis of Tongue Motion and Acoustics From Text Using a Multimodal Articulatory Database . . . . . . . . . 2351--2361 Jose A. Gonzalez and Lam A. Cheah and Angel M. Gomez and Phil D. Green and James M. Gilbert and Stephen R. Ell and Roger K. Moore and Ed Holdsworth Direct Speech Reconstruction From Articulatory Sensor Data by Machine Learning . . . . . . . . . . . . . . . . 2362--2374 Matthias Janke and Lorenz Diener EMG-to-Speech: Direct Generation of Speech From Facial Electromyographic Signals . . . . . . . . . . . . . . . . 2375--2385 Geoffrey S. Meltzner and James T. Heaton and Yunbin Deng and Gianluca De Luca and Serge H. Roy and Joshua C. Kline Silent Speech Recognition as an Alternative Communication Device for Persons With Laryngectomy . . . . . . . 2386--2398 Fei Chen and Lan Wang and Hui Chen and Gang Peng Investigations on Mandarin Aspiratory Animations Using an Airflow Model . . . 2399--2409 Wayne Xiong and Jasha Droppo and Xuedong Huang and Frank Seide and Michael L. Seltzer and Andreas Stolcke and Dong Yu and Geoffrey Zweig Toward Human Parity in Conversational Speech Recognition . . . . . . . . . . . 2410--2423 Biao Zhang and Deyi Xiong and Jinsong Su and Hong Duan A Context-Aware Recurrent Encoder for Neural Machine Translation . . . . . . . 2424--2432 Afsaneh Asaei and Milos Cernak and Hervé Bourlard Perceptual Information Loss due to Impaired Speech Production . . . . . . . 2433--2443 Ning Ma and Tobias May and Guy J. Brown Exploiting Deep Neural Networks and Head Movements for Robust Binaural Localization of Multiple Sources in Reverberant Environments . . . . . . . . 2444--2453 Anonymous List of Reviewers . . . . . . . . . . . 2454--2457 Anonymous \booktitleIEEE/ACM Transactions on Audio, Speech, and Language Processing Edics . . . . . . . . . . . . . . . . . 2458--2459 Anonymous \booktitleIEEE Transactions on Multimedia information for authors . . . 2460--2461 Anonymous Open Access . . . . . . . . . . . . . . 2462 Anonymous 2017 Subject Index \booktitleIEEE Transactions on Applied Superconductivity Vol. 27 . . . . . . . 2463--2488 Anonymous Front Cover . . . . . . . . . . . . . . C1 Anonymous IEEE Signal Processing Society . . . . . C2 Anonymous IEEE Signal Processing Society . . . . . C3 Anonymous Blank page . . . . . . . . . . . . . . . C4
Anonymous Table of contents . . . . . . . . . . . 1--2 Anonymous Table of Contents [Edics] . . . . . . . 3--4 Dianna Yee and Homayoun Kamkar-Parsi and Rainer Martin and Henning Puder A Noise Reduction Postfilter for Binaurally Linked Single-Microphone Hearing Aids Utilizing a Nearby External Microphone . . . . . . . . . . . . . . . 5--18 Tom Bäckstròm and Johannes Fischer Fast Randomization for Distributed Low-Bitrate Coding of Speech and Audio 19--30 Jun Deng and Xinzhou Xu and Zixing Zhang and Sascha Frühholz and Björn Schuller Semisupervised Autoencoders for Speech Emotion Recognition . . . . . . . . . . 31--43 Md. Sahidullah and Dennis Alexander Lehmann Thomsen and Rosa Gonzalez Hautamäki and Tomi Kinnunen and Zheng-Hua Tan and Robert Parts and Martti Pitkänen Robust Voice Liveness Detection and Speaker Verification Using Throat Microphones . . . . . . . . . . . . . . 44--56 Gilles Degottex and Pierre Lanchantin and Mark Gales A Log Domain Pulse Model for Parametric Speech Synthesis . . . . . . . . . . . . 57--70 Johannes Abel and Tim Fingscheidt Artificial Speech Bandwidth Extension Using Deep Neural Networks for Wideband Spectral Envelope Estimation . . . . . . 71--83 Yuki Saito and Shinnosuke Takamichi and Hiroshi Saruwatari Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks . . . . . . . . . . . . . . . . 84--96 Kristian Timm Andersen and Marc Moonen Robust Speech-Distortion Weighted Interframe Wiener Filters for Single-Channel Noise Reduction . . . . . 97--107 Chen-Yu Chiang Cross-Dialect Adaptation Framework for Constructing Prosodic Models for Chinese Dialect Text-to-Speech Systems . . . . . 108--121 Bingquan Liu and Zhen Xu and Chengjie Sun and Baoxun Wang and Xiaolong Wang and Derek F. Wong and Min Zhang Content-Oriented User Modeling for Personalized Response Ranking in Chatbots . . . . . . . . . . . . . . . . 122--133 Zhiyuan Tang and Dong Wang and Yixiang Chen and Lantian Li and Andrew Abel Phonetic Temporal Neural Model for Language Identification . . . . . . . . 134--144 Soumitro Chakrabarty and Emanuël A. P. Habets A Bayesian Approach to Informed Spatial Filtering With Robustness Against DOA Estimation Errors . . . . . . . . . . . 145--160 Kuan-Yu Chen and Shih-Hung Liu and Berlin Chen and Hsin-Min Wang An Information Distillation Framework for Extractive Summarization . . . . . . 161--170 Ma Jin and Yan Song and Ian McLoughlin and Li-Rong Dai LID-Senones and Their Statistics for Language Identification . . . . . . . . 171--183 Zhehuai Chen and Jasha Droppo and Jinyu Li and Wayne Xiong Progressive Joint Modeling in Unsupervised Single-Channel Overlapped Speech Recognition . . . . . . . . . . . 184--196 Shivesh Ranjan and John H. L. Hansen Curriculum Learning Based Approaches for Noise Robust Speaker Recognition . . . . 197--210
Yoshiaki Bando and Katsutoshi Itoyama and Masashi Konyo and Satoshi Tadokoro and Kazuhiro Nakadai and Kazuyoshi Yoshii and Tatsuya Kawahara and Hiroshi G. Okuno Speech Enhancement Based on Bayesian Low-Rank and Sparse Decomposition of Multichannel Magnitude Spectrograms . . 215--230 Yu-Ping Ruan and Qian Chen and Zhen-Hua Ling A Sequential Neural Encoder With Latent Structured Description for Modeling Sentences . . . . . . . . . . . . . . . 231--242 Amelia J. Gully and Helena Daffern and Damian T. Murphy Diphthong Synthesis Using the Dynamic $3$D Digital Waveguide Mesh . . . . . . 243--255 Chunyang Wu and Mark J. F. Gales and Anton Ragni and Penny Karanasou and Khe Chai Sim Improving Interpretability and Regularization in Deep Learning . . . . 256--265 Kehai Chen and Tiejun Zhao and Muyun Yang and Lemao Liu and Akihiro Tamura and Rui Wang and Masao Utiyama and Eiichiro Sumita A Neural Approach to Source Dependence Based Context Model for Statistical Machine Translation . . . . . . . . . . 266--280 Joonas Nikunen and Aleksandr Diment and Tuomas Virtanen Separation of Moving Sound Sources Using Multichannel NMF and Acoustic Tracking 281--295 Johan Swärd and Hongbin Li and Andreas Jakobsson Off-Grid Fundamental Frequency Estimation . . . . . . . . . . . . . . . 296--303 Dylan Menzies and Marcos F. Simón Gálvez and Filippo Maria Fazi A Low-Frequency Panning Method With Compensation for Head Rotation . . . . . 304--317 Branimir Dropulji\'c and Igor Miji\'c and Davor Petrinovi\'c and Tanja Jovanovic and Kre\vsimir \'Cosi\'c Vocal Analysis of Acoustic Startle Responses . . . . . . . . . . . . . . . 318--329 Philipp Aichinger and Martin Hagmüller and Berit Schneider-Stickler and Jean Schoentgen and Franz Pernkopf Tracking of Multiple Fundamental Frequencies in Diplophonic Voices . . . 330--341 Anastasios Alexandridis and Athanasios Mouchtaris Multiple Sound Source Location Estimation in Wireless Acoustic Sensor Networks Using DOA Estimates: The Data-Association Problem . . . . . . . . 342--356 Robert Rehr and Timo Gerkmann On the Importance of Super-Gaussian Speech Priors for Machine-Learning Based Speech Enhancement . . . . . . . . . . . 357--366 Sonia Djaziri-Larbi and Gaël Mahé and Imen Mezghani and Monia Turki and Mériem Ja\"\idane Watermark-Driven Acoustic Echo Cancellation . . . . . . . . . . . . . . 367--378 Annamaria Mesaros and Toni Heittola and Emmanouil Benetos and Peter Foster and Mathieu Lagrange and Tuomas Virtanen and Mark D. Plumbley Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge . . . . . . . . . . . . . 379--393 Cheng-Tao Chung and Lin-Shan Lee Unsupervised Discovery of Structured Acoustic Tokens With Applications to Spoken Term Detection . . . . . . . . . 394--405 Tobias May Robust Speech Dereverberation With a Neural Network-Based Post-Filter That Exploits Multi-Conditional Training of Binaural Cues . . . . . . . . . . . . . 406--414 Majid Mirbagheri and Les Atlas and Adrian K. C. Lee Regression Factor Analysis With an Application to Continuous HRIR Measurement . . . . . . . . . . . . . . 415--421 Jen-Tzung Chien Bayesian Nonparametric Learning for Hierarchical and Sparse Topics . . . . . 422--435 Johannes Stahl and Pejman Mowlaee A Pitch-Synchronous Simultaneous Detection-Estimation Framework for Speech Enhancement . . . . . . . . . . . 436--450
Anonymous Front Cover . . . . . . . . . . . . . . C1--C1 Anonymous IEEE Signal Processing Society . . . . . C2--C2 Anonymous Table of Contents . . . . . . . . . . . 457--458 Anonymous Table of Contents [Edics] . . . . . . . 459--460 C. D. Salvador and S. Sakamoto and J. Treviño and Y. Suzuki Boundary Matching Filters for Spherical Microphone and Loudspeaker Arrays . . . 461--474 A. H. Abdelaziz Comparing Fusion Models for DNN-Based Audiovisual Continuous Speech Recognition . . . . . . . . . . . . . . 475--484 S. Emura Residual Echo Reduction for Multichannel Acoustic Echo Cancelers With a Complex-Valued Residual Echo Estimate 485--500 V. H. Do and N. F. Chen and B. P. Lim and M. A. Hasegawa-Johnson Multitask Learning for Phone Recognition of Underresourced Languages Using Mismatched Transcription . . . . . . . . 501--514 M. Zohourian and G. Enzner and R. Martin Binaural Speaker Localization Integrated Into an Adaptive Beamformer for Hearing Aids . . . . . . . . . . . . . . . . . . 515--528 Y. Xiang and I. Natgunanathan and D. Peng and G. Hua and B. Liu Spread Spectrum Audio Watermarking Using Multiple Orthogonal PN Sequences and Variable Embedding Strengths and Polarities . . . . . . . . . . . . . . . 529--539 C. Tan and F. Wei and Q. Zhou and N. Yang and B. Du and W. Lv and M. Zhou Context-Aware Answer Sentence Selection With Hierarchical Gated Recurrent Neural Networks . . . . . . . . . . . . . . . . 540--549 J. Zhang and S. P. Chepuri and R. C. Hendriks and R. Heusdens Microphone Subset Selection for MVDR Beamformer Based Noise Reduction . . . . 550--563 S. Wang and P. Lin and Y. Tsao and J. Hung and B. Su Suppression by Selecting Wavelets for Feature Compression in Distributed Speech Recognition . . . . . . . . . . . 564--579 Y. Wang and M. Brookes Model-Based Speech Enhancement in the Modulation Domain . . . . . . . . . . . 580--594 C. Huemmer and C. Hofmann and R. Maas and W. Kellermann Estimating Parameters of Nonlinear Systems Using the Elitist Particle Filter Based on Evolutionary Strategies 595--608 D. Salvati and C. Drioli and G. L. Foresti A Low-Complexity Robust Beamforming Using Diagonal Unloading for Acoustic Source Localization . . . . . . . . . . 609--622 J. Su and J. Zeng and D. Xiong and Y. Liu and M. Wang and J. Xie A Hierarchy-to-Sequence Attentional Neural Machine Translation Model . . . . 623--632 W. B. Kheder and D. Matrouf and M. Ajili and J. Bonastre A Unified Joint Model to Deal With Nuisance Variabilities in the $i$-Vector Space . . . . . . . . . . . . . . . . . 633--645 G. Gelly and J. Gauvain Optimization of RNN-Based Speech Activity Detection . . . . . . . . . . . 646--656 M. Taseska and E. A. P. Habets Blind Source Separation of Moving Sources Using Sparsity-Based Source Detection and Tracking . . . . . . . . . 657--670 L. Yu and J. Wang and K. R. Lai and X. Zhang Refining Word Embeddings Using Intensity Scores for Sentiment Analysis . . . . . 671--681 Y. Dorfan and A. Plinge and G. Hazan and S. Gannot Distributed Expectation-Maximization Algorithm for Speaker Localization in Reverberant Environments . . . . . . . . 682--695 Anonymous \booktitleIEEE/ACM Transactions on Audio, Speech, and Language Processing Edics . . . . . . . . . . . . . . . . . 696--697 Anonymous \booktitleIEEE Transactions on Multimedia information for authors . . . 698--699 Anonymous Open Access . . . . . . . . . . . . . . 700--700 Anonymous Introducing IEEE Collabratec . . . . . . 701--701 Anonymous IEEE Signal Processing Society . . . . . C3--C3 Anonymous Blank page . . . . . . . . . . . . . . . C4--C4
Anonymous Front Cover . . . . . . . . . . . . . . C1--C1 Anonymous IEEE Signal Processing Society . . . . . C2--C2 Anonymous Table of Contents . . . . . . . . . . . 696--697 Anonymous Table of Contents [Edics] . . . . . . . 698--699 Z. Tan and M. Mak and B. K. Mak DNN-Based Score Calibration With Multitask Learning for Noise Robust Speaker Verification . . . . . . . . . . 700--712 Y. Hu and Z. Ling Extracting Spectral Features Using Deep Autoencoders With Binary Distributed Hidden Units for Statistical Parametric Speech Synthesis . . . . . . . . . . . . 713--724 B. Laufer-Goldshtein and R. Talmon and S. Gannot A Hybrid Approach for Speaker Tracking Based on TDOA and Data-Driven Models . . 725--735 S. Cumani and P. Laface Speaker Recognition Using e Vectors . . 736--748 L. Xu and K. A. Lee and H. Li and Z. Yang Generalizing I-Vector Estimation for Rapid Speaker Recognition . . . . . . . 749--759 Y. Buchris and I. Cohen and J. Benesty Frequency-Domain Design of Asymmetric Circular Differential Microphone Arrays 760--773 J. Zhang and T. D. Abhayapala and W. Zhang and P. N. Samarasinghe and S. Jiang Active Noise Control Over Space: a Wave Domain Approach . . . . . . . . . . . . 774--786 Y. Luo and Z. Chen and N. Mesgarani Speaker-Independent Speech Separation With Deep Attractor Network . . . . . . 787--796 N. M. Joy and S. R. Kothinti and S. Umesh FMLLR Speaker Normalization With i-Vector: In Pseudo-FMLLR and Distillation Framework . . . . . . . . . 797--805 S. Chandna and W. Wang Bootstrap Averaging for Model-Based Source Separation in Reverberant Conditions . . . . . . . . . . . . . . . 806--819 Z. Tan and M. Mak and B. K. Mak and Y. Zhu Denoised Senone I-Vectors for Robust Speaker Verification . . . . . . . . . . 820--830 K. Itakura and Y. Bando and E. Nakamura and K. Itoyama and K. Yoshii and T. Kawahara Bayesian Multichannel Audio Source Separation Based on Integrated Source and Spatial Models . . . . . . . . . . . 831--846 Anonymous \booktitleIEEE/ACM Transactions on Audio, Speech, and Language Processing Edics . . . . . . . . . . . . . . . . . 847--848 Anonymous \booktitleIEEE Transactions on Multimedia information for authors . . . 849--850 Anonymous Open Access . . . . . . . . . . . . . . 851--851 Anonymous Introducing IEEE Collabratec . . . . . . 852--852 Anonymous IEEE Signal Processing Society . . . . . C3--C3 Anonymous Blank page . . . . . . . . . . . . . . . C4--C4
Anonymous Table of Contents . . . . . . . . . . . 853--854 Anonymous Table of Contents [Edics] . . . . . . . 855--856 Y. E. Baba and A. Walther and E. A. P. Habets $3$D Room Geometry Inference Based on Room Impulse Response Stacks . . . . . . 857--872 Q. Zhang and J. H. L. Hansen Language/Dialect Recognition Based on Unsupervised Deep Learning . . . . . . . 873--882 Z. Ling and Y. Ai and Y. Gu and L. Dai Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension . . . . . 883--894 M. Delcroix and K. Kinoshita and A. Ogawa and C. Huemmer and T. Nakatani Context Adaptive Neural Network Based Acoustic Models for Rapid Adaptation . . 895--908 L. T. T. Tran and S. E. Nordholm and H. Schepker and H. H. Dam and S. Doclo Two-Microphone Hearing Aids Using Prediction Error Method for Adaptive Feedback Control . . . . . . . . . . . . 909--923 J. Chang and M. Marschall Periphony-Lattice Mixed-Order Ambisonic Scheme for Spherical Microphone Arrays 924--936 N. Dionelis and M. Brookes Phase-Aware Single-Channel Speech Enhancement With Modulation-Domain Kalman Filtering . . . . . . . . . . . . 937--950 C. Zheng and A. Deleforge and X. Li and W. Kellermann Statistical Analysis of the Multichannel Wiener Filter Using a Bivariate Normal Distribution for Sample Covariance Matrices . . . . . . . . . . . . . . . . 951--966 C. Vaz and V. Ramanarayanan and S. Narayanan Acoustic Denoising Using Dictionary Learning With Spectral and Temporal Regularization . . . . . . . . . . . . . 967--980 L. Wang and A. Cavallaro Pseudo-Determined Blind Source Separation for Ad-hoc Microphone Networks . . . . . . . . . . . . . . . . 981--994 S. Cumani and P. Laface Scoring Heterogeneous Speaker Vectors Using Nonlinear Transformations and Tied PLDA Models . . . . . . . . . . . . . . 995--1009 G. Bernardi and T. van Waterschoot and J. Wouters and M. Moonen Subjective and Objective Sound-Quality Evaluation of Adaptive Feedback Cancellation Algorithms . . . . . . . . 1010--1024 Anonymous \booktitleIEEE/ACM Transactions on Audio, Speech, and Language Processing Edics . . . . . . . . . . . . . . . . . 1025--1026 Anonymous \booktitleIEEE Transactions on Multimedia information for authors . . . 1027--1028 Anonymous Front Cover . . . . . . . . . . . . . . C1--C1 Anonymous IEEE Signal Processing Society . . . . . C2--C2 Anonymous IEEE Signal Processing Society . . . . . C3--C3 Anonymous Blank page . . . . . . . . . . . . . . . C4--C4
Anonymous Front Cover . . . . . . . . . . . . . . C1--C1 Anonymous IEEE Signal Processing Society . . . . . C2--C2 Anonymous Table of Contents . . . . . . . . . . . 1025--1026 Anonymous Table of Contents [Edics] . . . . . . . 1027--1028 H. Kameoka and T. Higuchi and M. Tanaka and L. Li Nonnegative Matrix Factorization With Basis Clustering Using Cepstral Distance Regularization . . . . . . . . . . . . . 1029--1040 J. Donley and C. Ritz and W. B. Kleijn Multizone Soundfield Reproduction With Privacy- and Quality-Based Speech Masking Filters . . . . . . . . . . . . 1041--1055 S. Braun and A. Kuklasi ski and O. Schwartz and O. Thiergart and E. A. P. Habets and S. Gannot and S. Doclo and J. Jensen Evaluation and Comparison of Late Reverberation Power Spectral Density Estimators . . . . . . . . . . . . . . . 1056--1071 E. L. Benaroya and N. Obin and M. Liuni and A. Roebel and W. Raumel and S. Argentieri Binaural Localization of Multiple Sound Sources by Non-Negative Tensor Factorization . . . . . . . . . . . . . 1072--1082 N. Perraudin and N. Holighaus and P. Majdak and P. Balazs Inpainting of Long Audio Segments With Similarity Graphs . . . . . . . . . . . 1083--1094 P. Magron and R. Badeau and B. David Model-Based STFT Phase Recovery for Audio Source Separation . . . . . . . . 1095--1105 I. Kodrasi and S. Doclo Analysis of Eigenvalue Decomposition-Based Late Reverberation Power Spectral Density Estimation . . . 1106--1118 S. Braun and E. A. P. Habets Linear Prediction-Based Online Dereverberation and Noise Reduction Using Alternating Kalman Filters . . . . 1119--1129 D. Ram and A. Asaei and H. Bourlard Sparse Subspace Modeling for Query by Example Spoken Term Detection . . . . . 1130--1143 M. Krawczyk-Becker and T. Gerkmann On Speech Enhancement Under PSD Uncertainty . . . . . . . . . . . . . . 1144--1153 S. Leglaive and R. Badeau and G. Richard Student's $t$-Source and Mixing Models for Multichannel Audio Source Separation 1154--1168 Anonymous \booktitleIEEE/ACM Transactions on Audio, Speech, and Language Processing Edics . . . . . . . . . . . . . . . . . 1169--1170 Anonymous \booktitleIEEE Transactions on Multimedia information for authors . . . 1171--1172 Anonymous IEEE Signal Processing Society . . . . . C3--C3 Anonymous Blank page . . . . . . . . . . . . . . . C4--C4
Anonymous Front Cover . . . . . . . . . . . . . . C1--C1 Anonymous IEEE Signal Processing Society . . . . . C2--C2 Anonymous Table of Contents . . . . . . . . . . . 1173--1174 Anonymous Table of Contents [Edics] . . . . . . . 1175--1176 T. Yoshimura and K. Hashimoto and K. Oura and Y. Nankaku and K. Tokuda Mel-Cepstrum-Based Quantization Noise Shaping Applied to Neural-Network-Based Speech Waveform Synthesis . . . . . . . 1177--1184 Q. Wang and J. Du and L. Dai and C. Lee A Multiobjective Learning and Ensembling Approach to High-Performance Speech Enhancement With Compact Neural Network Architectures . . . . . . . . . . . . . 1185--1197 M. Á. Del-Agua and A. Giménez and A. Sanchis and J. Civera and A. Juan Speaker-Adapted Confidence Measures for ASR Using Deep Bidirectional Recurrent Neural Networks . . . . . . . . . . . . 1198--1206 J. Proença and C. Lopes and M. Tjalve and A. Stolcke and S. Candeias and F. Perdigão Mispronunciation Detection in Children's Reading of Sentences . . . . . . . . . . 1207--1219 Ljubi\vsa Stankovi\'c and Milo\vs Brajovi\'c Analysis of the Reconstruction of Sparse Signals in the DCT Domain Applied to Audio Signals . . . . . . . . . . . . . 1220--1235 J. F. Santos and T. H. Falk Speech Dereverberation With Context-Aware Recurrent Neural Networks 1236--1246 M. Geronazzo and S. Spagnol and F. Avanzini Do We Need Individual Head-Related Transfer Functions for Vertical Localization? The Case Study of a Spectral Notch Distance Metric . . . . . 1247--1260 D. Marquardt and S. Doclo Interaural Coherence Preservation for Binaural Noise Reduction Using Partial Noise Estimation and Spectral Postfiltering . . . . . . . . . . . . . 1261--1274 M. Farmani and M. S. Pedersen and Z. Tan and J. Jensen Bias-Compensated Informed Sound Source Localization Using Relative Transfer Functions . . . . . . . . . . . . . . . 1275--1289 F. Tao and C. Busso Gating Neural Network for Large Vocabulary Audiovisual Speech Recognition . . . . . . . . . . . . . . 1290--1302 Anonymous \booktitleIEEE/ACM Transactions on Audio, Speech, and Language Processing Edics . . . . . . . . . . . . . . . . . 1303--1304 Anonymous \booktitleIEEE Transactions on Multimedia information for authors . . . 1305--1306 Anonymous IEEE Signal Processing Society . . . . . C3--C3 Anonymous Blank page . . . . . . . . . . . . . . . C4--C4
Anonymous Front Cover . . . . . . . . . . . . . . C1--C1 Anonymous IEEE Signal Processing Society . . . . . C2--C2 Anonymous Table of Contents . . . . . . . . . . . 1303--1304 Anonymous Table of Contents [Edics] . . . . . . . 1305--1306 Z. Rafii and A. Liutkus and F. Stöter and S. I. Mimilakis and D. FitzGerald and B. Pardo An Overview of Lead and Accompaniment Separation in Music . . . . . . . . . . 1307--1335 C. Wang and J. Wang and A. Santoso and C. Chiang and C. Wu Sound Event Recognition Using Auditory-Receptive-Field Binary Pattern and Hierarchical-Diving Deep Belief Network . . . . . . . . . . . . . . . . 1336--1351 L. Yang and M. Zhang and Y. Liu and M. Sun and N. Yu and G. Fu Joint POS Tagging and Dependence Parsing With Transition-Based Neural Networks 1352--1358 K. Yu and Z. Zhao and X. Wu and H. Lin and X. Liu Rich Short Text Conversation Using Semantic-Key-Controlled Sequence Generation . . . . . . . . . . . . . . . 1359--1368 B. Lehner and J. Schlüter and G. Widmer Online, Loudness-Invariant Vocal Detection in Mixed Music Signals . . . . 1369--1380 S. Stone and M. Marxen and P. Birkholz Construction and Evaluation of a Parametric One-Dimensional Vocal Tract Model . . . . . . . . . . . . . . . . . 1381--1392 T. Tan and Y. Qian and H. Hu and Y. Zhou and W. Ding and K. Yu Adaptive Very Deep Convolutional Residual Network for Noise Robust Speech Recognition . . . . . . . . . . . . . . 1393--1405 X. Wang and S. Takaki and J. Yamagishi Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis 1406--1419 C. Valentini-Botinhao and J. Yamagishi Speech Enhancement of Noisy and Reverberant Speech for Text-to-Speech 1420--1433 A. I. Koutrouvelis and T. W. Sherson and R. Heusdens and R. C. Hendriks A Low-Cost Robust Distributed Linearly Constrained Beamformer for Wireless Acoustic Sensor Networks With Arbitrary Topology . . . . . . . . . . . . . . . . 1434--1448 Anonymous \booktitleIEEE/ACM Transactions on Audio, Speech, and Language Processing Edics . . . . . . . . . . . . . . . . . 1449--1450 Anonymous \booktitleIEEE Transactions on Multimedia information for authors . . . 1451--1452 Anonymous IEEE Signal Processing Society . . . . . C3--C3 Anonymous Blank page . . . . . . . . . . . . . . . C4--C4
Anonymous Front Cover . . . . . . . . . . . . . . C1--C1 Anonymous IEEE Signal Processing Society . . . . . C2--C2 Anonymous Table of Contents . . . . . . . . . . . 1453--1454 Anonymous Table of Contents [Edics] . . . . . . . 1455--1456 C. Wu and C. Dittmar and C. Southall and R. Vogl and G. Widmer and J. Hockman and M. Müller and A. Lerch A Review of Automatic Drum Transcription 1457--1483 C. Evers and P. A. Naylor Acoustic SLAM . . . . . . . . . . . . . 1484--1498 C. Laroche and M. Kowalski and H. Papadopoulos and G. Richard Hybrid Projective Nonnegative Matrix Factorization With Drum Dictionaries for Harmonic/Percussive Source Separation 1499--1511 J. J. Carabias-Orti and J. Nikunen and T. Virtanen and P. Vera-Candeas Multichannel Blind Sound Source Separation Using Spatial Covariance Model With Level and Time Differences and Nonnegative Matrix Factorization . . 1512--1527 M. Zhang and N. Yu and G. Fu A Simple and Effective Neural Model for Joint Word Segmentation and POS Tagging 1528--1538 D. Menzies and F. M. Fazi A Complex Panning Method for Near-Field Imaging . . . . . . . . . . . . . . . . 1539--1548 A. Misra and J. H. L. Hansen Maximum-Likelihood Linear Transformation for Unsupervised Domain Adaptation in Speaker Verification . . . . . . . . . . 1549--1558 Y. Wakabayashi and T. Fukumori and M. Nakayama and T. Nishiura and Y. Yamashita Single-Channel Speech Enhancement With Phase Reconstruction Based on Phase Distortion Averaging . . . . . . . . . . 1559--1569 S. Fu and T. Wang and Y. Tsao and X. Lu and H. Kawai End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks . . . . . 1570--1584 K. Xiao and S. Wang and M. Wan and L. Wu Radiated Noise Suppression for Electrolarynx Speech Based on Multiband Time-Domain Amplitude Modulation . . . . 1585--1593 A. Fahim and P. N. Samarasinghe and T. D. Abhayapala PSD Estimation and Source Separation in a Noisy Reverberant Environment Using a Spherical Microphone Array . . . . . . . 1594--1607 H. He and J. Chen and J. Benesty and T. Yang Noise Robust Frequency-Domain Adaptive Blind Multichannel Identification With$ \ell_p$-Norm Constraint . . . . . . . . 1608--1619 W. Zhang and Z. Chen and F. Yin and Q. Zhang Melody Extraction From Polyphonic Music Using Particle Filter and Dynamic Programming . . . . . . . . . . . . . . 1620--1632 C. Zhang and K. Koishida and J. H. L. Hansen Text-Independent Speaker Verification Based on Triplet Convolutional Neural Network Embeddings . . . . . . . . . . . 1633--1644 A. R. MV and P. K. Ghosh PSFM A Probabilistic Source Filter Model for Noise Robust Glottal Closure Instant Detection . . . . . . . . . . . . . . . 1645--1657 M. Airaksinen and L. Juvela and B. Bollepalli and J. Yamagishi and P. Alku A Comparison Between STRAIGHT, Glottal, and Sinusoidal Vocoding in Statistical Parametric Speech Synthesis . . . . . . 1658--1670 G. Mahé and M. Ja\"\idane Perceptually Controlled Reshaping of Sound Histograms . . . . . . . . . . . . 1671--1683 Q. Huang and L. Zhang and Y. Fang Two-Step Spherical Harmonics ESPRIT-Type Algorithms and Performance Analysis . . 1684--1697 Anonymous \booktitleIEEE/ACM Transactions on Audio, Speech, and Language Processing Edics . . . . . . . . . . . . . . . . . 1698--1699 Anonymous \booktitleIEEE Transactions on Multimedia information for authors . . . 1700--1702 Anonymous IEEE Signal Processing Society . . . . . C3--C3 Anonymous Blank page . . . . . . . . . . . . . . . C4--C4
Anonymous Front Cover . . . . . . . . . . . . . . C1--C1 Anonymous IEEE Signal Processing Society . . . . . C2--C2 Anonymous Table of Contents . . . . . . . . . . . 1698--1699 Anonymous Table of Contents [Edics] . . . . . . . 1700--1701 D. Wang and J. Chen Supervised Speech Separation Based on Deep Learning: an Overview . . . . . . . 1702--1726 R. Wang and M. Utiyama and A. Finch and L. Liu and K. Chen and E. Sumita Sentence Selection and Weighting for Neural Machine Translation Domain Adaptation . . . . . . . . . . . . . . . 1727--1741 F. U. Khan and B. P. Milner and T. Le Cornu Using Visual Speech Information in Masking Methods for Audio Speaker Separation . . . . . . . . . . . . . . . 1742--1754 X. Li and S. Gannot and L. Girin and R. Horaud Multichannel Identification and Nonnegative Equalization for Dereverberation and Noise Reduction Based on Convolutive Transfer Function 1755--1768 Lütfi Kerem \cSenel and \.Ihsan Utlu and Veysel Yücesoy and Aykut Koç and Tolga Çukur Semantic Structure and Interpretability of Word Embeddings . . . . . . . . . . . 1769--1779 Y. Koizumi and K. Niwa and Y. Hioka and K. Kobayashi and Y. Haneda DNN-Based Source Enhancement to Increase Objective Sound Quality Assessment Score 1780--1792 C. Paleologu and J. Benesty and S. Ciochin\ua Linear System Identification Based on a Kronecker Product Decomposition . . . . 1793--1808 F. Xiong and S. Goetze and B. Kollmeier and B. T. Meyer Exploring Auditory-Inspired Acoustic Features for Room Acoustic Parameter Estimation From Monaural Speech . . . . 1809--1820 G. Le Lan and D. Charlet and A. Larcher and S. Meignier An Adaptive Method for Cross-Recording Speaker Diarization . . . . . . . . . . 1821--1832 W. Xue and A. H. Moore and M. Brookes and P. A. Naylor Modulation-Domain Multichannel Kalman Filtering for Speech Enhancement . . . . 1833--1847 K. Wu and V. G. Reju and A. W. H. Khong Multisource DOA Estimation in a Reverberant Environment Using a Single Acoustic Vector Sensor . . . . . . . . . 1848--1859 J. Huang and Y. Sun and W. Zhang and H. Wang and T. Liu Entity Highlight Generation as Statistical and Neural Machine Translation . . . . . . . . . . . . . . 1860--1872 Q. T. Do and S. Sakti and S. Nakamura Sequence-to-Sequence Models for Emphasis Speech Translation . . . . . . . . . . . 1873--1883 F. Fontana and E. Bozzo Explicit Fixed-Point Computation of Nonlinear Delay-Free Loop Filter Networks . . . . . . . . . . . . . . . . 1884--1896 S. Widmark Causal IIR Audio Precompensator Filters Subject to Quadratic Constraints . . . . 1897--1912 F. Winter and H. Wierstorf and C. Hold and F. Krüger and A. Raake and S. Spors Colouration in Local Wave Field Synthesis . . . . . . . . . . . . . . . 1913--1924 A. H. Andersen and J. M. de Haan and Z. Tan and J. Jensen Nonintrusive Speech Intelligibility Prediction Using Convolutional Neural Networks . . . . . . . . . . . . . . . . 1925--1939 Anonymous \booktitleIEEE/ACM Transactions on Audio, Speech, and Language Processing Edics . . . . . . . . . . . . . . . . . 1940--1941 Anonymous \booktitleIEEE Transactions on Multimedia information for authors . . . 1942--1944 Anonymous IEEE Signal Processing Society . . . . . C3--C3 Anonymous Blank page . . . . . . . . . . . . . . . C4--C4
Anonymous Front Cover . . . . . . . . . . . . . . C1--C1 Anonymous IEEE Signal Processing Society . . . . . C2--C2 Anonymous Table of Contents . . . . . . . . . . . 1945--1946 Anonymous Table of Contents [Edics] . . . . . . . 1947--1948 H. Hadian and H. Sameti and D. Povey and S. Khudanpur Flat-Start Single-Stage Discriminatively Trained HMM-Based Models for ASR . . . . 1949--1961 F. Katzberg and R. Mazur and M. Maass and P. Koch and A. Mertins A Compressed Sensing Framework for Dynamic Sound-Field Measurements . . . . 1962--1975 H. Sundar and T. V. Sreenivas and C. S. Seelamantula TDOA-Based Multiple Acoustic Source Localization Without Association Ambiguity . . . . . . . . . . . . . . . 1976--1990 R. Sahraeian and D. Van Compernolle Cross-Entropy Training of DNN Ensemble Acoustic Models for Low-Resource ASR . . 1991--2001 H. Dinkel and Y. Qian and K. Yu Investigating Raw Wave Deep Neural Networks for End-to-End Speaker Spoofing Detection . . . . . . . . . . . . . . . 2002--2014 J. Zhang and R. Heusdens and R. C. Hendriks Rate-Distributed Spatial Filtering Based Noise Reduction in Wireless Acoustic Sensor Networks . . . . . . . . . . . . 2015--2026 M. Heck and S. Sakti and S. Nakamura Dirichlet Process Mixture of Mixtures Model for Unsupervised Subword Modeling 2027--2042 S. Nie and S. Liang and W. Liu and X. Zhang and J. Tao Deep Learning Based Speech Separation via NMF-Style Reconstructions . . . . . 2043--2055 H. Dubey and A. Sangwan and J. H. L. Hansen Leveraging Frequency-Dependent Kernel and DIP-Based Clustering for Robust Speech Activity Detection in Naturalistic Audio Streams . . . . . . . 2056--2071 Y. Jang and J. Ham and B. Lee and K. Kim Cross-Language Neural Dialog State Tracker for Large Ontologies Using Hierarchical Attention . . . . . . . . . 2072--2082 G. Weisz and P. Budzianowski and P. Su and M. Ga\vsi\'c Sample Efficient Deep Reinforcement Learning for Dialogue Systems With Large Action Spaces . . . . . . . . . . . . . 2083--2097 S. Lin Reverberation-Robust Localization of Speakers Using Distinct Speech Onsets and Multichannel Cross Correlations . . 2098--2111 S. Abidin and R. Togneri and F. Sohel Spectrotemporal Analysis Using Local Binary Pattern Variants for Acoustic Scene Classification . . . . . . . . . . 2112--2121 N. Ma and J. A. Gonzalez and G. J. Brown Robust Binaural Localization of a Target Sound Source by Combining Spectral Source Models and Deep Neural Networks 2122--2131 S. Wu and D. Zhang and Z. Zhang and N. Yang and M. Li and M. Zhou Dependency-to-Dependency Neural Machine Translation . . . . . . . . . . . . . . 2132--2141 J. Xu and H. He and X. Sun and X. Ren and S. Li Cross-Domain and Semisupervised Named Entity Recognition in Chinese Social Media: a Unified Model . . . . . . . . . 2142--2152 S. Van Kuyk and W. B. Kleijn and R. C. Hendriks An Evaluation of Intrusive Instrumental Intelligibility Metrics . . . . . . . . 2153--2166 X. Ouyang and K. Gu and P. Zhou Spatial Pyramid Pooling Mechanism in 3D Convolutional Network for Sentence-Level Classification . . . . . . . . . . . . . 2167--2179 B. McFee and J. Salamon and J. P. Bello Adaptive Pooling Operators for Weakly Labeled Sound Event Detection . . . . . 2180--2193 I. Barbancho and G. Tzanetakis and A. M. Barbancho and L. J. Tardón Discrimination Between Ascending/Descending Pitch Arpeggios . . 2194--2203 Y. Kim and M. Kim and J. Goo and H. Kim Learning Self-Informed Feature Contribution for Deep Learning-Based Acoustic Modeling . . . . . . . . . . . 2204--2214 M. B. Çöteli and O. Olgun and H. Hacìhabibo\uglu Multiple Sound Source Localization With Steered Response Power Density and Hierarchical Grid Refinement . . . . . . 2215--2229 J. Bao and Y. Gong and N. Duan and M. Zhou and T. Zhao Question Generation With Doubly Adversarial Nets . . . . . . . . . . . . 2230--2239 B. Bu and C. Bao and M. Jia Design of a Planar First-Order Loudspeaker Array for Global Active Noise Control . . . . . . . . . . . . . 2240--2250 Anonymous \booktitleIEEE/ACM Transactions on Audio, Speech, and Language Processing Edics . . . . . . . . . . . . . . . . . 2251--2252 Anonymous \booktitleIEEE Transactions on Multimedia information for authors . . . 2253--2255 Anonymous IEEE Signal Processing Society . . . . . C3--C3 Anonymous Blank page . . . . . . . . . . . . . . . C4--C4
Anonymous Front Cover . . . . . . . . . . . . . . C1--C1 Anonymous IEEE Signal Processing Society . . . . . C2--C2 Anonymous Table of Contents . . . . . . . . . . . 2251--2252 Anonymous Table of Contents [Edics] . . . . . . . 2253--2254 X. Wang and Z. Tu and M. Zhang Incorporating Statistical Machine Translation Word Knowledge Into Neural Machine Translation . . . . . . . . . . 2255--2266 Y. Zhao and M. Kuruvilla-Dugdale and M. Song Structured Sparse Spectral Transforms and Structural Measures for Voice Conversion . . . . . . . . . . . . . . . 2267--2276 H. Salehi and D. Suelzle and P. Folkeard and V. Parsa Learning-Based Reference-Free Speech Quality Measures for Hearing Aid Applications . . . . . . . . . . . . . . 2277--2288 G. Enzner and P. Thüne Bayesian MMSE Filtering of Noisy Speech by SNR Marginalization With Global PSD Priors . . . . . . . . . . . . . . . . . 2289--2304 G. Huang and J. Chen and J. Benesty Insights Into Frequency-Invariant Beamforming With Concentric Circular Microphone Arrays . . . . . . . . . . . 2305--2318 Ayana and S. Shen and Y. Chen and C. Yang and Z. Liu and M. Sun Zero-Shot Cross-Lingual Neural Headline Generation . . . . . . . . . . . . . . . 2319--2327 S. Surendran and T. K. Kumar Oblique Projection and Cepstral Subtraction in Signal Subspace Speech Enhancement for Colored Noise Reduction 2328--2340 Q. Li and D. F. Wong and L. S. Chao and M. Zhu and T. Xiao and J. Zhu and M. Zhang Linguistic Knowledge-Aware Neural Machine Translation . . . . . . . . . . 2341--2354 W. Zhang and C. Hofmann and M. Buerger and T. D. Abhayapala and W. Kellermann Spatial Noise-Field Control With Online Secondary Path Modeling: a Wave-Domain Approach . . . . . . . . . . . . . . . . 2355--2370 A. Meynard and B. Torrésani Spectral Analysis for Nonstationary Audio . . . . . . . . . . . . . . . . . 2371--2380 I. Martín-Morató and M. Cobos and F. J. Ferri Adaptive Mid-Term Representations for Robust Audio Event Classification . . . 2381--2392 G. Firtha and P. Fiala and F. Schultz and S. Spors On the General Relation of Wave Field Synthesis and Spectral Division Method for Linear Arrays . . . . . . . . . . . 2393--2403 P. Birkholz and S. Stone and K. Wolf and D. Plettemeier Non-Invasive Silent Phoneme Recognition Using Microwave Signals . . . . . . . . 2404--2411 W. Lin and M. Mak and J. Chien Multisource I-Vectors Domain Adaptation Using Maximum Mean Discrepancy Based Autoencoders . . . . . . . . . . . . . . 2412--2422 M. Abdelwahab and C. Busso Domain Adversarial for Acoustic Emotion Recognition . . . . . . . . . . . . . . 2423--2435 D. El Badawy and I. Dokmani\'c Direction of Arrival With One Microphone, a Few LEGOs, and Non-Negative Matrix Factorization . . . 2436--2446 H. Lee and P. Chung and Y. Wu and T. Lin and T. Wen Interactive Spoken Content Retrieval by Deep Reinforcement Learning . . . . . . 2447--2459 S. Elshamy and N. Madhu and W. Tirry and T. Fingscheidt DNN-Supported Speech Enhancement With Cepstral Estimation of Both Excitation and Envelope . . . . . . . . . . . . . . 2460--2474 Y. Bao and H. Chen A Chance-Constrained Programming Approach to the Design of Robust Broadband Beamformers With Microphone Mismatches . . . . . . . . . . . . . . . 2475--2488 Anonymous Farewell Editorial . . . . . . . . . . . 2489--2489 Anonymous List of Reviewers . . . . . . . . . . . 2490--2496 Anonymous \booktitleIEEE/ACM Transactions on Audio, Speech, and Language Processing Edics . . . . . . . . . . . . . . . . . 2497--2498 Anonymous \booktitleIEEE Transactions on Multimedia information for authors . . . 2499--2501 Anonymous IEEE Open Access Publishing . . . . . . 2502--2502 Anonymous 2018 Index \booktitleIEEE/ACM Transactions on Audio, Speech, and Language Processing Vol. 26 . . . . . . 2503--2528 Anonymous IEEE Signal Processing Society . . . . . C3--C3 Anonymous Blank page . . . . . . . . . . . . . . . C4--C4
Anonymous Table of contents . . . . . . . . . . . C1--1 Anonymous IEEE Signal Processing Society . . . . . C2--C2 Anonymous Table of Contents [Edics] . . . . . . . 2--3 Anonymous [Blank page] . . . . . . . . . . . . . . B4--B4 Anonymous Inaugural Editorial Innovations in an Era of Ubiquitous Audio, Speech, and Language Processing . . . . . . . . . . 5--6 F. Bao and W. H. Abdulla A New Ratio Mask Representation for CASA-Based Speech Enhancement . . . . . 7--19 P. Magron and T. Virtanen Complex ISNMF: a Phase-Aware Model for Monaural Audio Source Separation . . . . 20--31 T. T. H. Duong and N. Q. K. Duong and P. C. Nguyen and C. Q. Nguyen Gaussian Modeling-Based Multichannel Audio Source Separation Exploiting Generic Source Spectral Model . . . . . 32--43 G. Zhang and J. Tao and X. Qiu and I. Burnett Decentralized Two-Channel Active Noise Control for Single Frequency by Shaping Matrix Eigenvalues . . . . . . . . . . . 44--52 Y. Zhao and Z. Wang and D. Wang Two-Stage Deep Learning for Noisy-Reverberant Speech Enhancement . . 53--62 N. Zheng and X. Zhang Phase-Aware Speech Enhancement Based on Deep Neural Networks . . . . . . . . . . 63--76 T. Moriya and T. Tanaka and T. Shinozaki and S. Watanabe and K. Duh Evolution-Strategy-Based Automation of System Development for High-Performance Speech Recognition . . . . . . . . . . . 77--88 H. Kamper and G. Shakhnarovich and K. Livescu Semantic Speech Retrieval With a Visually Grounded Model of Untranscribed Speech . . . . . . . . . . . . . . . . . 89--98 M. S. Kavalekalam and J. K. Nielsen and J. B. Boldt and M. G. Christensen Model-Based Speech Enhancement for Intelligibility Improvement in Binaural Hearing Aids . . . . . . . . . . . . . . 99--113 A. R. MV and P. K. Ghosh Glottal Inverse Filtering Using Probabilistic Weighted Linear Prediction 114--124 Y. Sun and W. Wang and J. Chambers and S. M. Naqvi Two-Stage Monaural Source Separation in Reverberant Room Environments Using Deep Neural Networks . . . . . . . . . . . . 125--139 L. Ferrer and M. K. Nandwana and M. McLaren and D. Castan and A. Lawson Toward Fail-Safe Speaker Recognition: Trial-Based Calibration With a Reject Option . . . . . . . . . . . . . . . . . 140--153 J. Amini and R. C. Hendriks and R. Heusdens and M. Guo and J. Jensen Asymmetric Coding for Rate-Constrained Noise Reduction in Binaural Hearing Aids 154--167 J. Yu and J. Jiang and R. Xia Global Inference for Aspect and Opinion Terms Co-Extraction Based on Multi-Task Neural Networks . . . . . . . . . . . . 168--177 Z. Wang and X. Zhang and D. Wang Robust Speaker Localization Guided by Deep Learning-Based Time-Frequency Masking . . . . . . . . . . . . . . . . 178--188 K. Tan and J. Chen and D. Wang Gated Residual Networks With Dilated Convolutions for Monaural Speech Enhancement . . . . . . . . . . . . . . 189--198 G. H. Ngo and M. Nguyen and N. F. Chen Phonology-Augmented Statistical Framework for Machine Transliteration Using Limited Linguistic Resources . . . 199--211 Y. Koizumi and S. Saito and H. Uematsu and Y. Kawachi and N. Harada Unsupervised Detection of Anomalous Sound Based on Deep Learning and the Neyman--Pearson Lemma . . . . . . . . . 212--224 Y. Laufer and S. Gannot A Bayesian Hierarchical Model for Speech Enhancement With Time-Varying Audio Channel . . . . . . . . . . . . . . . . 225--239 Anonymous Erratum for Nonlinear Audio Systems Identification Through Audio Input Gaussianization . . . . . . . . . . . . 240--240 Anonymous IEEE Signal Processing Society . . . . . C3--C3
Anonymous Table of Contents . . . . . . . . . . . C1--241 Anonymous IEEE Signal Processing Society . . . . . C2--C2 Anonymous Table of Contents[Edics] . . . . . . . . 242--243 T. Nakashika and S. Takaki and J. Yamagishi Complex-Valued Restricted Boltzmann Machine for Speaker-Dependent Speech Parameterization From Complex Spectra 244--254 F. Xiong and S. Goetze and B. Kollmeier and B. T. Meyer Joint Estimation of Reverberation Time and Early-To-Late Reverberation Ratio From Single-Channel Speech Signals . . . 255--267 F. Stöter and S. Chakrabarty and B. Edler and E. A. P. Habets CountNet: Estimating the Number of Concurrent Speakers Using Supervised Learning . . . . . . . . . . . . . . . . 268--282 M. Kolbæk and Z. Tan and J. Jensen On the Relationship Between Short-Time Objective Intelligibility and Short-Time Spectral-Amplitude Mean-Square Error for Speech Enhancement . . . . . . . . . . . 283--295 M. W. Hansen and J. R. Jensen and M. G. Christensen Estimation of Fundamental Frequencies in Stereophonic Music Mixtures . . . . . . 296--310 J. Bao and D. Tang and N. Duan and Z. Yan and M. Zhou and T. Zhao Text Generation From Tables . . . . . . 311--320 A. I. Koutrouvelis and R. C. Hendriks and R. Heusdens and J. Jensen A Convex Approximation of the Relaxed Binaural Beamforming Optimization Problem . . . . . . . . . . . . . . . . 321--331 T. Hashimoto and D. Saito and N. Minematsu Many-to-Many and Completely Parallel-Data-Free Voice Conversion Based on Eigenspace DNN . . . . . . . . 332--341 F. Pishdadian and B. Pardo Multi-Resolution Common Fate Transform 342--354 Y. Wu and W. Li Automatic Audio Chord Recognition With MIDI-Trained Deep Feature and BLSTM-CRF Sequence Decoding Model . . . . . . . . 355--366 K. Imoto and N. Ono Acoustic Topic Model for Scene Analysis With Intermittently Missing Observations 367--382 K. Xiao and S. Wang and M. Wan and L. Wu Reconstruction of Mandarin Electrolaryngeal Fricatives With Hybrid Noise Source . . . . . . . . . . . . . . 383--391 L. Krishnan and T. Betlehem and P. D. Teal Fast Algorithms for Acoustic Impulse Response Shaping . . . . . . . . . . . . 392--403 V. Zakeri and A. J. Hodgson Automatic Identification of Hard and Soft Bone Tissues by Analyzing Drilling Sounds . . . . . . . . . . . . . . . . . 404--414 S. Bilbao and B. Hamilton Directional Sources in Wave-Based Acoustic Simulation . . . . . . . . . . 415--428 Y. Zhang and B. Pardo and Z. Duan Siamese Style Convolutional Neural Networks for Sound Search by Vocal Imitation . . . . . . . . . . . . . . . 429--441 F. Feng and M. Kowalski Underdetermined Reverberant Blind Source Separation: Sparse Approaches for Multiplicative and Convolutive Narrowband Approximation . . . . . . . . 442--456 Z. Wang and D. Wang Combining Spectral and Spatial Features for Deep Learning Based Blind Speaker Separation . . . . . . . . . . . . . . . 457--468 Anonymous IEEE Signal Processing Society . . . . . C3--C3
Anonymous Table of Contents . . . . . . . . . . . C1--469 Anonymous IEEE Signal Processing Society . . . . . C2--C2 Anonymous Table of Contents[Edics] . . . . . . . . 470--471 M. Z. Jahromi and A. Zahedi and J. Jensen and J. Òstergaard Information Loss in the Human Auditory System . . . . . . . . . . . . . . . . . 472--481 Y. Buchris and A. Amar and J. Benesty and I. Cohen Incoherent Synthesis of Sparse Arrays for Frequency-Invariant Beamforming . . 482--495 Y. Rahulamathavan and K. R. Sutharsini and I. G. Ray and R. Lu and M. Rajarajan Privacy-Preserving $i$Vector-Based Speaker Verification . . . . . . . . . . 496--506 J. Zhang and Y. Zhao and H. Li and C. Zong Attention With Sparsity Regularization for Neural Machine Translation and Summarization . . . . . . . . . . . . . 507--518 A. H. Moore and W. Xue and P. A. Naylor and M. Brookes Noise Covariance Matrix Estimation for Rotating Microphone Arrays . . . . . . . 519--530 G. Yang and H. He and Q. Chen Emotion-Semantic-Enhanced Neural Network 531--543 T. Dietzen and A. Spriet and W. Tirry and S. Doclo and M. Moonen and T. van Waterschoot Comparative Analysis of Generalized Sidelobe Cancellation and Multi-Channel Linear Prediction for Speech Dereverberation and Noise Reduction . . 544--558 J. Gao and J. Du and E. Chen Mixed-Bandwidth Cross-Channel Speech Recognition via Joint Optimization of DNN-Based Bandwidth Expansion and Acoustic Modeling . . . . . . . . . . . 559--571 S. Deena and M. Hasan and M. Doulaty and O. Saz and T. Hain Recurrent Neural Network Language Model Adaptation for Multi-Genre Broadcast Speech Recognition and Alignment . . . . 572--582 F. B. Gelderblom and T. V. Tronstad and E. M. Viggen Subjective Evaluation of a Noise-Reduced Training Target for Deep Neural Network-Based Speech Enhancement . . . . 583--594 M. Luis Valero and E. A. P. Habets Low-Complexity Multi-Microphone Acoustic Echo Control in the Short-Time Fourier Transform Domain . . . . . . . . . . . . 595--609 Q. Zhu and P. Coleman and X. Qiu and M. Wu and J. Yang and I. Burnett Robust Personal Audio Geometry Optimization in the SVD-Based Modal Domain . . . . . . . . . . . . . . . . . 610--620 J. Yi and J. Tao and Z. Wen and Y. Bai Language-Adversarial Transfer Learning for Low-Resource Speech Recognition . . 621--630 J. Zhang and Z. Ling and L. Liu and Y. Jiang and L. Dai Sequence-to-Sequence Acoustic Modeling for Voice Conversion . . . . . . . . . . 631--644 X. Li and L. Girin and S. Gannot and R. Horaud Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function . . . . . . . . . . . 645--659 Anonymous IEEE Signal Processing Society . . . . . C3--C3
Anonymous Table of Contents . . . . . . . . . . . C1--660 Anonymous IEEE Signal Processing Society . . . . . C2--C2 Anonymous Table of Contents[Edics] . . . . . . . . 661--662 Z. Zhao and H. Liu and T. Fingscheidt Convolutional Neural Networks to Enhance Coded Speech . . . . . . . . . . . . . . 663--678 H. Schepker and S. E. Nordholm and L. T. T. Tran and S. Doclo Null-Steering Beamformer-Based Feedback Cancellation for Multi-Microphone Hearing Aids With Incoming Signal Preservation . . . . . . . . . . . . . . 679--691 Z. Li and Y. Song and L. Dai and I. McLoughlin Listening and Grouping: an Online Autoregressive Approach for Monaural Speech Separation . . . . . . . . . . . 692--703 D. Deng and L. Jing and J. Yu and S. Sun and M. K. Ng Sentiment Lexicon Construction With Hierarchical Supervision Topic Model . . 704--718 M. Zhou and M. Huang and X. Zhu Story Ending Selection by Finding Hints From Pairwise Candidate Endings . . . . 719--729 J. Richter and J. Fels On the Influence of Continuous Subject Rotation During High-Resolution Head-Related Transfer Function Measurements . . . . . . . . . . . . . . 730--741 J. Yu and K. Markov and T. Matsui Articulatory and Spectrum Information Fusion Based on Deep Recurrent Neural Networks . . . . . . . . . . . . . . . . 742--752 F. P. Itturriet and M. H. Costa Perceptually Relevant Preservation of Interaural Time Differences in Binaural Hearing Aids . . . . . . . . . . . . . . 753--764 J. Abel and T. Fingscheidt Sinusoidal-Based Lowband Synthesis for Artificial Speech Bandwidth Extension 765--776 Q. Kong and Y. Xu and I. Sobieraj and W. Wang and M. D. Plumbley Sound Event Detection and Time Frequency Segmentation from Weakly Labelled Data 777--787 Y. Tuan and H. Lee Improving Conditional Sequence Generative Adversarial Networks by Stepwise Evaluation . . . . . . . . . . 788--798 N. Dionelis and M. Brookes Modulation-Domain Kalman Filtering for Monaural Blind Speech Denoising and Dereverberation . . . . . . . . . . . . 799--814 R. Lotfian and C. Busso Curriculum Learning for Speech Emotion Recognition From Crowdsourced Labels . . 815--826 S. Lin Robust Pitch Estimation and Tracking For Speakers Based on Subband Encoding and The Generalized Labeled Multi-Bernoulli Filter . . . . . . . . . . . . . . . . . 827--841 X. Wang and I. Cohen and J. Chen and J. Benesty On Robust and High Directive Beamforming With Small-Spacing Microphone Arrays for Scattered Sources . . . . . . . . . . . 842--852 Z. Quan and Z. Wang and Y. Le and B. Yao and K. Li and J. Yin An Efficient Framework for Sentence Similarity Modeling . . . . . . . . . . 853--865 N. Lubis and S. Sakti and K. Yoshino and S. Nakamura Positive Emotion Elicitation in Chat-Based Dialogue Systems . . . . . . 866--877 Anonymous IEEE Signal Processing Society . . . . . C3--C3
Anonymous Table of Contents . . . . . . . . . . . C1--878 Anonymous IEEE Signal Processing Society . . . . . C2--C2 Anonymous Table of Contents . . . . . . . . . . . 879--880 F. J. Ibarrola and R. D. Spies and L. E. D. Persia Switching Divergences for Spectral Learning in Blind Speech Dereverberation 881--891 I. Cohen and J. Benesty and J. Chen Differential Kronecker Product Beamforming . . . . . . . . . . . . . . 892--902 C. Elisei-Iliescu and C. Paleologu and J. Benesty and C. Stanciu and C. Anghel and S. Ciochin\ua Recursive Least-Squares Algorithms for the Identification of Low-Rank Systems 903--918 A. Kumar and T. Guha and P. K. Ghosh Dirichlet Latent Variable Model: a Dynamic Model Based on Dirichlet Prior for Audio Processing . . . . . . . . . . 919--931 P. Jancovic and M. Köküer Bird Species Recognition Using Unsupervised Modeling of Individual Vocalization Elements . . . . . . . . . 932--947 T. Koriyama and T. Kobayashi Statistical Parametric Speech Synthesis Using Deep Gaussian Processes . . . . . 948--959 K. Shimada and Y. Bando and M. Mimura and K. Itoyama and K. Yoshii and T. Kawahara Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition . . . . . . . . . . . . . . 960--971 S. Widmark Causal MSE-Optimal Filters for Personal Audio Subject to Constrained Contrast 972--987 Anonymous Article Awards for the \booktitleIEEE/ACM Transactions on Audio, Speech, and Language Processing 988--988 Anonymous IEEE Signal Processing Society . . . . . C3--C3
Anonymous Table of contents . . . . . . . . . . . C1--989 Anonymous IEEE Signal Processing Society . . . . . C2--C2 Anonymous Table of contents (EDICS) . . . . . . . 990--991 A. Mesaros and A. Diment and B. Elizalde and T. Heittola and E. Vincent and B. Raj and T. Virtanen Sound Event Detection in the DCASE 2017 Challenge . . . . . . . . . . . . . . . 992--1006 S. R. Chetupalli and T. V. Sreenivas Late Reverberation Cancellation Using Bayesian Estimation of Multi-Channel Linear Predictors and Student's $t$-Source Prior . . . . . . . . . . . . 1007--1018 L. Juvela and B. Bollepalli and V. Tsiaras and P. Alku GlotNet --- a Raw Waveform Model for the Glottal Excitation in Statistical Parametric Speech Synthesis . . . . . . 1019--1030 F. Winter and F. Schultz and G. Firtha and S. Spors A Geometric Model for Prediction of Spatial Aliasing in $ 2.5 $D Sound Field Synthesis . . . . . . . . . . . . . . . 1031--1046 Y. Liu and T. Lee and T. Law and K. Y. Lee Acoustical Assessment of Voice Disorder With Continuous Speech Using ASR Posterior Features . . . . . . . . . . . 1047--1059 C. Pörschmann and J. M. Arend and F. Brinkmann Directional Equalization of Sparse Head-Related Transfer Function Sets for Spatial Upsampling . . . . . . . . . . . 1060--1071 S. S. Payal and V. J. Mathews and D. J. Button and A. Iyer and R. H. Lambert and J. Hutchings and L. A. Azpicueta-Ruiz Equalization of Nonlinear Propagation Distortion in Cylindrical Waveguides . . 1072--1084 B. Sisman and M. Zhang and H. Li Group Sparse Representation With WaveNet Vocoder Adaptation for Spectrum and Prosody Conversion . . . . . . . . . . . 1085--1097 J. Lee and H. Kang A Joint Learning Algorithm for Complex-Valued T--F Masks in Deep Learning-Based Single-Channel Speech Enhancement Systems . . . . . . . . . . 1098--1108 Anonymous IEEE Signal Processing Society . . . . . C3--C3
Anonymous Table of Contents . . . . . . . . . . . C1--1109 Anonymous IEEE Signal Processing Society . . . . . C2--C2 Anonymous Table of Contents[Edics] . . . . . . . . 1110--1111 J. Fleßner and T. Biberger and S. D. Ewert Subjective and Objective Assessment of Monaural and Binaural Aspects of Audio Quality . . . . . . . . . . . . . . . . 1112--1125 B. Yusuf and B. Gundogdu and M. Saraclar Low Resource Keyword Search With Synthesized Crosslingual Exemplars . . . 1126--1135 A. I. Koutrouvelis and R. C. Hendriks and R. Heusdens and J. Jensen Robust Joint Estimation of Multimicrophone Signal Model Parameters 1136--1150 B. Cauchi and K. Siedenburg and J. F. Santos and T. H. Falk and S. Doclo and S. Goetze Non-Intrusive Speech Quality Prediction Using Modulation Energies and LSTM-Network . . . . . . . . . . . . . . 1151--1163 Y. Zhang and P. Zhang and Y. Yan Tailoring an Interpretable Neural Language Model . . . . . . . . . . . . . 1164--1178 A. Pandey and D. Wang A New Framework for CNN-Based Speech Enhancement in the Time Domain . . . . . 1179--1188 C. M. Vikram and N. Adiga and S. R. M. Prasanna Detection of Nasalized Voiced Stops in Cleft Palate Speech Using Epoch-Synchronous Features . . . . . . . 1189--1200 H. Luo and T. Li and B. Liu and B. Wang and H. Unger Improving Aspect Term Extraction With Bidirectional Dependency Tree Representation . . . . . . . . . . . . . 1201--1212 Anonymous IEEE Signal Processing Society . . . . . C3--C3
Anonymous Table of Contents . . . . . . . . . . . C1--1213 Anonymous IEEE Signal Processing Society . . . . . C2--C2 Anonymous Table of Contents . . . . . . . . . . . 1214--1215 T. Zhang and J. Wu Constrained Learned Feature Extraction for Acoustic Scene Classification . . . 1216--1228 L. Gabrielli and S. Tomassetti and S. Squartini and C. Zinato and S. Guaiana A Multi-Stage Algorithm for Acoustic Physical Model Parameters Estimation . . 1229--1240 B. Yang and H. Liu and C. Pang and X. Li Multiple Sound Source Counting and Localization Based on TF-Wise Spatial Spectrum Clustering . . . . . . . . . . 1241--1255 Y. Luo and N. Mesgarani Conv-TasNet: Surpassing Ideal Time Frequency Magnitude Masking for Speech Separation . . . . . . . . . . . . . . . 1256--1266 A. K. Sarkar and Z. Tan and H. Tang and S. Shon and J. Glass Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification . . . . . . . . . . 1267--1279 J. Chua and W. B. Kleijn A Low Latency Approach for Blind Source Separation . . . . . . . . . . . . . . . 1280--1294 C. Pan and J. Chen and J. Benesty and G. Shi On the Design of Target Beampatterns for Differential Microphone Arrays . . . . . 1295--1307 A. M. Azmi and M. N. Almutery and H. A. Aboalsamh Real-Word Errors in Arabic Texts: a Better Algorithm for Detection and Correction . . . . . . . . . . . . . . . 1308--1320 M. Korpusik and J. Glass Deep Learning for Database Mapping and Asking Clarification Questions in Dialogue Systems . . . . . . . . . . . . 1321--1334 J. Pak and J. W. Shin Sound Localization Based on Phase Difference Enhancement Using Deep Neural Networks . . . . . . . . . . . . . . . . 1335--1345 Anonymous IEEE Signal Processing Society . . . . . C3--C3
R. Ali and G. Bernardi and T. van Waterschoot and M. Moonen Methods of Extending a Generalized Sidelobe Canceller With External Microphones . . . . . . . . . . . . . . 1349--1364 X. Li and L. Girin and S. Gannot and R. Horaud Multichannel Online Dereverberation Based on Spectral Magnitude Inverse Filtering . . . . . . . . . . . . . . . 1365--1377 L. Chen and Z. Chen and B. Tan and S. Long and M. Ga\vsi\'c and K. Yu AgentGraph: Toward Universal Dialogue Management With Structured Deep Reinforcement Learning . . . . . . . . . 1378--1391 L. Li and J. Wang and J. Li and Q. Ma and J. Wei Relation Classification via Keyword-Attentive Sentence Mechanism and Synthetic Stimulation Loss . . . . . . . 1392--1404 M. B. Mòller and J. K. Nielsen and E. Fernandez-Grande and S. K. Olesen On the Influence of Transfer Function Noise on Sound Zone Control in a Room 1405--1418 Z. Xu and C. Sun and Y. Long and B. Liu and B. Wang and M. Wang and M. Zhang and X. Wang Dynamic Working Memory for Context-Aware Response Generation . . . . . . . . . . 1419--1431 H. Kameoka and T. Kaneko and K. Tanaka and N. Hojo ACVAE-VC: Non-Parallel Voice Conversion With Auxiliary Classifier Variational Autoencoder . . . . . . . . . . . . . . 1432--1443 X. Chen and X. Liu and Y. Wang and A. Ragni and J. H. M. Wong and M. J. F. Gales Exploiting Future Word Contexts in Neural Network Language Models for Speech Recognition . . . . . . . . . . . 1444--1454 R. Wang and Z. Chen and F. Yin DOA-Based Three-Dimensional Node Geometry Calibration in Acoustic Sensor Networks and Its Cramér--Rao Bound and Sensitivity Analysis . . . . . . . . . . 1455--1468 C. Lee and H. Lee and S. Wu and C. Liu and W. Fang and J. Hsu and B. Tseng Machine Comprehension of Spoken Content: TOEFL Listening Test and Spoken SQuAD 1469--1480 Y. Chen and S. Huang and H. Lee and Y. Wang and C. Shen Audio Word2vec: Sequence-to-Sequence Autoencoding for Unsupervised Learning of Audio Segmentation and Representation 1481--1493
P. Li and C. Chen and W. Zheng and Y. Deng and F. Ye and Z. Zheng STD: an Automatic Evaluation Metric for Machine Translation Based on Word Embeddings . . . . . . . . . . . . . . . 1497--1506 J. Zhang and R. Heusdens and R. C. Hendriks Relative Acoustic Transfer Function Estimation in Wireless Acoustic Sensor Networks . . . . . . . . . . . . . . . . 1507--1519 J. Park and J. Chang State-Space Microphone Array Nonlinear Acoustic Echo Cancellation Using Multi-Microphone Near-End Speech Covariance . . . . . . . . . . . . . . . 1520--1534 Z. Luo and J. Chen and T. Takiguchi and Y. Ariki Emotional Voice Conversion Using Dual Supervised Adversarial Networks With Continuous Wavelet Transform F0 Features 1535--1548 H. As'ad and M. Bouchard and H. Kamkar-Parsi A Robust Target Linearly Constrained Minimum Variance Beamformer With Spatial Cues Preservation for Binaural Hearing Aids . . . . . . . . . . . . . . . . . . 1549--1563 Y. Wang and Y. Xia and L. Zhao and J. Bian and T. Qin and E. Chen and T. Liu Semi-Supervised Neural Machine Translation via Marginal Distribution Estimation . . . . . . . . . . . . . . . 1564--1576 A. Jati and P. Georgiou Neural Predictive Coding Using Convolutional Neural Networks Toward Unsupervised Learning of Speaker Characteristics . . . . . . . . . . . . 1577--1589 F. Fontana and E. Bozzo Newton--Raphson Solution of Nonlinear Delay-Free Loop Filter Networks . . . . 1590--1600 N. Makishima and S. Mogami and N. Takamune and D. Kitamura and H. Sumino and S. Takamichi and H. Saruwatari and N. Ono Independent Deeply Learned Matrix Analysis for Determined Audio Source Separation . . . . . . . . . . . . . . . 1601--1615 J. J. Prakash and H. A. Murthy Analysis of Inter-Pausal Units in Indian Languages and Its Application to Text-to-Speech Synthesis . . . . . . . . 1616--1628 Y. Lan and S. Wang and J. Jiang Knowledge Base Question Answering With a Matching-Aggregation Model and Question-Specific Contextual Relations 1629--1638 X. Bai and H. Cao and K. Chen and T. Zhao A Bilingual Adversarial Autoencoder for Unsupervised Bilingual Lexicon Induction 1639--1648 G. Zhao and R. Gutierrez-Osuna Using Phonetic Posteriorgram Based Frame Pairing for Segmental Accent Conversion 1649--1660
Z. Zhang and H. Zhao and K. Ling and J. Li and Z. Li and S. He and G. Fu Effective Subword Segmentation for Text Comprehension . . . . . . . . . . . . . 1664--1674 Y. Xie and R. Liang and Z. Liang and C. Huang and C. Zou and B. Schuller Speech Emotion Classification Using Attention-Based LSTM . . . . . . . . . . 1675--1685 S. Wang and Z. Huang and Y. Qian and K. Yu Discriminative Neural Embedding Learning for Short-Duration Text-Independent Speaker Verification . . . . . . . . . . 1686--1696 R. Lu and Z. Duan and C. Zhang Audio Visual Deep Clustering for Speech Separation . . . . . . . . . . . . . . . 1697--1712
N. Ueno and S. Koyama and H. Saruwatari Three-Dimensional Sound Field Reproduction Based on Weighted Mode-Matching Method . . . . . . . . . . 1852--1867 L. Wu and X. Tan and T. Qin and J. Lai and T. Liu Beyond Error Propagation: Language Branching Also Affects the Accuracy of Sequence Generation . . . . . . . . . . 1868--1879 A. Das and J. Li and G. Ye and R. Zhao and Y. Gong Advancing Acoustic-to-Word CTC Model With Attention and Mixed-Units . . . . . 1880--1892 N. Antonello and E. De Sena and M. Moonen and P. A. Naylor and T. van Waterschoot Joint Acoustic Localization and Dereverberation Through Plane Wave Decomposition and Sparse Regularization 1893--1905 F. Borra and A. Bernardini and F. Antonacci and A. Sarti Uniform Linear Arrays of First-Order Steerable Differential Microphones . . . 1906--1918 L. Chai and J. Du and Q. Liu and C. Lee Using Generalized Gaussian Distributions to Improve Regression Error Modeling for Deep Learning-Based Speech Enhancement 1919--1931 J. Qi and J. Du and S. M. Siniscalchi and C. Lee A Theory on Deep Neural Network Based Vector-to-Vector Regression With an Illustration of Its Expressive Power in Speech Enhancement . . . . . . . . . . . 1932--1943 X. Dang and Q. Cheng and H. Zhu Indoor Multiple Sound Source Localization via Multi-Dimensional Assignment Data Association . . . . . . 1944--1956 M. Schneider and E. A. P. Habets Iterative DFT-Domain Inverse Filter Optimization Using a Weighted Least-Squares Criterion . . . . . . . . 1957--1969 K. Chen and R. Wang and M. Utiyama and E. Sumita and T. Zhao Neural Machine Translation With Sentence-Level Topic Context . . . . . . 1970--1984 A. Gomez-Alanis and A. M. Peinado and J. A. Gonzalez and A. M. Gomez A Gated Recurrent Convolutional Neural Network for Robust Spoofing Detection 1985--1999 S. Feng and T. Lee Exploiting Cross-Lingual Speaker and Phonetic Diversity for Unsupervised Subword Modeling . . . . . . . . . . . . 2000--2011 W. Li and N. F. Chen and S. M. Siniscalchi and C. Lee Improving Mispronunciation Detection of Mandarin Tones for Non-Native Learners With Soft-Target Tone Labels and BLSTM-Based Deep Tone Models . . . . . . 2012--2024 Q. Tu and H. Chen On Mainlobe Orientation of the First- and Second-Order Differential Microphone Arrays . . . . . . . . . . . . . . . . . 2025--2040 J. Chorowski and R. J. Weiss and S. Bengio and A. van den Oord Unsupervised Speech Representation Learning Using WaveNet Autoencoders . . 2041--2053 V. Varanasi and A. Agarwal and R. M. Hegde Near-Field Acoustic Source Localization Using Spherical Harmonic Features . . . 2054--2066 Y. Zheng and J. Tao and Z. Wen and J. Yi Forward Backward Decoding Sequence for Regularizing End-to-End TTS . . . . . . 2067--2079 Y. Tu and J. Du and C. Lee Speech Enhancement Based on Teacher Student Deep Learning Using Improved Speech Presence Probability for Noise-Robust Speech Recognition . . . . 2080--2091 Y. Liu and D. Wang Divide and Conquer: A Deep CASA Approach to Talker-Independent Monaural Speaker Separation . . . . . . . . . . . . . . . 2092--2102 X. Liu and D. F. Wong and L. S. Chao and Y. Liu Latent Attribute Based Hierarchical Decoder for Neural Machine Translation 2103--2112 J. Hu and N. Chen Enhanced Feature Summarizing for Effective Cover Song Identification . . 2113--2126 Q. Ma and L. Yu and S. Tian and E. Chen and W. W. Y. Ng Global-Local Mutual Attention Model for Text Classification . . . . . . . . . . 2127--2139 V. Välimäki and J. Rämö Neurally Controlled Graphic Equalizer 2140--2149 S. U. N. Wood and J. K. W. Stahl and P. Mowlaee Binaural Codebook-Based Speech Enhancement With Atomic Speech Presence Probability . . . . . . . . . . . . . . 2150--2161 L. Pfeifenberger and M. Zöhrer and F. Pernkopf Eigenvector-Based Speech Mask Estimation for Multi-Channel Speech Enhancement . . 2162--2172 M. Arnela and S. Dabbaghchian and O. Guasch and O. Engwall MRI-Based Vocal Tract Representations for the Three-Dimensional Finite Element Synthesis of Diphthongs . . . . . . . . 2173--2182 K. Sekiguchi and Y. Bando and A. A. Nugraha and K. Yoshii and T. Kawahara Semi-Supervised Multichannel Speech Enhancement With a Deep Speech Prior . . 2197--2212 Q. Guo and X. Qiu and X. Xue and Z. Zhang Low-Rank and Locality Constrained Self-Attention for Sequence Modeling . . 2213--2222 J. Yu and Q. Ling and C. Luo and C. W. Chen Synthesizing $3$D Trump: Predicting and Visualizing the Relationship Between Text, Speech, and Articulatory Movements 2223--2233 R. Sugiura and Y. Kamamoto and T. Moriya Shape Control of Discrete Generalized Gaussian Distributions for Frequency-Domain Audio Coding . . . . . 2234--2248 Z. Ben-Hur and D. L. Alon and R. Mehra and B. Rafaely Efficient Representation and Sparse Sampling of Head-Related Transfer Functions Using Phase-Correction Based on Ear Alignment . . . . . . . . . . . . 2249--2262 L. Remaggi and P. J. B. Jackson and W. Wang Modeling the Comb Filter Effect and Interaural Coherence for Binaural Source Separation . . . . . . . . . . . . . . . 2263--2277 B. Zhang and D. Xiong and J. Su and J. Luo Future-Aware Knowledge Distillation for Neural Machine Translation . . . . . . . 2278--2287 R. Ali and T. Van Waterschoot and M. Moonen Integration of a Priori and Estimated Constraints Into an MVDR Beamformer for Speech Enhancement . . . . . . . . . . . 2288--2300 N. Tiwari and P. C. Pandey Speech Enhancement Using Noise Estimation With Dynamic Quantile Tracking . . . . . . . . . . . . . . . . 2301--2312 J. Duan and X. Ding and Y. Zhang and T. Liu TEND: A Target-Dependent Representation Learning Framework for News Document . . 2313--2325 L. Zhao and X. Qiu and Q. Zhang and X. Huang Sequence Labeling With Deep Gated Dual Path CNN . . . . . . . . . . . . . . . . 2326--2335 A. Kato and T. H. Kinnunen Statistical Regression Models for Noise Robust F0 Estimation Using Recurrent Deep Neural Networks . . . . . . . . . . 2336--2349 D. Liu and J. Fu and Q. Qu and J. Lv BFGAN: Backward and Forward Generative Adversarial Networks for Lexically Constrained Sentence Generation . . . . 2350--2361 A. Marafioti and N. Perraudin and N. Holighaus and P. Majdak A Context Encoder For Audio Inpainting 2362--2372 J. Yang and R. K. Das and N. Zhou Extraction of Octave Spectra Information for Spoofing Attack Detection . . . . . 2373--2384
Jamal Amini and Richard Christian Hendriks and Richard Heusdens and Meng Guo and Jesper Jensen Rate-Constrained Noise Reduction in Wireless Acoustic Sensor Networks . . . 1--12 Chitralekha Gupta and Haizhou Li and Ye Wang Automatic Leaderboard: Evaluation of Singing Quality Without a Standard Reference . . . . . . . . . . . . . . . 13--26 Sefik Emre Eskimez and Ross K. Maddox and Chenliang Xu and Zhiyao Duan Noise-Resilient Training Method for Face Landmark Generation From Speech . . . . 27--38 Peidong Wang and Ke Tan and De Liang Wang Bridging the Gap Between Monaural Speech Enhancement and Recognition With Distortion-Independent Acoustic Modeling 39--48 Yuki Mitsufuji and Stefan Uhlich and Norihiro Takamune and Daichi Kitamura and Shoichi Koyama and Hiroshi Saruwatari Multichannel Non-Negative Matrix Factorization Using Banded Spatial Covariance Matrices in Wavenumber Domain 49--60 Yaron Laufer and Sharon Gannot Scoring-Based ML Estimation and CRBs for Reverberation, Speech, and Noise PSDs in a Spatially Homogeneous Noise Field . . 61--76 Naveen Kumar Desiraju and Simon Doclo and Markus Buck and Tobias Wolff Online Estimation of Reverberation Parameters For Late Residual Echo Suppression . . . . . . . . . . . . . . 77--91 Mehdi Zohourian and Rainer Martin Binaural Direct-to-Reverberant Energy Ratio and Speaker Distance Estimation 92--104 Youhyun Shin and Sang-goo Lee Learning Context Using Segment-Level LSTM for Neural Sequence Labeling . . . 105--115 Gongping Huang and Jingdong Chen and Jacob Benesty Design of Planar Differential Microphone Arrays With Fractional Orders . . . . . 116--130 Ming-Hsiang Su and Chung-Hsien Wu and Liang-Yu Chen Attention-Based Response Generation Using Parallel Double Q-Learning for Dialog Policy Decision in a Conversational System . . . . . . . . . 131--143 Satoru Emura Wave-Domain Residual Echo Reduction Using Subspace Tracking . . . . . . . . 144--156 Xin Wang and Shinji Takaki and Junichi Yamagishi and Simon King and Keiichi Tokuda A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural $ F_0 $ Model for Statistical Parametric Speech Synthesis . . . . . . 157--170 Falk-Martin Hoffmann and Philip Arthur Nelson and Filippo Maria Fazi DOA Estimation Performance With Circular Arrays in Sound Fields With Finite Rate of Innovation . . . . . . . . . . . . . 171--184 Rongfeng Su and Xunying Liu and Lan Wang and Jingzhou Yang Cross-Domain Deep Visual Feature Generation for Mandarin Audio--Visual Speech Recognition . . . . . . . . . . . 185--197 Titouan Parcollet and Mohamed Morchid and Xavier Bost and Georges Linar\`es and Renato De Mori Real to H-Space Autoencoders for Theme Identification in Telephone Conversations . . . . . . . . . . . . . 198--210 Antonio Canclini and Fabio Antonacci and Stefano Tubaro and Augusto Sarti A Methodology for the Robust Estimation of the Radiation Pattern of Acoustic Sources . . . . . . . . . . . . . . . . 211--224 Yi Yu and Hongsen He and Badong Chen and Jianghui Li and Youwen Zhang and Lu Lu $M$-Estimate Based Normalized Subband Adaptive Filter Algorithm: Performance Analysis and Improvements . . . . . . . 225--239 Hao-Xiang Wen and Sen-Quan Yang and Yuan-Quan Hong and Huan Luo A Partial Update Adaptive Algorithm for Sparse System Identification . . . . . . 240--255 Martin Bo Mòller and Jan Òstergaard A Moving Horizon Framework for Sound Zones . . . . . . . . . . . . . . . . . 256--265 Stylianos Ioannis Mimilakis and Konstantinos Drossos and Estefanía Cano and Gerald Schuller Examining the Mapping Functions of Denoising Autoencoders in Singing Voice Separation . . . . . . . . . . . . . . . 266--278 Lachlan I. Birnie and Thushara D. Abhayapala and Prasanga N. Samarasinghe Reflection Assisted Sound Source Localization Through a Harmonic Domain MUSIC Framework . . . . . . . . . . . . 279--293 Wenhao Ding and Liang He Adaptive Multi-Scale Detection of Acoustic Events . . . . . . . . . . . . 294--306 Weijian Zhang and Peng Song Transfer Sparse Discriminant Subspace Learning for Cross-Corpus Speech Emotion Recognition . . . . . . . . . . . . . . 307--318 Bidisha Sharma and Ye Wang Automatic Evaluation of Song Intelligibility Using Singing Adapted STOI and Vocal-Specific Features . . . . 319--331 Hai Morgenstern and Boaz Rafaely Perceptually-Transparent Online Estimation of Two-Channel Room Transfer Function for Sound Calibration . . . . . 332--342 Shaojin Ding and Guanlong Zhao and Christopher Liberatore and Ricardo Gutierrez-Osuna Learning Structured Sparse Representations for Voice Conversion . . 343--354 Mireia Diez and Luká\vs Burget and Federico Landini and Jan \vCernocký Analysis of Speaker Diarization Based on Bayesian HMM With Eigenvoice Priors . . 355--368 Jia-Chen Gu and Zhen-Hua Ling and Quan Liu Utterance-to-Utterance Interactive Matching Network for Multi-Turn Response Selection in Retrieval-Based Chatbots 369--379 Ke Tan and DeLiang Wang Learning Complex Spectral Mapping With Gated Convolutional Recurrent Networks for Monaural Speech Enhancement . . . . 380--390 Richeng Duan and Tatsuya Kawahara and Masatake Dantsuji and Hiroaki Nanjo Cross-Lingual Transfer Learning of Non-Native Acoustic Modeling for Pronunciation Error Detection and Diagnosis . . . . . . . . . . . . . . . 391--401 Xin Wang and Shinji Takaki and Junichi Yamagishi Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis 402--415 Sanjeel Parekh and Slim Essid and Alexey Ozerov and Ngoc Q. K. Duong and Patrick Pérez and Gaël Richard Weakly Supervised Representation Learning for Audio-Visual Scene Analysis 416--428 Jianfei Yu and Jing Jiang and Rui Xia Entity-Sensitive Attention and Fusion Network for Entity-Level Multimodal Sentiment Classification . . . . . . . . 429--439 John G. Beerends and Niels M. P. Neumann and Egon L. van den Broek and Anna Llagostera Casanovas and Jovana Torres Menendez and Christian Schmidmer and Jens Berger Subjective and Objective Assessment of Full Bandwidth Speech Quality . . . . . 440--449 Vikram C. Mathad and S. R. Mahadeva Prasanna Vowel Onset Point Based Screening of Misarticulated Stops in Cleft Lip and Palate Speech . . . . . . . . . . . . . 450--460 Minh Nguyen and Gia H. Ngo and Nancy F. Chen Hierarchical Character Embeddings: Learning Phonological and Semantic Representations in Languages of Logographic Origin Using Recursive Neural Networks . . . . . . . . . . . . 461--473 Dani Cherkassky and Sharon Gannot Successive Relative Transfer Function Identification Using Blind Oblique Projection . . . . . . . . . . . . . . . 474--486 Ivo Trowitzsch and Christopher Schymura and Dorothea Kolossa and Klaus Obermayer Joining Sound Event Detection and Localization Through Spatial Segregation 487--502 Shinichi Mogami and Norihiro Takamune and Daichi Kitamura and Hiroshi Saruwatari and Yu Takahashi and Kazunobu Kondo and Nobutaka Ono Independent Low-Rank Matrix Analysis Based on Time-Variant Sub-Gaussian Source Model for Determined Blind Source Separation . . . . . . . . . . . . . . . 503--518 Hamzeh Ghasemzadeh and Meisam K. Arjmandi Toward Optimum Quantification of Pathology-Induced Noises: an Investigation of Information Missed by Human Auditory System . . . . . . . . . 519--528 Fei Ma and Wen Zhang and Thushara Dheemantha Abhayapala Active Control of Outgoing Broadband Noise Fields in Rooms . . . . . . . . . 529--539 Jing-Xuan Zhang and Zhen-Hua Ling and Li-Rong Dai Non-Parallel Sequence-to-Sequence Voice Conversion With Disentangled Linguistic and Speaker Representations . . . . . . 540--552 Tao Dai and Li Zhu and Yaxiong Wang and Kathleen M. Carley Attentive Stacked Denoising Autoencoder With Bi-LSTM for Personalized Context-Aware Citation Recommendation 553--568 Yuta Nishimura and Katsuhito Sudoh and Graham Neubig and Satoshi Nakamura Multi-Source Neural Machine Translation With Missing Data . . . . . . . . . . . 569--580 Jin Wang and Liang-Chih Yu and K. Robert Lai and Xuejie Zhang Tree-Structured Regional CNN-LSTM Model for Dimensional Sentiment Analysis . . . 581--591 Abul Azad and Lamine Mili Robust Speech Filter and Voice Encoder Parameter Estimation Using the Phase--Phase Correlator . . . . . . . . 592--604 Abdullah Fahim and Prasanga N. Samarasinghe and Thushara D. Abhayapala Multi-Source DOA Estimation Through Pattern Recognition of the Modal Coherence of a Reverberant Soundfield 605--618 Yaron Laufer and Bracha Laufer-Goldshtein and Sharon Gannot ML Estimation and CRBs for Reverberation, Speech, and Noise PSDs in Rank-Deficient Noise Field . . . . . . . 619--634 Zhongqing Wang and Qingying Sun and Shoushan Li and Qiaoming Zhu and Guodong Zhou Neural Stance Detection With Hierarchical Linguistic Representations 635--645 Ruizhi Li and Xiaofei Wang and Sri Harish Mallidi and Shinji Watanabe and Takaaki Hori and Hynek Hermansky Multi-Stream End-to-End Speech Recognition . . . . . . . . . . . . . . 646--655 Yu Maeno and Yuki Mitsufuji and Prasanga N. Samarasinghe and Naoki Murata and Thushara D. Abhayapala Spherical-Harmonic-Domain Feedforward Active Noise Control Using Sparse Decomposition of Reference Signals from Distributed Sensor Arrays . . . . . . . 656--670 Qingyu Zhou and Nan Yang and Furu Wei and Shaohan Huang and Ming Zhou and Tiejun Zhao A Joint Sentence Scoring and Selection Framework for Neural Extractive Document Summarization . . . . . . . . . . . . . 671--681 Ivan Kukanov and Trung Ngo Trong and Ville Hautamäki and Sabato Marco Siniscalchi and Valerio Mario Salerno and Kong Aik Lee Maximal Figure-of-Merit Framework to Detect Multi-Label Phonetic Features for Spoken Language Recognition . . . . . . 682--695 Shoichi Koyama and Gilles Chardon and Laurent Daudet Optimizing Source and Sensor Placement for Sound Field Control: an Overview . . 696--714 Atsushi Ando and Ryo Masumura and Hosana Kamiyama and Satoshi Kobashikawa and Yushi Aono and Tomoki Toda Customer Satisfaction Estimation in Contact Center Calls Based on a Hierarchical Multi-Task Model . . . . . 715--728 Thomas Dietzen and Simon Doclo and Marc Moonen and Toon van Waterschoot Integrated Sidelobe Cancellation and Linear Prediction Kalman Filter for Joint Multi-Microphone Speech Dereverberation, Interfering Speech Cancellation, and Noise Reduction . . . 740--754 Thomas Dietzen and Simon Doclo and Marc Moonen and Toon van Waterschoot Square Root-Based Multi-Source Early PSD Estimation and Recursive RETF Update in Reverberant Environments by Means of the Orthogonal Procrustes Problem . . . . . 755--769 Liwen Zhang and Ziqiang Shi and Jiqing Han Pyramidal Temporal Pooling With Discriminative Mapping for Audio Classification . . . . . . . . . . . . . 770--784 Mengfan Zhang and Zhongshu Ge and Tiejun Liu and Xihong Wu and Tianshu Qu Modeling of Individual HRTFs Based on Spatial Principal Component Analysis . . 785--797
Bijue Jia and Jiancheng Lv and Xi Peng and Yao Chen and Shenglan Yang Hierarchical Regulated Iterative Network for Joint Task of Music Detection and Music Relative Loudness Estimation . . . 1--13 Nauman Dawalatabad and Srikanth Madikeri and C. Chandra Sekhar and Hema A. Murthy Novel Architectures for Unsupervised Information Bottleneck Based Speaker Diarization of Meetings . . . . . . . . 14--27 Midia Yousefi and John H. L. Hansen Block-Based High Performance CNN Architectures for Frame-Level Overlapping Speech Detection . . . . . . 28--40 Jiaming Cheng and Ruiyu Liang and Zhenlin Liang and Li Zhao and Chengwei Huang and Björn Schuller A Deep Adaptation Network for Speech Enhancement: Combining a Relativistic Discriminator With Multi-Kernel Maximum Mean Discrepancy . . . . . . . . . . . . 41--53 Franz Anders and Mario Hlawitschka and Mirco Fuchs Comparison of Artificial Neural Network Types for Infant Vocalization Classification . . . . . . . . . . . . . 54--67 Tomohiko Nakamura and Hirokazu Kameoka Harmonic-Temporal Factor Decomposition for Unsupervised Monaural Separation of Harmonic Sounds . . . . . . . . . . . . 68--82 Jens Ahrens and Stefan Bilbao Computation of Spherical Harmonic Representations of Source Directivity Based on the Finite-Distance Signature 83--92 Shun-Po Chuang and Alexander H. Liu and Tzu-Wei Sung and Hung-yi Lee Improving Automatic Speech Recognition and Speech Translation via Word Embedding Prediction . . . . . . . . . . 93--105 Li Chai and Jun Du and Qing-Feng Liu and Chin-Hui Lee A Cross-Entropy-Guided Measure (CEGM) for Assessing Speech Recognition Performance and Optimizing DNN-Based Speech Enhancement . . . . . . . . . . . 106--117 De Hu and Zhe Chen and Fuliang Yin Passive Geometry Calibration for Microphone Arrays Based on Distributed Damped Newton Optimization . . . . . . . 118--131 Berrak Sisman and Junichi Yamagishi and Simon King and Haizhou Li An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning . . . . . . . . . . . . . 132--157 Jilu Jin and Gongping Huang and Xuehan Wang and Jingdong Chen and Jacob Benesty and Israel Cohen Steering Study of Linear Differential Microphone Arrays . . . . . . . . . . . 158--170 Ching-Hua Lee and Bhaskar D. Rao and Harinath Garudadri Proportionate Adaptive Filtering Algorithms Derived Using an Iterative Reweighting Framework . . . . . . . . . 171--186 Shakeel Ahmed and Muhammad Tufail and Muhammad Rehan and Tanveer Abbas and Amna Majid A Novel Approach for Improved Noise Reduction Performance in Feed-Forward Active Noise Control Systems With (Loudspeaker) Saturation Non-Linearity in the Secondary Path . . . . . . . . . 187--197 Cunhang Fan and Jiangyan Yi and Jianhua Tao and Zhengkun Tian and Bin Liu and Zhengqi Wen Gated Recurrent Fusion With Joint Training Framework for Robust End-to-End Speech Recognition . . . . . . . . . . . 198--209 Amin Edraki and Wai-Yip Chan and Jesper Jensen and Daniel Fogerty Speech Intelligibility Prediction Using Spectro-Temporal Modulation Analysis . . 210--225 Phan Le Son On the Design of Sparse Arrays With Frequency-Invariant Beam Pattern . . . . 226--238 Dylan Menzies and Philip Coleman and Filippo Maria Fazi A Room Compensation Method by Modification of Reverberant Audio Objects . . . . . . . . . . . . . . . . 239--252 Yonggang Hu and Thushara D. Abhayapala and Prasanga N. Samarasinghe Multiple Source Direction of Arrival Estimations Using Relative Sound Pressure Based MUSIC . . . . . . . . . . 253--264 Alan Kan and Qinglin Meng The Temporal Limits Encoder as a Sound Coding Strategy for Bilateral Cochlear Implants . . . . . . . . . . . . . . . . 265--273 Rui Liu and Berrak Sisman and Feilong Bao and Jichen Yang and Guanglai Gao and Haizhou Li Exploiting Morphological and Phonological Features to Improve Prosodic Phrasing for Mongolian Speech Synthesis . . . . . . . . . . . . . . . 274--285 Fei Ma and Thushara D. Abhayapala and Wen Zhang Multiple Circular Arrays of Vector Sensors for Real-Time Sound Field Analysis . . . . . . . . . . . . . . . . 286--299 David Diaz-Guerra and Antonio Miguel and Jose R. Beltran Robust Sound Source Tracking Using SRP-PHAT and $3$D Convolutional Neural Networks . . . . . . . . . . . . . . . . 300--311 Viet Anh Trinh and Michael Mandel Directly Comparing the Listening Strategies of Humans and Machines . . . 312--323 Leda Sari and Mark Hasegawa-Johnson and Samuel Thomas Auxiliary Networks for Joint Speaker Adaptation and Speaker Change Detection 324--333 Jielong Yang and Xionghu Zhong and Weiguang Chen and Wenwu Wang Multiple Acoustic Source Localization in Microphone Array Networks . . . . . . . 334--347 Bin Wu and Sakriani Sakti and Jinsong Zhang and Satoshi Nakamura Tackling Perception Bias in Unsupervised Phoneme Discovery Using DPGMM-RNN Hybrid Model and Functional Load . . . . . . . 348--362 Taewoong Lee and Liming Shi and Jesper Kjær Nielsen and Mads Græsbòll Christensen Fast Generation of Sound Zones Using Variable Span Trade-Off Filters in the DFT-Domain . . . . . . . . . . . . . . . 363--378 Maoshen Jia and Yuxuan Wu and Changchun Bao and Christian Ritz Multi-Source DOA Estimation in Reverberant Environments by Jointing Detection and Modeling of Time-Frequency Points . . . . . . . . . . . . . . . . . 379--392 Wei Xue and Alastair H. Moore and Mike Brookes and Patrick A. Naylor Speech Enhancement Based on Modulation-Domain Parametric Multichannel Kalman Filtering . . . . . 393--405 Wei Song and Jingjin Guo and Ruiji Fu and Ting Liu and Lizhen Liu A Knowledge Graph Embedding Approach for Metaphor Processing . . . . . . . . . . 406--420 Longbiao Cheng and Xingwei Sun and Dingding Yao and Junfeng Li and Yonghong Yan Estimation Reliability Function Assisted Sound Source Localization With Enhanced Steering Vector Phase Difference . . . . 421--435 Wangyang Yu and W. Bastiaan Kleijn Room Acoustical Parameter Estimation From Room Impulse Responses Using Deep Neural Networks . . . . . . . . . . . . 436--447 Miguel Ferrer and Maria de Diego and Gema Piñero and Alberto Gonzalez Affine Projection Algorithm Over Acoustic Sensor Networks for Active Noise Control . . . . . . . . . . . . . 448--461 Nico Gößling and Daniel Marquardt and Simon Doclo Performance Analysis of the Extended Binaural MVDR Beamformer With Partial Noise Estimation . . . . . . . . . . . . 462--476 Gábor Gosztolya and Róbert Busa-Fekete Ensemble Bag-of-Audio-Words Representation Improves Paralinguistic Classification Accuracy . . . . . . . . 477--488 Alfred Mertins and Marco Maass and Fabrice Katzberg Room Impulse Response Reshaping and Crosstalk Cancellation Using Convex Optimization . . . . . . . . . . . . . . 489--502 Xuefeng Bai and Pengbo Liu and Yue Zhang Investigating Typed Syntactic Dependencies for Targeted Sentiment Classification Using Graph Attention Neural Network . . . . . . . . . . . . . 503--514 Bengt J. Borgström and Michael S. Brandstein Speech Enhancement via Attention Masking Network (SEAMNET): an End-to-End System for Joint Suppression of Noise and Reverberation . . . . . . . . . . . . . 515--526 Juan M. Miramont and Marcelo A. Colominas and Gastón Schlotthauer Voice Jitter Estimation Using High-Order Synchrosqueezing Operators . . . . . . . 527--536 Peidong Wang and Zhuo Chen and DeLiang Wang and Jinyu Li and Yifan Gong Speaker Separation Using Speaker Inventories and Estimated Speech . . . . 537--546 Sandro Cumani On the Distribution of Speaker Verification Scores: Generative Models for Unsupervised Calibration . . . . . . 547--562 Yu-Ren Chien and Jón Gu\ethnason Acoustic Measure of Vocal Strain Based on Glottal Airflow Periodicity . . . . . 563--574 Xingfa Shen and Xingkun Shao and Quanbo Ge and Lili Liu RARS: Recognition of Audio Recording Source Based on Residual Neural Network 575--584 Gang Chen and Yang Liu and Huanbo Luan and Meng Zhang and Qun Liu and Maosong Sun Learning to Generate Explainable Plots for Neural Story Generation . . . . . . 585--593 Wenxing Yang and Jacob Benesty and Gongping Huang and Jingdong Chen A New Class of Differential Beamformers 594--606 Yuki Mitsufuji and Norihiro Takamune and Shoichi Koyama and Hiroshi Saruwatari Multichannel Blind Source Separation Based on Evanescent-Region-Aware Non-Negative Tensor Factorization in Spherical Harmonic Domain . . . . . . . 607--617 Dörte Fischer and Simon Doclo Robust Constrained MFMVDR Filters for Single-Channel Speech Enhancement Based on Spherical Uncertainty Set . . . . . . 618--631 Xudong Zhao and Jacob Benesty and Jingdong Chen and Gongping Huang Differential Beamforming From the Beampattern Factorization Perspective 632--643 Yuki Kawara and Chenhui Chu and Yuki Arase Preordering Encoding on Transformer for Translation . . . . . . . . . . . . . . 644--655 Anonymous Table of Contents . . . . . . . . . . . c1--ix Anonymous IEEE Signal Processing Society . . . . . c2--c2 Anonymous Table of Contents . . . . . . . . . . . x--xx Yuki Kawara and Chenhui Chu and Yuki Arase Preordering Encoding on Transformer for Translation . . . . . . . . . . . . . . 644--655 Hirokazu Kameoka and Wen-Chin Huang and Kou Tanaka and Takuhiro Kaneko and Nobukatsu Hojo and Tomoki Toda Many-to-Many Voice Transformer Network 656--670 Jie Zhang and Huawei Chen and Li-Rong Dai and Richard Christian Hendriks A Study on Reference Microphone Selection for Multi-Microphone Speech Enhancement . . . . . . . . . . . . . . 671--683 Archontis Politis and Annamaria Mesaros and Sharath Adavanne and Toni Heittola and Tuomas Virtanen Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019 684--698 Markus Niermann and Peter Vary Listening Enhancement in Noisy Environments: Solutions in Time and Frequency Domain . . . . . . . . . . . . 699--709 Hyeonseung Lee and Woo Hyun Kang and Sung Jun Cheon and Hyeongju Kim and Nam Soo Kim Gated Recurrent Context: Softmax-Free Attention for Online Encoder-Decoder Speech Recognition . . . . . . . . . . . 710--719 Elizabeth Vargas and James R. Hopgood and Keith Brown and Kartic Subr On Improved Training of CNN for Acoustic Source Localisation . . . . . . . . . . 720--732 Yunqi Cai and Lantian Li and Andrew Abel and Xiaoyan Zhu and Dong Wang Deep Normalization for Speaker Vectors 733--744 Wen-Chin Huang and Tomoki Hayashi and Yi-Chiao Wu and Hirokazu Kameoka and Tomoki Toda Pretraining Techniques for Sequence-to-Sequence Voice Conversion 745--755 Arindam Jati and Amrutha Nadarajan and Raghuveer Peri and Karel Mundnich and Tiantian Feng and Benjamin Girault and Shrikanth Narayanan Temporal Dynamics of Workplace Acoustic Scenes: Egocentric Analysis and Prediction . . . . . . . . . . . . . . . 756--769 Chaoqun Duan and Kehai Chen and Rui Wang and Masao Utiyama and Eiichiro Sumita and Conghui Zhu and Tiejun Zhao Modeling Future Cost for Neural Machine Translation . . . . . . . . . . . . . . 770--781 Kashif Munir and Hai Zhao and Zuchao Li Adaptive Convolution for Semantic Role Labeling . . . . . . . . . . . . . . . . 782--791 Yi-Chiao Wu and Tomoki Hayashi and Takuma Okamoto and Hisashi Kawai and Tomoki Toda Quasi-Periodic Parallel WaveGAN: a Non-Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network . . . 792--806 Weitao Yuan and Bofei Dong and Shengbei Wang and Masashi Unoki and Wenwu Wang Evolving Multi-Resolution Pooling CNN for Monaural Singing Voice Separation 807--822 Liming Shi and Taewoong Lee and Lijun Zhang and Jesper Kjær Nielsen and Mads Græsbòll Christensen Generation of Personal Sound Zones With Physical Meaningful Constraints and Conjugate Gradient Method . . . . . . . 823--837 Xi Chen and Jacob Benesty and Gongping Huang and Jingdong Chen On the Robustness of the Superdirective Beamformer . . . . . . . . . . . . . . . 838--849 Xinsheng Wang and Tingting Qiao and Jihua Zhu and Alan Hanjalic and Odette Scharenborg Generating Images From Spoken Descriptions . . . . . . . . . . . . . . 850--865 Vevake Balaraman and Bernardo Magnini Domain-Aware Dialogue State Tracker for Multi-Domain Dialogue Systems . . . . . 866--873 Xixin Wu and Yuewen Cao and Hui Lu and Songxiang Liu and Shiyin Kang and Zhiyong Wu and Xunying Liu and Helen Meng Exemplar-Based Emotive Speech Synthesis 874--886 Heinrich Dinkel and Mengyue Wu and Kai Yu Towards Duration Robust Weakly Supervised Sound Event Detection . . . . 887--900 Zamir Ben-Hur and David Lou Alon and Ravish Mehra and Boaz Rafaely Binaural Reproduction Based on Bilateral Ambisonics and Ear-Aligned HRTFs . . . . 901--913 Philipp Aichinger and Franz Pernkopf Synthesis and Analysis-By-Synthesis of Modulated Diplophonic Glottal Area Waveforms . . . . . . . . . . . . . . . 914--926 Finnian Kelly and John H. L. Hansen Analysis and Calibration of Lombard Effect and Whisper for Speaker Recognition . . . . . . . . . . . . . . 927--942 Matthias Müller and Thilo Schulz and Tatiana Ermakova and Philipp P. Caffier Lyric or Dramatic --- Vibrato Analysis for Voice Type Classification in Professional Opera Singers . . . . . . . 943--955 Demóstenes Z. Rodríguez and Dick Carrillo and Miguel A. Ramírez and Pedro H. J. Nardelli and Sebastian Möller Incorporating Wireless Communication Parameters Into the E-Model Algorithm 956--968 Tianrui Zong and Yong Xiang and Iynkaran Natgunanathan and Longxiang Gao and Guang Hua and Wanlei Zhou Non-Linear-Echo Based Anti-Collusion Mechanism for Audio Signals . . . . . . 969--984 Zheng Lian and Bin Liu and Jianhua Tao CTNet: Conversational Transformer Network for Emotion Recognition . . . . 985--1000 Jiacheng Zhang and Huanbo Luan and Maosong Sun and Feifei Zhai and Jingfang Xu and Yang Liu Neural Machine Translation With Explicit Phrase Alignment . . . . . . . . . . . . 1001--1010 Maria Vukovic and Melissa Stolar and Margaret Lech Cognitive Load Estimation From Speech Commands to Simulated Aircraft . . . . . 1011--1022 De Hu and Zhe Chen and Fuliang Yin Geometry Calibration for Acoustic Transceiver Networks Based on Network Newton Distributed Optimization . . . . 1023--1032 Yuki Saito and Shinnosuke Takamichi and Hiroshi Saruwatari Perceptual-Similarity-Aware Deep Speaker Representation Learning for Multi-Speaker Generative Modeling . . . 1033--1048 Tadashi Sakata and Naomitsu Ikeda and Yuichi Ueda and Akira Watanabe Vocal Tract Length Estimation Using Accumulated Means of Formants and Its Effects on Speaker-Normalization . . . . 1049--1064 Jichen Yang and Hongji Wang and Rohan Kumar Das and Yanmin Qian Modified Magnitude-Phase Spectrum Information for Spoofing Detection . . . 1065--1078 Yanmin Qian and Zhengyang Chen and Shuai Wang Audio-Visual Deep Neural Network for Robust Person Verification . . . . . . . 1079--1092 Peiqin Lin and Meng Yang and Jianhuang Lai Deep Selective Memory Network With Selective Attention and Inter-Aspect Modeling for Aspect Level Sentiment Classification . . . . . . . . . . . . . 1093--1106 Herman Kamper and Yevgen Matusevych and Sharon Goldwater Improved Acoustic Word Embeddings for Zero-Resource Languages Using Multilingual Transfer . . . . . . . . . 1107--1118 Weiqing Wang and Jin Pan and Hua Yi and Zhanmei Song and Ming Li Audio-Based Piano Performance Evaluation for Beginners With Convolutional Neural Network and Attention Mechanism . . . . 1119--1133 Yi-Chiao Wu and Tomoki Hayashi and Patrick Lumban Tobing and Kazuhiro Kobayashi and Tomoki Toda Quasi-Periodic WaveNet: an Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network . . . . . . . 1134--1148 Vesa Välimäki and Karolina Prawda Late-Reverberation Synthesis Using Interleaved Velvet-Noise Sequences . . . 1149--1160 Zhuosheng Zhang and Junlong Li and Hai Zhao Multi-Turn Dialogue Reading Comprehension With Pivot Turns and Knowledge . . . . . . . . . . . . . . . 1161--1173 Clément Gaultier and Sr an Kiti and Rémi Gribonval and Nancy Bertin Sparsity-Based Audio Declipping Methods: Selected Overview, New Algorithms, and Large-Scale Evaluation . . . . . . . . . 1174--1187 Lachlan Birnie and Thushara Abhayapala and Vladimir Tourbabin and Prasanga Samarasinghe Mixed Source Sound Field Translation for Virtual Binaural Application With Perceptual Validation . . . . . . . . . 1188--1203 Monisankha Pal and Manoj Kumar and Raghuveer Peri and Tae Jin Park and So Hyun Kim and Catherine Lord and Somer Bishop and Shrikanth Narayanan Meta-Learning With Latent Space Clustering in Generative Adversarial Network for Speaker Diarization . . . . 1204--1219 Jie Zhang and Jun Du and Li-Rong Dai Sensor Selection for Relative Acoustic Transfer Function Steered Linearly-Constrained Beamformers . . . . 1220--1232 Huang Xie and Tuomas Virtanen Zero-Shot Audio Classification Via Semantic Embeddings . . . . . . . . . . 1233--1242 Xianhong Chen and Changchun Bao Phoneme-Unit-Specific Time-Delay Neural Network for Speaker Verification . . . . 1243--1255 Dongyuan Shi and Woon-Seng Gan and Bhan Lam and Shulin Wen and Xiaoyi Shen Optimal Output-Constrained Active Noise Control Based on Inverse Adaptive Modeling Leak Factor Estimate . . . . . 1256--1269 Ashutosh Pandey and DeLiang Wang Dense CNN With Self-Attention for Time-Domain Speech Enhancement . . . . . 1270--1279 Libo Qin and Wanxiang Che and Minheng Ni and Yangming Li and Ting Liu Knowing Where to Leverage: Context-Aware Graph Convolutional Network With an Adaptive Fusion Layer for Contextual Spoken Language Understanding . . . . . 1280--1289 Mingyang Zhang and Yi Zhou and Li Zhao and Haizhou Li Transfer Learning From Speech Synthesis to Voice Conversion With Non-Parallel Training Data . . . . . . . . . . . . . 1290--1302 Weipeng He and Petr Motlicek and Jean-Marc Odobez Neural Network Adaptation and Data Augmentation for Multi-Speaker Direction-of-Arrival Estimation . . . . 1303--1317 Yile Wang and Leyang Cui and Yue Zhang Improving Skip-Gram Embeddings Using BERT . . . . . . . . . . . . . . . . . . 1318--1328 Linzhi Wu and Meishan Zhang Deep Graph-Based Character-Level Chinese Dependency Parsing . . . . . . . . . . . 1329--1339 Ye Bai and Jiangyan Yi and Jianhua Tao and Zhengqi Wen and Zhengkun Tian and Shuai Zhang Integrating Knowledge Into End-to-End Speech Recognition From External Text-Only Data . . . . . . . . . . . . . 1340--1351 Byung Joon Cho and Hyung-Min Park Convolutional Maximum-Likelihood Distortionless Response Beamforming With Steering Vector Estimation for Robust Speech Recognition . . . . . . . . . . . 1352--1367 Daniel Michelsanti and Zheng-Hua Tan and Shi-Xiong Zhang and Yong Xu and Meng Yu and Dong Yu and Jesper Jensen An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation . . . . . . . . . . . . . . . 1368--1396 Gal Itzhak and Jacob Benesty and Israel Cohen On the Design of Differential Kronecker Product Beamformers . . . . . . . . . . 1397--1410 Zhongshu Ge and Liang Li and Tianshu Qu Partially Matching Projection Decoding Method Evaluation Under Different Playback Conditions . . . . . . . . . . 1411--1423 Sijie Mai and Songlong Xing and Haifeng Hu Analyzing Multimodal Sentiment Via Acoustic- and Visual-LSTM With Channel-Aware Temporal Convolution Network . . . . . . . . . . . . . . . . 1424--1437 Tao Qian and Meishan Zhang and Yinxia Lou and Daiwen Hua A Joint Model for Named Entity Recognition With Sentence-Level Entity Type Attentions . . . . . . . . . . . . 1438--1448 Ryotaro Sato and Kenta Niwa and Kazunori Kobayashi Ambisonic Signal Processing DNNs Guaranteeing Rotation, Scale and Time Translation Equivariance . . . . . . . . 1449--1462 Sooyeon Park and Jung-Woo Choi Iterative Echo Labeling Algorithm With Convex Hull Expansion for Room Geometry Estimation . . . . . . . . . . . . . . . 1463--1478 Aidan O. T. Hogg and Christine Evers and Alastair H. Moore and Patrick A. Naylor Overlapping Speaker Segmentation Using Multiple Hypothesis Tracking of Fundamental Frequency . . . . . . . . . 1479--1490 Rajib Sharma and Israel Cohen and Baruch Berdugo Controlling Elevation and Azimuth Beamwidths With Concentric Circular Microphone Arrays . . . . . . . . . . . 1491--1502 Run-Ze Wang and Zhen-Hua Ling and Jing-Bo Zhou and Yu Hu A Multiple-Integration Encoder for Multi-Turn Text-to-SQL Semantic Parsing 1503--1513 Shoukang Hu and Xurong Xie and Shansong Liu and Jianwei Yu and Zi Ye and Mengzhe Geng and Xunying Liu and Helen Meng Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition . . . . . . . . . . . . . . 1514--1529 Matteo Torcoli and Thorsten Kastner and Jürgen Herre Objective Measures of Perceptual Audio Quality Reviewed: an Evaluation of Their Application Domain Dependence . . . . . 1530--1541 Heinrich Dinkel and Shuai Wang and Xuenan Xu and Mengyue Wu and Kai Yu Voice Activity Detection in the Wild: a Data-Driven Approach Using Teacher-Student Training . . . . . . . . 1542--1555 Songbin Li and Jingang Wang and Peng Liu and Miao Wei and Qiandong Yan Detection of Multiple Steganography Methods in Compressed Speech Based on Code Element Embedding, Bi-LSTM and CNN With Attention Mechanisms . . . . . . . 1556--1569 Qianli Ma and Jiangyue Yan and Zhenxi Lin and Liuhong Yu and Zipeng Chen Deformable Self-Attention for Text Classification . . . . . . . . . . . . . 1570--1581 Ya-Jie Zhang and Zhen-Hua Ling Extracting and Predicting Word-Level Style Variations for Speech Synthesis 1582--1593 Alexander Bohlender and Ann Spriet and Wouter Tirry and Nilesh Madhu Exploiting Temporal Context in CNN Based Multisource DOA Estimation . . . . . . . 1594--1608 Kohei Yatabe and Daichi Kitamura Determined BSS Based on Time-Frequency Masking and Its Application to Harmonic Vector Analysis . . . . . . . . . . . . 1609--1625 Ji Won Yoon and Hyeonseung Lee and Hyung Yong Kim and Won Ik Cho and Nam Soo Kim TutorNet: Towards Flexible Knowledge Distillation for End-to-End Speech Recognition . . . . . . . . . . . . . . 1626--1638 Prachi Singh and Sriram Ganapathy Self-Supervised Representation Learning With Path Integral Clustering for Speaker Diarization . . . . . . . . . . 1639--1649 Penghui Wei and Jiahao Zhao and Wenji Mao A Graph-to-Sequence Learning Framework for Summarizing Opinionated Texts . . . 1650--1660 Dovid Y. Levin and Shmulik Markovich-Golan and Sharon Gannot Near-Field Superdirectivity: an Analytical Perspective . . . . . . . . . 1661--1674 Jia-Hao Hsu and Ming-Hsiang Su and Chung-Hsien Wu and Yi-Hsuan Chen Speech Emotion Recognition Considering Nonverbal Vocalization in Affective Conversations . . . . . . . . . . . . . 1675--1686 Tomohiko Nakamura and Shihori Kozuka and Hiroshi Saruwatari Time-Domain Audio Source Separation With Neural Networks Based on Multiresolution Analysis . . . . . . . . . . . . . . . . 1687--1701 Yun Zhang and Yongguo Liu and Jiajing Zhu and Xindong Wu FSPRM: a Feature Subsequence Based Probability Representation Model for Chinese Word Embedding . . . . . . . . . 1702--1716 Songxiang Liu and Yuewen Cao and Disong Wang and Xixin Wu and Xunying Liu and Helen Meng Any-to-Many Voice Conversion With Location-Relative Sequence-to-Sequence Modeling . . . . . . . . . . . . . . . . 1717--1728 Rafael A. Chiea and Márcio H. Costa and Júlio A. Cordioli An Optimal Envelope-Based Noise Reduction Method for Cochlear Implants: an Upper Bound Performance Investigation 1729--1739 Junliang Guo and Zhirui Zhang and Linli Xu and Boxing Chen and Enhong Chen Adaptive Adapters: an Efficient Way to Incorporate BERT Into Neural Machine Translation . . . . . . . . . . . . . . 1740--1751 Yi Luo and Cong Han and Nima Mesgarani Group Communication With Context Codec for Lightweight Source Separation . . . 1752--1761 Zhiwen Xie and Runjie Zhu and Jin Liu and Guangyou Zhou and Jimmy Xiangji Huang Hierarchical Neighbor Propagation With Bidirectional Graph Attention Network for Relation Prediction . . . . . . . . 1762--1773 Xuehan Wang and Jacob Benesty and Jingdong Chen and Gongping Huang and Israel Cohen Beamforming with Cube Microphone Arrays Via Kronecker Product Decompositions . . 1774--1784 Ke Tan and DeLiang Wang Towards Model Compression for Deep Learning Based Speech Enhancement . . . 1785--1794 Kristina Tesch and Timo Gerkmann Nonlinear Spatial Filtering in Multichannel Speech Enhancement . . . . 1795--1805 Rui Liu and Berrak Sisman and Guanglai Gao and Haizhou Li Expressive TTS Training With Frame and Style Reconstruction Loss . . . . . . . 1806--1818 Jipeng Qiang and Xinyu Lu and Yun Li and Yunhao Yuan and Xindong Wu Chinese Lexical Simplification . . . . . 1819--1828 Andong Li and Wenzhe Liu and Chengshi Zheng and Cunhang Fan and Xiaodong Li Two Heads are Better Than One: a Two-Stage Complex Spectral Mapping Approach for Monaural Speech Enhancement 1829--1843 Eric C. Hamdan and Filippo Maria Fazi Weighted Orthogonal Vector Rejection Method for Loudspeaker-Based Binaural Audio Reproduction . . . . . . . . . . . 1844--1852 Ke Tan and Xueliang Zhang and DeLiang Wang Deep Learning Based Real-Time Speech Enhancement for Dual-Microphone Mobile Phones . . . . . . . . . . . . . . . . . 1853--1863 Kunkun SongGong and Huawei Chen and Wenwu Wang Indoor Multi-Speaker Localization Based on Bayesian Nonparametrics in the Circular Harmonic Domain . . . . . . . . 1864--1880 Aleksej Chinaev and Philipp Thüne and Gerald Enzner Double-Cross-Correlation Processing for Blind Sampling-Rate and Time-Offset Estimation . . . . . . . . . . . . . . . 1881--1896 Ye Bai and Jiangyan Yi and Jianhua Tao and Zhengkun Tian and Zhengqi Wen and Shuai Zhang Fast End-to-End Speech Recognition Via Non-Autoregressive Models and Cross-Modal Knowledge Transferring From BERT . . . . . . . . . . . . . . . . . . 1897--1911 Öykü Deniz Köse and Murat Saraçlar Multimodal Representations for Synchronized Speech and Real-Time MRI Video Processing . . . . . . . . . . . . 1912--1924 N. P. Narendra and Björn Schuller and Paavo Alku The Detection of Parkinson's Disease From Speech Using Voice Source Information . . . . . . . . . . . . . . 1925--1936 Robert Rehr and Timo Gerkmann SNR-Based Features and Diverse Training Data for Robust DNN-Based Speech Enhancement . . . . . . . . . . . . . . 1937--1949 Nobutaka Ito and Rintaro Ikeshita and Hiroshi Sawada and Tomohiro Nakatani A Joint Diagonalization Based Efficient Approach to Underdetermined Blind Audio Source Separation Using the Multichannel Wiener Filter . . . . . . . . . . . . . 1950--1965 Hao Fei and Shengqiong Wu and Yafeng Ren and Donghong Ji Second-Order Semantic Role Labeling With Global Structural Refinement . . . . . . 1966--1976 Humberto M. Torres and Mercedes Güemes and Jorge A. Gurlekian and Diego A. Evin F0 Perturbation Due to Articulatory Movements: Filtering, Characterization and Applications . . . . . . . . . . . . 1977--1986 Khaled Koutini and Hamid Eghbal-zadeh and Gerhard Widmer Receptive Field Regularization Techniques for Audio Classification and Tagging With Deep Convolutional Neural Networks . . . . . . . . . . . . . . . . 1987--2000 Zhong-Qiu Wang and Peidong Wang and DeLiang Wang Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation . . . . . . 2001--2014 Mengjia Zhou and Donghong Ji and Fei Li Relation Extraction in Dialogues: a Deep Learning Model Based on the Generality and Specialty of Dialogue Text . . . . . 2015--2026 Minh Nguyen and Gia H. Ngo and Nancy F. Chen Domain-Shift Conditioning Using Adaptable Filtering Via Hierarchical Embeddings for Robust Chinese Spell Check . . . . . . . . . . . . . . . . . 2027--2036 Lior Madmoni and Shir Tibor and Israel Nelken and Boaz Rafaely The Effect of Partial Time-Frequency Masking of the Direct Sound on the Perception of Reverberant Speech . . . . 2037--2047 Haibin Chen and Qianli Ma and Liuhong Yu and Zhenxi Lin and Jiangyue Yan Corpus-Aware Graph Aggregation Network for Sequence Labeling . . . . . . . . . 2048--2057 Heming Wang and DeLiang Wang Towards Robust Speech Super-Resolution 2058--2066 Jianwei Yu and Shi-Xiong Zhang and Bo Wu and Shansong Liu and Shoukang Hu and Mengzhe Geng and Xunying Liu and Helen Meng and Dong Yu Audio-Visual Multi-Channel Integration and Recognition of Overlapped Speech . . 2067--2082 Olga Slizovskaia and Gloria Haro and Emilia Gómez Conditioned Source Separation for Musical Instrument Performances . . . . 2083--2095 Xurong Xie and Xunying Liu and Tan Lee and Lan Wang Bayesian Learning for Deep Neural Network Adaptation . . . . . . . . . . . 2096--2110 Sankha Subhra Bhattacharjee and Nithin V. George Nearest Kronecker Product Decomposition Based Linear-in-The-Parameters Nonlinear Filters . . . . . . . . . . . . . . . . 2111--2122 Canguang Li and Guohua Wang and Jin Cao and Yi Cai A Multi-Agent Communication Based Model for Nested Named Entity Recognition . . 2123--2136 Jonah Ong and Ba Tuong Vo and Sven Nordholm Blind Separation for Multiple Moving Sources With Labeled Random Finite Sets 2137--2151 Yixuan Su and Yan Wang and Deng Cai and Simon Baker and Anna Korhonen and Nigel Collier PROTOTYPE-TO-STYLE: Dialogue Generation With Style-Aware Editing on Retrieval Memory . . . . . . . . . . . . . . . . . 2152--2161 Alberto Bernardini and Enrico Bozzo and Federico Fontana and Augusto Sarti A Wave Digital Newton--Raphson Method for Virtual Analog Modeling of Audio Circuits with Multiple One-Port Nonlinearities . . . . . . . . . . . . . 2162--2173 Gang Guo and Yi Yu and Rodrigo C. de Lamare and Zongsheng Zheng and Lu Lu and Qiangming Cai Proximal Normalized Subband Adaptive Filtering for Acoustic Echo Cancellation 2174--2188 Juho Liski and Aki Mäkivirta and Vesa Välimäki Audibility of Group-Delay Equalization 2189--2201 Farjana Sultana Mim and Naoya Inoue and Paul Reisert and Hiroki Ouchi and Kentaro Inui Corruption Is Not All Bad: Incorporating Discourse Structure Into Pre-Training via Corruption for Essay Scoring . . . . 2202--2215 Dror Kipnis and Roee Diamant Graph-Based Clustering of Dolphin Whistles . . . . . . . . . . . . . . . . 2216--2227 Yuanyuan Liu and Nelly Penttilä and Tiina Ihalainen and Juulia Lintula and Rachel Convey and Okko Räsänen Language-Independent Approach for Automatic Computation of Vowel Articulation Features in Dysarthric Speech Assessment . . . . . . . . . . . 2228--2243 C. Medina and R. Coelho and L. Zão Impulsive Noise Detection for Speech Enhancement in HHT Domain . . . . . . . 2244--2253 Iván López-Espejo and Zheng-Hua Tan and Jesper Jensen A Novel Loss Function and Training Strategy for Noise-Robust Keyword Spotting . . . . . . . . . . . . . . . . 2254--2266 Shansong Liu and Mengzhe Geng and Shoukang Hu and Xurong Xie and Mingyu Cui and Jianwei Yu and Xunying Liu and Helen Meng Recent Progress in the CUHK Dysarthric Speech Recognition System . . . . . . . 2267--2281 Juan Zhao and Tianrui Zong and Yong Xiang and Longxiang Gao and Wanlei Zhou and Gleb Beliakov Desynchronization Attacks Resilient Watermarking Method Based on Frequency Singular Value Coefficient Modification 2282--2295 Mert Burkay Çöteli and Hüseyin Hacìhabibo\uglu Sparse Representations With Legendre Kernels for DOA Estimation and Acoustic Source Separation . . . . . . . . . . . 2296--2309 Nicolas Furnon and Romain Serizel and Slim Essid and Irina Illina DNN-Based Mask Estimation for Distributed Speech Enhancement in Spatially Unconstrained Microphone Arrays . . . . . . . . . . . . . . . . . 2310--2323 Or Haim Anidjar and Itshak Lapidot and Chen Hajaj and Amit Dvir and Issachar Gilad Hybrid Speech and Text Analysis Methods for Speaker Change Detection . . . . . . 2324--2338 Chuang Fan and Chaofa Yuan and Lin Gui and Yue Zhang and Ruifeng Xu Multi-Task Sequence Tagging for Emotion-Cause Pair Extraction Via Tag Distribution Refinement . . . . . . . . 2339--2350 Andy T. Liu and Shang-Wen Li and Hung-yi Lee TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech . . . . . . . . . . . . . . . . . 2351--2366 Guanlong Zhao and Shaojin Ding and Ricardo Gutierrez-Osuna Converting Foreign Accent Speech Without a Reference . . . . . . . . . . . . . . 2367--2381 Kilian Schulze-Forster and Clement S. J. Doire and Gaël Richard and Roland Badeau Phoneme Level Lyrics Alignment and Text-Informed Singing Voice Separation 2382--2395 Shengqiong Wu and Hao Fei and Yafeng Ren and Bobo Li and Fei Li and Donghong Ji High-Order Pair-Wise Aspect and Opinion Terms Extraction With Edge-Enhanced Syntactic Graph Convolution . . . . . . 2396--2406 Jingyi Wu and Lin Shang and Xiaoying Gao Sentiment Time Series Calibration for Event Detection . . . . . . . . . . . . 2407--2420 Kashif Munir and Hai Zhao and Zuchao Li Learning Context-Aware Convolutional Filters for Implicit Discourse Relation Classification . . . . . . . . . . . . . 2421--2433 Seokhwan Kim and Hannes Schulz and Chulaka Gunasekara and Chiori Hori and Abhinav Rastogi and Luis Fernando D. Haro Editorial: Special Issue on the Eighth Dialog System Technology Challenge . . . 2434--2436 Byoungjae Kim and Jungyun Seo and Myoung-Wan Koo Randomly Wired Network Based on RoBERTa and Dialog History Attention for Response Selection . . . . . . . . . . . 2437--2442 Jia-Chen Gu and Tianda Li and Zhen-Hua Ling and Quan Liu and Zhiming Su and Yu-Ping Ruan and Xiaodan Zhu Deep Contextualized Utterance Representations for Response Selection and Dialogue Analysis . . . . . . . . . 2443--2455 Yun-Wei Chu and Kuan-Yen Lin and Chao-Chun Hsu and Lun-Wei Ku End-to-End Recurrent Cross-Modality Attention for Video Dialogue . . . . . . 2456--2464 Kun Xu and Han Wu and Linfeng Song and Haisong Zhang and Linqi Song and Dong Yu Conversational Semantic Role Labeling 2465--2475 Zekang Li and Zongjia Li and Jinchao Zhang and Yang Feng and Jie Zhou Bridging Text and Video: a Universal Multimodal Transformer for Audio-Visual Scene-Aware Dialog . . . . . . . . . . . 2476--2483 Igor Shalyminov and Alessandro Sordoni and Adam Atkinson and Hannes Schulz GRTr: Generative-Retrieval Transformers for Data-Efficient Dialogue Domain Adaptation . . . . . . . . . . . . . . . 2484--2492 Jiali Zeng and Yongjing Yin and Yang Liu and Yubin Ge and Jinsong Su Domain Adaptive Meta-Learning for Dialogue State Tracking . . . . . . . . 2493--2501 Chen Zhang and Grandee Lee and Luis Fernando D. Haro and Haizhou Li D-Score: Holistic Dialogue Evaluation Without Reference . . . . . . . . . . . 2502--2516 Shrikant Malviya and Rohit Mishra and Santosh Kumar Barnwal and Uma Shanker Tiwary HDRS: Hindi Dialogue Restaurant Search Corpus for Dialogue State Tracking in Task-Oriented Environment . . . . . . . 2517--2528 Seokhwan Kim and Michel Galley and Chulaka Gunasekara and Sungjin Lee and Adam Atkinson and Baolin Peng and Hannes Schulz and Jianfeng Gao and Jinchao Li and Mahmoud Adada and Minlie Huang and Luis Lastras and Jonathan K. Kummerfeld and Walter S. Lasecki and Chiori Hori and Anoop Cherian and Tim K. Marks and Abhinav Rastogi and Xiaoxue Zang and Srinivas Sunkara and Raghav Gupta Overview of the Eighth Dialog System Technology Challenge: DSTC8 . . . . . . 2529--2540 Myeongho Jeong and Seungtaek Choi and Jinyoung Yeo and Seung-won Hwang Label and Context Augmentation for Response Selection at DSTC8 . . . . . . 2541--2550 Qing Liu and Lei Chen and Yuan Yuan and Huarui Wu History Reuse and Bag-of-Words Loss for Long Summary Generation . . . . . . . . 2551--2560 Lu Zhang and Mingjiang Wang and Qiquan Zhang and Xinsheng Wang and Ming Liu PhaseDCN: a Phase-Enhanced Dual-Path Dilated Convolutional Network for Single-Channel Speech Enhancement . . . 2561--2574 Kazi Nazmul Haque and Rajib Rana and Jiajun Liu and John H. L. Hansen and Nicholas Cummins and Carlos Busso and Björn W. Schuller Guided Generative Adversarial Neural Network for Representation Learning and Audio Generation Using Fewer Labelled Audio Data . . . . . . . . . . . . . . . 2575--2590 Toru Nakashika and Kohei Yatabe Gamma Boltzmann Machine for Audio Modeling . . . . . . . . . . . . . . . . 2591--2605 Xintong Li and Lemao Liu and Zhaopeng Tu and Guanlin Li and Shuming Shi and Max Q.-H. Meng Attending From Foresight: a Novel Attention Mechanism for Neural Machine Translation . . . . . . . . . . . . . . 2606--2616 Hengshun Zhou and Jun Du and Yuanyuan Zhang and Qing Wang and Qing-Feng Liu and Chin-Hui Lee Information Fusion in Attention Networks Using Adaptive and Multi-Level Factorized Bilinear Pooling for Audio-Visual Emotion Recognition . . . . 2617--2629 Yuling Li and Kui Yu and Yuhong Zhang Learning Cross-Lingual Mappings in Imperfectly Isomorphic Embedding Spaces 2630--2642 Xiao Zhou and Zhen-Hua Ling and Li-Rong Dai UnitNet: a Sequence-to-Sequence Acoustic Model for Concatenative Speech Synthesis 2643--2655 Zihan Pan and Malu Zhang and Jibin Wu and Jiadong Wang and Haizhou Li Multi-Tone Phase Coding of Interaural Time Difference for Sound Source Localization With Spiking Neural Networks . . . . . . . . . . . . . . . . 2656--2670 Ken O Hanlon and Mark B. Sandler FifthNet: Structured Compact Neural Networks for Automatic Chord Recognition 2671--2682 Simone Spagnol and Riccardo Miccini and Marius George Onofrei and Runar Unnthorsson and Stefania Serafin Estimation of Spectral Notches From Pinna Meshes: Insights From a Simple Computational Model . . . . . . . . . . 2683--2695 Chenglin Xu and Wei Rao and Jibin Wu and Haizhou Li Target Speaker Verification With Selective Auditory Attention for Single and Multi-Talker Speech . . . . . . . . 2696--2709 Adel Zahedi and Michael Syskind Pedersen and Jan Òstergaard and Thomas Ulrich Christiansen and Lars Bramslòw and Jesper Jensen Minimum Processing Beamforming . . . . . 2710--2724 Xianghui Wang and Jie Chen and Xiaoyi Chen and Jing Guo and Qian Xiang Multichannel Iterative Noise Reduction Filters in the Short-Time-Fourier-Transform Domain Based on Kronecker Product Decomposition 2725--2740 Kai-Li Yin and Yi-Fei Pu and Lu Lu Robust Q-Gradient Subband Adaptive Filter for Nonlinear Active Noise Control . . . . . . . . . . . . . . . . 2741--2752 Jaeuk Byun and Jong Won Shin Monaural Speech Separation Using Speaker Embedding From Preliminary Separation 2753--2763 Xudong Zhao and Gongping Huang and Jingdong Chen and Jacob Benesty On the Design of 3D Steerable Beamformers With Uniform Concentric Circular Microphone Arrays . . . . . . . 2764--2778 Zifeng Cheng and Zhiwei Jiang and Yafeng Yin and Na Li and Qing Gu A Unified Target-Oriented Sequence-to-Sequence Model for Emotion-Cause Pair Extraction . . . . . 2779--2791 Hamid Azadi and Mohammad-R. Akbarzadeh-T and Hamid-R. Kobravi and Ali Shoeibi Robust Voice Feature Selection Using Interval Type-2 Fuzzy AHP for Automated Diagnosis of Parkinson's Disease . . . . 2792--2802 Yukiya Hono and Kei Hashimoto and Keiichiro Oura and Yoshihiko Nankaku and Keiichi Tokuda Sinsy: a Deep Neural Network-Based Singing Voice Synthesis System . . . . . 2803--2815 Jian Tang and Jie Zhang and Yan Song and Ian McLoughlin and Li-Rong Dai Multi-Granularity Sequence Alignment Mapping for Encoder-Decoder Based End-to-End ASR . . . . . . . . . . . . . 2816--2828 Chongman Leong and Xuebo Liu and Derek F. Wong and Lidia S. Chao Exploiting Translation Model for Parallel Corpus Mining . . . . . . . . . 2829--2839 Neil Zeghidour and David Grangier Wavesplit: End-to-End Speech Separation by Speaker Clustering . . . . . . . . . 2840--2849 Dino Oglic and Zoran Cvetkovic and Peter Sollich Learning Waveform-Based Acoustic Models Using Deep Variational Convolutional Neural Networks . . . . . . . . . . . . 2850--2863 Alexandru Nelus and Rainer Martin Privacy-Preserving Audio Classification Using Variational Information Feature Extraction . . . . . . . . . . . . . . . 2864--2877 Hao Li and DeLiang Wang and Xueliang Zhang and Guanglai Gao Recurrent Neural Networks and Acoustic Features for Frame-Level Signal-to-Noise Ratio Estimation . . . . . . . . . . . . 2878--2887 Yi Zhou and Xiaoqing Zheng and Xuanjing Huang Generating Responses With a Given Syntactic Pattern in Chinese Dialogues 2888--2898 Viktor Gunnarsson and Mikael Sternad Binaural Auralization of Microphone Array Room Impulse Responses Using Causal Wiener Filtering . . . . . . . . 2899--2914 Zuolong Chen and Huawei Chen and Quansheng Tu Sensor Imperfection Tolerance Analysis of Robust Linear Differential Microphone Arrays . . . . . . . . . . . . . . . . . 2915--2929 Yusheng Su and Xu Han and Yankai Lin and Zhengyan Zhang and Zhiyuan Liu and Peng Li and Jie Zhou and Maosong Sun CSS-LM: a Contrastive Framework for Semi-Supervised Fine-Tuning of Pre-Trained Language Models . . . . . . 2930--2941 Tobias Kabzinski and Peter Jax A Causality-Constrained Frequency-Domain Least-Squares Filter Design Method for Crosstalk Cancellation . . . . . . . . . 2942--2956 Frank Zalkow and Meinard Müller CTC-Based Learning of Chroma Features for Score Audio Music Retrieval . . . . 2957--2971 Teck Kai Chan and Cheng Siong Chin Multi-Branch Convolutional Macaron net for Sound Event Detection . . . . . . . 2972--2985 Tedd Kourkounakis and Amirhossein Hajavi and Ali Etemad FluentNet: End-to-End Detection of Stuttered Speech Disfluencies With Deep Learning . . . . . . . . . . . . . . . . 2986--2999 Haoyu Li and Junichi Yamagishi Multi-Metric Optimization Using Generative Adversarial Networks for Near-End Speech Intelligibility Enhancement . . . . . . . . . . . . . . 3000--3011 Zehao Lin and Shaobo Cui and Guodun Li and Xiaoming Kang and Feng Ji and Fenglin Li and Zhongzhou Zhao and Haiqing Chen and Yin Zhang Predict-Then-Decide: a Predictive Approach for Wait or Answer Task in Dialogue Systems . . . . . . . . . . . . 3012--3024 Metin Calis and Steven van de Par and Richard Heusdens and Richard Christian Hendriks Localization Based on Enhanced Low Frequency Interaural Level Difference 3025--3039 Christopher Liberatore Native-Nonnative Voice Conversion by Residual Warping in a Sparse, Anchor-Based Representation . . . . . . 3040--3051 Shoichi Koyama and Jesper Brunnström and Hayato Ito and Natsuki Ueno and Hiroshi Saruwatari Spatial Active Noise Control Based on Kernel Interpolation of Sound Field . . 3052--3063 Jipeng Qiang and Yun Li and Yi Zhu and Yunhao Yuan and Yang Shi and Xindong Wu LSBert: Lexical Simplification Based on BERT . . . . . . . . . . . . . . . . . . 3064--3076 Ningyu Zhang and Hongbin Ye and Shumin Deng and Chuanqi Tan and Mosha Chen and Songfang Huang and Fei Huang and Huajun Chen Contrastive Information Extraction With Generative Transformer . . . . . . . . . 3077--3088 Jianyu Wang and Shanzheng Guan and Shupei Liu and Xiao-Lei Zhang Minimum-Volume Multichannel Nonnegative Matrix Factorization for Blind Audio Source Separation . . . . . . . . . . . 3089--3103 Alberto Carini and Stefania Cecchi and Alessandro Terenzi and Simone Orcioni A Room Impulse Response Measurement Method Robust Towards Nonlinearities Based on Orthogonal Periodic Sequences 3104--3117 Jie Zhang and Changheng Li Quantization-Aware Binaural MWF Based Noise Reduction Incorporating External Wireless Devices . . . . . . . . . . . . 3118--3131 Biru Zhu and Xingyao Zhang and Ming Gu and Yangdong Deng Knowledge Enhanced Fact Checking and Verification . . . . . . . . . . . . . . 3132--3143 Mark A. Poletti and Paul D. Teal A Superfast Toeplitz Matrix Inversion Method for Single- and Multi-Channel Inverse Filters and Its Application to Room Equalization . . . . . . . . . . . 3144--3157 Guanlin Li and Lemao Liu and Conghui Zhu and Rui Wang and Tiejun Zhao and Shuming Shi Detecting Source Contextual Barriers for Understanding Neural Machine Translation 3158--3169 Chia-Chih Kuo and Kuan-Yu Chen and Shang-Bao Luo Audio-Aware Spoken Multiple-Choice Question Answering With Pre-Trained Language Models . . . . . . . . . . . . 3170--3179 Rui Liu and Zheng Lin and Weiping Wang Addressing Extraction and Generation Separately: Keyphrase Prediction With Pre-Trained Language Models . . . . . . 3180--3191 Jiangnan Li and Hongliang Pan and Zheng Lin and Peng Fu and Weiping Wang Sarcasm Detection with Commonsense Knowledge . . . . . . . . . . . . . . . 3192--3201 Runyan Yang and Gaofeng Cheng and Haoran Miao and Ta Li and Pengyuan Zhang and Yonghong Yan Keyword Search Using Attention-Based End-to-End ASR and Frame-Synchronous Phoneme Alignments . . . . . . . . . . . 3202--3215 Tareq Alkhaldi and Chenhui Chu and Sadao Kurohashi Flexibly Focusing on Supporting Facts, Using Bridge Links, and Jointly Training Specialized Modules for Multi-Hop Question Answering . . . . . . . . . . . 3216--3225 Wenyi Wu and Yegui Xiao and Jianhui Lin and Liying Ma and Khashayar Khorasani An Efficient Filter Bank Structure for Adaptive Notch Filtering and Applications . . . . . . . . . . . . . . 3226--3241 Xinsheng Wang and Justin van der Hout and Jihua Zhu and Mark Hasegawa-Johnson and Odette Scharenborg Synthesizing Spoken Descriptions of Images . . . . . . . . . . . . . . . . . 3242--3254 Vincent W. Neo and Christine Evers and Patrick A. Naylor Enhancement of Noisy Reverberant Speech Using Polynomial Matrix Eigenvalue Decomposition . . . . . . . . . . . . . 3255--3266 Riccardo Giampiccolo and Mauro Giuseppe de Bari and Alberto Bernardini and Augusto Sarti Wave Digital Modeling and Implementation of Nonlinear Audio Circuits With Nullors 3267--3279 Xixin Wu and Yuewen Cao and Hui Lu and Songxiang Liu and Disong Wang and Zhiyong Wu and Xunying Liu and Helen Meng Speech Emotion Recognition Using Sequential Capsule Networks . . . . . . 3280--3291 Yuan Gong and Yu-An Chung and James Glass PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation . . . . . . . . . . . . . . 3292--3306 Licheng Zhang and Zhendong Mao and Benfeng Xu and Quan Wang and Yongdong Zhang Review and Arrange: Curriculum Learning for Natural Language Understanding . . . 3307--3320 Fei He and Ling He and Jing Zhang and Yuanyuan Li and Xi Xiong Automatic Detection of Affective Flattening in Schizophrenia: Acoustic Correlates to Sound Waves and Auditory Perception . . . . . . . . . . . . . . . 3321--3334 Saoussen Mathlouthi Bouzid and Chiraz Ben Othmane Zribi Efficient Learning Approach for Pronominal Anaphora and Ellipsis Identification and Resolution in Arabic Texts . . . . . . . . . . . . . . . . . 3335--3348 Arda Yüksel and Berke U\ugurlu and Aykut Koç Semantic Change Detection With Gaussian Word Embeddings . . . . . . . . . . . . 3349--3361 Mei Li and Lu Xiang and Xiaomian Kang and Yang Zhao and Yu Zhou and Chengqing Zong Medical Term and Status Generation From Chinese Clinical Dialogue With Multi-Granularity Transformer . . . . . 3362--3374 Yongwei Li and Jianhua Tao and Donna Erickson and Bin Liu and Masato Akagi $ F_0 $-Noise-Robust Glottal Source and Vocal Tract Analysis Based on ARX-LF Model . . . . . . . . . . . . . . . . . 3375--3383 Xianwen Liao and Yongzhong Huang and Yongzhuang Wei and Chenhao Zhang and Fu Wang and Yong Wang Efficient Estimate of Sentence's Representation Based on the Difference Semantics Model . . . . . . . . . . . . 3384--3399 Kwang Myung Jeon and Geon Woo Lee and Nam Kyun Kim and Hong Kook Kim TAU-Net: Temporal Activation $U$-Net Shared With Nonnegative Matrix Factorization for Speech Enhancement in Unseen Noise Environments . . . . . . . 3400--3414 Yi-Yang Ding and Hao-Jian Lin and Li-Juan Liu and Zhen-Hua Ling and Yu Hu Robustness of Speech Spoofing Detectors Against Adversarial Post-Processing of Voice Conversion . . . . . . . . . . . . 3415--3426 Yi Zhou and Xiaohai Tian and Haizhou Li Language Agnostic Speaker Embedding for Cross-Lingual Personalized Speech Generation . . . . . . . . . . . . . . . 3427--3439 Ju Lin and Adriaan J. de Lind van Wijngaarden and Kuang-Ching Wang and Melissa C. Smith Speech Enhancement Using Multi-Stage Self-Attentive Temporal Convolutional Networks . . . . . . . . . . . . . . . . 3440--3450 Wei-Ning Hsu and Benjamin Bolte and Yao-Hung Hubert Tsai and Kushal Lakhotia and Ruslan Salakhutdinov and Abdelrahman Mohamed HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units . . . . . . . 3451--3460 Kouei Yamaoka and Nobutaka Ono and Shoji Makino Time-Frequency-Bin-Wise Linear Combination of Beamformers for Distortionless Signal Enhancement . . . 3461--3475 Zhong-Qiu Wang and Gordon Wichern and Jonathan Le Roux Convolutive Prediction for Monaural Speech Dereverberation and Noisy-Reverberant Speaker Separation . . 3476--3490 Bing Yang and Hong Liu and Xiaofei Li Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization . . . . . . . . . . 3491--3503 Yiming Cui and Wanxiang Che and Ting Liu and Bing Qin and Ziqing Yang Pre-Training With Whole Word Masking for Chinese BERT . . . . . . . . . . . . . . 3504--3514 Leda Sar and Mark Hasegawa-Johnson and Chang D. Yoo Counterfactually Fair Automatic Speech Recognition . . . . . . . . . . . . . . 3515--3525 Zhuohuang Zhang and Yong Xu and Meng Yu and Shi-Xiong Zhang and Lianwu Chen and Donald S. Williamson and Dong Yu Multi-Channel Multi-Frame ADL-MVDR for Target Speech Separation . . . . . . . . 3526--3540 Nils L. Westhausen and Rainer Huber and Hannah Baumgartner and Ragini Sinha and Jan Rennies and Bernd T. Meyer Reduction of Subjective Listening Effort for TV Broadcast Signals With Recurrent Neural Networks . . . . . . . . . . . . 3541--3550 Shota Sasaki and Jun Suzuki and Kentaro Inui Subword-Based Compact Reconstruction for Open-Vocabulary Neural Word Embeddings 3551--3564 Xiaodong Cui and Wei Zhang and Abdullah Kayi and Mingrui Liu and Ulrich Finkler and Brian Kingsbury and George Saon and David Kung Asynchronous Decentralized Distributed Training of Acoustic Models . . . . . . 3565--3576 Junqing Zhang and Wen Zhang and Jihui Aimee Zhang and Thushara Dheemantha Abhayapala and Lijun Zhang Spatial Active Noise Control in Rooms Using Higher Order Sources . . . . . . . 3577--3591 Bingzhi Chen and Qi Cao and Mixiao Hou and Zheng Zhang and Guangming Lu and David Zhang Multimodal Emotion Recognition With Temporal and Semantic Consistency . . . 3592--3603 S. Supraja and Andy W. H. Khong and S. Tatinati Regularized Phrase-Based Topic Model for Automatic Question Classification With Domain-Agnostic Class Labels . . . . . . 3604--3616 Natsuko Maeda and Filippo Maria Fazi and Falk-Martin Hoffmann Sound Field Reproduction With a Cylindrical Loudspeaker Array Using First Order Wall Reflections . . . . . . 3617--3630 Xugang Lu and Peng Shen and Yu Tsao and Hisashi Kawai Coupling a Generative Model With a Discriminative Learning Framework for Speaker Verification . . . . . . . . . . 3631--3641 Hannes Helmholz and David Lou Alon and Sebasti\`a V. Amengual Garí and Jens Ahrens Effects of Additive Noise in Binaural Rendering of Spherical Microphone Array Signals . . . . . . . . . . . . . . . . 3642--3653 Joanna Hong and Minsu Kim and Se Jin Park and Yong Man Ro Speech Reconstruction With Reminiscent Sound Via Visual Voice Memory . . . . . 3654--3667 Ran Weisman and Tom Shlomo and Vladimir Tourbabin and Paul Calamia and Boaz Rafaely Robustness of Acoustic Rake Filters in Minimum Variance Beamforming . . . . . . 3668--3678 Junhao Xu and Jianwei Yu and Shoukang Hu and Xunying Liu and Helen Meng Mixed Precision Low-Bit Quantization of Neural Network Language Models for Speech Recognition . . . . . . . . . . . 3679--3693 Jidong Ge and Yunyun Huang and Xiaoyu Shen and Chuanyi Li and Wei Hu Learning Fine-Grained Fact-Article Correspondence in Legal Cases . . . . . 3694--3706 Qiuqiang Kong and Bochen Li and Xuchen Song and Yuan Wan and Yuxuan Wang High-Resolution Piano Transcription With Pedals by Regressing Onset and Offset Times . . . . . . . . . . . . . . . . . 3707--3717 Anonymous 2021 Index \booktitleIEEE/ACM Transactions on Audio, Speech, and Language Processing Vol. 29 . . . . . . 3718--3760
Anonymous IEEE Signal Processing Society . . . . . C2--C2 Qianying Liu and Wenyu Guan and Sujian Li and Fei Cheng and Daisuke Kawahara and Sadao Kurohashi RODA: Reverse Operation Based Data Augmentation for Solving Math Word Problems . . . . . . . . . . . . . . . . 1--11 Kai Zhen and Jongmo Sung and Mi Suk Lee and Seungkwon Beack and Minje Kim Scalable and Efficient Neural Speech Coding: a Hybrid Design . . . . . . . . 12--25 Sen Yang and Yang Liu and Dawei Feng and Dongsheng Li Text Generation From Data With Dynamic Planning . . . . . . . . . . . . . . . . 26--34 Stefan Liebich and Peter Vary Occlusion Effect Cancellation in Headphones and Hearing Devices The Sister of Active Noise Cancellation . . 35--48 Zhuosheng Zhang and Haojie Yu and Hai Zhao and Masao Utiyama Which Apple Keeps Which Doctor Away? Colorful Word Representations With Visual Oracles . . . . . . . . . . . . . 49--59 Zhenyu Wang and John H. L. Hansen Multi-Source Domain Adaptation for Text-Independent Forensic Speaker Recognition . . . . . . . . . . . . . . 60--75 Kengtao Zheng and Nankai Lin and Shengyi Jiang Unsupervised Character Embedding Correction and Candidate Word Denoising 76--86 Bing Ma and Haifeng Sun and Jingyu Wang and Qi Qi and Jianxin Liao Extractive Dialogue Summarization Without Annotation Based on Distantly Supervised Machine Reading Comprehension in Customer Service . . . . . . . . . . 87--97 Shengcai Liu and Ning Lu and Cheng Chen and Ke Tang Efficient Combinatorial Optimization for Word-Level Adversarial Textual Attack 98--111 Alessandro Terenzi and Nicola Ortolani and Inês Nolasco and Emmanouil Benetos and Stefania Cecchi Comparison of Feature Extraction Methods for Sound-Based Classification of Honey Bee Activity . . . . . . . . . . . . . . 112--122 Shuiyang Mao and P. C. Ching and Tan Lee Enhancing Segment-Based Speech Emotion Recognition by Iterative Self-Learning 123--134 Abdolreza Sabzi Shahrebabaki and Giampiero Salvi and Torbjòrn Svendsen and Sabato Marco Siniscalchi Acoustic-to-Articulatory Mapping With Joint Optimization of Deep Speech Enhancement and Articulatory Inversion Models . . . . . . . . . . . . . . . . . 135--147 Javier Jorge and Adri\`a Giménez and Joan Albert Silvestre-Cerd\`a and Jorge Civera and Albert Sanchis and Alfons Juan Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models . . . . 148--161 Muhammed P. V. Shifas and C\uat\ualin Zoril\ua and Yannis Stylianou End-to-End Neural Based Modification of Noisy Speech for Speech-in-Noise Intelligibility Improvement . . . . . . 162--173 Joon-Young Yang and Joon-Hyuk Chang VACE-WPE: Virtual Acoustic Channel Expansion Based on Neural Networks for Weighted Prediction Error-Based Speech Dereverberation . . . . . . . . . . . . 174--189 Chenpeng Du and Kai Yu Phone-Level Prosody Modelling With GMM-Based MDN for Diverse and Controllable Speech Synthesis . . . . . 190--201 Haibin Wu and Xu Li and Andy T. Liu and Zhiyong Wu and Helen Meng and Hung-Yi Lee Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning . . . . . . . . . . . . . . . . 202--217 Mixiao Hou and Zheng Zhang and Qi Cao and David Zhang and Guangming Lu Multi-View Speech Emotion Recognition Via Collective Relation Construction . . 218--229 Da-rong Liu and Po-chun Hsu and Yi-chen Chen and Sung-feng Huang and Shun-po Chuang and Da-yi Wu and Hung-yi Lee Learning Phone Recognition From Unpaired Audio and Phone Sequences Based on Generative Adversarial Network . . . . . 230--243 Yuting Zhao and Mamoru Komachi and Tomoyuki Kajiwara and Chenhui Chu Word-Region Alignment-Guided Multimodal Neural Machine Translation . . . . . . . 244--259 Zhuosheng Zhang and Yiqing Zhang and Hai Zhao Syntax-Aware Multi-Spans Generation for Reading Comprehension . . . . . . . . . 260--268 Pengfei Zhu and Zhuosheng Zhang and Hai Zhao and Xiaoguang Li DUMA: Reading Comprehension With Transposition Thinking . . . . . . . . . 269--279 Jiayuan Xie and Ningxin Peng and Yi Cai and Tao Wang and Qingbao Huang Diverse Distractor Generation for Constructing High-Quality Multiple Choice Questions . . . . . . . . . . . . 280--291 Jie Zhang and Guanghui Zhang A Parametric Unconstrained Beamformer Based Binaural Noise Reduction for Assistive Hearing . . . . . . . . . . . 292--304 Luca Turchet and Johan Pauwels Music Emotion Recognition: Intention of Composers-Performers Versus Perception of Musicians, Non-Musicians, and Listening Machines . . . . . . . . . . . 305--316 Wenxin Hou and Han Zhu and Yidong Wang and Jindong Wang and Tao Qin and Renjun Xu and Takahiro Shinozaki Exploiting Adapters for Cross-Lingual Low-Resource Speech Recognition . . . . 317--329 Kehai Chen and Rui Wang and Masao Utiyama and Eiichiro Sumita Integrating Prior Translation Knowledge Into Neural Machine Translation . . . . 330--339 Keqi Deng and Gaofeng Cheng and Runyan Yang and Yonghong Yan Alleviating ASR Long-Tailed Problem by Decoupling the Learning of Representation and Classification . . . 340--354 Zuchao Li and Junru Zhou and Hai Zhao and Kevin Parnow HPSG-Inspired Joint Neural Constituent and Dependency Parsing in $ O(n^3) $ Time Complexity . . . . . . . . . . . . 355--366 Xuan Shi and Erica Cooper and Junichi Yamagishi Use of Speaker Recognition Approaches for Learning and Evaluating Embedding Representations of Musical Instrument Sounds . . . . . . . . . . . . . . . . . 367--377 Zengwei Yao and Wenjie Pei and Fanglin Chen and Guangming Lu and David Zhang Stepwise-Refining Speech Separation Network via Fine-Grained Encoding in High-Order Latent Domain . . . . . . . . 378--393 Yanmin Qian and Zhikai Zhou Optimizing Data Usage for Low-Resource Speech Recognition . . . . . . . . . . . 394--403 Narla John Metilda Sagaya Mary and Srinivasan Umesh and Sandesh Varadaraju Katta S-Vectors and TESA: Speaker Embeddings and a Speaker Authenticator Based on Transformer Encoder . . . . . . . . . . 404--413 Bengt J. Borgström Bayesian Estimation of PLDA in the Presence of Noisy Training Labels, With Applications to Speaker Verification . . 414--428 Menglong Lu and Zhen Huang and Binyang Li and Yunxiang Zhao and Zheng Qin and DongSheng Li SIFTER: a Framework for Robust Rumor Detection . . . . . . . . . . . . . . . 429--442 Lantian Li and Dong Wang and Jiawen Kang and Renyu Wang and Jing Wu and Zhendong Gao and Xiao Chen A Principle Solution for Enroll-Test Mismatch in Speaker Recognition . . . . 443--455 Feiran Yang Analysis of Deficient-Length Partitioned-Block Frequency-Domain Adaptive Filters . . . . . . . . . . . . 456--467 Hui Jiang and Linfeng Song and Yubin Ge and Fandong Meng and Junfeng Yao and Jinsong Su An AST Structure Enhanced Decoder for Code Generation . . . . . . . . . . . . 468--476 Anssi Kanervisto and Ville Hautamäki and Tomi Kinnunen and Junichi Yamagishi Optimizing Tandem Speaker Verification and Anti-Spoofing Systems . . . . . . . 477--488 Xin Ni and Jia Ren FC-U2-Net: a Novel Deep Neural Network for Singing Voice Separation . . . . . . 489--494 Neil Zeghidour and Alejandro Luebs and Ahmed Omran and Jan Skoglund and Marco Tagliasacchi SoundStream: an End-to-End Neural Audio Codec . . . . . . . . . . . . . . . . . 495--507 Wageesha Manamperi and Thushara D. Abhayapala and Jihui Zhang and Prasanga N. Samarasinghe Drone Audition: Sound Source Localization Using On-Board Microphones 508--519 Qian Li and Hao Peng and Jianxin Li and Jia Wu and Yuanxing Ning and Lihong Wang and Philip S. Yu and Zheng Wang Reinforcement Learning-Based Dialogue Guided Event Extraction to Exploit Argument Relations . . . . . . . . . . . 520--533 Santiago Ruiz and Toon van Waterschoot and Marc Moonen Distributed Combined Acoustic Echo Cancellation and Noise Reduction in Wireless Acoustic Sensor and Actuator Networks . . . . . . . . . . . . . . . . 534--547 Lukas Grinewitschus and Peter Jung The Harmonic Shift Algorithm for Efficient Multi-Pitch Detection . . . . 548--561 Ziyao Lu and Xiang Li and Yang Liu and Chulun Zhou and Jianwei Cui and Bin Wang and Min Zhang and Jinsong Su Exploring Multi-Stage Information Interactions for Multi-Source Neural Machine Translation . . . . . . . . . . 562--570 Jingxuan Yang and Si Li and Sheng Gao and Jun Guo CorefDPR: a Joint Model for Coreference Resolution and Dropped Pronoun Recovery in Chinese Conversations . . . . . . . . 571--581 Timuçin Berk Atalay and Zühre Sü Gül and Enzo De Sena and Zoran Cvetkovi\'c and Hüseyin Hacìhabibo\uglu Scattering Delay Network Simulator of Coupled Volume Acoustics . . . . . . . . 582--593 Yi Zhang and Lei Li and Yunfang Wu and Qi Su and Xu Sun Alleviating the Knowledge-Language Inconsistency: a Study for Deep Commonsense Knowledge . . . . . . . . . 594--604 Ke Tan and Zhong-Qiu Wang and DeLiang Wang Neural Spectrospatial Filtering . . . . 605--621 Qianren Mao and Jianxin Li and Chenghua Lin and Congwen Chen and Hao Peng and Lihong Wang and Philip S. Yu Adaptive Pre-Training and Collaborative Fine-Tuning: a Win-Win Strategy to Improve Review Analysis Tasks . . . . . 622--634 Zifeng Cheng and Zhiwei Jiang and Yafeng Yin and Cong Wang and Qing Gu Learning to Classify Open Intent via Soft Labeling and Manifold Mixup . . . . 635--645 Xiaochun An and Frank K. Soong and Lei Xie Disentangling Style and Speaker Attributes for TTS Style Transfer . . . 646--658 Zhuang Chen and Tieyun Qian Retrieve-and-Edit Domain Adaptation for End2End Aspect Based Sentiment Analysis 659--672 Jian Liu and Mengshi Yu and Yufeng Chen and Jinan Xu Cross-Domain Slot Filling as Machine Reading Comprehension: a New Perspective 673--685 Yongkang Liu and Qingbao Huang and Jing Li and Linzhang Mo and Yi Cai and Qing Li SSAP: Storylines and Sentiment Aware Pre-Trained Model for Story Ending Generation . . . . . . . . . . . . . . . 686--694 Ying Zhou and Xuefeng Liang and Yu Gu and Yifei Yin and Longshan Yao Multi-Classifier Interactive Learning for Ambiguous Speech Emotion Recognition 695--705 Poul Hoang and Jan Mark de Haan and Zheng-Hua Tan and Jesper Jensen Multichannel Speech Enhancement With Own Voice-Based Interfering Speech Suppression for Hearing Assistive Devices . . . . . . . . . . . . . . . . 706--720 Weijie Yu and Chen Xu and Jun Xu and Liang Pang and Ji-Rong Wen Distribution Distance Regularized Sequence Representation for Text Matching in Asymmetrical Domains . . . . 721--733 Heming Wang and DeLiang Wang Neural Cascade Architecture With Triple-Domain Loss for Speech Enhancement . . . . . . . . . . . . . . 734--743 Riccardo R. De Lucia and Antonio Canclini and Fabio Antonacci and Augusto Sarti Group Dictionary Equivalent Source Method for Sparse Nearfield Acoustic Holography . . . . . . . . . . . . . . . 744--757 Tong Ma and Ying Wei and Xin Lou Reconfigurable Nonuniform Filter Bank for Hearing Aid Systems . . . . . . . . 758--771 Victoria Mingote and Antonio Miguel and Dayana Ribas and Alfonso Ortega and Eduardo Lleida aDCF Loss Function for Deep Metric Learning in End-to-End Text-Dependent Speaker Verification Systems . . . . . . 772--784 Quansheng Tu and Huawei Chen Theoretical Lower Bounds on the Performance of the First-Order Differential Microphone Arrays With Sensor Imperfections . . . . . . . . . . 785--801 Taihui Wang and Feiran Yang and Jun Yang Convolutive Transfer Function-Based Multichannel Nonnegative Matrix Factorization for Overdetermined Blind Source Separation . . . . . . . . . . . 802--815 Yi Zhang and Guangyou Zhou and Zhiwen Xie and Jimmy Xiangji Huang HGEN: Learning Hierarchical Heterogeneous Graph Encoding for Math Word Problem Solving . . . . . . . . . . 816--828 Eduardo Fonseca and Xavier Favory and Jordi Pons and Frederic Font and Xavier Serra FSD50K: an Open Dataset of Human-Labeled Sound Events . . . . . . . . . . . . . . 829--852 Yi Lei and Shan Yang and Xinsheng Wang and Lei Xie MsEmoTTS: Multi-Scale Emotion Transfer, Prediction, and Control for Emotional Speech Synthesis . . . . . . . . . . . . 853--864 Tao Wang and Ruibo Fu and Jiangyan Yi and Jianhua Tao and Zhengqi Wen NeuralDPS: Neural Deterministic Plus Stochastic Model With Multiband Excitation for Noise-Controllable Waveform Generation . . . . . . . . . . 865--878 Simon Stone and Yingming Gao and Peter Birkholz Articulatory Synthesis of Vocalized /r/ Allophones in German . . . . . . . . . . 879--889 Prashant Serai and Vishal Sunder and Eric Fosler-Lussier Hallucination of Speech Recognition Errors With Sequence to Sequence Learning . . . . . . . . . . . . . . . . 890--900 Bin Wu and Sakriani Sakti and Jinsong Zhang and Satoshi Nakamura Modeling Unsupervised Empirical Adaptation by DPGMM and DPGMM-RNN Hybrid Model to Extract Perceptual Features for Low-Resource ASR . . . . . . . . . . . . 901--916 Mi Zhang and Tieyun Qian and Bing Liu Exploit Feature and Relation Hierarchy for Relation Extraction . . . . . . . . 917--930 Wenxiang Jiao and Xing Wang and Shilin He and Zhaopeng Tu and Irwin King and Michael R. Lyu Exploiting Inactive Examples for Natural Language Generation With Data Rejuvenation . . . . . . . . . . . . . . 931--943 Youzhi Tu and Man-Wai Mak Aggregating Frame-Level Information in the Spectral Domain With Self-Attention for Speaker Embedding . . . . . . . . . 944--957 Zhixing Tan and Zeyuan Yang and Meng Zhang and Qun Liu and Maosong Sun and Yang Liu Dynamic Multi-Branch Layers for On-Device Neural Machine Translation . . 958--967 Weiwei Lin and Man-Wai Mak Mixture Representation Learning for Deep Speaker Embedding . . . . . . . . . . . 968--978 Peng Zhu and Dawei Cheng and Fangzhou Yang and Yifeng Luo and Dingjiang Huang and Weining Qian and Aoying Zhou Improving Chinese Named Entity Recognition by Large-Scale Syntactic Dependency Graph . . . . . . . . . . . . 979--991 Xiaobo Liang and Lijun Wu and Juntao Li and Tao Qin and Min Zhang and Tie-Yan Liu Multi-Teacher Distillation With Single Model for Neural Machine Translation . . 992--1002 Xiaofeng Chen and Guohua Wang and Haopeng Ren and Yi Cai and Ho-fung Leung and Tao Wang Task-Adaptive Feature Fusion for Generalized Few-Shot Relation Classification in an Open World Environment . . . . . . . . . . . . . . 1003--1015 Yu-Chen Lin and Cheng Yu and Yi-Te Hsu and Szu-Wei Fu and Yu Tsao and Tei-Wei Kuo SEOFP-NET: Compression and Acceleration of Deep Neural Networks for Speech Enhancement Using Sign-Exponent-Only Floating-Points . . . . . . . . . . . . 1016--1031 Tomohiro Nakatani and Rintaro Ikeshita and Keisuke Kinoshita and Hiroshi Sawada and Naoyuki Kamo and Shoko Araki Switching Independent Vector Analysis and its Extension to Blind and Spatially Guided Convolutional Beamforming Algorithms . . . . . . . . . . . . . . . 1032--1047 Jianhua Geng and Sifan Wang and Qinglai Liu and Xin Lou Multi-Level Time-Frequency Bins Selection for Direction of Arrival Estimation Using a Single Acoustic Vector Sensor . . . . . . . . . . . . . 1048--1060 Qinzhuo Wu and Qi Zhang and Xuanjing Huang Automatic Math Word Problem Generation With Topic-Expression Co-Attention Mechanism and Reinforcement Learning . . 1061--1072 Michael Nigro and Sridhar Krishnan Multimodal System for Audio Scene Source Counting and Analysis . . . . . . . . . 1073--1082 Yishu Peng and Sheng Zhang and Jiashu Zhang and Wei Xing Zheng Combined-Sample Multiband-Structured Subband Filtering Algorithms . . . . . . 1083--1092 Shoukang Hu and Xurong Xie and Mingyu Cui and Jiajun Deng and Shansong Liu and Jianwei Yu and Mengzhe Geng and Xunying Liu and Helen Meng Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks . . . 1093--1107 Xudong Dang and Wen Ma and Emanuël A. P. Habets and Hongyan Zhu TDOA-Based Robust Sound Source Localization With Sparse Regularization in Wireless Acoustic Sensor Networks . . 1108--1123 Shan Gao and Jing Lin and Xihong Wu and Tianshu Qu Sparse DNN Model for Frequency Expanding of Higher Order Ambisonics Encoding Process . . . . . . . . . . . . . . . . 1124--1135 Giovanni Pepe and Leonardo Gabrielli and Stefano Squartini and Carlo Tripodi and Nicol\`o Strozzi Deep Optimization of Parametric IIR Filters for Audio Equalization . . . . . 1136--1149 Moa Lee and Junmo Lee and Joon-Hyuk Chang Non-Autoregressive Fully Parallel Deep Convolutional Neural Speech Synthesis 1150--1159 Liam Barrett and Junchao Hu and Peter Howell Systematic Review of Machine Learning Approaches for Detecting Developmental Stuttering . . . . . . . . . . . . . . . 1160--1172 Sang-Hoon Lee and Hyeong-Rae Noh and Woo-Jeoung Nam and Seong-Whan Lee Duration Controllable Voice Conversion via Phoneme-Based Information Bottleneck 1173--1183 Zhihong Shao and Zhongqin Wu and Minlie Huang AdvExpander: Generating Natural Language Adversarial Examples by Expanding Text 1184--1196 Dhanunjaya Varma Devalraju and Padmanabhan Rajan Multiview Embeddings for Soundscape Classification . . . . . . . . . . . . . 1197--1206 Chengyu Wang and Suyang Dai and Yipeng Wang and Fei Yang and Minghui Qiu and Kehan Chen and Wei Zhou and Jun Huang ARoBERT: an ASR Robust Pre-Trained Language Model for Spoken Language Understanding . . . . . . . . . . . . . 1207--1218 Jonah Ong and Ba Tuong Vo and Sven Nordholm and Ba-Ngu Vo and Diluka Moratuwage and Changbeom Shim Audio-Visual Based Online Multi-Source Separation . . . . . . . . . . . . . . . 1219--1234 Leyang Cui and Yafu Li and Yue Zhang Label Attention Network for Structured Prediction . . . . . . . . . . . . . . . 1235--1248 Sarinah Sutojo and Tobias May and Steven van de Par Segmentation of Multitalker Mixtures Based on Local Feature Contrasts and Auditory Glimpses . . . . . . . . . . . 1249--1262 Hao Gao and Xuelei Feng and Yong Shen Weighted Loudspeaker Placement Method for Sound Field Reproduction . . . . . . 1263--1276 Gongping Huang and Jacob Benesty and Israel Cohen and Jingdong Chen Kronecker Product Multichannel Linear Filtering for Adaptive Weighted Prediction Error-Based Speech Dereverberation . . . . . . . . . . . . 1277--1289 Takehiro Sugimoto Loudness-Level-Chasing Algorithm for Multiformat Live Audio Production . . . 1290--1304 Junshuang Wu and Richong Zhang and Yongyi Mao and Jinpeng Huai Dealing With Hierarchical Types and Label Noise in Fine-Grained Entity Typing . . . . . . . . . . . . . . . . . 1305--1318 Anton Ragni and Mark J. F. Gales and Oliver Rose and Katherine M. Knill and Alexandros Kastanos and Qiujia Li and Preben M. Ness Increasing Context for Estimating Confidence Scores in Automatic Speech Recognition . . . . . . . . . . . . . . 1319--1329 Zhongxin Bai and Jianyu Wang and Xiao-Lei Zhang and Jingdong Chen End-to-End Speaker Verification via Curriculum Bipartite Ranking Weighted Binary Cross-Entropy . . . . . . . . . . 1330--1344 Shang-Yi Chuang and Hsin-Min Wang and Yu Tsao Improved Lite Audio-Visual Speech Enhancement . . . . . . . . . . . . . . 1345--1359 Gaofeng Cheng and Haoran Miao and Runyan Yang and Keqi Deng and Yonghong Yan ETEH: Unified Attention-Based End-to-End ASR and KWS Architecture . . . . . . . . 1360--1373 Ashutosh Pandey and DeLiang Wang Self-Attending RNN for Speech Enhancement to Improve Cross-Corpus Generalization . . . . . . . . . . . . . 1374--1385 Di Jin and Shuyang Gao and Seokhwan Kim and Yang Liu and Dilek Hakkani-Tür Towards Textual Out-of-Domain Detection Without In-Domain Labels . . . . . . . . 1386--1395 K. Mrinalini and P. Vijayalakshmi and T. Nagarajan SBSim: a Sentence-BERT Similarity-Based Evaluation Metric for Indian Language Neural Machine Translation Systems . . . 1396--1406 Changhong Wang and Emmanouil Benetos and Vincent Lostanlen and Elaine Chew Adaptive Scattering Transforms for Playing Technique Recognition . . . . . 1407--1421 Danwei Cai and Weiqing Wang and Ming Li Incorporating Visual Information in Audio Based Self-Supervised Speaker Recognition . . . . . . . . . . . . . . 1422--1435 Yu Luo and Lina Pu EC-ANC: Edge Case-Enhanced Active Noise Cancellation for True Wireless Stereo Earbuds . . . . . . . . . . . . . . . . 1436--1447 Tao Li and Xinsheng Wang and Qicong Xie and Zhichao Wang and Lei Xie Cross-Speaker Emotion Disentangling and Transfer for End-to-End Speech Synthesis 1448--1460 Yilin Zhao and Zhuosheng Zhang and Hai Zhao Reference Knowledgeable Network for Machine Reading Comprehension . . . . . 1461--1473 Fu-Hao Yu and Kuan-Yu Chen and Ke-Han Lu Non-Autoregressive ASR Modeling Using Pre-Trained Language Models for Chinese Speech Recognition . . . . . . . . . . . 1474--1482 Yiming Cui and Ting Liu and Wanxiang Che and Zhigang Chen and Shijin Wang Teaching Machines to Read, Answer and Explain . . . . . . . . . . . . . . . . 1483--1492 Shota Horiguchi and Yusuke Fujita and Shinji Watanabe and Yawen Xue and Paola García Encoder-Decoder Based Attractors for End-to-End Neural Diarization . . . . . 1493--1507 Chenda Li and Zhuo Chen and Yanmin Qian Dual-Path Modeling With Memory Embedding Model for Continuous Speech Separation 1508--1520 Yu Tong and Jingzhi Guo and Jizhe Zhou Separation Inference: a Unified Framework for Word Segmentation in East Asian Languages . . . . . . . . . . . . 1521--1530
Mrinmoy Bhattacharjee and S. R. M. Prasanna and Prithwijit Guha Clean vs. Overlapped Speech-Music Detection Using Harmonic-Percussive Features and Multi-Task Learning . . . . 1--10 Zhaojie Luo and Shoufeng Lin and Rui Liu and Jun Baba and Yuichiro Yoshikawa and Hiroshi Ishiguro Decoupling Speaker-Independent Emotions for Voice Conversion via Source-Filter Networks . . . . . . . . . . . . . . . . 11--24 Jinchuan Tian and Jianwei Yu and Chao Weng and Yuexian Zou and Dong Yu Integrating Lattice-Free MMI Into End-to-End Speech Recognition . . . . . 25--38 Ravi Shankar and Hsi-Wei Hsieh and Nicolas Charon and Archana Venkataraman A Diffeomorphic Flow-Based Variational Framework for Multi-Speaker Emotion Conversion . . . . . . . . . . . . . . . 39--53 Ryandhimas E. Zezario and Szu-Wei Fu and Fei Chen and Chiou-Shann Fuh and Hsin-Min Wang and Yu Tsao Deep Learning-Based Non-Intrusive Multi-Objective Speech Assessment Model With Cross-Domain Features . . . . . . . 54--70 Xiaoyi Qin and Danwei Cai and Ming Li Robust Multi-Channel Far-Field Speaker Verification Under Different In-Domain Data Availability Scenarios . . . . . . 71--85 Vikram C. Mathad and Julie M. Liss and Kathy Chapman and Nancy Scherer and Visar Berisha Consonant-Vowel Transition Models Based on Deep Learning for Objective Evaluation of Articulation . . . . . . . 86--95 Li Li and Hirokazu Kameoka and Shoji Makino FastMVAE2: On Improving and Accelerating the Fast Variational Autoencoder-Based Source Separation Algorithm for Determined Mixtures . . . . . . . . . . 96--110 Jie Wang and Yan Yang and Keyu Liu and Zhiping Zhu and Xiaorong Liu M3S: Scene Graph Driven Multi-Granularity Multi-Task Learning for Multi-Modal NER . . . . . . . . . . 111--120 Marc Delcroix and Jorge Bennasar Vazquez and Tsubasa Ochiai and Keisuke Kinoshita and Yasunori Ohishi and Shoko Araki SoundBeam: Target Sound Extraction Conditioned on Sound-Class Labels and Enrollment Clues for Increased Performance and Continuous Learning . . 121--136 Daisuke Niizumi and Daiki Takeuchi and Yasunori Ohishi and Noboru Harada and Kunio Kashino BYOL for Audio: Exploring Pre-Trained General-Purpose Audio Representations 137--151 Yingrui Xu and Hao Liu and Jingguo Ge and Xiaodan Zhang and Jingyuan Hu and Yulei Wu and Honglei Lv and Hongbin Shi and Wei Zhou Mining Weak Relations Between Reviews for Opinion Spam Detection . . . . . . . 152--162 Yoshiki Masuyama and Kohei Yatabe and Kento Nagatomo and Yasuhiro Oikawa Online Phase Reconstruction via DNN-Based Phase Differences Estimation 163--176 Jiang Liu and Donghong Ji and Jingye Li and Dongdong Xie and Chong Teng and Liang Zhao and Fei Li TOE: a Grid-Tagging Discontinuous NER Model Enhanced by Embedding Tag\slash Word Relations and More Fine-Grained Tags . . . . . . . . . . . . . . . . . . 177--187 Zhe Hu and Zhiwei Cao and Hou Pong Chan and Jiachen Liu and Xinyan Xiao and Jinsong Su and Hua Wu Controllable Dialogue Generation With Disentangled Multi-Grained Style Specification and Attribute Consistency Reward . . . . . . . . . . . . . . . . . 188--199 Sondes Abderrazek and Corinne Fredouille and Alain Ghio and Muriel Lalain and Christine Meunier and Virginie Woisard Interpreting Deep Representations of Phonetic Features via Neuro-Based Concept Detector: Application to Speech Disorders Due to Head and Neck Cancer 200--214 Jie Zhang and Rui Tao and Jun Du and Li-Rong Dai Energy-Efficient Sparsity-Driven Speech Enhancement in Wireless Acoustic Sensor Networks . . . . . . . . . . . . . . . . 215--228 Xianke Wang and Bowen Tian and Weiming Yang and Wei Xu and Wenqing Cheng MusicYOLO: a Vision-Based Framework for Automatic Singing Transcription . . . . 229--241 Yuanyuan Liu and Mittapalle Kiran Reddy and Nelly Penttila and Tiina Ihalainen and Paavo Alku and Okko Rasanen Automatic Assessment of Parkinson's Disease Using Speech Representations of Phonation and Articulation . . . . . . . . . . . . . . 242--255 David Sudholt and Alec Wright and Cumhur Erkut and Vesa Valimaki Pruning Deep Neural Network Models of Guitar Distortion Effects . . . . . . . 256--264 Fangkai Jiao and Yangyang Guo and Minlie Huang and Liqiang Nie Enhanced Multi-Domain Dialogue State Tracker With Second-Order Slot Interactions . . . . . . . . . . . . . . 265--276 Hui Tian and Yiqin Qiu and Wojciech Mazurczyk and Haizhou Li and Zhenxing Qian STFF-SM: Steganalysis Model Based on Spatial and Temporal Feature Fusion for Speech Streams . . . . . . . . . . . . . 277--289 Gopendra Vikram Singh and Mauajama Firdaus and Asif Ekbal and Pushpak Bhattacharyya EmoInt-Trans: a Multimodal Transformer for Identifying Emotions and Intents in Social Conversations . . . . . . . . . . 290--300 De De Hu and Huaiwen Zhang and Feilong Bao and Rui Wang Distributed Sampling Rate Offset Estimation Over Acoustic Sensor Networks Based on Asynchronous Network Newton Optimization . . . . . . . . . . . . . . 301--312 David Diaz-Guerra and Antonio Miguel and Jose R. Beltran Direction of Arrival Estimation of Sound Sources Using Icosahedral CNNs . . . . . 313--321 Peiming Guo and Shen Huang and Peijie Jiang and Yueheng Sun and Meishan Zhang and Min Zhang Curriculum-Style Fine-Grained Adaption for Unsupervised Cross-Lingual Dependency Transfer . . . . . . . . . . 322--332 Naveen Kumar Desiraju and Simon Doclo and Markus Buck and Tobias Wolff Joint Online Estimation of Early and Late Residual Echo PSD for Residual Echo Suppression . . . . . . . . . . . . . . 333--344 Guangzhi Sun and Chao Zhang and Philip C. Woodland Minimising Biasing Word Errors for Contextual ASR With the Tree-Constrained Pointer Generator . . . . . . . . . . . 345--354 Jonah Casebeer and Nicholas J. Bryan and Paris Smaragdis Meta-AF: Meta-Learning for Adaptive Filters . . . . . . . . . . . . . . . . 355--370 Yingwen Fu and Nankai Lin and Boyu Chen and Ziyu Yang and Shengyi Jiang Cross-Lingual Named Entity Recognition for Heterogeneous Languages . . . . . . 371--382 Jun-You Wang and Jyh-Shing Roger Jang Training a Singing Transcription Model Using Connectionist Temporal Classification Loss and Cross-Entropy Loss . . . . . . . . . . . . . . . . . . 383--396 Zhong-Qiu Wang and Gordon Wichern and Shinji Watanabe and Jonathan Le Roux STFT-Domain Neural Speech Enhancement With Very Low Algorithmic Latency . . . 397--410 Yu Li and Bojie Hu and Jian Liu and Yufeng Chen and Jinan Xu A Neighborhood Re-Ranking Model With Relation Constraint for Knowledge Graph Completion . . . . . . . . . . . . . . . 411--425 Alessio Miaschi and Dominique Brunato and Felice Dell'Orletta and Giulia Venturi On Robustness and Sensitivity of a Neural Language Model: a Case Study on Italian L1 Learner Errors . . . . . . . 426--438 Rong Xiao and Yu Wan and Baosong Yang and Haibo Zhang and Huajin Tang and Derek F. Wong and Boxing Chen Towards Energy-Preserving Natural Language Understanding With Spiking Neural Networks . . . . . . . . . . . . 439--447 Juan Zhao and Tianrui Zong and Yong Xiang and Longxiang Gao and Guang Hua and Keshav Sood and Yushu Zhang SSVS-SSVD Based Desynchronization Attacks Resilient Watermarking Method for Stereo Signals . . . . . . . . . . . 448--461 Qiquan Zhang and Xinyuan Qian and Zhaoheng Ni and Aaron Nicolson and Eliathamby Ambikairajah and Haizhou Li A Time-Frequency Attention Module for Neural Speech Enhancement . . . . . . . 462--475 Binhong Xie and Yu Li and Hongyan Zhao and Lihu Pan and Enhui Wang A Cross-Attention Fusion Based Graph Convolution Auto-Encoder for Open Relation Extraction . . . . . . . . . . 476--485 Qian-Bei Hong and Chung-Hsien Wu and Hsin-Min Wang Generalization Ability Improvement of Speaker Representation and Anti-Interference for Speaker Verification . . . . . . . . . . . . . . 486--499 Xinglin Lyu and Junhui Li and Min Zhang and Chenchen Ding and Hideki Tanaka and Masao Utiyama Refining History for Future-Aware Neural Machine Translation . . . . . . . . . . 500--512 Mou Wang and Junqi Chen and Xiao-Lei Zhang and Susanto Rahardja End-to-End Multi-Modal Speech Recognition on an Air and Bone Conducted Speech Corpus . . . . . . . . . . . . . 513--524 Asier López Zorrilla and María Inés Torres and Heriberto Cuayáhuitl Audio Embedding-Aware Dialogue Policy Learning . . . . . . . . . . . . . . . . 525--538 Xichen Shang and Chuxin Chen and Zipeng Chen and Qianli Ma Modularized Mutuality Network for Emotion-Cause Pair Extraction . . . . . 539--549 Xinyuan Qian and Zhengdong Wang and Jiadong Wang and Guohui Guan and Haizhou Li Audio-Visual Cross-Attention Network for Robotic Speaker Tracking . . . . . . . . 550--562 Kristina Tesch and Timo Gerkmann Insights Into Deep Non-Linear Filters for Improved Multi-Channel Speech Enhancement . . . . . . . . . . . . . . 563--575 Thilo von Neumann and Keisuke Kinoshita and Christoph Boeddeker and Marc Delcroix and Reinhold Haeb-Umbach Segment-Less Continuous Speech Separation of Meetings: Training and Evaluation Criteria . . . . . . . . . . 576--589 Davide Albertini and Alberto Bernardini and Federico Borra and Fabio Antonacci and Augusto Sarti Two-Stage Beamforming With Arbitrary Planar Arrays of Differential Microphone Array Units . . . . . . . . . . . . . . 590--602 Yi-Syuan Chen and Yun-Zhu Song and Hong-Han Shuai SPEC: Summary Preference Decomposition for Low-Resource Abstractive Summarization . . . . . . . . . . . . . 603--618 Yingying Xiao and Shanmou Chen and Qiangqiang Zhang and Dongyuan Lin and Minglin Shen and Junhui Qian and Shiyuan Wang Generalized Hyperbolic Tangent Based Random Fourier Conjugate Gradient Filter for Nonlinear Active Noise Control . . . 619--632 Jun Qi and Chao-Han Huck Yang and Pin-Yu Chen and Javier Tejedor Exploiting Low-Rank Tensor-Train Deep Neural Networks Based on Riemannian Gradient Descent With Illustrations of Speech Processing . . . . . . . . . . . 633--642 Bin Gu and Wu Guo and Jie Zhang Memory Storable Network Based Feature Aggregation for Speaker Representation Learning . . . . . . . . . . . . . . . . 643--655 Takumi Abe and Shoichi Koyama and Natsuki Ueno and Hiroshi Saruwatari Amplitude Matching for Multizone Sound Field Control . . . . . . . . . . . . . 656--669 Mahdi Barhoush and Ahmed Hallawa and Arne Peine and Lukas Martin and Anke Schmeink Localization-Driven Speech Enhancement in Noisy Multi-Speaker Hospital Environments Using Deep Learning and Meta Learning . . . . . . . . . . . . . 670--683 Herman Kamper Word Segmentation on Discovered Phone Units With Dynamic Programming and Self-Supervised Scoring . . . . . . . . 684--694 Changheng Li and Jorge Martinez and Richard Christian Hendriks Joint Maximum Likelihood Estimation of Microphone Array Parameters for a Reverberant Single Source Scenario . . . 695--705 Shota Horiguchi and Shinji Watanabe and Paola García and Yuki Takashima and Yohei Kawaguchi Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors . . . . . . . . . . . . 706--720 Ling He and Jia Fu and Yuanyuan Li and Xi Xiong and Jing Zhang WNSA-Net: an Axial-Attention-Based Network for Schizophrenia Detection Using Wideband and Narrowband Spectrograms . . . . . . . . . . . . . . 721--733 Anusha Prakash and Hema A. Murthy Exploring the Role of Language Families for Building Indic Speech Synthesisers 734--747 Mahdin Rohmatillah and Jen-Tzung Chien Hierarchical Reinforcement Learning With Guidance for Multi-Domain Dialogue Policy . . . . . . . . . . . . . . . . . 748--761 Shahram Ghorbani and John H. L. Hansen Domain Expansion for End-to-End Speech Recognition: Applications for Accent\slash Dialect Speech . . . . . . 762--774 Weidong Chen and Xiaofen Xing and Xiangmin Xu and Jianxin Pang and Lan Du SpeechFormer++: a Hierarchical Efficient Framework for Paralinguistic Speech Processing . . . . . . . . . . . . . . . 775--788 Nicki Holighaus and Günther Koliander and Clara Hollomey and Friedrich Pillichshammer Grid-Based Decimation for Wavelet Transforms With Stably Invertible Implementation . . . . . . . . . . . . . 789--801 Weiwei Lin and Man-Wai Mak Robust Speaker Verification Using Deep Weight Space Ensemble . . . . . . . . . 802--812 Lin Zhang and Xin Wang and Erica Cooper and Nicholas Evans and Junichi Yamagishi The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance . . . . . . . . . . . . . . 813--825 Jie Mei and Yufan Wang and Xinhui Tu and Ming Dong and Tingting He Incorporating BERT With Probability-Aware Gate for Spoken Language Understanding . . . . . . . . . 826--834 Tsubasa Ochiai and Marc Delcroix and Tomohiro Nakatani and Shoko Araki Mask-Based Neural Beamforming for Moving Speakers With Self-Attention-Based Tracking . . . . . . . . . . . . . . . . 835--848 Rongzhi Gu and Shi-Xiong Zhang and Yuexian Zou and Dong Yu Towards Unified All-Neural Beamforming for Time and Frequency Domain Speech Separation . . . . . . . . . . . . . . . 849--862 Naotake Masuda and Daisuke Saito Improving Semi-Supervised Differentiable Synthesizer Sound Matching for Practical Applications . . . . . . . . . . . . . . 863--875 Erfan Loweimi and Zhengjun Yue and Peter Bell and Steve Renals and Zoran Cvetkovic Multi-Stream Acoustic Modelling Using Raw Real and Imaginary Parts of the Fourier Transform . . . . . . . . . . . 876--890 Bengt J. Borgström A Generative Approach to Condition-Aware Score Calibration for Speaker Verification . . . . . . . . . . . . . . 891--901 Irene Martín-Morató and Annamaria Mesaros Strong Labeling of Sound Events Using Crowdsourced Weak Labels and Annotator Competence Estimation . . . . . . . . . 902--914 Wenzhao Zhu and Lei Luo and Jinwei Sun and Mads Græsbòll Christensen A New Virtual Tracking Sub-Algorithm Based Hybrid Active Control System for Narrowband Noise With Impulsive Interference . . . . . . . . . . . . . . 915--926 Thomas Deppisch and Sebasti\`a V. Amengual Garí and Paul Calamia and Jens Ahrens Direct and Residual Subspace Decomposition of Spatial Room Impulse Responses . . . . . . . . . . . . . . . 927--942 Eloi Moliner and Vesa Välimäki BEHM-GAN: Bandwidth Extension of Historical Music Using Generative Adversarial Networks . . . . . . . . . . 943--956 Martin Jälmby and Filip Elvander and Toon van Waterschoot Low-Rank Room Impulse Response Estimation . . . . . . . . . . . . . . . 957--969 Hong Liu and Yucheng Cai and Zhenru Lin and Zhijian Ou and Yi Huang and Junlan Feng Variational Latent-State GPT for Semi-Supervised Task-Oriented Dialog Systems . . . . . . . . . . . . . . . . 970--984 De Hu and Qintuya Si and Rui Liu and Feilong Bao Distributed Sensor Selection for Speech Enhancement With Acoustic Sensor Networks . . . . . . . . . . . . . . . . 985--999 Yingke Zhu and Brian Mak Bayesian Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification . . . . . . . . . . . . . . 1000--1012 Yuying Li and Yuchen Liu and Donald S. Williamson A Composite T60 Regression and Classification Approach for Speech Dereverberation . . . . . . . . . . . . 1013--1023 Hanyi Zhang and Longbiao Wang and Kong Aik Lee and Meng Liu and Jianwu Dang and Helen Meng Meta-Generalization for Domain-Invariant Speaker Verification . . . . . . . . . . 1024--1036 Shu-Tong Niu and Jun Du and Lei Sun and Yu Hu and Chin-Hui Lee QDM-SSD: Quality-Aware Dynamic Masking for Separation-Based Speaker Diarization 1037--1049 Boyang Lyu and Chunxiao Fan and Yue Ming and Panzi Zhao and Nannan Hu En-HACN: Enhancing Hybrid Architecture With Fast Attention and Capsule Network for End-to-end Speech Recognition . . . 1050--1062 Yang Liu and Haoqin Sun and Wenbo Guan and Yuqi Xia and Yongwei Li and Masashi Unoki and Zhen Zhao A Discriminative Feature Representation Method Based on Cascaded Attention Network With Adversarial Strategy for Speech Emotion Recognition . . . . . . . 1063--1074 Hao Zhang and Nianwen Si and Yaqi Chen and Wenlin Zhang and Xukui Yang and Dan Qu and Wei-Qiang Zhang Improving Speech Translation by Cross-Modal Multi-Grained Contrastive Learning . . . . . . . . . . . . . . . . 1075--1086 Wei-Cheng Lin and Carlos Busso Sequential Modeling by Leveraging Non-Uniform Distribution of Speech Emotion . . . . . . . . . . . . . . . . 1087--1099 Achyut Mani Tripathi and Om Jee Pandey Divide and Distill: New Outlooks on Knowledge Distillation for Environmental Sound Classification . . . . . . . . . . 1100--1113 Hao Zhang and Ashutosh Pandey and De Liang Wang Low-Latency Active Noise Control Using Attentive Recurrent Network . . . . . . 1114--1123 Avital Bross and Sharon Gannot Training-Based Multiple Source Tracking Using Manifold-Learning and Recursive Expectation-Maximization . . . . . . . . 1124--1140 Guimin Hu and Yi Zhao and Guangming Lu Emotion Prediction Oriented Method With Multiple Supervisions for Emotion-Cause Pair Extraction . . . . . . . . . . . . 1141--1152 Reza Mohsenipour and Daniel Massicotte and Wei-Ping Zhu PI Control of Loudspeakers Based on Linear Fractional Order Model . . . . . 1153--1162 Tim Lübeck and Johannes M. Arend and Christoph Pörschmann Spatial Upsampling of Sparse Spherical Microphone Array Signals . . . . . . . . 1163--1174 Jiajun Deng and Xurong Xie and Tianzi Wang and Mingyu Cui and Boyang Xue and Zengrui Jin and Guinan Li and Shujie Hu and Xunying Liu Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems . . . . . . . . . . 1175--1190 Hongsheng Zhang and Jizhang Gan and Ting Liu and Kui Huang and Hong Yang Coefficients-Switched Normalized Least-Mean- Squares Adaption in Echo Canceler of Sparse-Echo-Path . . . . . . 1191--1199 Eric Guizzo and Tillman Weyde and Simone Scardapane and Danilo Comminiello Learning Speech Emotion Representations in the Quaternion Domain . . . . . . . . 1200--1212 Jiaqi Bai and Ze Yang and Jian Yang and Hongcheng Guo and Zhoujun Li KINet: Incorporating Relevant Facts Into Knowledge-Grounded Dialog Generation . . 1213--1222 Haiquan Zhao and Yuan Gao and Yingying Zhu Robust Subband Adaptive Filter Algorithms-Based Mixture Correntropy and Application to Acoustic Echo Cancellation . . . . . . . . . . . . . . 1223--1233 Chen Zhang and Luis Fernando D'Haro and Qiquan Zhang and Thomas Friedrichs and Haizhou Li PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment . . . . . 1234--1250 Qing Wang and Jun Du and Hua-Xin Wu and Jia Pan and Feng Ma and Chin-Hui Lee A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection . . . . . . . . . . . . . 1251--1264 Yingwen Fu and Nankai Lin and Xiaohui Yu and Shengyi Jiang Self-Training With Double Selectors for Low-Resource Named Entity Recognition 1265--1275 Kilian Schulze-Forster and Gaël Richard and Liam Kelley and Clement S. J. Doire and Roland Badeau Unsupervised Music Source Separation Using Differentiable Parametric Source Models . . . . . . . . . . . . . . . . . 1276--1289 Yinggang Liu and Hong Fu and Ying Wei and Hanbing Zhang Sound Event Classification Based on Frequency-Energy Feature Representation and Two-Stage Data Dimension Reduction 1290--1304 Ege Erdem and Zoran Cvetkovi\'c and Hüseyin Hacìhabibo\uglu $3$D Perceptual Soundfield Reconstruction via Virtual Microphone Synthesis . . . . . . . . . . . . . . . 1305--1317 Dongyuan Shi and Woon-Seng Gan and Bhan Lam and Xiaoyi Shen A Frequency-Domain Output-Constrained Active Noise Control Algorithm Based on an Intuitive Circulant Convolutional Penalty Factor . . . . . . . . . . . . . 1318--1332 Muhammed Zahid Ozturk and Chenshu Wu and Beibei Wang and Min Wu and K. J. Ray Liu RadioSES: mmWave-Based Audioradio Speech Enhancement and Separation System . . . 1333--1347 Jianwei Zhang and Julie Liss and Suren Jayasuriya and Visar Berisha Robust Vocal Quality Feature Embeddings for Dysphonic Voice Detection . . . . . 1348--1359 Ashutosh Pandey and DeLiang Wang Attentive Training: a New Training Framework for Speech Enhancement . . . . 1360--1370 Hirofumi Inaguma and Tatsuya Kawahara Alignment Knowledge Distillation for Online Streaming Attention-Based Speech Recognition . . . . . . . . . . . . . . 1371--1385 Mittapalle Kiran Reddy and Paavo Alku Exemplar-Based Sparse Representations for Detection of Parkinson's Disease From Speech . . . . . . . . . . . . . . 1386--1396 Shunsuke Kita and Yoshinobu Kajikawa Sound Source Localization Inside a Structure Under Semi-Supervised Conditions . . . . . . . . . . . . . . . 1397--1408 Guowei Wu and Shipei Liu and Xiaoya Fan The Power of Fragmentation: a Hierarchical Transformer Model for Structural Segmentation in Symbolic Music Generation . . . . . . . . . . . . 1409--1420 Xueqin Luo and Gongping Huang and Jilu Jin and Jingdong Chen and Jacob Benesty and Wen Zhang and Mengyao Zhu and Chunjian Li Design of Maximum Directivity Beamformers With Linear Acoustic Vector Sensor Arrays . . . . . . . . . . . . . 1421--1435 Ruchao Fan and Wei Chu and Peng Chang and Abeer Alwan A CTC Alignment-Based Non-Autoregressive Transformer for End-to-End Automatic Speech Recognition . . . . . . . . . . . 1436--1448 Tianyou Li and Siyuan Lian and Sipei Zhao and Jing Lu and Ian S. Burnett Distributed Active Noise Control Based on an Augmented Diffusion FxLMS Algorithm . . . . . . . . . . . . . . . 1449--1463 Jiayuan Xie and Wenhao Fang and Qingbao Huang and Yi Cai and Tao Wang Enhancing Paraphrase Question Generation With Prior Knowledge . . . . . . . . . . 1464--1475 Chen Chen and Hansheng Hong and Jie Guo and Bin Song Inter- Intra Modal Representation Augmentation With Trimodal Collaborative Disentanglement Network for Multimodal Sentiment Analysis . . . . . . . . . . . 1476--1488 Jian Yang and Yuwei Yin and Liqun Yang and Shuming Ma and Haoyang Huang and Dongdong Zhang and Furu Wei and Zhoujun Li GTrans: Grouping and Fusing Transformer Layers for Neural Machine Translation 1489--1498 Xin Wu and Yi Cai and Zetao Lian and Ho-fung Leung and Tao Wang Generating Natural Language From Logic Expressions With Structural Representation . . . . . . . . . . . . . 1499--1510 Yi Li and Yang Sun and Wenwu Wang and Syed Mohsen Naqvi U-Shaped Transformer With Frequency-Band Aware Attention for Speech Enhancement 1511--1521 Christian Antoñanzas and Miguel Ferrer and Maria de Diego and Alberto Gonzalez Remote Microphone Technique for Active Noise Control Over Distributed Networks 1522--1535 Yi Zhu and Abhishek Tiwari and João Monteiro and Shruti Kshirsagar and Tiago Henrique Falk COVID-19 Detection via Fusion of Modulation Spectrum and Linear Prediction Speech Features . . . . . . . 1536--1549 Jijie Li and Kai Shuang and Jinyu Guo and Zengyi Shi and Hongman Wang Enhancing Semantic Relation Classification With Shortest Dependency Path Reasoning . . . . . . . . . . . . . 1550--1560 Mao-Kui He and Jun Du and Qing-Feng Liu and Chin-Hui Lee ANSD-MA-MSE: Adaptive Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding . . . . . . . . 1561--1573 Longting Xu and Jichen Yang and Chang Huai You and Xinyuan Qian and Daiyu Huang Device Features Based on Linear Transformation With Parallel Training Data for Replay Speech Detection . . . . 1574--1586 Huajian Fang and Dennis Becker and Stefan Wermter and Timo Gerkmann Integrating Uncertainty Into Neural Network-Based Speech Enhancement . . . . 1587--1600 Libo Qin and Xiao Xu and Lehan Wang and Yue Zhang and Wanxiang Che Modularized Pre-Training for End-to-End Task-Oriented Dialogue . . . . . . . . . 1601--1610 Hanlei Zhang and Hua Xu and Shaojie Zhao and Qianrui Zhou Learning Discriminative Representations and Decision Boundaries for Open Intent Detection . . . . . . . . . . . . . . . 1611--1623 Guangsheng Bao and Yue Zhang A General Contextualized Rewriting Framework for Text Summarization . . . . 1624--1635 Christoph Kirsch and Stephan D. Ewert A Universal Filter Approximation of Edge Diffraction for Geometrical Acoustics 1636--1651 Peyman Goli and Steven van de Par Deep Learning-Based Speech Specific Source Localization by Using Binaural and Monaural Microphone Arrays in Hearing Aids . . . . . . . . . . . . . . 1652--1666 Nguyen Binh Thien and Yukoh Wakabayashi and Kenta Iwai and Takanobu Nishiura Inter-Frequency Phase Difference for Phase Reconstruction Using Deep Neural Networks and Maximum Likelihood . . . . 1667--1680 Srikanth Raj Chetupalli and Emanuël A. P. Habets Speaker Counting and Separation From Single-Channel Noisy Mixtures . . . . . 1681--1692 Guangyan Zhang and Ying Qin and Wenjie Zhang and Jialun Wu and Mei Li and Yutao Gai and Feijun Jiang and Tan Lee iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis Based on Disentanglement Between Prosody and Timbre . . . . . . . 1693--1705 Ruijie Tao and Kong Aik Lee and Rohan Kumar Das and Ville Hautamäki and Haizhou Li Self-Supervised Training of Speaker Encoder With Multi-Modal Diverse Positive Pairs . . . . . . . . . . . . . 1706--1719 Dongchao Yang and Jianwei Yu and Helin Wang and Wen Wang and Chao Weng and Yuexian Zou and Dong Yu Diffsound: Discrete Diffusion Model for Text-to-Sound Generation . . . . . . . . 1720--1733 Paul Konstantin Krug and Peter Birkholz and Branislav Gerazov and Daniel Rudolph van Niekerk and Anqi Xu and Yi Xu Artificial Vocal Learning Guided by Phoneme Recognition and Visual Information . . . . . . . . . . . . . . 1734--1744 Qian-Bei Hong and Chung-Hsien Wu and Hsin-Min Wang Decomposition and Reorganization of Phonetic Information for Speaker Embedding Learning . . . . . . . . . . . 1745--1757 Wenbin Jiang and Kai Yu Speech Enhancement With Integration of Neural Homomorphic Synthesis and Spectral Masking . . . . . . . . . . . . 1758--1770 Shu'ang Li and Xuming Hu and Li Lin and Aiwei Liu and Lijie Wen and Philip S. Yu A Multi-Level Supervised Contrastive Learning Framework for Low-Resource Natural Language Inference . . . . . . . 1771--1783 Xiaoqing Zheng Building Conventional ``Experts'' With a Dialogue Logic Programming Language . . 1784--1796 Haitao Lin and Junnan Zhu and Lu Xiang and Feifei Zhai and Yu Zhou and Jiajun Zhang and Chengqing Zong Topic-Oriented Dialogue Summarization 1797--1810 Haohan Guo and Fenglong Xie and Xixin Wu and Frank K. Soong and Helen Meng MSMC-TTS: Multi-Stage Multi-Codebook VQ-VAE Based Neural TTS . . . . . . . . 1811--1824 Bei Liu and Zhengyang Chen and Yanmin Qian Depth-First Neural Architecture With Attentive Feature Fusion for Efficient Speaker Verification . . . . . . . . . . 1825--1838 Ria Ghosh and John H. L. Hansen Bilateral Cochlear Implant Processing of Coding Strategies With CCi-MOBILE, an Open-Source Research Platform . . . . . 1839--1850 Aolong Zhou and Wen Zhang and Guojun Xu and Xiaoyong Li and Kefeng Deng and Junqiang Song DBSA-Net: Dual Branch Self-Attention Network for Underwater Acoustic Signal Denoising . . . . . . . . . . . . . . . 1851--1865 Weiwei Lin and Man-Wai Mak Model-Agnostic Meta-Learning for Fast Text-Dependent Speaker Embedding Adaptation . . . . . . . . . . . . . . . 1866--1876 Andrea Galassi and Marco Lippi and Paolo Torroni Multi-Task Attentive Residual Networks for Argument Mining . . . . . . . . . . 1877--1892 Yi Luo and Jianwei Yu Music Source Separation With Band-Split RNN . . . . . . . . . . . . . . . . . . 1893--1901 Keisuke Matsubara and Takuma Okamoto and Ryoichi Takashima and Tetsuya Takiguchi and Tomoki Toda and Hisashi Kawai Harmonic-Net: Fundamental Frequency and Speech Rate Controllable Fast Neural Vocoder . . . . . . . . . . . . . . . . 1902--1915 Yi Zhou and Zhizheng Wu and Xiaohai Tian and Haizhou Li Optimization of Cross-Lingual Voice Conversion With Linguistics Losses to Reduce Foreign Accents . . . . . . . . . 1916--1926 Qiu-Shi Zhu and Jie Zhang and Zi-Qiang Zhang and Li-Rong Dai A Joint Speech Enhancement and Self-Supervised Representation Learning Framework for Noise-Robust Speech Recognition . . . . . . . . . . . . . . 1927--1939 Siqi Sun and Korin Richmond and Hao Tang Improving Seq2Seq TTS Frontends With Transcribed Speech Audio . . . . . . . . 1940--1952 Shih-Lun Wu and Yi-Hsuan Yang MuseMorphose: Full-Song and Fine-Grained Piano Music Style Transfer With One Transformer VAE . . . . . . . . . . . . 1953--1967 Xiaoxue Gao and Chitralekha Gupta and Haizhou Li PoLyScriber: Integrated Fine-Tuning of Extractor and Lyrics Transcriber for Polyphonic Music . . . . . . . . . . . . 1968--1981 Zhicheng Lian and Haonan Cheng and Jiawan Zhang PQG-A2SA: Performance Quantification Guided Audio-to-Score Alignment for Orchestral Music . . . . . . . . . . . . 1982--1992 Jingen Ni and Ningning Zhang and Haofen Li Sparsity-Promoting Affine Projection Algorithm With Periodically-Updated Gain Matrix and Its Performance Analysis . . 1993--2003 Orchisama Das and Sebastian J. Schlecht and Enzo De Sena Grouped Feedback Delay Networks With Frequency-Dependent Coupling . . . . . . 2004--2015 Xudong Zhao and Gongping Huang and Jingdong Chen and Jacob Benesty Design of $2$D and $3$D Differential Microphone Arrays With a Multistage Framework . . . . . . . . . . . . . . . 2016--2031 Jia-Hao Hsu and Jeremy Chang and Min-Hsueh Kuo and Chung-Hsien Wu Empathetic Response Generation Based on Plug-and-Play Mechanism With Empathy Perturbation . . . . . . . . . . . . . . 2032--2042 Aditya Dutt and Paul Gader Wavelet Multiresolution Analysis Based Speech Emotion Recognition System Using $1$D CNN LSTM Networks . . . . . . . . . 2043--2054 Arturo Morales and Juan I. Yuz and Juan P. Cortés and Javier G. Fontanet and Matías Zañartu Glottal Airflow Estimation Using Neck Surface Acceleration and Low-Order Kalman Smoothing . . . . . . . . . . . . 2055--2066 Yuya Hosoda and Arata Kawamura and Youji Iiguni Complex-Domain Pitch Estimation Algorithm for Narrowband Speech Signals 2067--2078 Zhidong Liu and Junhui Li and Muhua Zhu Alleviating Exposure Bias for Neural Machine Translation via Contextual Augmentation and Self Distillation . . . 2079--2089 Hanan Beit-On and Tom Shlomo and Boaz Rafaely Weighted Frequency Smoothing for Enhanced Speaker Localization . . . . . 2090--2099 Shan Gao and Xihong Wu and Tianshu Qu A Physical Model-Based Self-Supervised Learning Method for Signal Enhancement Under Reverberant Environment . . . . . 2100--2110 Xue Jiang and Xiulian Peng and Huaying Xue and Yuan Zhang and Yan Lu Latent-Domain Predictive Neural Speech Coding . . . . . . . . . . . . . . . . . 2111--2123 Shumin Deng and Jiacheng Yang and Hongbin Ye and Chuanqi Tan and Mosha Chen and Songfang Huang and Fei Huang and Huajun Chen and Ningyu Zhang LOGEN: Few-Shot Logical Knowledge-Conditioned Text Generation With Self-Training . . . . . . . . . . . 2124--2133 Yuanzhi Liu and Min He and Qingqing Yang and Gwanggil Jeon An Unsupervised Framework With Attention Mechanism and Embedding Perturbed Encoder for Non-Parallel Text Sentiment Style Transfer . . . . . . . . . . . . . 2134--2144 Yang Ai and Zhen-Hua Ling APNet: an All-Frame-Level Neural Vocoder Incorporating Direct Prediction of Amplitude and Phase Spectra . . . . . . 2145--2157 Fei Zhao and Zhen Wu and Liang He and Xin-Yu Dai Label-Correction Capsule Network for Hierarchical Text Classification . . . . 2158--2168 Cem Subakan and Mirco Ravanelli and Samuele Cornell and François Grondin and Mirko Bronzi Exploring Self-Attention Mechanisms for Speech Separation . . . . . . . . . . . 2169--2180 Chenggang Zhang and Jinjiang Liu and Hao Li and Xueliang Zhang Neural Multi-Channel and Multi-Microphone Acoustic Echo Cancellation . . . . . . . . . . . . . . 2181--2192 Zheng Liu and Xin Kang and Fuji Ren Dual-TBNet: Improving the Robustness of Speech Features via Dual-Transformer-BiLSTM for Speech Emotion Recognition . . . . . . . . . . 2193--2203 Sandro Cumani and Salvatore Sarni The Distributions of Uncalibrated Speaker Verification Scores: a Generative Model for Domain Mismatch and Trial-Dependent Calibration . . . . . . 2204--2219 Xi Ai and Bin Fang Cross-Modal Language Modeling in Multi-Motion-Informed Context for Lip Reading . . . . . . . . . . . . . . . . 2220--2232 Andreas Jonas Fuglsig and Jesper Jensen and Zheng-Hua Tan and Lars Sòndergaard Bertelsen and Jens Christian Lindof and Jan Òstergaard Minimum Processing Near-End Listening Enhancement . . . . . . . . . . . . . . 2233--2245 Zhiwen Xie and Runjie Zhu and Jin Liu and Guangyou Zhou and Jimmy Xiangji Huang TARGAT: a Time-Aware Relational Graph Attention Model for Temporal Knowledge Graph Embedding . . . . . . . . . . . . 2246--2258 Cuilian Zhang and Derek F. Wong and Eddy S. K. Lei and Runzhe Zhan and Lidia S. Chao Obscurity-Quantified Curriculum Learning for Machine Translation Evaluation . . . 2259--2271 Yaxin Liu and Yan Zhou and Ziming Li and Junlin Wang and Wei Zhou and Songlin Hu HIM: an End-to-End Hierarchical Interaction Model for Aspect Sentiment Triplet Extraction . . . . . . . . . . . 2272--2285 Yukoh Wakabayashi and Kouei Yamaoka and Nobutaka Ono Sound Field Interpolation for Rotation-Invariant Multichannel Array Signal Processing . . . . . . . . . . . 2286--2298 Jesper Kjær Nielsen and Mads Græsbòll Christensen and Jesper Bünsow Boldt An Analysis of Traditional Noise Power Spectral Density Estimators Based on the Gaussian Stochastic Volatility Model . . 2299--2313 Karen Gissell Rosero Jacome and Felipe Leonel Grijalva and Bruno Sanches Masiero Sound Events Localization and Detection Using Bio-Inspired Gammatone Filters and Temporal Convolutional Neural Networks 2314--2324 Lin Yuan and Guoheng Huang and Fenghuan Li and Xiaochen Yuan and Chi-Man Pun and Guo Zhong RBA-GCN: Relational Bilevel Aggregation Graph Convolutional Network for Emotion Recognition . . . . . . . . . . . . . . 2325--2337 Samuel Poirot and Stefan Bilbao and Mitsuko Aramaki and Sòlvi Ystad and Richard Kronland-Martinet A Perceptually Evaluated Signal Model: Collisions Between a Vibrating Object and an Obstacle . . . . . . . . . . . . 2338--2350 Julius Richter and Simon Welker and Jean-Marie Lemercier and Bunlong Lay and Timo Gerkmann Speech Enhancement and Dereverberation With Diffusion-Based Generative Models 2351--2364 Siarhei Y. Barysenka and Vasili I. Vorobiov SNR-Based Inter-Component Phase Estimation Using Bi-Phase Prior Statistics for Single-Channel Speech Enhancement . . . . . . . . . . . . . . 2365--2381 Jiandian Zeng and Jiantao Zhou and Caishi Huang Exploring Semantic Relations for Social Media Sentiment Analysis . . . . . . . . 2382--2394 Fotios Drakopoulos and Sarah Verhulst A Neural-Network Framework for the Design of Individualised Hearing-Loss Compensation . . . . . . . . . . . . . . 2395--2409 Xinbei Ma and Zhuosheng Zhang and Hai Zhao Enhanced Speaker-Aware Multi-Party Multi-Turn Dialogue Comprehension . . . 2410--2423 Tianrui Wang and Weibin Zhu and Yingying Gao and Shilei Zhang and Junlan Feng Harmonic Attention for Monaural Speech Enhancement . . . . . . . . . . . . . . 2424--2436 Lei Lei and Guoshun Yuan and Hongjiang Yu and Dewei Kong and Yuefeng He Multilingual Customized Keyword Spotting Using Similar-Pair Contrastive Learning 2437--2447 Shaokai Li and Peng Song and Wenming Zheng Multi-Source Discriminant Subspace Alignment for Cross-Domain Speech Emotion Recognition . . . . . . . . . . 2448--2460 Yeqing Ren and Haipeng Peng and Lixiang Li and Xiaopeng Xue and Yang Lan and Yixian Yang Generalized Voice Spoofing Detection via Integral Knowledge Amalgamation . . . . 2461--2475 Xing Chen and Jie Wang and Xiao-Lei Zhang and Wei-Qiang Zhang and Kunde Yang LMD: a Learnable Mask Network to Detect Adversarial Examples for Speaker Verification . . . . . . . . . . . . . . 2476--2490 Benjamin Yen and Yameizhen Li and Yusuke Hioka Rotor Noise-Aware Noise Covariance Matrix Estimation for Unmanned Aerial Vehicle Audition . . . . . . . . . . . . 2491--2506 Xuechen Liu and Xin Wang and Md Sahidullah and Jose Patino and Héctor Delgado and Tomi Kinnunen and Massimiliano Todisco and Junichi Yamagishi and Nicholas Evans and Andreas Nautsch and Kong Aik Lee ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild 2507--2522 Zalán Borsos and Raphaël Marinier and Damien Vincent and Eugene Kharitonov and Olivier Pietquin and Matt Sharifi and Dominik Roblek and Olivier Teboul and David Grangier and Marco Tagliasacchi and Neil Zeghidour AudioLM: a Language Modeling Approach to Audio Generation . . . . . . . . . . . . 2523--2533 Xingfeng Li and Xiaohan Shi and Desheng Hu and Yongwei Li and Qingchen Zhang and Zhengxia Wang and Masashi Unoki and Masato Akagi Music Theory-Inspired Acoustic Representation for Speech Emotion Recognition . . . . . . . . . . . . . . 2534--2547 Jiachen Lian and Chunlei Zhang and Gopala K. Anumanchipalli and Dong Yu Unsupervised TTS Acoustic Modeling for TTS With Conditional Disentangled Sequential VAE . . . . . . . . . . . . . 2548--2557 Arsalan Malik and Nipun Agarwal and Harshavardhan Settibhaktini and Ananthakrishna Chintanpalli Predicting Level-Dependent Changes in Concurrent Vowel Scores Using the $2$D-CNN Models . . . . . . . . . . . . 2558--2566 Michael Krause and Meinard Müller Hierarchical Classification for Instrument Activity Detection in Orchestral Music Recordings . . . . . . 2567--2578 Julie Meyer and Sebastian Prepeli\ct\ua and Ali Khajeh-Saeed and Michael Smirnov and Pablo Hoffmann Verification on Head-Related Transfer Functions of a Snowman Model Simulated Using the Finite-Difference Time-Domain Method . . . . . . . . . . . . . . . . . 2579--2591 Darius Petermann and Gordon Wichern and Aswin Shanmugam Subramanian and Zhong-Qiu Wang and Jonathan Le Roux Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks . . . . . . . . . 2592--2605 Hailong Cao and Liguo Li and Conghui Zhu and Muyun Yang and Tiejun Zhao Dual Word Embedding for Robust Unsupervised Bilingual Lexicon Induction 2606--2615 Lin Xiao and Pengyu Xu and Mingyang Song and Huafeng Liu and Liping Jing and Xiangliang Zhang Triple Alliance Prototype Orthotist Network for Long-Tailed Multi-Label Text Classification . . . . . . . . . . . . . 2616--2628 Juhua Liu and Qihuang Zhong and Liang Ding and Hua Jin and Bo Du and Dacheng Tao Unified Instance and Knowledge Alignment Pretraining for Aspect-Based Sentiment Analysis . . . . . . . . . . . . . . . . 2629--2642 Yiming Zhang and Hong Yu and Ruoyi Du and Zheng-Hua Tan and Wenwu Wang and Zhanyu Ma and Yuan Dong ACTUAL: Audio Captioning With Caption Feature Space Regularization . . . . . . 2643--2657 Jakob Abeßer and Sascha Grollmisch and Meinard Müller How Robust are Audio Embeddings for Polyphonic Sound Event Tagging? . . . . 2658--2667 Wei Xia and John H. L. Hansen Attention and DCT Based Global Context Modeling for Text-Independent Speaker Recognition . . . . . . . . . . . . . . 2668--2679 Takuya Hasumi and Tomohiko Nakamura and Norihiro Takamune and Hiroshi Saruwatari and Daichi Kitamura and Yu Takahashi and Kazunobu Kondo PoP-IDLMA: Product-of-Prior Independent Deeply Learned Matrix Analysis for Multichannel Music Source Separation . . 2680--2694 Ben Liu and Jun Wang and Guanyuan Yu and Shaolei Chen CUPVC: a Constraint-Based Unsupervised Prosody Transfer for Improving Telephone Banking Services . . . . . . . . . . . . 2695--2706 Guinan Li and Jiajun Deng and Mengzhe Geng and Zengrui Jin and Tianzi Wang and Shujie Hu and Mingyu Cui and Helen Meng and Xunying Liu Audio-Visual End-to-End Multi-Channel Speech Separation, Dereverberation and Recognition . . . . . . . . . . . . . . 2707--2723 Jean-Marie Lemercier and Julius Richter and Simon Welker and Timo Gerkmann StoRM: a Diffusion-Based Stochastic Regeneration Model for Speech Enhancement and Dereverberation . . . . 2724--2737 Yen-Ju Lu and Chia-Yu Chang and Cheng Yu and Ching-Feng Liu and Jeih-weih Hung and Shinji Watanabe and Yu Tsao Improving Speech Enhancement Performance by Leveraging Contextual Broad Phonetic Class Information . . . . . . . . . . . 2738--2750 Sungjae Kim and Yewon Kim and Jewoo Jun and Injung Kim MuSE-SVS: Multi-Singer Emotional Singing Voice Synthesizer That Controls Emotional Intensity . . . . . . . . . . 2751--2764 Xinxin Su and Zhen Huang and Yunxiang Zhao and Yifan Chen and Yong Dou and Hengyue Pan Recent Trends in Deep Learning Based Textual Emotion Cause Extraction . . . . 2765--2786 Junyu Lu and Hongfei Lin and Xiaokun Zhang and Zhaoqing Li and Tongyue Zhang and Linlin Zong and Fenglong Ma and Bo Xu Hate Speech Detection via Dual Contrastive Learning . . . . . . . . . . 2787--2795 Diego Marques do Carmo and Ricardo A. Borsoi and Márcio Holsbach Costa Closed-Form Solution to the Multichannel Wiener Filter With Interaural Level Difference Preservation . . . . . . . . 2796--2811 Ya-Jie Zhang and Chao Zhang and Wei Song and Zhengchen Zhang and Youzheng Wu and Xiaodong He Prosody Modelling With Pre-Trained Cross-Utterance Representations for Improved Speech Synthesis . . . . . . . 2812--2823 Ching-Yu Chiu and Meinard Müller and Matthew E. P. Davies and Alvin Wen-Yu Su and Yi-Hsuan Yang Local Periodicity-Based Beat Tracking for Expressive Classical Piano Music . . 2824--2835 Feng Chen and Ke Ma and Yapeng Mao and Desen Yang and Yi Zhang and Jie Shi and Shiqi Mo and Gui Chenyang and Song Li A Novel Method to Design Steerable Differential Beamformer Using Linear Acoustics Vector Sensor Array . . . . . 2836--2849 Tianyu Huang and Weisheng Dong and Fangfang Wu and Xin Li and Guangming Shi Uncertainty-Driven Knowledge Distillation for Language Model Compression . . . . . . . . . . . . . . 2850--2858 Andrés Carofilis and Enrique Alegre and Eduardo Fidalgo and Laura Fernández-Robles Improvement of Accent Classification Models Through Grad-Transfer From Spectrograms and Gradient-Weighted Class Activation Mapping . . . . . . . . . . . 2859--2871 Jacob Hollebon and Filippo Maria Fazi Higher-Order Stereophony . . . . . . . . 2872--2885 Jeremy H. M. Wong and Huayun Zhang and Nancy F. Chen Modelling Inter-Rater Uncertainty in Spoken Language Assessment . . . . . . . 2886--2898 Qinghua Zheng and Yuefei Wu and Guangtao Wang and Yanping Chen and Wei Wu and Zai Zhang and Bin Shi and Bo Dong Exploring Interactive and Contrastive Relations for Nested Named Entity Recognition . . . . . . . . . . . . . . 2899--2909 Dongyuan Shi and Woon-Seng Gan and Bhan Lam and Zhengding Luo and Xiaoyi Shen Transferable Latent of CNN-Based Selective Fixed-Filter Active Noise Control . . . . . . . . . . . . . . . . 2910--2921 Dorian Desblancs and Vincent Lostanlen and Romain Hennequin Zero-Note Samba: Self-Supervised Beat Tracking . . . . . . . . . . . . . . . . 2922--2934 Nankai Lin and Yingwen Fu and Xiaotian Lin and Dong Zhou and Aimin Yang and Shengyi Jiang CL-XABSA: Contrastive Learning for Cross-Lingual Aspect-Based Sentiment Analysis . . . . . . . . . . . . . . . . 2935--2946 Hanmeng Liu and Jian Liu and Leyang Cui and Zhiyang Teng and Nan Duan and Ming Zhou and Yue Zhang LogiQA 2.0 --- an Improved Dataset for Logical Reasoning in Natural Language Understanding . . . . . . . . . . . . . 2947--2962 Jiangyan Yi and Jianhua Tao and Ruibo Fu and Tao Wang and Chu Yuan Zhang and Chenglong Wang Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings . . . . . . 2963--2973 Ji Won Yoon and Hyung Yong Kim and Hyeonseung Lee and Sunghwan Ahn and Nam Soo Kim Oracle Teacher: Leveraging Target Information for Better Knowledge Distillation of CTC Models . . . . . . . 2974--2987 Sufeng Duan and Hai Zhao and Dongdong Zhang Syntax-Aware Data Augmentation for Neural Machine Translation . . . . . . . 2988--2999 Tongzheng Liu and Zhihua Lu and João Paulo J. da Costa and Tai Fei A Hybrid Reverberation Model and Its Application to Joint Speech Dereverberation and Separation . . . . . 3000--3014 Junjun Guo and Junjie Ye and Yan Xiang and Zhengtao Yu Layer-Level Progressive Transformer With Modality Difference Awareness for Multi-Modal Neural Machine Translation 3015--3026 Qian Tao and Zhihao Xiong and Bocheng Han and Xiaoyang Fan and Lusi Li A Novel Unsupervised Approach for Cross-Lingual Word Alignment in Low Isomorphic Embedding Spaces . . . . . . 3027--3041 Jilu Jin and Jacob Benesty and Jingdong Chen and Gongping Huang Differential Beamforming From a Geometric Perspective . . . . . . . . . 3042--3054 Alberto Palomo-Alonso and David Casillas-Pérez and Silvia Jiménez-Fernández and Jose A. Portilla-Figueras and Sancho Salcedo-Sanz A Flexible Architecture Using Temporal, Spatial and Semantic Correlation-Based Algorithms for Story Segmentation of Broadcast News . . . . . . . . . . . . . 3055--3069 Bolaji Yusuf and Jan \vCernocký and Murat Saraçlar End-to-End Open Vocabulary Keyword Search With Multilingual Neural Representations . . . . . . . . . . . . 3070--3080 Adrian Herzog and Srikanth Raj Chetupalli and Emanuël A. P. Habets AmbiSep: Joint Ambisonic-to-Ambisonic Speech Separation and Noise Reduction 3081--3094 Po-chun Hsu and Da-rong Liu and Andy T. Liu and Hung-yi Lee Parallel Synthesis for Autoregressive Speech Generation . . . . . . . . . . . 3095--3111 Siddharth Dalmia and Dmytro Okhonko and Mike Lewis and Sergey Edunov and Shinji Watanabe and Florian Metze and Luke Zettlemoyer and Abdelrahman Mohamed LegoNN: Building Modular Encoder-Decoder Models . . . . . . . . . . . . . . . . . 3112--3126 Tom Gajecki and Waldo Nogueira Deep Latent Fusion Layers for Binaural Speech Enhancement . . . . . . . . . . . 3127--3138 Huawen Feng and Zhenxi Lin and Qianli Ma Perturbation-Based Self-Supervised Attention for Attention Bias in Text Classification . . . . . . . . . . . . . 3139--3151 Jiaxin Zhong and Tao Zhuang and Mengtong Li and Ray Kirby and Mahmoud Karimi and Jing Lu and Dong Zhang Sidelobe Suppression for a Steerable Parametric Source Using the Sparse Random Array Technique . . . . . . . . . 3152--3161 Yan Fang and Wei Lu and Xiaodong Liu and Witold Pedrycz and Qi Lang and Jianhua Yang CircularE: a Complex Space Circular Correlation Relational Model for Link Prediction in Knowledge Graph Embedding 3162--3175 Jie Zhang and Rui Tao and Jun Du and Li-Rong Dai SDW-SWF: Speech Distortion Weighted Single-Channel Wiener Filter for Noise Reduction . . . . . . . . . . . . . . . 3176--3189 Haozhou Li and Qinke Peng and Xu Mou and Ying Wang and Zeyuan Zeng and Muhammad Fiaz Bashir Abstractive Financial News Summarization via Transformer-BiLSTM Encoder and Graph Attention-Based Decoder . . . . . . . . 3190--3205 Weitao Yuan and Shengbei Wang and Jianming Wang and Masashi Unoki and Wenwu Wang Unsupervised Deep Unfolded Representation Learning for Singing Voice Separation . . . . . . . . . . . . 3206--3220 Zhong-Qiu Wang and Samuele Cornell and Shukjae Choi and Younglo Lee and Byeong-Yeol Kim and Shinji Watanabe TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation 3221--3236 Marvin Tammen and Simon Doclo Parameter Estimation Procedures for Deep Multi-Frame MVDR Filtering for Single-Microphone Speech Enhancement . . 3237--3248 Yi Lin and Qingyang Wang and Xincheng Yu and Zichen Zhang and Dongyue Guo and Jizhe Zhou Towards Recognition for Radio-Echo Speech in Air Traffic Control: Dataset and a Contrastive Learning Approach . . 3249--3262 Diego Caviedes-Nozal and Efren Fernandez-Grande Spatio-Temporal Bayesian Regression for Room Impulse Response Reconstruction With Spherical Waves . . . . . . . . . . 3263--3277 Xinyu Hu and Xiaojun Wan RST Discourse Parsing as Text-to-Text Generation . . . . . . . . . . . . . . . 3278--3289 Shun Lei and Yixuan Zhou and Liyang Chen and Zhiyong Wu and Xixin Wu and Shiyin Kang and Helen Meng MSStyleTTS: Multi-Scale Style Modeling With Hierarchical Context Information for Expressive Speech Synthesis . . . . 3290--3303 Pedro Izquierdo Lehmann and Rodrigo F. Cádiz and Carlos A. Sing Long Towards Maximizing a Perceptual \em Sweet Spot for Spatial Sound With Loudspeakers . . . . . . . . . . . . . . 3304--3319 Han Zhu and Dongji Gao and Gaofeng Cheng and Daniel Povey and Pengyuan Zhang and Yonghong Yan Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition . . . . . . . . . . . . . . 3320--3330 Junqing Zhang and Liming Shi and Mads Græsbòll Christensen and Wen Zhang and Lijun Zhang and Jingdong Chen CGMM-Based Sound Zone Generation Using Robust Pressure Matching With ATF Perturbation Constraints . . . . . . . . 3331--3345 Erfan Loweimi and Andrea Carmantini and Peter Bell and Steve Renals and Zoran Cvetkovic Phonetic Error Analysis Beyond Phone Error Rate . . . . . . . . . . . . . . . 3346--3361 Runxuan Yang and Yuyang Peng and Xiaolin Hu A Fast High-Fidelity Source-Filter Vocoder With Lightweight Neural Modules 3362--3373 Yuxiang Zhang and Zhuo Li and Jingze Lu and Hua Hua and Wenchao Wang and Pengyuan Zhang The Impact of Silence on Speech Anti-Spoofing . . . . . . . . . . . . . 3374--3389 Philippe Gonzalez and Tommy Sonne Alstròm and Tobias May Assessing the Generalization Gap of Learning-Based Speech Enhancement Systems in Noisy and Reverberant Environments . . . . . . . . . . . . . . 3390--3403 Ziyi Xu and Ziyue Zhao and Tim Fingscheidt Coded Speech Quality Measurement by a Non-Intrusive PESQ-DNN . . . . . . . . . 3404--3417 Tao Li and Chenxu Hu and Jian Cong and Xinfa Zhu and Jingbei Li and Qiao Tian and Yuping Wang and Lei Xie DiCLET-TTS: Diffusion Model Based Cross-Lingual Emotion Transfer for Text-to-Speech --- a Study Between English and Mandarin . . . . . . . . . . 3418--3430 Xuexin Xu and Liang Shi and Xunquan Chen and Pingyuan Lin and Jie Lian and Jinhui Chen and Zhihong Zhang and Edwin R. Hancock Any-to-Any Voice Conversion With Multi-Layer Speaker Adaptation and Content Supervision . . . . . . . . . . 3431--3445 Chenpeng Du and Yiwei Guo and Xie Chen and Kai Yu Speaker Adaptive Text-to-Speech With Timbre-Normalized Vector-Quantized Feature . . . . . . . . . . . . . . . . 3446--3456 Yash Kumar Atri and Vikram Goyal and Tanmoy Chakraborty Multi-Document Summarization Using Selective Attention Span and Reinforcement Learning . . . . . . . . . 3457--3467 Maochun Huang and Chunmei Qing and Junpeng Tan and Xiangmin Xu Context-Based Adaptive Multimodal Fusion Network for Continuous Frame-Level Sentiment Prediction . . . . . . . . . . 3468--3477 Sebastian J. Schlecht and Jon Fagerström and Vesa Välimäki Decorrelation in Feedback Delay Networks 3478--3487 Jinliang Lu and Jiajun Zhang Towards Unified Multi-Domain Machine Translation With Mixture of Domain Experts . . . . . . . . . . . . . . . . 3488--3498 Julien Hauret and Thomas Joubaud and Véronique Zimpfer and Éric Bavu Configurable EBEN: Extreme Bandwidth Extension Network to Enhance Body-Conducted Speech Capture . . . . . 3499--3512 Wanli Peng and Sheng Li and Zhenxing Qian and Xinpeng Zhang Text Steganalysis Based on Hierarchical Supervised Learning and Dual Attention Mechanism . . . . . . . . . . . . . . . 3513--3526 Lin Xu and Qixian Zhou and Jinlan Fu and See-Kiong Ng CET2: Modelling Topic Transitions for Coherent and Engaging Knowledge-Grounded Conversations . . . . . . . . . . . . . 3527--3536 Vincent W. Neo and Christine Evers and Stephan Weiss and Patrick A. Naylor Signal Compaction Using Polynomial EVD for Spherical Array Processing With Applications . . . . . . . . . . . . . . 3537--3549 Gerald Enzner and Svantje Voit Hybrid- Frequency-Resolution Adaptive Kalman Filter for Online Identification of Long Acoustic Responses With Low Input-Output Latency . . . . . . . . . . 3550--3563 Shang Gao and Maoshen Jia and Dingding Yao and Jing Wang Multi-Source Localization Using Optimized Time-Frequency Representation and Sparsity Component Analysis . . . . 3564--3578 Qi He and Mingjie Gao and Ka Fai Cedric Yiu and Sven Nordholm Distributed Microphone Array Localization Problem via SDP-SOCP Method 3579--3588 Hiroshi Sawada and Rintaro Ikeshita and Keisuke Kinoshita and Tomohiro Nakatani Multi-Frame Full-Rank Spatial Covariance Analysis for Underdetermined Blind Source Separation and Dereverberation 3589--3602 Hongyang Chang and Hongfei Xu and Josef van Genabith and Deyi Xiong and Hongying Zan JoinER-BART: Joint Entity and Relation Extraction With Constrained Decoding, Representation Reuse and Fusion . . . . 3603--3616 Xinqi Huang and Yingsong Li and Yuriy Zakharov and Yongchun Miao and Zhixiang Huang Squared Sine Adaptive Algorithm and Its Performance Analysis . . . . . . . . . . 3617--3628 Andong Li and Guochen Yu and Chengshi Zheng and Wenzhe Liu and Xiaodong Li A General Unfolding Speech Enhancement Method Motivated by Taylor's Theorem . . 3629--3646 Bin Gu and Jie Zhang and Wu Guo A Dynamic Convolution Framework for Session-Independent Speaker Embedding Learning . . . . . . . . . . . . . . . . 3647--3658 Daojian Zeng and Chao Zhao and Chao Jiang and Jianling Zhu and Jianhua Dai Document-Level Relation Extraction With Context Guided Mention Integration and Inter-Pair Reasoning . . . . . . . . . . 3659--3666 Lu Li and Maoshen Jia and Jing Wang and Ruiyuan Cao Multiple-Speech-Source DOA Estimation Based on Single-Source Cluster Detection 3667--3680 Xiaoxiao Miao and Xin Wang and Erica Cooper and Junichi Yamagishi and Natalia Tomashenko Speaker Anonymization Using Orthogonal Householder Neural Network . . . . . . . 3681--3695 Zhengshan Xue and Xiaolei Zhang and Tingxun Shi and Deyi Xiong DetTrans: a Lightweight Framework to Detect and Translate Noisy Inputs Simultaneously . . . . . . . . . . . . . 3696--3705 Chang Liu and Zhen-Hua Ling and Ling-Hui Chen Pronunciation Dictionary-Free Multilingual Speech Synthesis Using Learned Phonetic Representations . . . . 3706--3716 Reo Yoneyama and Yi-Chiao Wu and Tomoki Toda High-Fidelity and Pitch-Controllable Neural Vocoder Based on Unified Source-Filter Networks . . . . . . . . . 3717--3729 Stefan Thaleiser and Gerald Enzner Binaural-Projection Multichannel Wiener Filter for Cue-Preserving Binaural Speech Enhancement . . . . . . . . . . . 3730--3745 Yixin Wang and Wei Wei and Xiangming Gu and Xiaohong Guan and Ye Wang Disentangled Adversarial Domain Adaptation for Phonation Mode Detection in Singing and Speech . . . . . . . . . 3746--3759 Yixuan Zhang and Heming Wang and DeLiang Wang $ F0 $ Estimation and Voicing Detection With Cascade Architecture in Noisy Speech . . . . . . . . . . . . . . . . . 3760--3770 Zhengdao Zhao and Yuhua Wang and Guang Shen and Yuezhu Xu and Jiayuan Zhang TDFNet: Transformer-Based Deep-Scale Fusion Network for Multimodal Emotion Recognition . . . . . . . . . . . . . . 3771--3782 Johannes M. Arend and Christoph Pörschmann and Stefan Weinzierl and Fabian Brinkmann Magnitude-Corrected and Time-Aligned Interpolation of Head-Related Transfer Functions . . . . . . . . . . . . . . . 3783--3799 Desh Raj and Daniel Povey and Sanjeev Khudanpur SURT 2.0: Advances in Transducer-Based Multi-Talker Speech Recognition . . . . 3800--3813 Jiaming An and Zixiang Ding and Ke Li and Rui Xia Global-View and Speaker-Aware Emotion Cause Extraction in Conversations . . . 3814--3823 Yuqin Lin and Longbiao Wang and Yanbing Yang and Jianwu Dang CFDRN: a Cognition-Inspired Feature Decomposition and Recombination Network for Dysarthric Speech Recognition . . . 3824--3836 Rémi Blandin and Simon Stone and Angélique Remacle and Vincent Didone and Peter Birkholz A Comparative Study of $3$D and $1$D Acoustic Simulations of the Higher Frequencies of Speech . . . . . . . . . 3837--3847 Qing Wang and Jixun Yao and Li Zhang and Pengcheng Guo and Lei Xie Timbre-Reserved Adversarial Attack in Speaker Identification . . . . . . . . . 3848--3858 Yachao Li and Junhui Li and Jing Jiang and Shimin Tao and Hao Yang and Min Zhang P-Transformer: Towards Better Document-to-Document Neural Machine Translation . . . . . . . . . . . . . . 3859--3870 Chao Xie and Tomoki Toda Noisy-to-Noisy Voice Conversion Under Variations of Noisy Condition . . . . . 3871--3882 Zhichao Wang and Xinsheng Wang and Qicong Xie and Tao Li and Lei Xie and Qiao Tian and Yuping Wang MSM-VC: High-Fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-Scale Style Modeling 3883--3895 Yilin Zhao and Hai Zhao and Sufeng Duan Multi-Grained Evidence Inference for Multi-Choice Reading Comprehension . . . 3896--3907 Ye-Qian Du and Jie Zhang and Xin Fang and Ming-Hui Wu and Zhou-Wang Yang A Semi-Supervised Complementary Joint Training Approach for Low-Resource Speech Recognition . . . . . . . . . . . 3908--3921 Changheng Li and Richard C. Hendriks Alternating Least-Squares-Based Microphone Array Parameter Estimation for a Single-Source Reverberant and Noisy Acoustic Scenario . . . . . . . . 3922--3934 Kun Zhou and Yuanhang Zhou and Wayne Xin Zhao and Ji-Rong Wen Learning to Perturb for Contrastive Learning of Unsupervised Sentence Representations . . . . . . . . . . . . 3935--3944 Georg Götz and Sebastian J. Schlecht and Ville Pulkki Common-Slope Modeling of Late Reverberation . . . . . . . . . . . . . 3945--3957 Guanhua Chen and Runzhe Zhan and Derek F. Wong and Lidia S. Chao Multi-Level Curriculum Learning for Multi-Turn Dialogue Generation . . . . . 3958--3967 Yun-Yen Chuang and Hung-Min Hsu and Kevin Lin and Ray-I. Chang and Hung-Yi Lee MetaEx-GAN: Meta Exploration to Improve Natural Language Generation via Generative Adversarial Networks . . . . 3968--3980 Chuxuan Tong and Xi Zheng and Jianhua Li and Xingjun Ma and Longxiang Gao and Yong Xiang Query-Efficient Black-Box Adversarial Attacks on Automatic Speech Recognition 3981--3992 Xixin Wu and Hui Lu and Kun Li and Zhiyong Wu and Xunying Liu and Helen Meng Hiformer: Sequence Modeling Networks With Hierarchical Attention Mechanisms 3993--4003 Ante Wang and Linfeng Song and Lifeng Jin and Junfeng Yao and Haitao Mi and Chen Lin and Jinsong Su and Dong Yu D$^2$PSG: Multi-Party Dialogue Discourse Parsing as Sequence Generation . . . . . 4004--4013 Nan Gao and Yongjian Wang and Peng Chen and Jijun Tang Boosting Short Text Classification by Solving the OOV Problem . . . . . . . . 4014--4024
Jin Chu Wu and Raghu N. Kacker Statistical Analysis for Speaker Recognition Evaluation With Data Dependence and Three Score Distributions 1--14 Yongwei Zhou and Junwei Bao and Youzheng Wu and Xiaodong He and Tiejun Zhao Operation-Augmented Numerical Reasoning for Question Answering . . . . . . . . . 15--28 Anurenjan Purushothaman and Debottam Dutta and Rohit Kumar and Sriram Ganapathy Speech Dereverberation With Frequency Domain Autoregressive Modeling . . . . . 29--38 Leyuan Qu and Taihao Li and Cornelius Weber and Theresa Pekarek-Rosin and Fuji Ren and Stefan Wermter Disentangling Prosody Representations With Unsupervised Speech Reconstruction 39--54 Mathias Bach Pedersen and Sòren Holdt Jensen and Zheng-Hua Tan and Jesper Jensen Data-Driven Non-Intrusive Speech Intelligibility Prediction Using Speech Presence Probability . . . . . . . . . . 55--67 Yuanbo Hou and Bo Kang and Andrew Mitchell and Wenwu Wang and Jian Kang and Dick Botteldooren Cooperative Scene-Event Modelling for Acoustic Scene Classification . . . . . 68--82 Xiaotong Jiang and Peiwen You and Chen Chen and Zhongqing Wang and Guodong Zhou Exploring Scope Detection for Aspect-Based Sentiment Analysis . . . . 83--94 Xuenan Xu and Zeyu Xie and Mengyue Wu and Kai Yu Beyond the Status Quo: a Contemporary Survey of Advances and Challenges in Audio Captioning . . . . . . . . . . . . 95--112 Federico Miotello and Mirco Pezzoli and Luca Comanducci and Fabio Antonacci and Augusto Sarti Deep Prior-Based Audio Inpainting Using Multi-Resolution Harmonic Convolutional Neural Networks . . . . . . . . . . . . 113--123 Cristian-Lucian Stanciu and Jacob Benesty and Constantin Paleologu and Ruxandra-Liana Costea and Laura-Maria Dogariu and Silviu Ciochin\ua Decomposition-Based Wiener Filter Using the Kronecker Product and Conjugate Gradient Method . . . . . . . . . . . . 124--138 Huiyao Chen and Yueheng Sun and Meishan Zhang and Min Zhang Automatic Noise Generation and Reduction for Text Classification . . . . . . . . 139--150 Jiaming Xu and Jian Cui and Yunzhe Hao and Bo Xu Multi-Cue Guided Semi-Supervised Learning Toward Target Speaker Separation in Real Environments . . . . 151--163 Yang Xiang and Jesper Lisby Hòjvang and Morten Hòjfeldt Rasmussen and Mads Græsbòll Christensen A Two-Stage Deep Representation Learning-Based Speech Enhancement Method Using Variational Autoencoder and Adversarial Training . . . . . . . . . . 164--177 Xiao Li and Ruirui Liu and Huichou Huang and Qingyao Wu Contrastive Learning for Target Speaker Extraction With Attention-Based Fusion 178--188 Xiaobo Liang and Runze Mao and Lijun Wu and Juntao Li and Min Zhang and Qing Li Enhancing Low-Resource NLP by Consistency Training With Data and Model Perturbations . . . . . . . . . . . . . 189--199 Haisheng Lu and Jiangnan Liang and Chuang Shi Comments on ``Primary-Ambient Extraction Using Ambient Spectrum Estimation for Immersive Spatial Audio Reproduction'' 200--202 Szymon Drgas and Lars Bramslòw and Archontis Politis and Gaurav Naithani and Tuomas Virtanen Dynamic Processing Neural Network Architecture for Hearing Loss Compensation . . . . . . . . . . . . . . 203--214 Femke B. Gelderblom and Tron Vedul Tronstad and Torbjòrn Svendsen and Tor Andre Myrvoll On the Predictive Power of Objective Intelligibility Metrics for the Subjective Performance of Deep Complex Convolutional Recurrent Speech Enhancement Networks . . . . . . . . . . 215--226 Thomas Haubner and Andreas Brendel and Walter Kellermann End-to-End Deep Learning-Based Adaptation Control for Linear Acoustic Echo Cancellation . . . . . . . . . . . 227--238 Congcong Jiang and Tieyun Qian and Bing Liu One General Teacher for Multi-Data Multi-Task: a New Knowledge Distillation Framework for Discourse Relation Analysis . . . . . . . . . . . . . . . . 239--249 Khandokar Md. Nayem and Donald S. Williamson Attention-Based Speech Enhancement Using Human Quality Perception Modeling . . . 250--260 Ying Zhang and Fandong Meng and Yufeng Chen and Jinan Xu and Jie Zhou Complex Question Enhanced Transfer Learning for Zero-Shot Joint Information Extraction . . . . . . . . . . . . . . . 261--275 Jingsong Yan and Piji Li and Haibin Chen and Junhao Zheng and Qianli Ma Does the Order Matter? A Random Generative Way to Learn Label Hierarchy for Hierarchical Text Classification . . 276--285 Georgios Paraskevopoulos and Theodoros Kouzelis and Georgios Rouvalis and Athanasios Katsamanis and Vassilis Katsouros and Alexandros Potamianos Sample-Efficient Unsupervised Domain Adaptation of Speech Recognition Systems: a Case Study for Modern Greek 286--299 Ernesto Accolti and Javier Gimenez and Michael Vorländer Uncertainties of Room Acoustics Simulation Due to Directivity Data of Musical Instruments . . . . . . . . . . 300--309 Yoshiki Masuyama and Kouei Yamaoka and Yuma Kinoshita and Taishi Nakashima and Nobutaka Ono Causal and Relaxed-Distortionless Response Beamforming for Online Target Source Extraction . . . . . . . . . . . 310--324 Rohit Prabhavalkar and Takaaki Hori and Tara N. Sainath and Ralf Schlüter and Shinji Watanabe End-to-End Speech Recognition: a Survey 325--351 Yun Zhao and Dexi Liu and Changxuan Wan and Xiping Liu and Jian-yun Nie and Jiaming Liu JMS-QA: a Joint Hierarchical Architecture for Mental Health Question Answering . . . . . . . . . . . . . . . 352--363 Shiwen Ni and Jiawen Li and Min Yang and Hung-Yu Kao DropAttack: a Random Dropped Weight Attack Adversarial Training for Natural Language Understanding . . . . . . . . . 364--373 Tiantian Zhu and Yang Qin and Ming Feng and Qingcai Chen and Baotian Hu and Yang Xiang BioPRO: Context-Infused Prompt Learning for Biomedical Entity Linking . . . . . 374--385 Jiapu Wang and Boyue Wang and Junbin Gao and Simin Hu and Yongli Hu and Baocai Yin Multi-Level Interaction Based Knowledge Graph Completion . . . . . . . . . . . . 386--396 Qiangqiang Zhang and Dongyuan Lin and Yingying Xiao and Yunfei Zheng and Shiyuan Wang Error Reused Filtered-$X$ Least Mean Square Algorithm for Active Noise Control . . . . . . . . . . . . . . . . 397--412 Zengrui Jin and Mengzhe Geng and Jiajun Deng and Tianzi Wang and Shujie Hu and Guinan Li and Xunying Liu Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition . . . . . . . . . . . 413--429 Jun Kong and Jin Wang and Xuejie Zhang Adaptive Ensemble Self-Distillation With Consistent Gradients for Fast Inference of Pretrained Language Models . . . . . 430--442 Sr\dbaran Kiti\'c and Jérôme Daniel Blind Identification of Ambisonic Reduced Room Impulse Response . . . . . 443--458 Qijie Shao and Pengcheng Guo and Jinghao Yan and Pengfei Hu and Lei Xie Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition . . . . . . . . . . . 459--470 Han Zhu and Gaofeng Cheng and Jindong Wang and Wenxin Hou and Pengyuan Zhang and Yonghong Yan Boosting Cross-Domain Speech Recognition With Self-Supervision . . . . . . . . . 471--485 Yile Wang and Yue Zhang and Peng Li and Yang Liu Gradual Syntactic Label Replacement for Language Model Pre-Training . . . . . . 486--496 Penghui Ma and Jianfeng Li and Jingjing Pan and Xiaofei Zhang and Roberto Gil-Pita Coherent Signal DOA Estimation With Coprime Array: Exploiting Signal Subspace Reconstructing Strategy . . . . 497--508 Emma Hamel and Nickvash Kani Factors That Influence Automatic Recognition of African-American Vernacular English in Machine-Learning Models . . . . . . . . . . . . . . . . . 509--516 Jingbei Li and Sipan Li and Ping Chen and Luwen Zhang and Yi Meng and Zhiyong Wu and Helen Meng and Qiao Tian and Yuping Wang and Yuxuan Wang Joint Multiscale Cross-Lingual Speaking Style Transfer With Bidirectional Attention Mechanism for Automatic Dubbing . . . . . . . . . . . . . . . . 517--528 Bing Han and Zhengyang Chen and Yanmin Qian Self-Supervised Learning With Cluster-Aware-DINO for High-Performance Robust Speaker Verification . . . . . . 529--541 Kristina Tesch and Timo Gerkmann Multi-Channel Speech Separation Using Spatially Selective Deep Non-Linear Filters . . . . . . . . . . . . . . . . 542--553 Hao-Chen Pei and Hao Fang and Xin Luo and Xin-Shun Xu Gradformer: a Framework for Multi-Aspect Multi-Granularity Pronunciation Assessment . . . . . . . . . . . . . . . 554--563 Garima Sharma and Karthikeyan Umapathy and Sridhar Krishnan Time-Frequency Scattergrams for Biomedical Audio Signal Representation and Classification . . . . . . . . . . . 564--576 Zhibo Man and Zengcheng Huang and Yujie Zhang and Yu Li and Yuanmeng Chen and Yufeng Chen and Jinan Xu WDSRL: Multi-Domain Neural Machine Translation With Word-Level Domain-Sensitive Representation Learning 577--590 Chin-Po Chen and Ho-Hsien Pan and Susan Shur-Fen Gau and Chi-Chun Lee Using Measures of Vowel Space for Autistic Traits Characterization . . . . 591--607 Kevin Wilkinghoff and Frank Kurth Why Do Angular Margin Losses Work Well for Semi-Supervised Anomalous Sound Detection? . . . . . . . . . . . . . . . 608--622 Aku Rouhe and Tamás Grósz and Mikko Kurimo Principled Comparisons for End-to-End Speech Recognition: Attention vs Hybrid at the $ 1000$-Hour Scale . . . . . . . 623--638 Yile Wang and Yue Zhang Lost in Context? On the Sense-Wise Variance of Contextualized Word Embeddings . . . . . . . . . . . . . . . 639--650 Christoph Hold and Ville Pulkki and Archontis Politis and Leo McCormack Compression of Higher-Order Ambisonic Signals Using Directional Audio Coding 651--665 Shouhui Wang and Biao Qin A Novel Joint Training Model for Knowledge Base Question Answering . . . 666--679 Songbin Li and Jingang Wang and Peng Liu and Ke Shi SANet: a Compressed Speech Encoder and Steganography Algorithm Independent Steganalysis Deep Neural Network . . . . 680--690 Tarek Kanan and Amani AbedAlghafer and Shadi AlZu'bi and Bilal Hawashin and Ala Mughaid and Ghassan Kanaan and M. M. Kamruzzaman An Intelligent Health Care System for Detecting Drug Abuse in Social Media Platforms Based on Low Resource Language 691--703 Alejandro Santorum Varela and Svetlana Stoyanchev and Simon Keizer and Rama Doddipatla and Kate Knill Entity Resolution in Situated Dialog With Unimodal and Multimodal Transformers . . . . . . . . . . . . . . 704--713 Huang He and Hua Lu and Siqi Bao and Fan Wang and Hua Wu and Zheng-Yu Niu and Haifeng Wang Learning to Select External Knowledge With Multi-Scale Negative Sampling . . . 714--720 Hua Lu and Zhen Guo and Chanjuan Li and Yunyi Yang and Huang He and Siqi Bao Towards Building an Open-Domain Dialogue System Incorporated With Internet Memes 721--726 Jungwoo Lim and Taesun Whang and Dongyub Lee and Heuiseok Lim Adaptive Multi-Domain Dialogue State Tracking on Spoken Conversations . . . . 727--732 David Thulke and Nico Daheim and Christian Dugast and Hermann Ney Task-Oriented Document-Grounded Dialog Systems by HLTPR@RWTH for DSTC9 and DSTC10 . . . . . . . . . . . . . . . . . 733--741 Han Wu and Kun Xu and Linqi Song Structure-Aware Dialogue Modeling Methods for Conversational Semantic Role Labeling . . . . . . . . . . . . . . . . 742--752 Zhe Chen and Hongcheng Liu and Yu Wang DialogMCF: Multimodal Context Flow for Audio Visual Scene-Aware Dialog . . . . 753--764 Koichiro Yoshino and Yun-Nung Chen and Paul Crook and Satwik Kottur and Jinchao Li and Behnam Hedayatnia and Seungwhan Moon and Zhengcong Fei and Zekang Li and Jinchao Zhang and Yang Feng and Jie Zhou and Seokhwan Kim and Yang Liu and Di Jin and Alexandros Papangelis and Karthik Gopalakrishnan and Dilek Hakkani-Tur and Babak Damavandi and Alborz Geramifard and Chiori Hori and Ankit Shah and Chen Zhang and Haizhou Li and João Sedoc and Luis F. D'Haro and Rafael Banchs and Alexander Rudnicky Overview of the Tenth Dialog System Technology Challenge: DSTC10 . . . . . . 765--778 Shekhar Kumar Yadav and Nithin V. George Joint Dereverberation and Beamforming With Blind Estimation of the Shape Parameter of the Desired Source Prior 779--793 Yanxiong Li and Zhongjie Jiang and Qisheng Huang and Wenchang Cao and Jialong Li Lightweight Speaker Verification Using Transformation Module With Feature Partition and Fusion . . . . . . . . . . 794--806 Yuhan Dai and Zhirui Zhang and Yichao Du and Shengcai Liu and Lemao Liu and Tong Xu Datastore Distillation for Nearest Neighbor Machine Translation . . . . . . 807--817 Changtao Li and Feiran Yang and Jun Yang A Two-Stage Approach to Quality Restoration of Bone-Conducted Speech . . 818--829 Jie Zhou and Yuanbiao Lin and Qin Chen and Qi Zhang and Xuanjing Huang and Liang He CausalABSC: Causal Inference for Aspect Debiasing in Aspect-Based Sentiment Classification . . . . . . . . . . . . . 830--840 Ruiying Lu and Bo Chen and Dandan Guo and Dongsheng Wang and Mingyuan Zhou Hierarchical Topic-Aware Contextualized Transformers . . . . . . . . . . . . . . 841--852 Yaru Zhao and Bo Cheng and Yakun Huang and Zhiguo Wan FluGCF: a Fluent Dialogue Generation Model With Coherent Concept Entity Flow 853--867 Changhao Ding and Zhangjie Fu and Zhongliang Yang and Qi Yu and Daqiu Li and Yongfeng Huang Context-Aware Linguistic Steganography Model Based on Neural Machine Translation . . . . . . . . . . . . . . 868--878 Zainab Alhakeem and Se-In Jang and Hong-Goo Kang Disentangled Representations in Local-Global Contexts for Arabic Dialect Identification . . . . . . . . . . . . . 879--890 Jae-Hong Lee and Joon-Hyuk Chang Partitioning Attention Weight: Mitigating Adverse Effect of Incorrect Pseudo-Labels for Self-Supervised ASR 891--905 Ryo Fukuda and Katsuhito Sudoh and Satoshi Nakamura Improving Speech Translation Accuracy and Time Efficiency With Fine-Tuned wav2vec 2.0-Based Speech Segmentation 906--916 Seong-Gyun Leem and Daniel Fulford and Jukka-Pekka Onnela and David Gard and Carlos Busso Selective Acoustic Feature Enhancement for Speech Emotion Recognition With Noisy Speech . . . . . . . . . . . . . . 917--929 Alexander Bohlender and Ann Spriet and Wouter Tirry and Nilesh Madhu Spatially Selective Speaker Separation Using a DNN With a Location Dependent Feature Extraction . . . . . . . . . . . 930--945 Matan Karo and Arie Yeredor and Itshak Lapidot Compact Time-Domain Representation for Logical Access Spoofed Audio . . . . . . 946--958 Or Berebi and Zamir Ben-Hur and David Lou Alon and Boaz Rafaely Analysis and Design of Head-Tracked Compensation for Bilateral Ambisonics 959--972 Wei Wang and Yanmin Qian Universal Cross-Lingual Data Generation for Low Resource ASR . . . . . . . . . . 973--983 Davide Berghi and Philip J. B. Jackson Leveraging Visual Supervision for Array-Based Active Speaker Detection and Localization . . . . . . . . . . . . . . 984--995 Daniel Aleksander Krause and Guillermo García-Barrios and Archontis Politis and Annamaria Mesaros Binaural Sound Source Distance Estimation and Localization for a Moving Listener . . . . . . . . . . . . . . . . 996--1011 Seung-Bin Kim and Sang-Hoon Lee and Ha-Yeong Choi and Seong-Whan Lee Audio Super-Resolution With Robust Speech Representation Learning of Masked Autoencoder . . . . . . . . . . . . . . 1012--1022 Omer Musa Battal and Aykut Koç Automatic Construction of Sememe Knowledge Bases From Machine Readable Dictionaries . . . . . . . . . . . . . . 1023--1035 Varun Krishna and Tarun Sai and Sriram Ganapathy Representation Learning With Hidden Unit Clustering for Low Resource Speech Applications . . . . . . . . . . . . . . 1036--1047 Zhengding Luo and Dongyuan Shi and Woon-Seng Gan and Qirui Huang Delayless Generative Fixed-Filter Active Noise Control Based on Deep Learning and Bayesian Filter . . . . . . . . . . . . 1048--1060 Zewen Chi and Heyan Huang and Luyang Liu and Yu Bai and Xiaoyan Gao and Xian-Ling Mao Can Pretrained English Language Models Benefit Non-English NLP Systems in Low-Resource Scenarios? . . . . . . . . 1061--1074 Rui Liu and Yifan Hu and Haolin Zuo and Zhaojie Luo and Longbiao Wang and Guanglai Gao Text-to-Speech for Low-Resource Agglutinative Language With Morphology-Aware Language Model Pre-Training . . . . . . . . . . . . . . 1075--1087 Shu Jiang and Zuchao Li and Hai Zhao and Weiping Ding Entity-Relation Extraction as Full Shallow Semantic Dependency Parsing . . 1088--1099 Yoav Vered and Stephen Elliott A Parallel Analog and Digital Adaptive Feedforward Controller for Active Noise Control . . . . . . . . . . . . . . . . 1100--1108 Puning Zhang and Rongjian Zhao and Boran Yang and Yuexian Li and Zhigang Yang Integrated Syntactic and Semantic Tree for Targeted Sentiment Classification Using Dual-Channel Graph Convolutional Network . . . . . . . . . . . . . . . . 1109--1124 Xu Wang and Hainan Zhang and Shuai Zhao and Hongshen Chen and Zhuoye Ding and Zhiguo Wan and Bo Cheng and Yanyan Lan Debiasing Counterfactual Context With Causal Inference for Multi-Turn Dialogue Reasoning . . . . . . . . . . . . . . . 1125--1132 Hoang Ngoc Chau and Tien Dat Bui and Huu Binh Nguyen and Thanh Thi Hien Duong and Quoc Cuong Nguyen A Novel Approach to Multi-Channel Speech Enhancement Based on Graph Neural Networks . . . . . . . . . . . . . . . . 1133--1144 Yuchen Hu and Chen Chen and Qiushi Zhu and Eng Siong Chng Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR . . . . . . . . . . . . 1145--1156 Tetsuya Ueda and Tomohiro Nakatani and Rintaro Ikeshita and Keisuke Kinoshita and Shoko Araki and Shoji Makino Blind and Spatially-Regularized Online Joint Optimization of Source Separation, Dereverberation, and Noise Reduction . . 1157--1172 Vibhav Agarwal and Sourav Ghosh and Harichandana BSS and Himanshu Arora and Barath Raj Kandur Raja TrICy: Trigger-Guided Data-to-Text Generation With Intent Aware Attention-Copy . . . . . . . . . . . . . 1173--1184 Christoph Boeddeker and Aswin Shanmugam Subramanian and Gordon Wichern and Reinhold Haeb-Umbach and Jonathan Le Roux TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings . . . . . . . . . . . . . . . 1185--1197 Reza Varzandeh and Simon Doclo and Volker Hohmann Speech-Aware Binaural DOA Estimation Utilizing Periodicity and Spatial Features in Convolutional Neural Networks . . . . . . . . . . . . . . . . 1198--1213 Yigitcan Özer and Meinard Müller Source Separation of Piano Concertos Using Musically Motivated Augmentation Techniques . . . . . . . . . . . . . . . 1214--1225 Lior Frenkel and Shlomo E. Chazan and Jacob Goldberger Domain Adaptation Using Suitable Pseudo Labels for Speech Enhancement and Dereverberation . . . . . . . . . . . . 1226--1236 Jiahao Zhao and Wenji Mao and Daniel Dajun Zeng Disentangled Text Representation Learning With Information-Theoretic Perspective for Adversarial Robustness 1237--1247 Dong Zhou and Fang Lei and Lin Li and Yongmei Zhou and Aimin Yang Cross-Modal Interaction via Reinforcement Feedback for Audio-Lyrics Retrieval . . . . . . . . . . . . . . . 1248--1260 Xuechen Liu and Md Sahidullah and Kong Aik Lee and Tomi Kinnunen Generalizing Speaker Verification for Spoof Awareness in the Embedding Space 1261--1273 Shiyao Cui and Jiangxia Cao and Xin Cong and Jiawei Sheng and Quangang Li and Tingwen Liu and Jinqiao Shi Enhancing Multimodal Entity and Relation Extraction With Variational Information Bottleneck . . . . . . . . . . . . . . . 1274--1285 Yizhou Tan and Haojun Ai and Shengchen Li and Mark D. Plumbley Acoustic Scene Classification Across Cities and Devices via Feature Disentanglement . . . . . . . . . . . . 1286--1297 Orel Ben Zaken and Anurag Kumar and Vladimir Tourbabin and Boaz Rafaely Neural- Network-Based Direction-of-Arrival Estimation for Reverberant Speech --- The Importance of Energetic, Temporal, and Spatial Information . . . . . . . . . . . . . . 1298--1309 Changsheng Quan and Xiaofei Li SpatialNet: Extensively Learning Spatial Information for Multichannel Joint Speech Separation, Denoising and Dereverberation . . . . . . . . . . . . 1310--1323 Matthew Baas and Herman Kamper Disentanglement in a GAN for Unconditional Speech Synthesis . . . . . 1324--1335 Xian Li and Nian Shao and Xiaofei Li Self-Supervised Audio Teacher-Student Transformer for Both Clip-Level and Frame-Level Tasks . . . . . . . . . . . 1336--1351 Yifan Chen and Gaofeng Cheng and Runyan Yang and Pengyuan Zhang and Yonghong Yan Interrelate Training and Clustering for Online Speaker Diarization . . . . . . . 1352--1364 Sheng Feng and Xiaoqian Zhu and Shuqing Ma Masking Hierarchical Tokens for Underwater Acoustic Target Recognition With Self-Supervised Learning . . . . . 1365--1379 Yangyang Zhao and Kai Yin and Zhenyu Wang and Mehdi Dastani and Shihan Wang Decomposed Deep $Q$-Network for Coherent Task-Oriented Dialogue Policy Learning 1380--1391 Jayneel Parekh and Sanjeel Parekh and Pavlo Mozharovskyi and Gaël Richard and Florence d'Alché-Buc Tackling Interpretability in Audio Classification Networks With Non-negative Matrix Factorization . . . 1392--1405 Xiuying Chen and Shen Gao and Mingzhe Li and Qingqing Zhu and Xin Gao and Xiangliang Zhang Write Summary Step-by-Step: a Pilot Study of Stepwise Summarization . . . . 1406--1415 Changkai Lin and Hongju Cheng and Qiang Rao and Yang Yang M$^3$SA: Multimodal Sentiment Analysis Based on Multi-Scale Feature Extraction and Multi-Task Learning . . . . . . . . 1416--1429 Rui-Chen Zheng and Yang Ai and Zhen-Hua Ling Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement . . 1430--1444 Ritujoy Biswas and Karan Nathwani and Vinayak Abrol Statistically Guided Near-End Speech Intelligibility Improvement Through Voice Transformation and Transfer Learning . . . . . . . . . . . . . . . . 1445--1456 Linhui Sun and Shuo Yuan and Aifei Gong and Lei Ye and Eng Siong Chng Dual-Branch Modeling Based on State-Space Model for Speech Enhancement 1457--1467 Alkis Koudounas and Eliana Pastor and Giuseppe Attanasio and Vittorio Mazzia and Manuel Giollo and Thomas Gueudre and Elisa Reale and Luca Cagliero and Sandro Cumani and Luca de Alfaro and Elena Baralis and Daniele Amberti Towards Comprehensive Subgroup Performance Analysis in Speech Models 1468--1480 Wenmeng Xiong and Changchun Bao and Jing Zhou and Maoshen Jia and José Picheral Joint DOA Estimation and Dereverberation Based on Multi-Channel Linear Prediction Filtering and Azimuth Sparsity . . . . . 1481--1493 Yehav Alkaher and Israel Cohen Howling Detection and Gain Control for Speech Reinforcement in a Noisy Car Cabin Environment . . . . . . . . . . . 1494--1505 Xinfa Zhu and Yi Lei and Tao Li and Yongmao Zhang and Hongbin Zhou and Heng Lu and Lei Xie METTS: Multilingual Emotional Text-to-Speech by Cross-Speaker and Cross-Lingual Emotion Transfer . . . . . 1506--1518 Myeonghun Jeong and Minchan Kim and Byoung Jin Choi and Jaesam Yoon and Won Jang and Nam Soo Kim Transfer Learning for Low-Resource, Multi-Lingual, and Zero-Shot Multi-Speaker Text-to-Speech . . . . . . 1519--1530 Jiadi Yao and Hong Luo and Jun Qi and Xiao-Lei Zhang Interpretable Spectrum Transformation Attacks to Speaker Recognition Systems 1531--1545 Xiang Chen and Lei Li and Yuqi Zhu and Shumin Deng and Chuanqi Tan and Fei Huang and Luo Si and Ningyu Zhang and Huajun Chen Sequence Labeling as Non-Autoregressive Dual-Query Set Generation . . . . . . . 1546--1558 Lei Liu and Li Liu and Haizhou Li Computation and Parameter Efficient Multi-Modal Fusion Transformer for Cued Speech Recognition . . . . . . . . . . . 1559--1572 Adrián Barahona-Ríos and Tom Collins NoiseBandNet: Controllable Time-Varying Neural Synthesis of Sound Effects Using Filterbanks . . . . . . . . . . . . . . 1573--1585 Siyuan Wang and Zhongyu Wei and Jiarong Xu and Taishan Li and Zhihao Fan Unifying Structure Reasoning and Language Pre-Training for Complex Reasoning Tasks . . . . . . . . . . . . 1586--1595 Yijing Chu and Sipei Zhao and Feng Niu and Yongzheng Dong and Yuezhe Zhao A New Diffusion Filtered-$X$ Affine Projection Algorithm: Performance Analysis and Application in Windy Environment . . . . . . . . . . . . . . 1596--1608 Yuquan Le and Zhe Quan and Jiawei Wang and Da Cao and Kenli Li $ R^2 $: a Novel Recall & Ranking Framework for Legal Judgment Prediction 1609--1622 Xiaotong Jiang and Ruirui Bai and Zhongqing Wang and Guodong Zhou Cross-Domain Aspect-Based Sentiment Classification With Tripartite Graph Modeling . . . . . . . . . . . . . . . . 1623--1635 Zhengyang Chen and Bing Han and Shuai Wang and Yanmin Qian Attention-Based Encoder-Decoder End-to-End Neural Diarization With Embedding Enhancer . . . . . . . . . . . 1636--1649 Chenfeng Miao and Qingying Zhu and Minchuan Chen and Jun Ma and Shaojun Wang and Jing Xiao EfficientTTS 2: Variational End-to-End Text-to-Speech Synthesis and Voice Conversion . . . . . . . . . . . . . . . 1650--1661 Orel Peretz and Israel Cohen Constant Elevation-Beamwidth Beamforming With Concentric Ring Arrays . . . . . . 1662--1672 Zhibin Quan and Chi-Man Vong and Weili Zeng and Wankou Yang The MorPhEMe Machine: an Addressable Neural Memory for Learning Knowledge-Regularized Deep Contextualized Chinese Embedding . . . . 1673--1686 Lijian Gao and Qirong Mao and Ming Dong On Local Temporal Embedding for Semi-Supervised Sound Event Detection 1687--1698 Xuehao Zhou and Mingyang Zhang and Yi Zhou and Zhizheng Wu and Haizhou Li Accented Text-to-Speech Synthesis With Limited Data . . . . . . . . . . . . . . 1699--1711 Vinay Kothapally and John H. L. Hansen Monaural Speech Dereverberation Using Deformable Convolutional Networks . . . 1712--1723 Taihui Wang and Feiran Yang and Jun Yang Multichannel Linear Prediction-Based Speech Dereverberation Considering Sparse and Low-Rank Priors . . . . . . . 1724--1735 Saurabh Kataria and Jesús Villalba and Laureano Moro-Velázquez and Piotr \.Zelasko and Najim Dehak Time-Domain Speech Super-Resolution With GAN Based Modeling for Telephony Speaker Verification . . . . . . . . . . . . . . 1736--1749 Marco Olivieri and Amy Bastine and Mirco Pezzoli and Fabio Antonacci and Thushara Abhayapala and Augusto Sarti Acoustic Imaging With Circular Microphone Array: a New Approach for Sound Field Analysis . . . . . . . . . . 1750--1761 Tengfei Liu and Yongli Hu and Junbin Gao and Yanfeng Sun and Baocai Yin Hierarchical Multi-Granularity Interaction Graph Convolutional Network for Long Document Classification . . . . 1762--1775 Douglas O'Shaughnessy Review of Methods for Automatic Speaker Verification . . . . . . . . . . . . . . 1776--1789 Etienne Thuillier and Craig T. Jin and Vesa Välimäki HRTF Interpolation Using a Spherical Neural Process Meta-Learner . . . . . . 1790--1802 Xun Gong and Yu Wu and Jinyu Li and Shujie Liu and Rui Zhao and Xie Chen and Yanmin Qian Advanced Long-Content Speech Recognition With Factorized Neural Transducer . . . 1803--1815 Yoshiki Masuyama and Kouei Yamaoka and Takao Kawamura and Nobutaka Ono Efficient Joint Optimization of Sampling Rate Offsets Using Entire Multichannel Signal . . . . . . . . . . . . . . . . . 1816--1828 Takaaki Saeki and Soumi Maiti and Xinjian Li and Shinji Watanabe and Shinnosuke Takamichi and Hiroshi Saruwatari Text-Inductive Graphone-Based Language Adaptation for Low-Resource Speech Synthesis . . . . . . . . . . . . . . . 1829--1844 Yingming Gao and Peter Birkholz and Ya Li Articulatory Copy Synthesis Based on the Speech Synthesizer VocalTractLab and Convolutional Recurrent Neural Networks 1845--1858 Théo Mariotte and Anthony Larcher and Silvio Montrésor and Jean-Hugh Thomas Channel-Combination Algorithms for Robust Distant Voice Activity and Overlapped Speech Detection . . . . . . 1859--1872 Luciana M. X. de Souza and Márcio H. Costa and Renata Coelho Borges Envelope-Based Multichannel Noise Reduction for Cochlear Implant Applications . . . . . . . . . . . . . . 1873--1884 Linjian Li and Yi Cai and Xin Wu Unsupervised Disentanglement Learning Model for Exemplar-Guided Paraphrase Generation . . . . . . . . . . . . . . . 1885--1900 Amir Ivry and Israel Cohen and Baruch Berdugo A User-Centric Approach for Deep Residual-Echo Suppression in Double-Talk 1901--1914 Geng Zhang and Jin Liu and Guangyou Zhou and Kunsong Zhao and Zhiwen Xie and Bo Huang Question-Directed Reasoning With Relation-Aware Graph Attention Network for Complex Question Answering Over Knowledge Graph . . . . . . . . . . . . 1915--1927 Yu Yao and Peng Yang and Guangzhen Zhao and Guoshun Yin KGAgent: Learning a Deep Reinforced Agent for Keyphrase Generation . . . . . 1928--1940 Jiahong Li and Chenda Li and Yifei Wu and Yanmin Qian Unified Cross-Modal Attention: Robust Audio-Visual Speech Recognition and Beyond . . . . . . . . . . . . . . . . . 1941--1953 Mieszko Fra\'s and Konrad Kowalczyk Reverberant Source Separation Using NTF With Delayed Subsources and Spatial Priors . . . . . . . . . . . . . . . . . 1954--1967 Rui Wang and Li Li and Tomoki Toda Dual-Channel Target Speaker Extraction Based on Conditional Variational Autoencoder and Directional Information 1968--1979