Table of contents for issues of IEEE\slash ACM Transactions on Audio, Speech, and Language Processing

Last update: Sat Jun 8 14:56:32 MDT 2024

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 22, Number 1, January, 2014

                      Anonymous   Table of Contents  . . . . . . . . . . . 1--2
                      Anonymous   Table of Contents  . . . . . . . . . . . 3--4
                    L. Deng and   
                  S. Renals and   
                M. Federico and   
                   M. Ostendorf   Editorial: Expanding the Technical Reach
                                  of our Transactions  . . . . . . . . . . 5--5
                  J. Taghia and   
                      R. Martin   Objective Intelligibility Measures Based
                                  on Mutual Information for Speech
                                  Subjected to Speech Enhancement
                                  Processing . . . . . . . . . . . . . . . 6--16
                   Liang Lu and   
                 A. Ghoshal and   
                      S. Renals   Cross-Lingual Subspace Gaussian Mixture
                                  Models for Low-Resource Speech
                                  Recognition  . . . . . . . . . . . . . . 17--27
                   M. Gasic and   
                       S. Young   Gaussian Processes for POMDP-Based
                                  Dialogue Manager Optimization  . . . . . 28--40
      I. Mezghani-Marrakchi and   
                    G. Mahe and   
           S. Djaziri-Larbi and   
                 M. Jaidane and   
          M. Turki-Hadj Alouane   Nonlinear Audio Systems Identification
                                  Through Audio Input Gaussianization  . . 41--53
               J. B. Crespo and   
                 R. C. Hendriks   Multizone Speech Reinforcement . . . . . 54--66
                   Chao Pan and   
              Jingdong Chen and   
                     J. Benesty   Performance Study of the MVDR Beamformer
                                  as a Function of the Source Incidence
                                  Angle  . . . . . . . . . . . . . . . . . 67--79
                Hung-yi Lee and   
                   Lin-shan Lee   Improved Semantic Retrieval of Spoken
                                  Content by Document/Query Expansion with
                                  Random Walk Over Acoustic Similarity
                                  Graphs . . . . . . . . . . . . . . . . . 80--94
                V. Leutnant and   
                 A. Krueger and   
                 R. Haeb-Umbach   A New Observation Model in the
                                  Logarithmic Mel Power Spectral Domain
                                  for the Automatic Recognition of Noisy
                                  Reverberant Speech . . . . . . . . . . . 95--109
                 N. F. Chen and   
                  S. W. Tam and   
                  Wade Shen and   
                 J. P. Campbell   Characterizing Phonetic Transformations
                                  and Acoustic Differences Across English
                                  Dialects . . . . . . . . . . . . . . . . 110--124
                D. Markovic and   
               K. Kowalczyk and   
               F. Antonacci and   
                 C. Hofmann and   
                   A. Sarti and   
                  W. Kellermann   Estimation of Acoustic Reflection
                                  Coefficients Through Pseudospectrum
                                  Matching . . . . . . . . . . . . . . . . 125--137
                Zhiyao Duan and   
                  Jinyu Han and   
                       B. Pardo   Multi-pitch Streaming of Harmonic Sound
                                  Mixtures . . . . . . . . . . . . . . . . 138--150
                 Shilin Liu and   
                   Khe Chai Sim   Temporally Varying Weight Regression: A
                                  Semi-Parametric Trajectory Model for
                                  Automatic Speech Recognition . . . . . . 151--160
                V. S. Tomar and   
                     R. C. Rose   A Family of Discriminative Manifold
                                  Learning Algorithms and Their
                                  Application to Speech Recognition  . . . 161--171
                     H. Doi and   
                    T. Toda and   
                K. Nakamura and   
              H. Saruwatari and   
                     K. Shikano   Alaryngeal Speech Enhancement Based on
                                  One-to-Many Eigenvoice Conversion  . . . 172--183
                  E. Arisoy and   
                 S. F. Chen and   
             B. Ramabhadran and   
                       A. Sethy   Converting Neural Network Language
                                  Models into Back-off Language Models for
                                  Efficient Decoding in Automatic Speech
                                  Recognition  . . . . . . . . . . . . . . 184--192
                  C. T. Jin and   
                   N. Epain and   
                      A. Parthy   Design, Optimization and Evaluation of a
                                  Dual-Radius Spherical Microphone Array   193--204
                  R. Mignot and   
                 G. Chardon and   
                      L. Daudet   Low Frequency Interpolation of Room
                                  Impulse Responses Using Compressed
                                  Sensing  . . . . . . . . . . . . . . . . 205--216
             M. Senoussaoui and   
                   P. Kenny and   
              T. Stafylakis and   
                   P. Dumouchel   A Study of the Cosine Distance-Based
                                  Mean Shift for Telephone Speech
                                  Diarization  . . . . . . . . . . . . . . 217--227
               H. Tachibana and   
                     N. Ono and   
                    S. Sagayama   Singing Voice Enhancement in Monaural
                                  Music Signals Based on Two-stage
                                  Harmonic/Percussive Sound Separation on
                                  Multiple Resolution Spectrograms . . . . 228--237
              N. R. Shabtai and   
                     B. Rafaely   Generalized Spherical Array Beamforming
                                  for Binaural Speech Reproduction . . . . 238--247
                  S. Cumani and   
                      P. Laface   Factorized Sub-Space Estimation for Fast
                                  and Memory Effective $I$-vector
                                  Extraction . . . . . . . . . . . . . . . 248--259
                  Yuan Zeng and   
                 R. C. Hendriks   Distributed Delay and Sum Beamformer for
                                  Speech Enhancement via Randomized Gossip 260--273
                Zhenghua Li and   
                  Min Zhang and   
               Wanxiang Che and   
                   Ting Liu and   
                  Wenliang Chen   Joint Optimization for Chinese POS
                                  Tagging and Dependency Parsing . . . . . 274--286
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing --- EDICS  . . . 289--290
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing Information for
                                  Authors  . . . . . . . . . . . . . . . . 291--292
                      Anonymous   Open Access  . . . . . . . . . . . . . . 293--293
                      Anonymous   [Blank page] . . . . . . . . . . . . . . B287--B288
                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing publication
                                  information  . . . . . . . . . . . . . . C2
                      Anonymous   IEEE Signal Processing Society
                                  Information  . . . . . . . . . . . . . . C3
                      Anonymous   [Blank page --- back cover]  . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 22, Number 2, February, 2014

                      Anonymous   Table of contents  . . . . . . . . . . . 289--290
                      Anonymous   Table of contents  . . . . . . . . . . . 291--292
                 Dehong Gao and   
                  Wenjie Li and   
                Xiaoyan Cai and   
              Renxian Zhang and   
                     You Ouyang   Sequential Summarization: a Full View of
                                  Twitter Trending Topics  . . . . . . . . 293--302
        P. W. J. van Hengel and   
                J. D. Krijnders   A Comparison of Spectro-Temporal
                                  Representations of Audio Signals . . . . 303--313
                 I. Zitouni and   
                    Y. Benajiba   Aligned-Parallel-Corpora Based
                                  Semi-Supervised Learning for Arabic
                                  Mention Detection  . . . . . . . . . . . 314--324
                  E. Molina and   
            A. M. Barbancho and   
               L. J. Tardon and   
                   I. Barbancho   Dissonance Reduction In Polyphonic Audio
                                  Using Harmonic Reorganization  . . . . . 325--334
               D. P. K. Lun and   
               Tak-Wai Shen and   
                       K. C. Ho   A Novel Expectation-Maximization
                                  Framework for Speech Enhancement in
                                  Non-Stationary Noise Environments  . . . 335--346
               S. Cosentino and   
                 T. H. Falk and   
                D. McAlpine and   
                   T. Marquardt   Cochlear Implant Filterbank Design and
                                  Optimization: A Simulation Study . . . . 347--353
                  M. Souden and   
               K. Kinoshita and   
                M. Delcroix and   
                    T. Nakatani   Location Feature Integration for
                                  Clustering-Based Speech Separation in
                                  Distributed Microphone Arrays  . . . . . 354--367
              H. Kallasjoki and   
              J. F. Gemmeke and   
                 K. J. Palomaki   Estimating Uncertainty to Improve
                                  Exemplar-Based Feature Enhancement for
                                  Noise Robust Speech Recognition  . . . . 368--380
                   T. Hasan and   
                J. H. L. Hansen   Maximum Likelihood Acoustic Factor
                                  Analysis Models for Robust Speaker
                                  Verification in Noise  . . . . . . . . . 381--391
                O. Schwartz and   
                      S. Gannot   Speaker Tracking Using Recursive EM
                                  Algorithms . . . . . . . . . . . . . . . 392--402
                    Yu Tsao and   
                 S. Matsuda and   
                    C. Hori and   
                H. Kashioka and   
                   Chin-Hui Lee   A MAP-based Online Estimation Approach
                                  to Ensemble Speaker and Speaking
                                  Environment Modeling . . . . . . . . . . 403--416
                 Pui-Yu Hui and   
                        H. Meng   Latent Semantic Analysis for Multimodal
                                  User Input With Speech and Gestures  . . 417--429
                  J. Jensen and   
                     C. H. Taal   Speech Intelligibility Prediction Based
                                  on Mutual Information  . . . . . . . . . 430--440
               A. Primavera and   
                  S. Cecchi and   
                 Junfeng Li and   
                      F. Piazza   Objective and Subjective Investigation
                                  on a Novel Method for Digital
                                  Reverberator Parameters Estimation . . . 441--452
                   M. Speed and   
                  D. Murphy and   
                      D. Howard   Modeling the Vocal Tract Transfer
                                  Function Using a $3$D Digital Waveguide
                                  Mesh . . . . . . . . . . . . . . . . . . 453--464
Hüseyim Hacìhabibo\uglu   Theoretical Analysis of Open Spherical
                                  Microphone Arrays for Acoustic Intensity
                                  Measurements . . . . . . . . . . . . . . 465--476
                 Taemin Cho and   
                    J. P. Bello   On the Relative Importance of Individual
                                  Components of Chord Recognition Systems  477--492
                  T. Otsuka and   
                K. Ishiguro and   
                  H. Sawada and   
                    H. G. Okuno   Bayesian Nonparametrics for Microphone
                                  Array Processing . . . . . . . . . . . . 493--504
                 Jianjun He and   
                Ee-Leng Tan and   
                  Woon-Seng Gan   Linear Estimation Based Primary-Ambient
                                  Extraction for Stereo Audio Signals  . . 505--517
                S. Gonzalez and   
                     M. Brookes   PEFAC --- A Pitch Estimation Algorithm
                                  Robust to High Levels of Noise . . . . . 518--530
                  Min Zhang and   
               Xiangyu Duan and   
                  Wenliang Chen   Bayesian Constituent Context Model for
                                  Grammar Induction  . . . . . . . . . . . 531--541
            Dah-Chung Chang and   
                    Fei-Tao Chu   Feedforward Active Noise Control With a
                                  New Variable Tap-Length and Step-Size
                                  Filtered-X LMS Algorithm . . . . . . . . 542--555
                 M. McVicar and   
        R. Santos-Rodriguez and   
                  Yizhao Ni and   
                    Tijl De Bie   Automatic Chord Estimation from Audio: a
                                  Review of the State of the Art . . . . . 556--575
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing --- EDICS  . . . 576--577
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing Information for
                                  Authors  . . . . . . . . . . . . . . . . 578--579
                      Anonymous   Open Access  . . . . . . . . . . . . . . 580--580
                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing publication
                                  information  . . . . . . . . . . . . . . C2
                      Anonymous   IEEE Signal Processing Society
                                  Information  . . . . . . . . . . . . . . C3
                      Anonymous   [Blank page --- back cover]  . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 22, Number 3, March, 2014

                      Anonymous   Table of Contents  . . . . . . . . . . . 581--582
                      Anonymous   Table of Contents  . . . . . . . . . . . 583--584
             Chung-Hsien Wu and   
              Yi-Chin Huang and   
              Chung-Han Lee and   
                  Jun-Cheng Guo   Synthesis of Spontaneous Speech With
                                  Syllable Contraction Using State-Based
                                  Context-Dependent Voice Transformation   585--595
              M. Airaksinen and   
                  T. Raitio and   
                   B. Story and   
                        P. Alku   Quasi Closed Phase Glottal Inverse
                                  Filtering Analysis With Weighted Linear
                                  Prediction . . . . . . . . . . . . . . . 596--607
                Jae-Mo Yang and   
                  Hong-Goo Kang   Online Speech Dereverberation Algorithm
                                  Based on Adaptive Multichannel Linear
                                  Prediction . . . . . . . . . . . . . . . 608--619
                   A. Asaei and   
               M. Golbabaee and   
                H. Bourlard and   
                      V. Cevher   Structured Sparsity Models for
                                  Reverberant Speech Separation  . . . . . 620--633
              R. S. Rashobh and   
             A. W. H. Khong and   
                         Di Liu   Multichannel Equalization in the KLT and
                                  Frequency Domains With Application to
                                  Speech Dereverberation . . . . . . . . . 634--646
            P. Samarasinghe and   
              T. Abhayapala and   
                     M. Poletti   Wavefield Analysis Over Large Areas
                                  Using Distributed Higher Order
                                  Microphones  . . . . . . . . . . . . . . 647--658
                 Wen-Li Wei and   
             Chung-Hsien Wu and   
               Jen-Chun Lin and   
                         Han Li   Exploiting Psychological Factors for
                                  Interaction Style Recognition in Spoken
                                  Conversation . . . . . . . . . . . . . . 659--671
          S. A. Raczy\'nski and   
                     E. Vincent   Genre-Based Music Language Modeling with
                                  Latent Hierarchical Pitman-Yor Process
                                  Allocation . . . . . . . . . . . . . . . 672--681
                   Dalei Wu and   
               Wei-Ping Zhu and   
                 M. N. S. Swamy   The Theory of Compressive Sensing
                                  Matching Pursuit Considering Time-domain
                                  Noise with Application to Speech
                                  Enhancement  . . . . . . . . . . . . . . 682--696
           T. Nanjundaswamy and   
                        K. Rose   Cascaded Long Term Prediction for
                                  Enhanced Compression of Polyphonic Audio
                                  Signals  . . . . . . . . . . . . . . . . 697--710
               K. Audhkhasi and   
                A. M. Zavou and   
             P. G. Georgiou and   
                S. S. Narayanan   Theoretical Analysis of Diversity in an
                                  Ensemble of Automatic Speech Recognition
                                  Systems  . . . . . . . . . . . . . . . . 711--726
                 J. Nikunen and   
                    T. Virtanen   Direction of Arrival Based Spatial
                                  Covariance Model for Blind Sound Source
                                  Separation . . . . . . . . . . . . . . . 727--739
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing EDICS  . . . . . 741--742
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing Information for
                                  Authors  . . . . . . . . . . . . . . . . 743--744
                      Anonymous   Open Access  . . . . . . . . . . . . . . 745--745
                      Anonymous   Publish your article in IEEE Access  . . 746--746
                      Anonymous   [Blank page] . . . . . . . . . . . . . . B740
                      Anonymous   [Front cover]  . . . . . . . . . . . . . C1
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing publication
                                  information  . . . . . . . . . . . . . . C2
                      Anonymous   IEEE Signal Processing Society
                                  Information  . . . . . . . . . . . . . . C3
                      Anonymous   [Blank page --- back cover]  . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 22, Number 4, April, 2014

                      Anonymous   Table of contents  . . . . . . . . . . . 741--742
                      Anonymous   Table of contents  . . . . . . . . . . . 743--744
                   Jinyu Li and   
                    Li Deng and   
                 Yifan Gong and   
                 R. Haeb-Umbach   An Overview of Noise-Robust Automatic
                                  Speech Recognition . . . . . . . . . . . 745--777
                R. Sarikaya and   
               G. E. Hinton and   
                      A. Deoras   Application of Deep Belief Networks for
                                  Natural Language Understanding . . . . . 778--784
                 R. Serizel and   
                  M. Moonen and   
                B. Van Dijk and   
                     J. Wouters   Low-rank Approximation Based
                                  Multichannel Wiener Filter Algorithms
                                  for Noise Reduction with Application in
                                  Cochlear Implants  . . . . . . . . . . . 785--799
                  M. Crocco and   
                      A. Trucco   Design of Superdirective Planar Arrays
                                  With Sparse Aperiodic Layouts for
                                  Processing Broadband Signals via $3$-D
                                  Beamforming  . . . . . . . . . . . . . . 800--815
               J. R. Zapata and   
            M. E. P. Davies and   
                       E. Gomez   Multi-Feature Beat Tracking  . . . . . . 816--825
               A. Narayanan and   
                   Deliang Wang   Investigation of Speech Separation as a
                                  Front-End for Noise Robust Speech
                                  Recognition  . . . . . . . . . . . . . . 826--835
               Xiaojia Zhao and   
                Yuxuan Wang and   
                   Deliang Wang   Robust Speaker Identification in Noisy
                                  and Reverberant Conditions . . . . . . . 836--845
                  S. Cumani and   
                  O. Plchot and   
                      P. Laface   On the use of $i$-vector posterior
                                  distributions in Probabilistic Linear
                                  Discriminant Analysis  . . . . . . . . . 846--857
             Chung-Hsien Wu and   
              Han-Ping Shen and   
                  Yan-Ting Yang   Chinese--English Phone Set Construction
                                  for Code-Switching ASR Using Acoustic
                                  and DNN-Extracted Articulatory Features  858--862
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing EDICS  . . . . . 863--864
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing Information for
                                  Authors  . . . . . . . . . . . . . . . . 865--866
                      Anonymous   Open Access  . . . . . . . . . . . . . . 867--867
                      Anonymous   Publish your article in IEEE Access  . . 868--868
                      Anonymous   [Front cover]  . . . . . . . . . . . . . C1
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing publication
                                  information  . . . . . . . . . . . . . . C2
                      Anonymous   IEEE Signal Processing Society
                                  Information  . . . . . . . . . . . . . . C3
                      Anonymous   [Blank page --- back cover]  . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 22, Number 5, May, 2014

                      Anonymous   Table of Contents  . . . . . . . . . . . 869--870
                      Anonymous   Table of Contents  . . . . . . . . . . . 871--872
               Weibin Zhang and   
                        P. Fung   Discriminatively Trained Sparse Inverse
                                  Covariance Matrices for Speech
                                  Recognition  . . . . . . . . . . . . . . 873--882
                Hung-yi Lee and   
             Sz-Rung Shiang and   
             Ching-Feng Yeh and   
              Yun-Nung Chen and   
                   Yu Huang and   
              Sheng-Yi Kong and   
                   Lin-shan Lee   Spoken Knowledge Organization by
                                  Semantic Structuring and a Prototype
                                  Course Lecture System for Personalized
                                  Learning . . . . . . . . . . . . . . . . 883--898
              L. Zão and   
                  R. Coelho and   
                    P. Flandrin   Speech Enhancement with EMD and
                                  Hurst-Based Mode Selection . . . . . . . 899--911
              D. Giacobello and   
          M. G. Christensen and   
               T. L. Jensen and   
               M. N. Murthi and   
               S. H. Jensen and   
                      M. Moonen   Stable $1$-Norm Error Minimization Based
                                  Linear Predictors for Speech Modeling    912--922
        Y. Lacouture-Parodi and   
            E. A. P. Habets and   
              Jingdong Chen and   
                     J. Benesty   Multichannel Noise Reduction in the
                                  Karhunen--Lo\`eve Expansion Domain . . . 923--936
              S. O. Sadjadi and   
                J. H. L. Hansen   Blind Spectral Weighting for Robust
                                  Speaker Identification under
                                  Reverberation Mismatch . . . . . . . . . 937--945
                 G. Mantena and   
                 S. Achanta and   
                   K. Prahallad   Query-by-Example Spoken Term Detection
                                  using Frequency Domain Linear Prediction
                                  and Non-Segmental Dynamic Time Warping   946--955
               C. Osterwise and   
                    S. L. Grant   On Over-Determined Frequency Domain BSS  956--966
              D. P. Jarrett and   
                 M. Taseska and   
            E. A. P. Habets and   
                   P. A. Naylor   Noise Reduction in the Spherical
                                  Harmonic Domain Using a Tradeoff
                                  Beamformer and Narrowband DOA Estimates  967--978
                  V. Rieser and   
                   O. Lemon and   
                      S. Keizer   Natural Language Generation as
                                  Incremental Planning Under Uncertainty:
                                  Adaptive Information Presentation for
                                  Statistical Dialogue Systems . . . . . . 979--994
                   J. Cheer and   
                  S. J. Elliott   Comments on ``Complete Parallel
                                  Narrowband Active Noise Control
                                  Systems''  . . . . . . . . . . . . . . . 995--996
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing EDICS  . . . . . 999--1000
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing Information for
                                  Authors  . . . . . . . . . . . . . . . . 1001--1002
                      Anonymous   Blank page . . . . . . . . . . . . . . . B997--B998
                      Anonymous   [Front cover]  . . . . . . . . . . . . . C1
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing publication
                                  information  . . . . . . . . . . . . . . C2
                      Anonymous   IEEE Signal Processing Society
                                  Information  . . . . . . . . . . . . . . C3
                      Anonymous   [Blank page --- back cover]  . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 22, Number 6, June, 2014

                      Anonymous   Table of contents  . . . . . . . . . . . 999--1000
                      Anonymous   Table of contents  . . . . . . . . . . . 1001--1002
                   V. Arora and   
                      L. Behera   Musical Source Clustering and
                                  Identification in Polyphonic Audio . . . 1003--1012
                 R. C. Nongpiur   Design of Minimax Broadband Beamformers
                                  that are Robust to Microphone Gain,
                                  Phase, and Position Errors . . . . . . . 1013--1022
            A. Venkitaraman and   
             C. S. Seelamantula   Binaural Signal Processing Motivated
                                  Generalized Analytic Signal Construction
                                  and AM--FM Demodulation  . . . . . . . . 1023--1036
               J. T. Geiger and   
                F. Weninger and   
              J. F. Gemmeke and   
                 M. Wollmer and   
                B. Schuller and   
                      G. Rigoll   Memory-Enhanced Neural Networks and NMF
                                  for Robust ASR . . . . . . . . . . . . . 1037--1046
               Haiquan Zhao and   
                      Yi Yu and   
                 Shibin Gao and   
             Xiangping Zeng and   
                    Zhengyou He   Memory Proportionate APA with Individual
                                  Activation Factors for Acoustic Echo
                                  Cancellation . . . . . . . . . . . . . . 1047--1055
               M. J. Gangeh and   
                  P. Fewzee and   
                  A. Ghodsi and   
                M. S. Kamel and   
                      F. Karray   Multiview Supervised Dictionary Learning
                                  in Speech Emotion Recognition  . . . . . 1056--1068
               Jae-Hun Choi and   
                Joon-Hyuk Chang   Dual-Microphone Voice Activity Detection
                                  Technique Based on Two-Step Power Level
                                  Difference Ratio . . . . . . . . . . . . 1069--1081
          X. Alameda-Pineda and   
                      R. Horaud   A Geometric Approach to Sound Source
                                  Localization from Time-Delay Estimates   1082--1095
                  K. Reindl and   
                   S. Meier and   
                 H. Barfuss and   
                  W. Kellermann   Minimum Mutual Information-Based
                                  Linearly Constrained Broadband Signal
                                  Extraction . . . . . . . . . . . . . . . 1096--1108
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing EDICS  . . . . . 1109--1110
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing Information for
                                  Authors  . . . . . . . . . . . . . . . . 1111--1112
                      Anonymous   [Front cover]  . . . . . . . . . . . . . C1
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing publication
                                  information  . . . . . . . . . . . . . . C2
                      Anonymous   IEEE Signal Processing Society
                                  Information  . . . . . . . . . . . . . . C3
                      Anonymous   [Blank page --- back cover]  . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 22, Number 7, July, 2014

                      Anonymous   Table of Contents  . . . . . . . . . . . 1113--1114
                      Anonymous   Table of Contents  . . . . . . . . . . . 1115--1116
               M. H. Bahari and   
                   N. Dehak and   
               H. Van hamme and   
                  L. Burget and   
                  A. M. Ali and   
                       J. Glass   Non-Negative Factor Analysis of Gaussian
                                  Mixture Model Weight Adaptation for
                                  Language and Dialect Recognition . . . . 1117--1129
              Guangzhao Bao and   
                 Yangfei Xu and   
                     Zhongfu Ye   Learning a Discriminative Dictionary for
                                  Single-Channel Speech Separation . . . . 1130--1138
                I. J. Kelly and   
                   F. M. Boland   Detecting Arrivals in Room Impulse
                                  Responses With Dynamic Time Warping  . . 1139--1147
             M. Guldenschuh and   
                 R. de Callafon   Detection of Secondary-Path
                                  Irregularities in Active Noise Control
                                  Headphones . . . . . . . . . . . . . . . 1148--1157
             Sin-Horng Chen and   
            Chiao-Hua Hsieh and   
             Chen-Yu Chiang and   
             Hsi-Chun Hsiao and   
                Yih-Ru Wang and   
               Yuan-Fu Liao and   
                    Hsiu-Min Yu   Modeling of Speaking Rate Influences on
                                  Mandarin Speech Prosody and Its
                                  Application to Speaking Rate-controlled
                                  TTS  . . . . . . . . . . . . . . . . . . 1158--1171
             D. Comminiello and   
              M. Scarpiniti and   
       L. A. Azpicueta-Ruiz and   
           J. Arenas-Garcia and   
                      A. Uncini   Nonlinear Acoustic Echo Cancellation
                                  Based on Sparse Functional Link
                                  Representations  . . . . . . . . . . . . 1172--1183
                  Wen Zhang and   
               T. D. Abhayapala   Three Dimensional Sound Field
                                  Reproduction using Multiple Circular
                                  Loudspeaker Arrays: Functional Analysis
                                  Guided Approach  . . . . . . . . . . . . 1184--1194
                 M. Taseska and   
                E. A. P. Habets   Informed Spatial Filtering for Sound
                                  Extraction Using Distributed Microphone
                                  Arrays . . . . . . . . . . . . . . . . . 1195--1207
                    Mo Shen and   
                D. Kawahara and   
                   S. Kurohashi   Dependency Parse Reranking with Rich
                                  Subtree Features . . . . . . . . . . . . 1208--1218
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing EDICS  . . . . . 1221--1222
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing Information for
                                  Authors  . . . . . . . . . . . . . . . . 1223--1224
                      Anonymous   Open Access  . . . . . . . . . . . . . . 1225--1225
                      Anonymous   [Blank page] . . . . . . . . . . . . . . B1219--B1220
                      Anonymous   [Front cover]  . . . . . . . . . . . . . C1
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing publication
                                  information  . . . . . . . . . . . . . . C2
                      Anonymous   IEEE Signal Processing Society
                                  Information  . . . . . . . . . . . . . . C3
                      Anonymous   [Blank page --- back cover]  . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 22, Number 8, August, 2014

                      Anonymous   Table of contents  . . . . . . . . . . . 1221--1222
                      Anonymous   Table of contents  . . . . . . . . . . . 1223--1224
                  Zhibao Li and   
               K. F. C. Yiu and   
                    S. Nordholm   On the Indoor Beamformer Design With
                                  Reverberation  . . . . . . . . . . . . . 1225--1235
                M. B. Hawes and   
                        Wei Liu   Sparse Array Design for Wideband
                                  Beamforming With Reduced Complexity in
                                  Tapped Delay-Lines . . . . . . . . . . . 1236--1247
               Yi FanChiang and   
              Cheng-Wen Wei and   
                 Yi-Le Meng and   
                 Yu-Wen Lin and   
               Shyh-Jye Jou and   
              Tian-Sheuan Chang   Low Complexity Formant Estimation
                                  Adaptive Feedback Cancellation for
                                  Hearing Aids Using Pitch Based
                                  Processing . . . . . . . . . . . . . . . 1248--1259
                   S. Conan and   
                 O. Derrien and   
                 M. Aramaki and   
                   S. Ystad and   
           R. Kronland-Martinet   A Synthesis Model With Intuitive Control
                                  Capabilities for Rolling Sounds  . . . . 1260--1273
                 C. Schuldt and   
                      P. Handel   Decay Rate Estimators and Their
                                  Performance for Blind Reverberation Time
                                  Estimation . . . . . . . . . . . . . . . 1274--1284
               S. Ganapathy and   
              S. H. Mallidi and   
                   H. Hermansky   Robust Feature Extraction Using
                                  Modulation Filtering of Autoregressive
                                  Models . . . . . . . . . . . . . . . . . 1285--1295
                      Bo Li and   
                   Khe Chai Sim   A Spectral Masking Approach to
                                  Noise-Robust Speech Recognition Using
                                  Deep Neural Networks . . . . . . . . . . 1296--1305
                  E. Yilmaz and   
              J. F. Gemmeke and   
                   H. Van hamme   Noise Robust Exemplar Matching Using
                                  Sparse Representations of Speech . . . . 1306--1319
                  D. Schmid and   
                  G. Enzner and   
                   S. Malik and   
                 D. Kolossa and   
                      R. Martin   Variational Bayesian Inference for
                                  Multichannel Dereverberation and Noise
                                  Reduction  . . . . . . . . . . . . . . . 1320--1335
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing EDICS  . . . . . 1336--1337
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing Information for
                                  Authors  . . . . . . . . . . . . . . . . 1338--1339
                      Anonymous   Open Access  . . . . . . . . . . . . . . 1340--1340
                      Anonymous   [Front cover]  . . . . . . . . . . . . . C1
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing publication
                                  information  . . . . . . . . . . . . . . C2
                      Anonymous   IEEE Signal Processing Society
                                  Information  . . . . . . . . . . . . . . C3
                      Anonymous   [Blank page --- back cover]  . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 22, Number 9, September, 2014

                      Anonymous   Table of Contents  . . . . . . . . . . . 1341--1342
                      Anonymous   Table of Contents  . . . . . . . . . . . 1343--1344
                 B. Masiero and   
                   M. Vorlander   A Framework for the Calculation of
                                  Dynamic Crosstalk Cancellation Filters   1345--1354
                 A. Schasse and   
                      R. Martin   Estimation of Subband Speech
                                  Correlations for Noise Reduction via
                                  MVDR Processing  . . . . . . . . . . . . 1355--1365
      Michal Novotný and   
                   Jan Rusz and   
             Roman \vCmejla and   
          Ev\vzen R\ru\vzi\vcka   Automatic Evaluation of Articulatory
                                  Disorders in Parkinson's Disease . . . . 1366--1378
                     F. Lim and   
             Wancheng Zhang and   
            E. A. P. Habets and   
                   P. A. Naylor   Robust Multichannel Dereverberation
                                  using Relaxed Multichannel Least Squares 1379--1390
           S. H. Ghalehjegh and   
                     R. C. Rose   Linear Regression Based Acoustic
                                  Adaptation for the Subspace Gaussian
                                  Mixture Model  . . . . . . . . . . . . . 1391--1402
                   J. Botts and   
                     L. Savioja   Spectral and Pseudospectral Properties
                                  of Finite Difference Models Used in
                                  Audio and Room Acoustics . . . . . . . . 1403--1412
                 Yong Xiang and   
           I. Natgunanathan and   
                   Song Guo and   
                Wanlei Zhou and   
                   S. Nahavandi   Patchwork-Based Audio Watermarking
                                  Method Robust to De-synchronization
                                  Attacks  . . . . . . . . . . . . . . . . 1413--1423
               I. V. McLoughlin   Super-Audible Voice Activity Detection   1424--1433
                A. Alinaghi and   
              P. J. Jackson and   
                 Qingju Liu and   
                     Wenwu Wang   Joint Mixing Vector and Binaural Model
                                  Based Stereo Source Separation . . . . . 1434--1448
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing EDICS  . . . . . 1451--1452
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing Information for
                                  Authors  . . . . . . . . . . . . . . . . 1453--1454
                      Anonymous   Open Access  . . . . . . . . . . . . . . 1455--1455
                      Anonymous   Together, we are advancing technology    1456--1456
                      Anonymous   [Blank page] . . . . . . . . . . . . . . B1449--B1450
                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing publication
                                  information  . . . . . . . . . . . . . . C2
                      Anonymous   IEEE Signal Processing Society
                                  Information  . . . . . . . . . . . . . . C3
                      Anonymous   [Blank page --- back cover]  . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 22, Number 10, October, 2014

                      Anonymous   Table of contents  . . . . . . . . . . . 1451--1452
                      Anonymous   Table of contents  . . . . . . . . . . . 1453--1454
                Liheng Zhao and   
                 J. Benesty and   
                  Jingdong Chen   Design of Robust Differential Microphone
                                  Arrays . . . . . . . . . . . . . . . . . 1455--1466
                    P. Jain and   
                  R. B. Pachori   Event-Based Method for Instantaneous
                                  Fundamental Frequency Estimation from
                                  Voiced Speech Based on Eigenvalue
                                  Decomposition of the Hankel Matrix . . . 1467--1482
                 Y. Vaizman and   
                   B. McFee and   
                   G. Lanckriet   Codebook-Based Audio Feature
                                  Representation for Music Information
                                  Retrieval  . . . . . . . . . . . . . . . 1483--1493
                  O. Nadiri and   
                     B. Rafaely   Localization of Multiple Speakers under
                                  High Reverberation using a Spherical
                                  Microphone Array and the Direct-Path
                                  Dominance Test . . . . . . . . . . . . . 1494--1505
                Zhizheng Wu and   
                T. Virtanen and   
             Eng Siong Chng and   
                     Haizhou Li   Exemplar-Based Sparse Representation
                                  With Residual Compensation for Voice
                                  Conversion . . . . . . . . . . . . . . . 1506--1521
             D. S. Talagala and   
                  Wen Zhang and   
               T. D. Abhayapala   Efficient Multi-Channel Adaptive Room
                                  Compensation for Spatial Soundfield
                                  Reproduction Using a Modal Decomposition 1522--1532
             O. Abdel-Hamid and   
              A.-R. Mohamed and   
                  Hui Jiang and   
                    Li Deng and   
                    G. Penn and   
                        Dong Yu   Convolutional Neural Networks for Speech
                                  Recognition  . . . . . . . . . . . . . . 1533--1545
                  S. Koyama and   
                  K. Furuya and   
                Y. Hiwasaki and   
                  Y. Haneda and   
                      Y. Suzuki   Wave Field Reconstruction Filtering in
                                  Cylindrical Harmonic Domain for
                                  With-Height Recording and Reproduction   1546--1557
             Chia-Ping Chen and   
              Yi-Chin Huang and   
             Chung-Hsien Wu and   
                    Kuan-De Lee   Polyglot Speech Synthesis Based on
                                  Cross-Lingual Frame Selection Using
                                  Auditory and Articulatory Features . . . 1558--1570
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing EDICS  . . . . . 1571--1572
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing Information for
                                  Authors  . . . . . . . . . . . . . . . . 1573--1574
                      Anonymous   Open Access  . . . . . . . . . . . . . . 1575--1575
                      Anonymous   Together, we are advancing technology    1576--1576
                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing publication
                                  information  . . . . . . . . . . . . . . C2
                      Anonymous   IEEE Signal Processing Society
                                  Information  . . . . . . . . . . . . . . C3
                      Anonymous   [Blank page --- back cover]  . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 22, Number 11, November, 2014

                      Anonymous   Table of Contents  . . . . . . . . . . . 1577--1578
                      Anonymous   Table of Contents  . . . . . . . . . . . 1579--1580
                    Jian Xu and   
                Zhi-Jie Yan and   
                      Qiang Huo   An Unsupervised Adaptation Approach to
                                  Leveraging Feedback Loop Data by Using
                                  $i$-Vector for Data Clustering and
                                  Selection  . . . . . . . . . . . . . . . 1581--1589
                  S. Cumani and   
                      P. Laface   Large-Scale Training of Pairwise Support
                                  Vector Machines for Speaker Recognition  1590--1600
                     Jun Du and   
                      Qiang Huo   An Improved VTS Feature Compensation
                                  using Mixture Models of Distortion and
                                  IVN Training for Noisy Speech
                                  Recognition  . . . . . . . . . . . . . . 1601--1611
                  M. Togami and   
                   Y. Kawaguchi   Simultaneous Optimization of Acoustic
                                  Echo Reduction, Speech Dereverberation,
                                  and Noise Reduction against Mutual
                                  Interference . . . . . . . . . . . . . . 1612--1623
                 J. Lorente and   
                  M. Ferrer and   
                M. de Diego and   
                    A. Gonzalez   GPU Implementation of Multichannel
                                  Adaptive Algorithms for Local Active
                                  Noise Control  . . . . . . . . . . . . . 1624--1635
                       T. Helie   Simulation of Fractional-Order Low-Pass
                                  Filters  . . . . . . . . . . . . . . . . 1636--1647
                B. Defraene and   
         T. van Waterschoot and   
                   M. Diehl and   
                      M. Moonen   Embedded-Optimization-Based Loudspeaker
                                  Precompensation Using a Hammerstein
                                  Loudspeaker Model  . . . . . . . . . . . 1648--1659
              Guangsen Wang and   
                   Khe Chai Sim   Regression-Based Context-Dependent
                                  Modeling of Deep Neural Networks for
                                  Speech Recognition . . . . . . . . . . . 1660--1669
                  R. Badeau and   
                 M. D. Plumbley   Multichannel High-Resolution NMF for
                                  Modeling Convolutive Mixtures of
                                  Non-Stationary Signals in the
                                  Time-Frequency Domain  . . . . . . . . . 1670--1680
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing EDICS  . . . . . 1683--1684
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing Information for
                                  Authors  . . . . . . . . . . . . . . . . 1685--1686
                      Anonymous   [Blank page] . . . . . . . . . . . . . . B1681--B1682
                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing publication
                                  information  . . . . . . . . . . . . . . C2
                      Anonymous   IEEE Signal Processing Society
                                  Information  . . . . . . . . . . . . . . C3
                      Anonymous   [Blank page --- back cover]  . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 22, Number 12, December, 2014

                      Anonymous   Table of contents  . . . . . . . . . . . 1683--1685
                           Deng   Farewell editorial: Keeping up the
                                  momentum of innovations  . . . . . . . . 1687--1687
                S. H. Yella and   
                    H. Bourlard   Overlapping Speech Detection Using
                                  Long-Term Conversational Features for
                                  Speaker Diarization in Meeting Room
                                  Conversations  . . . . . . . . . . . . . 1688--1700
            R. K. Chivukula and   
               Y. A. Reznik and   
                  Yanyan Hu and   
               V. Devarajan and   
           M. Jayendra-Lakshman   Fast Algorithms for Low-Delay TDAC
                                  Filterbanks in MPEG-4 AAC--ELD . . . . . 1701--1712
                Shaofei Xue and   
             O. Abdel-Hamid and   
                  Hui Jiang and   
                 Lirong Dai and   
                   Qingfeng Liu   Fast Adaptation of Deep Neural Network
                                  Based on Discriminant Codes for Speech
                                  Recognition  . . . . . . . . . . . . . . 1713--1725
            M. E. P. Davies and   
                   P. Hamel and   
                  K. Yoshii and   
                        M. Goto   AutoMashUpper: Automatic Creation of
                                  Multi-Song Music Mashups . . . . . . . . 1726--1737
                  Chao Weng and   
              D. L. Thomson and   
                 P. Haffner and   
                 B.-H. F. Juang   Latent Semantic Rational Kernels for
                                  Topic Spotting on Conversational Speech  1738--1749
               N. Wachowski and   
            M. R. Azimi-Sadjadi   Detection and Classification of
                                  Nonstationary Transient Signals Using
                                  Sparse Approximations and Bayesian
                                  Networks . . . . . . . . . . . . . . . . 1750--1764
                G. Percival and   
                  G. Tzanetakis   Streamlined Tempo Estimation Based on
                                  Autocorrelation and Cross-correlation
                                  With Pulses  . . . . . . . . . . . . . . 1765--1776
               A. Barkefors and   
                 M. Sternad and   
                L.-J. Brannmark   Design and Analysis of Linear Quadratic
                                  Gaussian Feedforward Controllers for
                                  Active Noise Control . . . . . . . . . . 1777--1791
                   M. Cobos and   
         J. J. Perez-Solano and   
          S. Felici-Castell and   
                  J. Segura and   
                  J. M. Navarro   Cumulative-Sum-Based Localization of
                                  Sound Events in Low-Cost Wireless
                                  Acoustic Sensor Networks . . . . . . . . 1792--1802
               V. Tourbabin and   
                     B. Rafaely   Theoretical Framework for the
                                  Optimization of Microphone Array
                                  Configuration for Humanoid Robot
                                  Audition . . . . . . . . . . . . . . . . 1803--1814
                Y. Zakharov and   
               V. H. Nascimento   Sliding-Window RLS Low-Cost
                                  Implementation of Proportionate Affine
                                  Projection Algorithms  . . . . . . . . . 1815--1824
                S. D'Angelo and   
                    V. Valimaki   Generalized Moog Ladder Filter: Part I
                                  --- Linear Analysis and Parameterization 1825--1832
                    Na Yang and   
                      He Ba and   
                Weiyang Cai and   
                I. Demirkol and   
                  W. Heinzelman   BaNa: a Noise Resilient Fundamental
                                  Frequency Detection Algorithm for Speech
                                  and Music  . . . . . . . . . . . . . . . 1833--1848
                Yuxuan Wang and   
               A. Narayanan and   
                   Deliang Wang   On Training Targets for Supervised
                                  Speech Separation  . . . . . . . . . . . 1849--1858
              Ling-Hui Chen and   
              Zhen-Hua Ling and   
                Li-Juan Liu and   
                    Li-Rong Dai   Voice Conversion Using Deep Neural
                                  Networks With Layer-Wise Generative
                                  Training . . . . . . . . . . . . . . . . 1859--1872
                S. D'Angelo and   
                    V. Valimaki   Generalized Moog Ladder Filter: Part II
                                  --- Explicit Nonlinear Model through a
                                  Novel Delay-Free Loop Implementation
                                  Method . . . . . . . . . . . . . . . . . 1873--1883
                   Z. Rafii and   
                Zhiyao Duan and   
                       B. Pardo   Combining Rhythm-Based and Pitch-Based
                                  Methods for Background and Melody
                                  Separation . . . . . . . . . . . . . . . 1884--1893
                    J. Ramo and   
                V. Valimaki and   
                        B. Bank   High-Precision Parallel Graphic
                                  Equalizer  . . . . . . . . . . . . . . . 1894--1904
               Y. Panagakis and   
          C. L. Kotropoulos and   
                     G. R. Arce   Music Genre Classification via Joint
                                  Sparse Low-Rank Representation of Audio
                                  Features . . . . . . . . . . . . . . . . 1905--1917
                 A. Maezawa and   
                 K. Itoyama and   
                  K. Yoshii and   
                    H. G. Okuno   Nonparametric Bayesian Dereverberation
                                  of Power Spectrograms Based on
                                  Infinite-Order Autoregressive Processes  1918--1930
                M. Krawczyk and   
                    T. Gerkmann   STFT Phase Reconstruction in Voiced
                                  Speech for an Improved Single-Channel
                                  Speech Enhancement . . . . . . . . . . . 1931--1940
                V. Khanagha and   
                  K. Daoudi and   
                    H. M. Yahia   Detection of Glottal Closure Instants
                                  Based on the Microcanonical Multiscale
                                  Formalism  . . . . . . . . . . . . . . . 1941--1950
               A. Venturini and   
                     L. Zao and   
                      R. Coelho   On speech features fusion, $ \alpha
                                  $-integration Gaussian modeling and
                                  multi-style training for noise robust
                                  speaker classification . . . . . . . . . 1951--1964
                  P. Foster and   
                   M. Mauch and   
                       S. Dixon   Sequential Complexity as a Descriptor
                                  for Musical Similarity . . . . . . . . . 1965--1977
                   Gang Liu and   
                J. H. L. Hansen   An Investigation into Back-end
                                  Advancements for Speaker Recognition in
                                  Multi-Session and Noisy Enrollment
                                  Scenarios  . . . . . . . . . . . . . . . 1978--1992
                Jitong Chen and   
                Yuxuan Wang and   
                   Deliang Wang   A Feature Study for Classification-Based
                                  Speech Separation at Low Signal-to-Noise
                                  Ratios . . . . . . . . . . . . . . . . . 1993--2002
              J. van Mourik and   
                      D. Murphy   Explicit Higher-Order FDTD Schemes for
                                  $3$D Room Acoustic Simulation  . . . . . 2003--2011
              Pei Chee Yong and   
                S. Nordholm and   
                  Hai Huyen Dam   Effective Binaural Multi-Channel
                                  Processing Algorithm for Improved
                                  Environmental Presence . . . . . . . . . 2012--2024
                    A. Chen and   
         M. A. Hasegawa-Johnson   Mixed Stereo Audio Classification Using
                                  a Stereo-Input Mixed-to-Panned Level
                                  Feature  . . . . . . . . . . . . . . . . 2025--2033
             Gongping Huang and   
                 J. Benesty and   
                   Tao Long and   
                  Jingdong Chen   A Family of Maximum SNR Filters for
                                  Noise Reduction  . . . . . . . . . . . . 2034--2047
                     Su Yan and   
                    Xiaojun Wan   SRRank: Leveraging Semantic Roles for
                                  Extractive Multi-Document Summarization  2048--2058
               H. Tachibana and   
                     N. Ono and   
                 H. Kameoka and   
                    S. Sagayama   Harmonic/Percussive Sound Separation
                                  Based on Anisotropic Smoothness of
                                  Spectrograms . . . . . . . . . . . . . . 2059--2073
            J. M. Gil-Cacho and   
         T. van Waterschoot and   
                  M. Moonen and   
                   S. H. Jensen   A Frequency-Domain Adaptive Filter
                                  (FDAF) Prediction Error Method (PEM)
                                  Framework for Double-Talk-Robust
                                  Acoustic Echo Cancellation . . . . . . . 2074--2086
                    Qi Wang and   
                  W. L. Woo and   
                     S. S. Dlay   Informed Single-Channel Speech
                                  Separation Using HMM--GMM User-Generated
                                  Exemplar Source  . . . . . . . . . . . . 2087--2100
                    D. Erro and   
               T.-C. Zorila and   
                   Y. Stylianou   Enhancing the Intelligibility of
                                  Statistically Generated Synthetic Speech
                                  by Means of Noise-Independent
                                  Modifications  . . . . . . . . . . . . . 2101--2111
                   Yi Jiang and   
               Deliang Wang and   
               Runsheng Liu and   
                  ZhenMing Feng   Binaural Classification for Reverberant
                                  Speech Segregation Using Deep Neural
                                  Networks . . . . . . . . . . . . . . . . 2112--2121
                      Li Su and   
              Hsin-Ming Lin and   
                  Yi-Hsuan Yang   Sparse Modeling of Magnitude and
                                  Phase-Derived Spectra for Playing
                                  Technique Classification . . . . . . . . 2122--2132
                V. V. Reddy and   
             A. W. H. Khong and   
                    Boon Poh Ng   Unambiguous Speech DOA Estimation Under
                                  Spatial Aliasing Conditions  . . . . . . 2133--2145
               A. Mohammadi and   
              S. S. Sarfjoo and   
                   C. Demiroglu   Eigenvoice Speaker Adaptation with
                                  Minimal Data for Statistical Speech
                                  Synthesis Systems Using a MAP Approach
                                  and Nearest-Neighbors  . . . . . . . . . 2146--2157
                    Kun Han and   
                   Deliang Wang   Neural Network Based Pitch Tracking in
                                  Very Noisy Speech  . . . . . . . . . . . 2158--2168
               Yongsheng Mu and   
                 Peifeng Ji and   
                     Wei Ji and   
                    Ming Wu and   
                       Jun Yang   Modeling and Compensation for the
                                  Distortion of Parametric Loudspeakers
                                  Using a One-Dimension Volterra Filter    2169--2181
               O. Thiergart and   
                 M. Taseska and   
                E. A. P. Habets   An Informed Parametric Spatial Filter
                                  Based on Instantaneous
                                  Direction-of-Arrival Estimates . . . . . 2182--2196
               J. F. Santos and   
                     T. H. Falk   Updating the SRMR--CI Metric for
                                  Improved Intelligibility Prediction for
                                  Cochlear Implant Users . . . . . . . . . 2197--2206
               Seon Man Kim and   
                  Hong Kook Kim   Direction-of-Arrival Based SNR
                                  Estimation for Dual-Microphone Speech
                                  Enhancement  . . . . . . . . . . . . . . 2207--2217
                  T. Otsuka and   
                K. Ishiguro and   
                T. Yoshioka and   
                  H. Sawada and   
                    H. G. Okuno   Multichannel Sound Source
                                  Dereverberation and Separation for
                                  Arbitrary Number of Sources Based on
                                  Bayesian Nonparametrics  . . . . . . . . 2218--2232
                    J. Traa and   
                   P. Smaragdis   Multichannel Source Separation and
                                  Tracking With RANSAC and Directional
                                  Statistics . . . . . . . . . . . . . . . 2233--2243
                 Weifeng Li and   
              Longbiao Wang and   
                Yicong Zhou and   
                   J. Dines and   
            M. Magimai-Doss and   
                H. Bourlard and   
                   Qingmin Liao   Feature Mapping of Multiple Beamformed
                                  Sources for Robust Overlapping Speech
                                  Recognition Using a Microphone Array . . 2244--2255
               Y. FanChiang and   
                  C.-W. Wei and   
                 Y.-L. Meng and   
                  Y.-W. Lin and   
                  S.-J. Jou and   
                    T.-S. Chang   Correction to ``Low Complexity Formant
                                  Estimation Adaptive Feedback
                                  Cancellation for Hearing Aids Using
                                  Pitch Based Processing'' [Aug \bf 14
                                  1248--1259]  . . . . . . . . . . . . . . 2256--2256
                      Anonymous   List of Reviewers  . . . . . . . . . . . 2257--2259
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing EDICS  . . . . . 2260--2261
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing Information for
                                  Authors  . . . . . . . . . . . . . . . . 2262--2263
                      Anonymous   2014 Index IEEE/ACM Transactions on
                                  Audio, Speech, and Language Processing
                                  Vol. 22  . . . . . . . . . . . . . . . . 2264--2288
                      Anonymous   [Blank page] . . . . . . . . . . . . . . B1686
                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing publication
                                  information  . . . . . . . . . . . . . . C2
                      Anonymous   IEEE Signal Processing Society
                                  Information  . . . . . . . . . . . . . . C3
                      Anonymous   [Blank page --- back cover]  . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 23, Number 1, January, 2015

                      Anonymous   Table of contents  . . . . . . . . . . . 1--2
                      Anonymous   Table of contents  . . . . . . . . . . . 3--4
                           H Li   Inaugural Editorial: Embracing New
                                  Opportunities for Growth . . . . . . . . 5--6
                    Yong Xu and   
                     Jun Du and   
                Li-Rong Dai and   
                   Chin-Hui Lee   A Regression Approach to Speech
                                  Enhancement Based on Deep Neural
                                  Networks . . . . . . . . . . . . . . . . 7--19
                    H. Phan and   
                    M. Maas and   
                   R. Mazur and   
                     A. Mertins   Random Regression Forests for Acoustic
                                  Event Detection and Classification . . . 20--31
                  Yuntao Wu and   
                    L. Amir and   
               J. R. Jensen and   
                  Guisheng Liao   Joint Pitch and DOA Estimation Using the
                                  ESPRIT Method  . . . . . . . . . . . . . 32--45
              R. Decorsiere and   
   P. L. Sòndergaard and   
            E. N. MacDonald and   
                         T. Dau   Inversion of Auditory Spectrograms,
                                  Traditional Spectrograms, and Other
                                  Envelope Representations . . . . . . . . 46--56
                J. Poignant and   
                L. Besacier and   
               G. Quénot   Unsupervised Speaker Identification in
                                  TV Broadcast Based on Written Names  . . 57--68
                Renjie Tong and   
               Yingyue Zhou and   
                 Long Zhang and   
              Guangzhao Bao and   
                     Zhongfu Ye   A Robust Time-Frequency Decomposition
                                  Model for Suppression of Mixed
                                  Gaussian-Impulse Noise in Audio Signals  69--79
                   S. Ahani and   
            S. Ghaemmaghami and   
                     Z. J. Wang   A Sparse Representation-Based Wavelet
                                  Domain Speech Steganography Method . . . 80--91
               A. Narayanan and   
                   Deliang Wang   Improving Robustness of Deep Neural
                                  Network Acoustic Models via Speech
                                  Separation and Joint Adaptive Training   92--101
                Rongfeng Su and   
                Xunying Liu and   
                       Lan Wang   Automatic Complexity Control of
                                  Generalized Variable Parameter HMMs for
                                  Noise Robust Speech Recognition  . . . . 102--114
               Zixing Zhang and   
                E. Coutinho and   
                   Jun Deng and   
                    B. Schuller   Cooperative Learning and its Application
                                  to Emotion Recognition from Speech . . . 115--126
                 Pei-hao Su and   
              Chuan-hsun Wu and   
                   Lin-shan Lee   A Recursive Dialogue Game for
                                  Personalized Computer-Aided
                                  Pronunciation Training . . . . . . . . . 127--141
           A. Rakotomamonjy and   
                       G. Gasso   Histogram of Gradients of
                                  Time--Frequency Representations for
                                  Audio Scene Classification . . . . . . . 142--153
            S. A. Khoubrouy and   
            I. M. S. Panahi and   
                J. H. L. Hansen   Howling Detection in Hearing Aids Based
                                  on Generalized Teager--Kaiser Operator   154--161
           J. B. B. Nielsen and   
                 J. Nielsen and   
                      J. Larsen   Perception-Based Personalization of
                                  Hearing Aids Using Gaussian Processes
                                  and Active Learning  . . . . . . . . . . 162--173
               J. R. Jensen and   
          M. G. Christensen and   
                 J. Benesty and   
                   S. H. Jensen   Joint Spatio-Temporal Filtering Methods
                                  for DOA and Fundamental Frequency
                                  Estimation . . . . . . . . . . . . . . . 174--185
                  J. Jensen and   
                  Zheng-Hua Tan   Minimum Mean-Square Error Estimation of
                                  Mel-Frequency Cepstral Features --- A
                                  Theoretically Consistent Approach  . . . 186--197
   C.-D. Martinez-Hinarejos and   
               J.-M. Benedi and   
                     V. Tamarit   Unsegmented Dialogue Act Annotation and
                                  Decoding With $N$-Gram Transducers . . . 198--211
                   Lin Wang and   
                   Zhe Chen and   
                    Fuliang Yin   A Novel Hierarchical Decomposition
                                  Vector Quantization Method for
                                  High-Order LPC Parameters  . . . . . . . 212--221
                      Anonymous   [Blank page] . . . . . . . . . . . . . . 222--222
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing EDICS  . . . . . 223--224
                      Anonymous   Information for Authors  . . . . . . . . 225--226
                      Anonymous   Open Access  . . . . . . . . . . . . . . 227--227
                      Anonymous   [Front cover]  . . . . . . . . . . . . . C1
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing publication
                                  information  . . . . . . . . . . . . . . C2
                      Anonymous   IEEE Signal Processing Society
                                  Information  . . . . . . . . . . . . . . C3
                      Anonymous   [Blank page --- back cover]  . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 23, Number 2, February, 2015

                      Anonymous   Table of contents  . . . . . . . . . . . 223--224
                      Anonymous   Table of contents  . . . . . . . . . . . 225--226
                  Guang Hua and   
                     J. Goh and   
                 V. L. L. Thing   Time-Spread Echo-Based Audio
                                  Watermarking With Optimized
                                  Imperceptibility and Robustness  . . . . 227--239
                O. Schwartz and   
                  S. Gannot and   
                E. A. P. Habets   Multi-Microphone Speech Dereverberation
                                  and Noise Reduction Using Relative Early
                                  Transfer Functions . . . . . . . . . . . 240--251
                  E. Molina and   
               L. J. Tardon and   
            A. M. Barbancho and   
                   I. Barbancho   SiPTH: Singing Transcription Based on
                                  Hysteresis Defined on the Pitch-Time
                                  Curve  . . . . . . . . . . . . . . . . . 252--263
               Haipeng Wang and   
                    Tan Lee and   
           Cheung-Chi Leung and   
                     Bin Ma and   
                     Haizhou Li   Acoustic Segment Modeling with Spectral
                                  Clustering Methods . . . . . . . . . . . 264--277
                   V. Arora and   
                      L. Behera   Multiple F0 Estimation and Source
                                  Clustering of Polyphonic Music Audio
                                  Using PLCA and HMRFs . . . . . . . . . . 278--287
                 R. Sugiura and   
                Y. Kamamoto and   
                  N. Harada and   
                 H. Kameoka and   
                      T. Moriya   Resolution Warped Spectral
                                  Representation for Low-Delay and
                                  Low-Bit-Rate Audio Coder . . . . . . . . 288--299
                  Chao Weng and   
                 B.-H. F. Juang   Discriminative Training Using
                                  Non-Uniform Criteria for Keyword
                                  Spotting on Spontaneous Speech . . . . . 300--312
               Y. Matsuyama and   
                   A. Saito and   
                   S. Fujie and   
                   T. Kobayashi   Automatic Expressive Opinion Sentence
                                  Generation for Enjoyable Conversational
                                  Systems  . . . . . . . . . . . . . . . . 313--326
               P. N. Petkov and   
                   W. B. Kleijn   Spectral Dynamics Recovery for Enhanced
                                  Speech Intelligibility in Noise  . . . . 327--338
                  E. Bicici and   
                       D. Yuret   Optimizing Instance Selection for
                                  Statistical Machine Translation with
                                  Feature Decay Algorithms . . . . . . . . 339--350
              Mengqiu Zhang and   
              R. A. Kennedy and   
               T. D. Abhayapala   Empirical Determination of Frequency
                                  Representation in Spherical
                                  Harmonics-Based HRTF Functional Modeling 351--360
                Zu-Ren Feng and   
                  Qing Zhou and   
                  Jun Zhang and   
                 Ping Jiang and   
                   Xue-Wen Yang   A Target Guided Subband Filter for
                                  Acoustic Event Detection in Noisy
                                  Environments Using Wavelet Packets . . . 361--372
                N. Hirayama and   
                 K. Yoshino and   
                 K. Itoyama and   
                    S. Mori and   
                    H. G. Okuno   Automatic Speech Recognition for Mixed
                                  Dialect Utterances by Mixing Dialect
                                  Language Models  . . . . . . . . . . . . 373--382
                 A. Schasse and   
                T. Gerkmann and   
                  R. Martin and   
                  W. Sorgel and   
                 T. Pilgrim and   
                       H. Puder   Two-Stage Filter-Bank System for
                                  Improved Single-Channel Noise Reduction
                                  in Hearing Aids  . . . . . . . . . . . . 383--393
                B. Schwartz and   
                  S. Gannot and   
                E. A. P. Habets   Online Speech Dereverberation Using
                                  Kalman Filter and EM Algorithm . . . . . 394--406
                 B. Gerazov and   
                   Z. Ivanovski   Kernel Power Flow Orientation
                                  Coefficients for Noise-Robust Speech
                                  Recognition  . . . . . . . . . . . . . . 407--419
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing EDICS  . . . . . 420--421
                      Anonymous   Information for Authors  . . . . . . . . 422--423
                      Anonymous   Open Access  . . . . . . . . . . . . . . 424--424
                      Anonymous   [Front cover]  . . . . . . . . . . . . . C1
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing publication
                                  information  . . . . . . . . . . . . . . C2
                      Anonymous   IEEE Signal Processing Society
                                  Information  . . . . . . . . . . . . . . C3
                      Anonymous   [Blank page --- back cover]  . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 23, Number 3, March, 2015

                      Anonymous   Table of Contents  . . . . . . . . . . . 425--426
                      H. Li and   
                M. Federico and   
                      X. He and   
                    H. Meng and   
                    I. Trancoso   Introduction to the Special Section on
                                  Continuous Space and Related Methods in
                                  Natural Language Processing  . . . . . . 427--430
                    H. Adel and   
              Ngoc Thang Vu and   
               K. Kirchhoff and   
                  D. Telaar and   
                     T. Schultz   Syntactic and Semantic Features For
                                  Code-Switching Factored Language Models  431--440
              Xiaodong Zeng and   
                 D. F. Wong and   
                 L. S. Chao and   
                    I. Trancoso   Graph-Based Lexicon Regularization for
                                  PCFG With Latent Annotations . . . . . . 441--450
              Wenliang Chen and   
                  Min Zhang and   
                      Yue Zhang   Distributed Feature Representations for
                                  Dependency Parsing . . . . . . . . . . . 451--460
                   Ruiji Fu and   
                  Jiang Guo and   
                   Bing Qin and   
               Wanxiang Che and   
               Haifeng Wang and   
                       Ting Liu   Learning Semantic Hierarchies: a
                                  Continuous Vector Space Approach . . . . 461--471
               R. E. Banchs and   
               L. F. D'Haro and   
                     Haizhou Li   Adequacy--Fluency Metrics: Evaluating MT
                                  in the Continuous Space Model Framework  472--482
                 Deyi Xiong and   
                  Min Zhang and   
                      Xing Wang   Topic-Based Coherence Modeling for
                                  Statistical Machine Translation  . . . . 483--493
              B. Hutchinson and   
               M. Ostendorf and   
                       M. Fazel   A Sparse Plus Low-Rank Exponential
                                  Language Model for Limited Resource
                                  Scenarios  . . . . . . . . . . . . . . . 494--504
           M. A. A. Rashwan and   
            A. A. Al Sallab and   
               H. M. Raafat and   
                       A. Rafea   Deep Learning Framework with Confused
                                  Sub-Set Resolution Architecture for
                                  Automatic Arabic Diacritization  . . . . 505--516
             M. Sundermeyer and   
                     H. Ney and   
                    R. Schluter   From Feedforward to Recurrent LSTM
                                  Neural Networks for Language Modeling    517--529
                  G. Mesnil and   
                 Y. Dauphin and   
               Kaisheng Yao and   
                  Y. Bengio and   
                    Li Deng and   
             D. Hakkani-Tur and   
                Xiaodong He and   
                    L. Heck and   
                     G. Tur and   
                    Dong Yu and   
                       G. Zweig   Using Recurrent Neural Networks for Slot
                                  Filling in Spoken Language Understanding 530--539
              I. McLoughlin and   
               Haomin Zhang and   
                Zhipeng Xie and   
                   Yan Song and   
                       Wei Xiao   Robust Sound Event Classification Using
                                  Deep Neural Networks . . . . . . . . . . 540--552
              D. Zahoransky and   
                     I. Polasek   Text Search of Surnames in Some Slavic
                                  and Other Morphologically Rich Languages
                                  Using Rule Based Phonetic Algorithms . . 553--563
              Yow-Bang Wang and   
                   Lin-shan Lee   Supervised Detection and Unsupervised
                                  Discovery of Pronunciation Error
                                  Patterns for Computer-Assisted Language
                                  Learning . . . . . . . . . . . . . . . . 564--579
               T. Nakashika and   
               T. Takiguchi and   
                       Y. Ariki   Voice Conversion Using RNN Pre-Trained
                                  by Recurrent Temporal Restricted
                                  Boltzmann Machines . . . . . . . . . . . 580--587
                    N. Obin and   
                  P. Lanchantin   Symbolic Modeling of Prosody: From
                                  Linguistics to Statistics  . . . . . . . 588--599
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing EDICS  . . . . . 601--602
                      Anonymous   Information for Authors  . . . . . . . . 603--604
                      Anonymous   IEEE Member Digital Library  . . . . . . 606--606
                      Anonymous   Blank page . . . . . . . . . . . . . . . B600
                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing publication
                                  information  . . . . . . . . . . . . . . C2
                      Anonymous   IEEE Signal Processing Society
                                  Information  . . . . . . . . . . . . . . C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 23, Number 4, April, 2015

                      Anonymous   Table of Contents  . . . . . . . . . . . 601--602
                      Anonymous   Table of Contents  . . . . . . . . . . . 603--604
              Langzhou Chen and   
          N. Braunschweiler and   
                 M. J. F. Gales   Speaker and Expression Factorization for
                                  Audiobook Data: Expressiveness and
                                  Transplantation  . . . . . . . . . . . . 605--618
                Xinjie Zhou and   
                Xiaojun Wan and   
                   Jianguo Xiao   CLOpinionMiner: Opinion Target
                                  Extraction in a Cross-Language Scenario  619--630
                   Pan Zhou and   
                  Hui Jiang and   
                Li-Rong Dai and   
                      Yu Hu and   
                  Qing-Feng Liu   State-Clustering Based Multiple Deep
                                  Neural Networks Modeling Approach for
                                  Speech Recognition . . . . . . . . . . . 631--642
                    Ying Hu and   
                   Guizhong Liu   Separation of Singing Voice Using
                                  Nonnegative Matrix Partial
                                  Co-Factorization for Singer
                                  Identification . . . . . . . . . . . . . 643--653
                D. Kitamura and   
              H. Saruwatari and   
                 H. Kameoka and   
              Yu. Takahashi and   
                   K. Kondo and   
                    S. Nakamura   Multichannel Signal Separation Combining
                                  Directional Clustering and Nonnegative
                                  Matrix Factorization with Spectrogram
                                  Restoration  . . . . . . . . . . . . . . 654--669
              Van-Khanh Mai and   
                  D. Pastor and   
            A. Aissa-El-Bey and   
                    R. Le-Bidan   Robust Estimation of Non-Stationary
                                  Noise Power Spectrum for Speech
                                  Enhancement  . . . . . . . . . . . . . . 670--682
                  E. Blanco and   
                    D. Moldovan   A Semantic Logic-Based Approach to
                                  Determine Textual Similarity . . . . . . 683--693
             Myung Jong Kim and   
              Younggwan Kim and   
                     Hoirin Kim   Automatic Intelligibility Assessment of
                                  Dysarthric Speech Using
                                  Phonologically-Structured Sparse Linear
                                  Model  . . . . . . . . . . . . . . . . . 694--704
                  G. Aneeja and   
               B. Yegnanarayana   Single Frequency Filtering Approach for
                                  Discriminating Speech and Nonspeech  . . 705--717
               A. Deleforge and   
                  R. Horaud and   
            Y. Y. Schechner and   
                       L. Girin   Co-Localization of Audio Sources in
                                  Images Using Binaural Features and
                                  Locally-Linear Regression  . . . . . . . 718--731
                     D. Dov and   
                  R. Talmon and   
                       I. Cohen   Audio-Visual Voice Activity Detection
                                  Using Diffusion Maps . . . . . . . . . . 732--745
                  M. Habibi and   
               A. Popescu-Belis   Keyword Extraction and Clustering for
                                  Document Recommendation in Conversations 746--759
                   N. Mamun and   
               W. A. Jassim and   
                M. S. A. Zilany   Prediction of Speech Intelligibility
                                  Using a Neurogram Orthogonal Polynomial
                                  Measure (NOPM) . . . . . . . . . . . . . 760--773
                 E. De Sena and   
               N. Antonello and   
                  M. Moonen and   
             T. van Waterschoot   On the Modeling of Rectangular
                                  Geometries in Room Acoustic Simulations  774--786
                  Hao Huang and   
                  Haihua Xu and   
               Xianhui Wang and   
                      W. Silamu   Maximum F1-Score Discriminative Training
                                  Criterion for Automatic Mispronunciation
                                  Detection  . . . . . . . . . . . . . . . 787--797
             Chung-Che Wang and   
                  J.-S. R. Jang   Improving Query-by-Singing/Humming by
                                  Combining Melody and Lyric Information   798--806
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing EDICS  . . . . . 807--808
                      Anonymous   Information for Authors  . . . . . . . . 809--810
                      Anonymous   IEEE Member Digital Library  . . . . . . 812--812
                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing publication
                                  information  . . . . . . . . . . . . . . C2
                      Anonymous   IEEE Signal Processing Society
                                  Information  . . . . . . . . . . . . . . C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 23, Number 5, May, 2015

                      Anonymous   Table of Contents  . . . . . . . . . . . 813--814
                      Anonymous   Table of Contents  . . . . . . . . . . . 815--816
                   F. Krebs and   
               A. Holzapfel and   
               A. T. Cemgil and   
                      G. Widmer   Inferring Metrical Structure in Music
                                  Using Particle Filters . . . . . . . . . 817--827
               Janghoon Cho and   
                      C. D. Yoo   Underdetermined Convolutive BSS: Bayes
                                  Risk Minimization Based on a Mixture of
                                  Super-Gaussian Posterior Approximation   828--839
                     Hao Mu and   
              Woon-Seng Gan and   
                    Ee-Leng Tan   An Objective Analysis Method for
                                  Perceptual Quality of a Virtual Bass
                                  System . . . . . . . . . . . . . . . . . 840--850
             R. C. Hendriks and   
               J. B. Crespo and   
                  J. Jensen and   
                     C. H. Taal   Optimal Near-End Speech Intelligibility
                                  Improvement Incorporating Additive Noise
                                  and Late Reverberation Under an
                                  Approximation of the Short-Time SII  . . 851--862
            A. H. Abdelaziz and   
                  S. Zeiler and   
                     D. Kolossa   Learning Dynamic Stream Weights For
                                  Coupled-HMM-Based Audio-Visual Speech
                                  Recognition  . . . . . . . . . . . . . . 863--876
                  R. Berkun and   
                   I. Cohen and   
                     J. Benesty   Combined Beamformers for Robust
                                  Broadband Regularized Superdirective
                                  Beamforming  . . . . . . . . . . . . . . 877--886
                   J. Breebaart   Evaluation of Statistical Inference
                                  Tests Applied to Subjective Audio
                                  Quality Data With Small Sample Size  . . 887--897
                 M. Zivanovi\'c   Harmonic Bandwidth Companding for
                                  Separation of Overlapping Harmonics in
                                  Pitched Signals  . . . . . . . . . . . . 898--908
                Jen-Tzung Chien   Laplace Group Sensing for Acoustic
                                  Models . . . . . . . . . . . . . . . . . 909--922
                   Ying Wei and   
                   Yinfeng Wang   Design of Low Complexity Adjustable
                                  Filter Bank for Personalized Hearing Aid
                                  Solutions  . . . . . . . . . . . . . . . 923--931
          A. Perez-Carrillo and   
                M. M. Wanderley   Indirect Acquisition of Violin
                                  Instrumental Controls from Audio Signal
                                  with Hidden Markov Models  . . . . . . . 932--940
           A. Mansikkaniemi and   
                      M. Kurimo   Adaptation of Morph-Based Speech
                                  Recognition for Foreign Names and
                                  Acronyms . . . . . . . . . . . . . . . . 941--950
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing EDICS  . . . . . 953--954
                      Anonymous   Information for Authors  . . . . . . . . 955--956
                      Anonymous   Open Access  . . . . . . . . . . . . . . 957--957
                      Anonymous   Blank page . . . . . . . . . . . . . . . B951--B952
                      Anonymous   [Front cover]  . . . . . . . . . . . . . C1
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing publication
                                  information  . . . . . . . . . . . . . . C2
                      Anonymous   IEEE Signal Processing Society
                                  Information  . . . . . . . . . . . . . . C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 23, Number 6, June, 2015

                      Anonymous   Table of Contents  . . . . . . . . . . . 953--954
                      Anonymous   Table of Contents  . . . . . . . . . . . 955--956
              Shih-Hung Liu and   
               Kuan-Yu Chen and   
                    B. Chen and   
              Hsin-Min Wang and   
               Hsu-Chun Yen and   
                   Wen-Lian Hsu   Combining Relevance Language Modeling
                                  and Clarity Measure for Extractive
                                  Speech Summarization . . . . . . . . . . 957--969
             M. Niedzwiecki and   
                  M. Ciolek and   
                    K. Cisowski   Elimination of Impulsive Disturbances
                                  From Stereo Audio Recordings Using
                                  Vector Autoregressive Modeling and
                                  Variable-order Kalman Filtering  . . . . 970--981
                    Kun Han and   
                Yuxuan Wang and   
               Deliang Wang and   
                W. S. Woods and   
                   I. Merks and   
                      Tao Zhang   Learning Spectral Mapping for Speech
                                  Dereverberation and Denoising  . . . . . 982--992
                  P. Foster and   
                   S. Dixon and   
                     A. Klapuri   Identifying Cover Songs Using
                                  Information-Theoretic Measures of
                                  Similarity . . . . . . . . . . . . . . . 993--1005
                 A. Schwarz and   
                  W. Kellermann   Coherent-to-Diffuse Power Ratio
                                  Estimation for Dereverberation . . . . . 1006--1018
                  M. Cernak and   
               P. N. Garner and   
               A. Lazaridis and   
                P. Motlicek and   
                      Xingyu Na   Incremental Syllable-Context Phonetic
                                  Vocoding . . . . . . . . . . . . . . . . 1019--1030
                 M. Rouvier and   
                    S. Oger and   
                 G. Linares and   
                 D. Matrouf and   
                B. Merialdo and   
                          Y. Li   Audio-Based Video Genre Identification   1031--1041
                 H. Kameoka and   
               K. Yoshizato and   
                T. Ishihara and   
                K. Kadowaki and   
                  Y. Ohishi and   
                     K. Kashino   Generative Modeling of Voice Fundamental
                                  Frequency Contours . . . . . . . . . . . 1042--1053
           Dejan Markovi\'c and   
            Fabio Antonacci and   
              Augusto Sarti and   
                 Stefano Tubaro   Multiview Soundfield Imaging in the
                                  Projective Ray Space . . . . . . . . . . 1054--1067
                A. P. Bates and   
                  Z. Khalid and   
                  R. A. Kennedy   Novel Sampling Scheme on the Sphere for
                                  Head-Related Transfer Function
                                  Measurements . . . . . . . . . . . . . . 1068--1081
                Maoshen Jia and   
                  Ziyu Yang and   
              Changchun Bao and   
              Xiguang Zheng and   
                        C. Ritz   Encoding Multiple Audio Objects Using
                                  Intra-Object Sparsity  . . . . . . . . . 1082--1095
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing EDICS  . . . . . 1096--1097
                      Anonymous   Information for Authors  . . . . . . . . 1098--1099
                      Anonymous   Open Access  . . . . . . . . . . . . . . 1100--1100
                      Anonymous   [Front cover]  . . . . . . . . . . . . . C1
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing publication
                                  information  . . . . . . . . . . . . . . C2
                      Anonymous   IEEE Signal Processing Society
                                  Information  . . . . . . . . . . . . . . C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 23, Number 7, July, 2015

                      Anonymous   Table of Contents  . . . . . . . . . . . 1101--1102
                      Anonymous   Table of Contents  . . . . . . . . . . . 1103--1104
                 M. McVicar and   
                S. Fukayama and   
                        M. Goto   AutoGuitarTab: Computer-Aided
                                  Composition of Rhythm and Lead Guitar
                                  Parts in the Tablature Space . . . . . . 1105--1117
           M. Van Segbroeck and   
                 R. Travadi and   
                S. S. Narayanan   Rapid Language Identification  . . . . . 1118--1129
                 D. Marelli and   
             R. Baumgartner and   
                      P. Majdak   Efficient Approximation of Head-Related
                                  Transfer Functions in Subbands for
                                  Accurate Sound Localization  . . . . . . 1130--1143
             Ching-Feng Yeh and   
                   Lin-shan Lee   An Improved Framework for Recognizing
                                  Highly Imbalanced Bilingual
                                  Code-Switched Lectures with
                                  Cross-Language Acoustic Modeling and
                                  Frame-Level Language Identification  . . 1144--1159
                 D. Basaran and   
               A. T. Cemgil and   
                      E. Anarim   A Probabilistic Model-Based Approach for
                                  Aligning Multiple Audio Sequences  . . . 1160--1171
              Dongpeng Chen and   
                   B. K.-W. Mak   Multitask Learning of Deep Neural
                                  Networks for Low-Resource Speech
                                  Recognition  . . . . . . . . . . . . . . 1172--1183
                   T. Meyer and   
                N. Hajlaoui and   
               A. Popescu-Belis   Disambiguating Discourse Connectives for
                                  Statistical Machine Translation  . . . . 1184--1197
                   U. Remes and   
           A. Ramirez Lopez and   
                K. Palomaki and   
                      M. Kurimo   Bounded Conditional Mean Imputation with
                                  Observation Uncertainties and Acoustic
                                  Model Adaptation . . . . . . . . . . . . 1198--1208
                   Rui Wang and   
                   Hai Zhao and   
               Bao-Liang Lu and   
                 M. Utiyama and   
                      E. Sumita   Bilingual Continuous-Space Language
                                  Model Growing for Statistical Machine
                                  Translation  . . . . . . . . . . . . . . 1209--1220
            Tze Yuang Chong and   
               R. E. Banchs and   
             Eng Siong Chng and   
                     Haizhou Li   Decoupling Word-Pair Distance and
                                  Co-occurrence Information for Effective
                                  Long History Context Language Modeling   1221--1232
                   Meng Sun and   
                   Yinan Li and   
              J. F. Gemmeke and   
                 Xiongwei Zhang   Speech Enhancement Under Low SNR
                                  Conditions Via Noise Estimation Using
                                  Sparse and Low-Rank NMF with
                                  Kullback--Leibler Divergence . . . . . . 1233--1242
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing EDICS  . . . . . 1245--1246
                      Anonymous   Information for Authors  . . . . . . . . 1247--1248
                      Anonymous   Blank page . . . . . . . . . . . . . . . B1243--B1244
                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing publication
                                  information  . . . . . . . . . . . . . . C2
                      Anonymous   IEEE Signal Processing Society
                                  Information  . . . . . . . . . . . . . . C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 23, Number 8, August, 2015

                      Anonymous   Table of Contents  . . . . . . . . . . . 1245--1246
                      Anonymous   Table of Contents  . . . . . . . . . . . 1247--1248
                  H. Momeni and   
            H. R. Abutalebi and   
                     A. Tadaion   Joint Detection and Estimation of Speech
                                  Spectral Amplitude Using Noncontinuous
                                  Gain Functions . . . . . . . . . . . . . 1249--1258
                Jen-Tzung Chien   Hierarchical Pitman--Yor--Dirichlet
                                  Language Model . . . . . . . . . . . . . 1259--1272
              M. Fallahpour and   
                      D. Megias   Audio Watermarking Based on Fibonacci
                                  Numbers  . . . . . . . . . . . . . . . . 1273--1282
                 P. Mowlaee and   
                      J. Kulmer   Phase Estimation in Single-Channel
                                  Speech Enhancement: Limits-Potential . . 1283--1294
                 M. Morchid and   
              M. Bouallegue and   
                  R. Dufour and   
                 G. Linares and   
                 D. Matrouf and   
                     R. De Mori   Compact Multiview Representation of
                                  Documents Based on the Total Variability
                                  Space  . . . . . . . . . . . . . . . . . 1295--1308
                 R. Sugiura and   
                Y. Kamamoto and   
                  N. Harada and   
                 H. Kameoka and   
                      T. Moriya   Optimal Coding of
                                  Generalized-Gaussian-Distributed
                                  Frequency Spectra for Low-Delay Audio
                                  Coder With Powered All-Pole Spectrum
                                  Estimation . . . . . . . . . . . . . . . 1309--1321
               Kuan-Yu Chen and   
              Shih-Hung Liu and   
                    B. Chen and   
              Hsin-Min Wang and   
                  Ea-Ee Jan and   
               Wen-Lian Hsu and   
                  Hsin-Hsi Chen   Extractive Broadcast News Summarization
                                  Leveraging Recurrent Neural Network
                                  Language Modeling Techniques . . . . . . 1322--1334
               Z. Koldovsky and   
                   J. Malek and   
                      S. Gannot   Spatial Source Subtraction Based on
                                  Incomplete Measurements of Relative
                                  Transfer Function  . . . . . . . . . . . 1335--1347
             D. Dimitriadis and   
                   E. Bocchieri   Use of Micro-Modulation Features in
                                  Large Vocabulary Continuous Speech
                                  Recognition Tasks  . . . . . . . . . . . 1348--1357
                   Xun Wang and   
                 Y. Yoshida and   
                   T. Hirao and   
                  M. Nagata and   
                       K. Sudoh   Summarization Based on Task-Oriented
                                  Discourse Parsing  . . . . . . . . . . . 1358--1367
                     C. Spa and   
                     A. Rey and   
                   E. Hernandez   A GPU Implementation of an Explicit
                                  Compact FDTD Algorithm with a Digital
                                  Impedance Filter for Room Acoustics
                                  Applications . . . . . . . . . . . . . . 1368--1380
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing EDICS  . . . . . 1381--1382
                      Anonymous   Information for Authors  . . . . . . . . 1383--1384
                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing publication
                                  information  . . . . . . . . . . . . . . C2
                      Anonymous   IEEE Signal Processing Society
                                  Information  . . . . . . . . . . . . . . C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 23, Number 9, September, 2015

                      Anonymous   Table of Contents  . . . . . . . . . . . 1385--1386
                      Anonymous   Table of Contents  . . . . . . . . . . . 1387--1388
               Lin-shan Lee and   
                   J. Glass and   
                Hung-yi Lee and   
                   Chun-an Chan   Spoken Content Retrieval --- Beyond
                                  Cascading Speech Recognition with Text
                                  Retrieval  . . . . . . . . . . . . . . . 1389--1420
                Yishan Jiao and   
                 V. Berisha and   
                    Ming Tu and   
                        J. Liss   Convex Weighting Criteria for Speaking
                                  Rate Estimation  . . . . . . . . . . . . 1421--1430
                 Jianjun He and   
              Woon-Seng Gan and   
                    Ee-Leng Tan   Primary-Ambient Extraction Using Ambient
                                  Spectrum Estimation for Immersive
                                  Spatial Audio Reproduction . . . . . . . 1431--1444
                  Qing Shen and   
                    Wei Liu and   
                    Wei Cui and   
                 Siliang Wu and   
                Y. D. Zhang and   
                     M. G. Amin   Low-Complexity Direction-of-Arrival
                                  Estimation Based on Wideband Co-Prime
                                  Arrays . . . . . . . . . . . . . . . . . 1445--1456
               Yu-Ren Chien and   
              Hsin-Min Wang and   
                 Shyh-Kang Jeng   An Acoustic-Phonetic Model of F0
                                  Likelihood for Vocal Melody Extraction   1457--1468
               Xiaodong Cui and   
                    V. Goel and   
                   B. Kingsbury   Data Augmentation for Deep Neural
                                  Network Acoustic Modeling  . . . . . . . 1469--1477
                 E. De Sena and   
  H. Hacìhabibo\uglu and   
             Z. Cvetkovi\'c and   
                    J. O. Smith   Efficient Synthesis of Room Acoustics
                                  via Scattering Delay Networks  . . . . . 1478--1492
                   Lin Wang and   
                T. Gerkmann and   
                       S. Doclo   Noise Power Spectral Density Estimation
                                  Using MaxNSR Blocking Matrix . . . . . . 1493--1508
                   A. Jukic and   
         T. van Waterschoot and   
                T. Gerkmann and   
                       S. Doclo   Multi-Channel Linear Prediction-Based
                                  Speech Dereverberation With Sparse
                                  Priors . . . . . . . . . . . . . . . . . 1509--1520
                 P. Mowlaee and   
                      J. Kulmer   Harmonic Phase Estimation in
                                  Single-Channel Speech Enhancement Using
                                  Phase Decomposition and SNR Information  1521--1532
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing EDICS  . . . . . 1535--1536
                      Anonymous   Information for Authors  . . . . . . . . 1537--1538
                      Anonymous   How can you get your idea to market
                                  first? . . . . . . . . . . . . . . . . . 1539--1539
                      Anonymous   Blank page . . . . . . . . . . . . . . . B1533--B1534
                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing publication
                                  information  . . . . . . . . . . . . . . C2
                      Anonymous   IEEE Signal Processing Society
                                  Information  . . . . . . . . . . . . . . C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 23, Number 10, October, 2015

                      Anonymous   Table of Contents  . . . . . . . . . . . 1535--1536
                      Anonymous   Table of Contents  . . . . . . . . . . . 1537--1538
                   S. Tervo and   
                     A. Politis   Direction of Arrival Estimation of
                                  Reflections from Room Impulse Responses
                                  Using a Spherical Microphone Array . . . 1539--1551
             Jia-Ching Wang and   
                Yu-Hao Chin and   
                Bo-Wei Chen and   
             Chang-Hong Lin and   
                 Chung-Hsien Wu   Speech Emotion Verification Using
                                  Emotion Variance Modeling and
                                  Discriminant Scale-Frequency Maps  . . . 1552--1562
                A. Canclini and   
               P. Bestagini and   
               F. Antonacci and   
              M. Compagnoni and   
                   A. Sarti and   
                      S. Tubaro   A Robust and Low-Complexity Source
                                  Localization Algorithm for Asynchronous
                                  Distributed Microphone Networks  . . . . 1563--1575
                 Jianjun He and   
              Woon-Seng Gan and   
                    Ee-Leng Tan   Time-Shifting Based Primary-Ambient
                                  Extraction for Spatial Audio
                                  Reproduction . . . . . . . . . . . . . . 1576--1588
                    P. Shah and   
                   I. Lewis and   
                   S. Grant and   
                   S. Angrignon   Nonlinear Acoustic Echo Cancellation
                                  Using Voltage and Current Feedback . . . 1589--1599
                      Li Su and   
                  Yi-Hsuan Yang   Combining Spectral and Temporal
                                  Representations for Multipitch
                                  Estimation of Polyphonic Music . . . . . 1600--1612
                 T. Fujioka and   
                  Y. Nagata and   
                         M. Abe   High-Precision Harmonic Distortion Level
                                  Measurement of a Loudspeaker Using
                                  Adaptive Filters in a Noisy Environment  1613--1622
                Tsz-Kin Hon and   
                   Lin Wang and   
                J. D. Reiss and   
                   A. Cavallaro   Audio Fingerprinting for Multi-Device
                                  Self-Localization  . . . . . . . . . . . 1623--1636
                    Ye Tian and   
                   Zhe Chen and   
                    Fuliang Yin   Distributed IMM-Unscented Kalman Filter
                                  for Speaker Tracking in Microphone Array
                                  Networks . . . . . . . . . . . . . . . . 1637--1647
                      Na Li and   
                    Man-Wai Mak   SNR-Invariant PLDA Modeling in
                                  Nonparametric Subspace for Robust
                                  Speaker Verification . . . . . . . . . . 1648--1659
                 J. Vilkamo and   
            S. Delikaris-Manias   Perceptual Reproduction of Spatial Sound
                                  Using Loudspeaker-Signal-Domain
                                  Parametrization  . . . . . . . . . . . . 1660--1669
                  Chao Weng and   
                    Dong Yu and   
              M. L. Seltzer and   
                      J. Droppo   Deep Neural Networks for Single-Channel
                                  Multi-Talker Speech Recognition  . . . . 1670--1679
                 M. Ruhland and   
                  J. Bitzer and   
                  M. Brandt and   
                      S. Goetze   Reduction of Gaussian, Supergaussian,
                                  and Impulsive Noise by Interpolation of
                                  the Binary Mask Residual . . . . . . . . 1680--1691
                  Y. Dorfan and   
                      S. Gannot   Tree-Based Recursive
                                  Expectation-Maximization Algorithm for
                                  Localization of Acoustic Sources . . . . 1692--1703
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing EDICS  . . . . . 1704--1705
                      Anonymous   Information for Authors  . . . . . . . . 1706--1707
                      Anonymous   How can you get your idea to market
                                  first? . . . . . . . . . . . . . . . . . 1708--1708
                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing publication
                                  information  . . . . . . . . . . . . . . C2
                      Anonymous   IEEE Signal Processing Society
                                  Information  . . . . . . . . . . . . . . C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 23, Number 11, November, 2015

               A. Sarmiento and   
              I. Duran-Diaz and   
                A. Cichocki and   
                      S. Cruces   A Contrast Function Based on Generalized
                                  Divergences for Solving the Permutation
                                  Problem in Convolved Speech Mixtures . . 1713--1726
               Xiaojia Zhao and   
                Yuxuan Wang and   
                   Deliang Wang   Cochannel Speaker Identification in
                                  Anechoic and Reverberant Conditions  . . 1727--1736
              Liang-Yu Chen and   
                  J.-S. R. Jang   Automatic Pronunciation Scoring with
                                  Score Combination by Learning to Rank
                                  and Class-Normalized DP-Based
                                  Quantization . . . . . . . . . . . . . . 1737--1749
                  Duyu Tang and   
                   Bing Qin and   
                   Furu Wei and   
                    Li Dong and   
                   Ting Liu and   
                      Ming Zhou   A Joint Segmentation and Classification
                                  Framework for Sentence Level Sentiment
                                  Classification . . . . . . . . . . . . . 1750--1761
             F.-M. Hoffmann and   
                     F. M. Fazi   Theoretical Study of Acoustic Circular
                                  Arrays With Tangential Pressure Gradient
                                  Sensors  . . . . . . . . . . . . . . . . 1762--1774
       N. Souviraa-Labastie and   
                 A. Olivero and   
                 E. Vincent and   
                      F. Bimbot   Multi-Channel Audio Source Separation
                                  Using Multiple Deformed References . . . 1775--1787
                    D. Baby and   
                T. Virtanen and   
              J. F. Gemmeke and   
                   H. Van hamme   Coupled Dictionaries for Exemplar-Based
                                  Speech Enhancement and Automatic Speech
                                  Recognition  . . . . . . . . . . . . . . 1788--1799
                M. T. Islam and   
                 C. Shahnaz and   
               Wei-Ping Zhu and   
                    M. O. Ahmad   Speech Enhancement Based on Student
                                  Modeling of Teager Energy Operated
                                  Perceptual Wavelet Packet Coefficients
                                  and a Custom Thresholding Function . . . 1800--1811
          Quynh Thi Ngoc Do and   
                 S. Bethard and   
                    M.-F. Moens   Domain Adaptation in Semantic Role
                                  Labeling Using a Neural Language Model
                                  and Linguistic Resources . . . . . . . . 1812--1823
                H. Aragonda and   
             C. S. Seelamantula   Demodulation of Narrowband Speech
                                  Spectrograms Using the Riesz Transform   1824--1834
                 D. T. Tran and   
                 E. Vincent and   
                      D. Jouvet   Nonparametric Uncertainty Estimation and
                                  Propagation for Noise Robust ASR . . . . 1835--1846
                     Mei Tu and   
                    Yu Zhou and   
                 Chengqing Zong   Exploring Diverse Features for
                                  Statistical Machine Translation Model
                                  Pruning  . . . . . . . . . . . . . . . . 1847--1857
                  G. Okopal and   
                  S. Wisdom and   
                       L. Atlas   Speech Analysis With the Strong
                                  Uncorrelating Transform  . . . . . . . . 1858--1868
         M. F. Simon Galvez and   
              S. J. Elliott and   
                       J. Cheer   Time Domain Optimization of Filters Used
                                  in a Loudspeaker Array for Personal
                                  Audio  . . . . . . . . . . . . . . . . . 1869--1878
               M. H. Bokaei and   
                  H. Sameti and   
                       Yang Liu   Linear Discourse Segmentation of
                                  Multi-Party Meetings Based on Local and
                                  Global Information . . . . . . . . . . . 1879--1891
             Chung-Hsien Wu and   
              Han-Ping Shen and   
                  Chun-Shan Hsu   Code-Switching Event Detection by Using
                                  a Latent Language Space Model and the
                                  Delta-Bayesian Information Criterion . . 1892--1903
               Zhangli Chen and   
                     V. Hohmann   Online Monaural Speech Enhancement Based
                                  on Periodicity Analysis and A Priori SNR
                                  Estimation . . . . . . . . . . . . . . . 1904--1916
           S. Sarreshtedari and   
               M. A. Akhaee and   
                    A. Abbasfar   A Watermarking Method for Digital Speech
                                  Self-Recovery  . . . . . . . . . . . . . 1917--1925
                  N. Moritz and   
               J. Anemuller and   
                   B. Kollmeier   An Auditory Inspired Amplitude
                                  Modulation Filter Bank for Robust
                                  Feature Extraction in Automatic Speech
                                  Recognition  . . . . . . . . . . . . . . 1926--1937
                 Yajie Miao and   
                  Hao Zhang and   
                       F. Metze   Speaker Adaptive Training of Deep Neural
                                  Network Acoustic Models Using
                                  $I$-Vectors  . . . . . . . . . . . . . . 1938--1949
                   V. Morfi and   
                G. Degottex and   
                  A. Mouchtaris   Speech Analysis and Synthesis with a
                                  Computationally Efficient Adaptive
                                  Harmonic Model . . . . . . . . . . . . . 1950--1962
                  J. Dennis and   
                 H. D. Tran and   
                     Haizhou Li   Generalized Hough Transform for Speech
                                  Pattern Classification . . . . . . . . . 1963--1972
                  Feng Deng and   
              Changchun Bao and   
                   W. B. Kleijn   Sparse Hidden Markov Models for Speech
                                  Enhancement in Non-Stationary Noise
                                  Environments . . . . . . . . . . . . . . 1973--1987
                  R. Ranjan and   
                  Woon-Seng Gan   Natural Listening over Headphones in
                                  Augmented Reality Using Adaptive
                                  Filtering Techniques . . . . . . . . . . 1988--2002
                 L.-H. Chen and   
                  T. Raitio and   
      C. Valentini-Botinhao and   
                 Z.-H. Ling and   
                   J. Yamagishi   A Deep Generative Architecture for
                                  Postfiltering in Statistical Parametric
                                  Speech Synthesis . . . . . . . . . . . . 2003--2014
               Ho Seon Shin and   
             T. Fingscheidt and   
                  Hong-Goo Kang   A Priori SNR Estimation Using Air- and
                                  Bone-Conduction Microphones  . . . . . . 2015--2025
                      Ji Wu and   
                    Miao Li and   
                   Chin-Hui Lee   A Probabilistic Framework for
                                  Representing Dialog Systems and
                                  Entropy-Based Dialog Management Through
                                  Dynamic Stochastic State Evolution . . . 2026--2035
                      S. Cumani   Fast Scoring of Full Posterior PLDA
                                  Models . . . . . . . . . . . . . . . . . 2036--2045
               V. Tourbabin and   
                     B. Rafaely   Direction of Arrival Estimation Using
                                  Microphone Array Processing for Moving
                                  Humanoid Robots  . . . . . . . . . . . . 2046--2058
                  Y. J. Chu and   
                     S. C. Chan   A New Local Polynomial Modeling-Based
                                  Variable Forgetting Factor RLS Algorithm
                                  and Its Acoustic Applications  . . . . . 2059--2069
       F. de-la-Calle-Silos and   
    F. J. Valverde-Albacete and   
        A. Gallardo-Antolin and   
               C. Pelaez-Moreno   Morphologically Filtered
                                  Power-Normalized Cochleograms as Robust,
                                  Biologically Inspired Features for ASR   2070--2080
                   T. Hirao and   
                 M. Nishino and   
                 Y. Yoshida and   
                  J. Suzuki and   
                  N. Yasuda and   
                      M. Nagata   Summarizing a Document by Trimming the
                                  Discourse Tree . . . . . . . . . . . . . 2081--2092
                   Chao Pan and   
              Jingdong Chen and   
                     J. Benesty   Theoretical Analysis of Differential
                                  Microphone Array Beamforming and an
                                  Improved Solution  . . . . . . . . . . . 2093--2105

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 23, Number 12, December, 2015

               Wanxiang Che and   
                Yanyan Zhao and   
                Honglei Guo and   
                   Zhong Su and   
                       Ting Liu   Sentence Compression for Aspect-Based
                                  Sentiment Analysis . . . . . . . . . . . 2111--2124
                J. Sheaffer and   
            M. van Walstijn and   
                 B. Rafaely and   
                   K. Kowalczyk   Binaural Reproduction of Finite
                                  Difference Simulations Using Spherical
                                  Array Processing . . . . . . . . . . . . 2125--2135
               Po-Sen Huang and   
                  Minje Kim and   
        M. Hasegawa-Johnson and   
                   P. Smaragdis   Joint Optimization of Masks and Deep
                                  Recurrent Neural Networks for Monaural
                                  Source Separation  . . . . . . . . . . . 2136--2147
                  A. Heidel and   
             Hsiang-Hung Lu and   
                   Lin-Shan Lee   Finding Complex Features for Guest
                                  Language Fragment Recovery in
                                  Resource-Limited Code-Mixed Speech
                                  Recognition  . . . . . . . . . . . . . . 2148--2161
               D. Marquardt and   
                 V. Hohmann and   
                       S. Doclo   Interaural Coherence Preservation in
                                  Multi-Channel Wiener Filtering-Based
                                  Noise Reduction for Binaural Hearing
                                  Aids . . . . . . . . . . . . . . . . . . 2162--2176
                     Kai Yu and   
                    Kai Sun and   
                    Lu Chen and   
                         Su Zhu   Constrained Markov Bayesian Polynomial
                                  for Efficient Dialogue State Tracking    2177--2188
             C. A. Anderson and   
                 P. D. Teal and   
                  M. A. Poletti   Spatially Robust Far-field Beamforming
                                  Using the von Mises(--Fisher)
                                  Distribution . . . . . . . . . . . . . . 2189--2197
                J. Schroder and   
                  S. Goetze and   
                   J. Anemuller   Spectro-Temporal Gabor Filterbank
                                  Features for Acoustic Event Detection    2198--2208
                 Inseok Heo and   
                 W. A. Sethares   Classification Based on Speech Rhythm
                                  via a Temporal Alignment of Spoken
                                  Sentences  . . . . . . . . . . . . . . . 2209--2216
            P. Samarasinghe and   
              T. Abhayapala and   
                 M. Poletti and   
                    T. Betlehem   An Efficient Parameterization of the
                                  Room Transfer Function . . . . . . . . . 2217--2227
                 Yong Xiang and   
           I. Natgunanathan and   
                   Yue Rong and   
                       Song Guo   Spread Spectrum-Based High Embedding
                                  Capacity Watermarking Method for Audio
                                  Signals  . . . . . . . . . . . . . . . . 2228--2237
                In-Chul Yoo and   
              Hyeontaek Lim and   
                   Dongsuk Yook   Formant-Based Robust Voice Activity
                                  Detection  . . . . . . . . . . . . . . . 2238--2245
                  T. Hueber and   
                   L. Girin and   
          X. Alameda-Pineda and   
                      G. Bailly   Speaker-Adaptive Acoustic-Articulatory
                                  Inversion Using Cascaded Gaussian
                                  Mixture Regression . . . . . . . . . . . 2246--2259
                  Hequn Bai and   
                 G. Richard and   
                      L. Daudet   Late Reverberation Synthesis: From
                                  Radiance Transfer to Feedback Delay
                                  Networks . . . . . . . . . . . . . . . . 2260--2271
                      I. Bayram   A Multichannel Audio Denoising
                                  Formulation Based on Spectral Sparsity   2272--2285
                 H. Delgado and   
                 X. Anguera and   
              C. Fredouille and   
                     J. Serrano   Fast Single- and Cross-Show Speaker
                                  Diarization Using Binary Key Speaker
                                  Modeling . . . . . . . . . . . . . . . . 2286--2297
          W. S. Percybrooks and   
                       E. Moore   A New HMM-Based Voice Conversion
                                  Methodology Evaluated on Monolingual and
                                  Cross-Lingual Conversion Tasks . . . . . 2298--2310
                   M. Graja and   
                   M. Jaoua and   
                 L. H. Belguith   Statistical Framework with Knowledge
                                  Base Integration for Robust Speech
                                  Understanding of the Tunisian Dialect    2311--2321
                F. Strasser and   
                       H. Puder   Adaptive Feedback Cancellation for
                                  Realistic Hearing Aid Applications . . . 2322--2333
              Yu Ting Yeung and   
                    Tan Lee and   
               Cheung-Chi Leung   Supervised Single-Microphone
                                  Multi-Talker Speech Separation with
                                  Conditional Random Fields  . . . . . . . 2334--2342
                  Wenyu Jin and   
                   W. B. Kleijn   Theory and Design of Multizone
                                  Soundfield Reproduction Using Sparse
                                  Methods  . . . . . . . . . . . . . . . . 2343--2355
              Xionghu Zhong and   
                  J. R. Hopgood   A Time--Frequency Masking Based Random
                                  Finite Set Particle Filtering Method for
                                  Multiple Acoustic Source Detection and
                                  Tracking . . . . . . . . . . . . . . . . 2356--2370
                 K. Vijayan and   
                 K. S. R. Murty   Analysis of Phase Spectrum of Speech
                                  Signals Using Allpass Modeling . . . . . 2371--2383
               D. Marquardt and   
                   E. Hadad and   
                  S. Gannot and   
                       S. Doclo   Theoretical Analysis of Linearly
                                  Constrained Multi-Channel Wiener
                                  Filtering Algorithms for Combined Noise
                                  Reduction and Binaural Cue Preservation
                                  in Binaural Hearing Aids . . . . . . . . 2384--2397
                  M. Zohrer and   
                  R. Peharz and   
                    F. Pernkopf   Representation Learning for
                                  Single-Channel Source Separation and
                                  Bandwidth Extension  . . . . . . . . . . 2398--2409
                   Hao Fang and   
               M. Ostendorf and   
                 P. Baumann and   
               J. Pierrehumbert   Exponential Language Modeling Using
                                  Morphological Features and Multi-Task
                                  Learning . . . . . . . . . . . . . . . . 2410--2421
               M. A. Carlin and   
                    M. Elhilali   A Framework for Speech Activity
                                  Detection Using Adaptive Auditory
                                  Receptive Fields . . . . . . . . . . . . 2422--2433
                   S. Saito and   
                   K. Oishi and   
                    T. Furukawa   Convolutive Blind Source Separation
                                  Using an Iterative Least-Squares
                                  Algorithm for Non-Orthogonal Approximate
                                  Joint Diagonalization  . . . . . . . . . 2434--2448
                   E. Hadad and   
               D. Marquardt and   
                   S. Doclo and   
                      S. Gannot   Theoretical Analysis of Binaural
                                  Transfer Function MVDR Beamformers with
                                  Interference Cue Preservation
                                  Constraints  . . . . . . . . . . . . . . 2449--2464
                 Guang Yang and   
                 R. F. Lyon and   
                 E. M. Drakakis   Psychophysical Evaluation of An
                                  Ultra-Low Power, Analog Biomimetic
                                  Cochlear Implant Processor Filterbank
                                  Architecture With Across Channels AGC    2465--2473
                      Anonymous   List of Reviewers  . . . . . . . . . . . 2474--2476

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 24, Number 1, January, 2016

                      Anonymous   Table of Contents  . . . . . . . . . . . 1--2
                      Anonymous   Table of Contents  . . . . . . . . . . . 3--4
                S. Brognaux and   
                     T. Drugman   HMM-Based Speech Segmentation:
                                  Improvements of Fully Automatic
                                  Approaches . . . . . . . . . . . . . . . 5--15
                   M. Tahon and   
                   L. Devillers   Towards a Small Set of Robust Acoustic
                                  Features for Emotion Recognition:
                                  Challenges . . . . . . . . . . . . . . . 16--28
                H. Behravan and   
               V. Hautamaki and   
          S. M. Siniscalchi and   
                T. Kinnunen and   
                   Chin-Hui Lee   $i$-Vector Modeling of Speech Attributes
                                  for Automatic Foreign Accent Recognition 29--41
                  R. Saeidi and   
                    P. Alku and   
                   T. Backstrom   Feature Extraction Using Power-Law
                                  Adjusted Linear Prediction With
                                  Application to Speaker Recognition Under
                                  Severe Vocal Effort Mismatch . . . . . . 42--53
             I. T. Ardekani and   
               J. P. Kaipio and   
                  A. Nasiri and   
             H. Sharifzadeh and   
                  W. H. Abdulla   A Statistical Inverse Problem Approach
                                  to Online Secondary Path Modeling in
                                  Active Noise Control . . . . . . . . . . 54--64
              T. Stafylakis and   
                   P. Kenny and   
                 M. J. Alam and   
                    M. Kockmann   Speaker and Channel Factors in
                                  Text-Dependent Speaker Recognition . . . 65--78
                Yanzhang He and   
                 P. Baumann and   
                   Hao Fang and   
              B. Hutchinson and   
                   A. Jaech and   
               M. Ostendorf and   
          E. Fosler-Lussier and   
               J. Pierrehumbert   Using Pronunciation-Based Morphological
                                  Subword Units to Improve OOV Handling in
                                  Keyword Search . . . . . . . . . . . . . 79--92
                   Meng Sun and   
             Xiongwei Zhang and   
               H. Van Hamme and   
                    T. F. Zheng   Unseen Noise Estimation Using Separable
                                  Deep Auto Encoder for Speech Enhancement 93--104
                  L. Ferrer and   
                    Yun Lei and   
                 M. McLaren and   
                    N. Scheffer   Study of Senone-Based Deep Neural
                                  Network Approaches for Spoken Language
                                  Recognition  . . . . . . . . . . . . . . 105--116
        S. I. Adalbjornsson and   
                T. Kronvall and   
                 S. Burgess and   
                  K. Astrom and   
                   A. Jakobsson   Sparse Localization of Harmonic Audio
                                  Sources  . . . . . . . . . . . . . . . . 117--129
                Man-Wai Mak and   
               Xiaomin Pang and   
                Jen-Tzung Chien   Mixture of PLDA for Noise Robust
                                  $I$-Vector Speaker Verification  . . . . 130--142
             C. A. Anderson and   
                 P. D. Teal and   
                  M. A. Poletti   Spatial Correlation of Radial Gaussian
                                  and Uniform Spherical Volume Near-Field
                                  Source Distributions . . . . . . . . . . 143--150
                  H. Torres and   
                   J. Gurlekian   Novel Estimation Method for the
                                  Superpositional Intonation Model . . . . 151--160
                  S. Bilbao and   
                B. Hamilton and   
                   J. Botts and   
                     L. Savioja   Finite Volume Time Domain Room Acoustics
                                  Simulation under General Impedance
                                  Boundary Conditions  . . . . . . . . . . 161--173
 A. H. Harati Nejad Torbati and   
                      J. Picone   A Doubly Hierarchical Dirichlet Process
                                  Hidden Markov Model with a Non-Ergodic
                                  Structure  . . . . . . . . . . . . . . . 174--184
            Jen-Tzung Chien and   
                    Po-Kai Yang   Bayesian Factorization and Learning for
                                  Monaural Source Separation . . . . . . . 185--195
                 D. L. Alon and   
                     B. Rafaely   Beamforming with Optimal Aliasing
                                  Cancellation in Spherical Microphone
                                  Arrays . . . . . . . . . . . . . . . . . 196--210
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing Edics  . . . . . 211--212
                      Anonymous   Information for authors  . . . . . . . . 213--214
                      Anonymous   Special issue on sound scene and event
                                  analysis . . . . . . . . . . . . . . . . 215
                      Anonymous   [Front cover]  . . . . . . . . . . . . . C1
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing publication
                                  information  . . . . . . . . . . . . . . C2
                      Anonymous   IEEE Signal Processing Society
                                  Information  . . . . . . . . . . . . . . C3
                      Anonymous   [Blank page] . . . . . . . . . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 24, Number 2, 2016

                      Anonymous   Table of contents  . . . . . . . . . . . 211--212
                      Anonymous   Table of contents  . . . . . . . . . . . 213--214
                 E. Rasumow and   
                  M. Hansen and   
              S. van de Par and   
                 D. Puschel and   
                 V. Mellert and   
                   S. Doclo and   
                        M. Blau   Regularization Approaches for
                                  Synthesizing HRTF Directivity Patterns   215--225
                   Chao Pan and   
                 J. Benesty and   
                  Jingdong Chen   Design of Directivity Patterns with a
                                  Unique Null of Maximum Multiplicity  . . 226--235
             Jeih-Weih Hung and   
              Hsin-Ju Hsieh and   
                    Berlin Chen   Robust Speech Recognition via Enhancing
                                  the Complex-Valued Acoustic Spectrum in
                                  Modulation Domain  . . . . . . . . . . . 236--251
             Xiao-Lei Zhang and   
                   DeLiang Wang   Boosting Contextual Information for Deep
                                  Neural Network Based Voice Activity
                                  Detection  . . . . . . . . . . . . . . . 252--264
       M. A. Tugtekin Turan and   
                       E. Erzin   Source and Filter Estimation for
                                  Throat-Microphone Speech Enhancement . . 265--275
             N. Mohammadiha and   
                       S. Doclo   Speech Dereverberation Using
                                  Non-Negative Convolutive Transfer
                                  Function and Spectro-Temporal Modeling   276--289
                  A. Sharma and   
                        S. Kaul   Two-Stage Supervised Learning-Based
                                  Method to Detect Screams and Cries in
                                  Urban Environments . . . . . . . . . . . 290--299
               Xiaoguang Wu and   
                    Huawei Chen   Directivity Factors of the First-Order
                                  Steerable Differential Array With
                                  Microphone Mismatches: Deterministic and
                                  Worst-Case Analysis  . . . . . . . . . . 300--315
         A. I. Koutrouvelis and   
            G. P. Kafentzis and   
             N. D. Gaubitch and   
                    R. Heusdens   A Fast Method for High-Resolution
                                  Voiced/Unvoiced Detection and Glottal
                                  Closure/Opening Instant Estimation of
                                  Speech . . . . . . . . . . . . . . . . . 316--328
                T. Nakamura and   
                E. Nakamura and   
                    S. Sagayama   Real-Time Audio-to-Score Alignment of
                                  Music Performances Containing Errors and
                                  Arbitrary Repeats and Skips  . . . . . . 329--339
                   A. Bahne and   
                       A. Ahlen   Optimizing the Similarity of
                                  Loudspeaker-Room Responses in Multiple
                                  Listening Positions  . . . . . . . . . . 340--353
                J. M. Kates and   
                  K. H. Arehart   The Hearing-Aid Audio Quality Index
                                  (HAAQI)  . . . . . . . . . . . . . . . . 354--365
                H. Schepker and   
                       S. Doclo   A Semidefinite Programming Approach to
                                  Min-max Estimation of the Common Part of
                                  Acoustic Feedback Paths in Hearing Aids  366--377
                Bong-Ki Lee and   
                Joon-Hyuk Chang   Packet Loss Concealment Based on Deep
                                  Neural Networks for Digital Speech
                                  Transmission . . . . . . . . . . . . . . 378--387
              L. Bentivogli and   
                N. Bertoldi and   
                 M. Cettolo and   
                M. Federico and   
                   M. Negri and   
                      M. Turchi   On the Evaluation of Adaptive Machine
                                  Translation for Human Post-Editing . . . 388--399
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing Edics  . . . . . 400--401
                      Anonymous   Information for authors  . . . . . . . . 402--403
                      Anonymous   Special issue on sound scene and event
                                  analysis . . . . . . . . . . . . . . . . 404
                      Anonymous   [Front cover]  . . . . . . . . . . . . . C1
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing publication
                                  information  . . . . . . . . . . . . . . C2
                      Anonymous   IEEE Signal Processing Society
                                  Information  . . . . . . . . . . . . . . C3
                      Anonymous   [Blank page] . . . . . . . . . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 24, Number 3, March, 2016

                      Anonymous   Table of Contents  . . . . . . . . . . . 405--406
                      Anonymous   Table of Contents  . . . . . . . . . . . 407--408
       Reinhard Sonnleitner and   
                 Gerhard Widmer   Robust Quad-Based Audio Fingerprinting   409--421
                    Li Dong and   
                   Furu Wei and   
                      Ke Xu and   
                 Shixia Liu and   
                      Ming Zhou   Adaptive Multi-Compositionality for
                                  Recursive Neural Network Models  . . . . 422--431
                  Zheng Lin and   
               Xiaolong Jin and   
                   Xueke Xu and   
              Yuanzhuo Wang and   
                Xueqi Cheng and   
               Weiping Wang and   
                       Dan Meng   An Unsupervised Cross-Lingual Topic
                                  Model Framework for Sentiment
                                  Classification . . . . . . . . . . . . . 432--444
              Anil Nagathil and   
                Claus Weihs and   
                  Rainer Martin   Spectral Complexity Reduction of Music
                                  Signals for Mitigating Effects of
                                  Cochlear Hearing Loss  . . . . . . . . . 445--458
                   Tian Tan and   
                Yanmin Qian and   
                         Kai Yu   Cluster Adaptive Training for Deep
                                  Neural Network Based Acoustic Model  . . 459--468
                Arne Leijon and   
          Gustav Eje Henter and   
               Martin Dahlquist   Bayesian Analysis of Phoneme Confusion
                                  Matrices . . . . . . . . . . . . . . . . 469--482
       Donald S. Williamson and   
                Yuxuan Wang and   
                   DeLiang Wang   Complex Ratio Masking for Monaural
                                  Speech Separation  . . . . . . . . . . . 483--492
              Johannes Traa and   
              David Wingate and   
              Noah D. Stein and   
                Paris Smaragdis   Robust Source Localization and
                                  Enhancement With a Probabilistic Steered
                                  Response Power Model . . . . . . . . . . 493--503
        Sven Ewan Shepstone and   
               Kong Aik Lee and   
                 Haizhou Li and   
              Zheng-Hua Tan and   
      Sòren Holdt Jensen   Total Variability Modeling Using
                                  Source-Specific Priors . . . . . . . . . 504--517
           Martin Schneider and   
              Walter Kellermann   Multichannel Acoustic Echo Cancellation
                                  in the Wave Domain With Increased
                                  Robustness to Nonuniqueness  . . . . . . 518--529
               Ken O'Hanlon and   
            Hidehisa Nagano and   
            Nicolas Keriven and   
               Mark D. Plumbley   Non-Negative Group Sparsity with
                                  Subspace Note Modelling for Polyphonic
                                  Transcription  . . . . . . . . . . . . . 530--542
                Elior Hadad and   
                Simon Doclo and   
                  Sharon Gannot   The Binaural LCMV Beamformer and its
                                  Performance Analysis . . . . . . . . . . 543--558
            Felipe Grijalva and   
               Luiz Martini and   
            Dinei Florencio and   
              Siome Goldenstein   A Manifold Learning Approach for
                                  Personalizing HRTFs from Anthropometric
                                  Features . . . . . . . . . . . . . . . . 559--570
                   Lin Wang and   
                    Simon Doclo   Correlation Maximization-Based Sampling
                                  Rate Offset Estimation for Distributed
                                  Microphone Arrays  . . . . . . . . . . . 571--582
            Nasim Radmanesh and   
             Ian S. Burnett and   
                 Bhaskar D. Rao   A Lasso-LS Optimization with a Frequency
                                  Variable Dictionary in a Multizone Sound
                                  System . . . . . . . . . . . . . . . . . 583--593
                    Xin Liu and   
                  Changchun Bao   Audio Bandwidth Extension Based on
                                  Ensemble Echo State Networks with
                                  Temporal Evolution . . . . . . . . . . . 594--607
                      Anonymous   EDICS Categories for IEEE/ACM
                                  Transactions on Audio, Speech, and
                                  Language Processing  . . . . . . . . . . 608--609
                      Anonymous   Information for Authors  . . . . . . . . 610--611
                      Anonymous   Special issue on sound scene and event
                                  analysis . . . . . . . . . . . . . . . . 612
                      Anonymous   Introducing IEEE Collabratec . . . . . . 613
                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing  . . . . . . . . C2
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing  . . . . . . . . C3

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 24, Number 4, April, 2016

                      Anonymous   Table of Contents  . . . . . . . . . . . 608--609
                      Anonymous   Table of Contents  . . . . . . . . . . . 610--611
                 Peifeng Li and   
                   Guodong Zhou   Joint Argument Inference in Chinese
                                  Event Extraction with Argument
                                  Consistency and Event Relevance  . . . . 612--622
               Jianming Liu and   
                Steven L. Grant   Proportionate Adaptive Filtering for
                                  Block-Sparse System Identification . . . 623--630
       Jesper Rindom Jensen and   
              Jacob Benesty and   
Mads Græsbòll Christensen   Noise Reduction with Optimal Variable
                                  Span Linear Filters  . . . . . . . . . . 631--644
Sidsel Marie Nòrholm and   
       Jesper Rindom Jensen and   
Mads Græsbòll Christensen   Enhancement and Noise Statistics
                                  Estimation for Non-Stationary Voiced
                                  Speech . . . . . . . . . . . . . . . . . 645--658
           Daryush D. Mehta and   
         Jarrad H. Van Stan and   
              Robert E. Hillman   Relationships Between Vocal Function
                                  Measures Derived from an Acoustic
                                  Microphone and a Subglottal Neck-Surface
                                  Accelerometer  . . . . . . . . . . . . . 659--668
              Herman Kamper and   
                Aren Jansen and   
               Sharon Goldwater   Unsupervised Word Segmentation and
                                  Lexicon Discovery Using Acoustic Word
                                  Embeddings . . . . . . . . . . . . . . . 669--679
                Ina Kodrasi and   
                    Simon Doclo   Joint Dereverberation and Noise
                                  Reduction Based on Acoustic
                                  Multi-Channel Equalization . . . . . . . 680--693
              Hamid Palangi and   
                    Li Deng and   
                Yelong Shen and   
               Jianfeng Gao and   
                Xiaodong He and   
               Jianshu Chen and   
               Xinying Song and   
                     Rabab Ward   Deep Sentence Embedding Using Long
                                  Short-Term Memory Networks: Analysis and
                                  Application to Information Retrieval . . 694--707
             Michael Jeffet and   
            Noam R. Shabtai and   
                   Boaz Rafaely   Theory and Perceptual Evaluation of the
                                  Binaural Reproduction and Beamforming
                                  Tradeoff in the Generalized Spherical
                                  Array Beamformer . . . . . . . . . . . . 708--718
          Pablo Peso Parada and   
            Dushyant Sharma and   
                Jose Lainez and   
             Daniel Barreda and   
       Toon van Waterschoot and   
              Patrick A. Naylor   A Single-Channel Non-Intrusive C50
                                  Estimator Correlated With Speech
                                  Recognition Performance  . . . . . . . . 719--732
             Ming-Hsiang Su and   
             Chung-Hsien Wu and   
                  Yu-Ting Zheng   Exploiting Turn-Taking Temporal
                                  Evolution for Personality Trait
                                  Perception in Dyadic Conversations . . . 733--744
           Sadaf Abdul-Rauf and   
             Holger Schwenk and   
             Patrik Lambert and   
                 Mohammad Nawaz   Empirical Use of Information Retrieval
                                  to Build Synthetic Data for SMT Domain
                                  Adaptation . . . . . . . . . . . . . . . 745--754
       Shinnosuke Takamichi and   
                Tomoki Toda and   
              Alan W. Black and   
              Graham Neubig and   
             Sakriani Sakti and   
               Satoshi Nakamura   Postfilters to Modify the Modulation
                                  Spectrum for Statistical Parametric
                                  Speech Synthesis . . . . . . . . . . . . 755--767
                Zhizheng Wu and   
         Phillip L. De Leon and   
             Cenk Demiroglu and   
            Ali Khodabakhsh and   
                 Simon King and   
              Zhen-Hua Ling and   
              Daisuke Saito and   
              Bryan Stewart and   
                Tomoki Toda and   
              Mirjam Wester and   
              Junichi Yamagishi   Anti-Spoofing for Text-Independent
                                  Speaker Verification: an Initial
                                  Database, Comparison of Countermeasures,
                                  and Human Performance  . . . . . . . . . 768--783
     Kristian Timm Andersen and   
                    Marc Moonen   Adaptive Time-Frequency Analysis for
                                  Noise Reduction in an Audio Filter Bank
                                  With Low Delay . . . . . . . . . . . . . 784--795
             Zhong-Qiu Wang and   
                   DeLiang Wang   A Joint Training Framework for Robust
                                  Automatic Speech Recognition . . . . . . 796--806
                   Huy Phan and   
                Lars Hertel and   
                Marco Maass and   
             Radoslaw Mazur and   
                 Alfred Mertins   Learning Representations for Nonspeech
                                  Audio Events Through Their Similarities
                                  to Speech Patterns . . . . . . . . . . . 807--822
                      Anonymous   EDICS Categories for IEEE/ACM
                                  Transactions on Audio, Speech, and
                                  Language Processing  . . . . . . . . . . 823--824
                      Anonymous   Information for Authors  . . . . . . . . 825--826
                      Anonymous   Special issue on sound scene and event
                                  analysis . . . . . . . . . . . . . . . . 827
                      Anonymous   Introducing IEEE Collabratec . . . . . . 828
                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing  . . . . . . . . C2

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 24, Number 5, May, 2016

                      Anonymous   Table of Contents  . . . . . . . . . . . 829--830
                      Anonymous   Table of Contents  . . . . . . . . . . . 831--832
                 T. J. Tsai and   
                Andreas Stolcke   Robust and Efficient Multiple Alignment
                                  of Unsynchronized Meeting Recordings . . 833--845
             Simon Receveur and   
           Robin Weiß and   
                Tim Fingscheidt   Turbo Automatic Speech Recognition . . . 846--862
              Ricard Marxer and   
                Hendrik Purwins   Unsupervised Incremental Online Learning
                                  and Prediction of Musical Audio Signals  863--874
             Mohammad Adeli and   
                 Jean Rouat and   
                  Sean Wood and   
Stéphane Molotchnikoff and   
                   Eric Plourde   A Flexible Bio-Inspired Hierarchical
                                  Model for Analyzing Musical Timbre . . . 875--889
              Geliang Zhang and   
                  Simon Godsill   Fundamental Frequency Estimation in
                                  Speech Signals With Variable Rate
                                  Particle Filters . . . . . . . . . . . . 890--900
              Nadine Kroher and   
            Emilia Gómez   Automatic Transcription of Flamenco
                                  Singing From Polyphonic Music Recordings 901--913
               Fiete Winter and   
                Jens Ahrens and   
                   Sascha Spors   On Analytic Methods for $ 2.5$-D Local
                                  Sound Field Synthesis Using Circular
                                  Distributions of Secondary Sources . . . 914--926
           Siddharth Sigtia and   
          Emmanouil Benetos and   
                    Simon Dixon   An End-to-End Neural Network for
                                  Polyphonic Piano Music Transcription . . 927--939
     Martin Krawczyk-Becker and   
                  Timo Gerkmann   Fundamental Frequency Informed Speech
                                  Enhancement in a Flexible Statistical
                                  Framework  . . . . . . . . . . . . . . . 940--951
             Joseph Szurley and   
         Alexander Bertrand and   
               Bas Van Dijk and   
                    Marc Moonen   Binaural Noise Cue Preservation in a
                                  Binaural Noise Reduction System With a
                                  Remote Microphone Signal . . . . . . . . 952--966
             Xiao-Lei Zhang and   
                   DeLiang Wang   A Deep Ensemble Learning Method for
                                  Monaural Speech Separation . . . . . . . 967--977
                 Haotian Xu and   
                     Zhijian Ou   Scalable Discovery of Audio Fingerprint
                                  Motifs in Broadcast Streams With
                                  Determinantal Point Process Based Motif
                                  Clustering . . . . . . . . . . . . . . . 978--989
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing Edics  . . . . . 990--991
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing information for
                                  authors  . . . . . . . . . . . . . . . . 992--993
                      Anonymous   Special issue on sound scene and event
                                  analysis . . . . . . . . . . . . . . . . 994
                      Anonymous   Special Issue on Biosignal-based Spoken
                                  Communication  . . . . . . . . . . . . . 995
                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing publication
                                  information  . . . . . . . . . . . . . . C2
                      Anonymous   IEEE Power Electronics Society
                                  Information  . . . . . . . . . . . . . . C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 24, Number 6, June, 2016

                      Anonymous   Table of Contents  . . . . . . . . . . . 990--991
                      Anonymous   Table of Contents  . . . . . . . . . . . 992--993
           Asli Celikyilmaz and   
              Ruhi Sarikaya and   
               Minwoo Jeong and   
                   Anoop Deoras   An Empirical Investigation of Word
                                  Class-Based Features for Natural
                                  Language Understanding . . . . . . . . . 994--1005
        Duc Hoang Ha Nguyen and   
                 Xiong Xiao and   
             Eng Siong Chng and   
                     Haizhou Li   Feature Adaptation Using Linear
                                  Spectro-Temporal Transform for Robust
                                  Speech Recognition . . . . . . . . . . . 1006--1019
               Xiaojun Qian and   
                 Helen Meng and   
                    Frank Soong   A Two-Pass Framework of Mispronunciation
                                  Detection and Diagnosis for
                                  Computer-Aided Pronunciation Training    1020--1028
               Lijiang Chen and   
                    Xia Mao and   
                       Hong Yan   Text-Independent Phoneme Segmentation
                                  Combining EGG and Speech Data  . . . . . 1029--1037
  Vincent Mohammad Tavakoli and   
       Jesper Rindom Jensen and   
Mads Græsbòll Christensen and   
                  Jacob Benesty   A Framework for Speech Enhancement With
                                  Ad Hoc Microphone Arrays . . . . . . . . 1038--1051
               Yan-You Chen and   
             Chung-Hsien Wu and   
              Yi-Chin Huang and   
               Shih-Lun Lin and   
                  Jhing-Fa Wang   Candidate Expansion and Prosody
                                  Adjustment for Natural Speech Synthesis
                                  Using a Small Corpus . . . . . . . . . . 1052--1065
             Xueliang Zhang and   
                  Hui Zhang and   
                  Shuai Nie and   
               Guanglai Gao and   
                      Wenju Liu   A Pairwise Algorithm Using the Deep
                                  Stacking Network for Speech Separation
                                  and Pitch Estimation . . . . . . . . . . 1066--1078
                   Lin Wang and   
                Tsz-Kin Hon and   
            Joshua D. Reiss and   
               Andrea Cavallaro   An Iterative Approach to Source Counting
                                  and Localization Using Two Distant
                                  Microphones  . . . . . . . . . . . . . . 1079--1093
        Seán O'Leary and   
                Axel Röbel   A Montage Approach to Sound Texture
                                  Synthesis  . . . . . . . . . . . . . . . 1094--1105
               Chahid Ouali and   
           Pierre Dumouchel and   
                   Vishwa Gupta   Fast Audio Fingerprinting System Using
                                  GPU and a Clustering-Based Technique . . 1106--1118
           Francisco Raposo and   
            Ricardo Ribeiro and   
         David Martins de Matos   Using Generic Summarization to Improve
                                  Music Information Retrieval Tasks  . . . 1119--1128
                 Lantian Li and   
                  Dong Wang and   
              Chenhao Zhang and   
              Thomas Fang Zheng   Improving Short Utterance Speaker
                                  Recognition by Modeling Speech Unit
                                  Classes  . . . . . . . . . . . . . . . . 1129--1139
               Jalal Taghia and   
                  Rainer Martin   A Frequency-Domain Adaptive Line
                                  Enhancer With Step-Size Control Based on
                                  Mutual Information for Harmonic Noise
                                  Reduction  . . . . . . . . . . . . . . . 1140--1154
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing Edics  . . . . . 1155--1156
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing information for
                                  authors  . . . . . . . . . . . . . . . . 1157--1158
                      Anonymous   Special issue on sound scene and event
                                  analysis . . . . . . . . . . . . . . . . 1159
                      Anonymous   Special Issue on Biosignal-based Spoken
                                  Communication  . . . . . . . . . . . . . 1160
                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing publication
                                  information  . . . . . . . . . . . . . . C2
                      Anonymous   IEEE Power Electronics Society
                                  Information  . . . . . . . . . . . . . . C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 24, Number 7, July, 2016

                    Min Gao and   
                    Jing Lu and   
                    Xiaojun Qiu   A Simplified Subband ANC Algorithm
                                  Without Secondary Path Modeling  . . . . 1164--1174
                 Ryo Aihara and   
          Tetsuya Takiguchi and   
                    Yasuo Ariki   Multiple Non-Negative Matrix
                                  Factorization for Many-to-Many Voice
                                  Conversion . . . . . . . . . . . . . . . 1175--1184
                   Kai Chen and   
                      Qiang Huo   Training Deep Bidirectional LSTM
                                  Acoustic Model for LVCSR by a
                                  Context-Sensitive-Chunk BPTT Approach    1185--1193
          Themos Stafylakis and   
          Md. Jahangir Alam and   
                  Patrick Kenny   Text-Dependent Speaker Recognition With
                                  Random Digit Strings . . . . . . . . . . 1194--1203
               K. T. Deepak and   
        S. R. Mahadeva Prasanna   Foreground Speech Segmentation and
                                  Enhancement Using Glottal Closure
                                  Instants and Mel Cepstral Coefficients   1204--1218
      Habib Hajimolahoseini and   
        Rassoul Amirfattahi and   
                Saeed Gazor and   
          Hamid Soltanian-Zadeh   Robust Estimation and Tracking of Pitch
                                  Period Using an Efficient Bayesian
                                  Filter . . . . . . . . . . . . . . . . . 1219--1229
           Subhasmita Sahoo and   
              Aurobinda Routray   A Novel Method of Glottal Inverse
                                  Filtering  . . . . . . . . . . . . . . . 1230--1241
            Gilles Degottex and   
              Luc Ardaillon and   
                    Axel Roebel   Multi-Frame Amplitude Envelope
                                  Estimation for Modification of Singing
                                  Voice  . . . . . . . . . . . . . . . . . 1242--1254
                Zhizheng Wu and   
                     Simon King   Improving Trajectory Modelling for
                                  DNN-Based Speech Synthesis by Using
                                  Stacked Bottleneck Features and Minimum
                                  Generation Error Training  . . . . . . . 1255--1265
       Xabier Jaureguiberry and   
           Emmanuel Vincent and   
              Gaël Richard   Fusion Methods for Speech Enhancement
                                  and Audio Source Separation  . . . . . . 1266--1279
           Rajib Lochan Das and   
         Mrityunjoy Chakraborty   Improving the Performance of the PNLMS
                                  Algorithm Using Norm Regularization  . . 1280--1290
               Maja Taseska and   
      Emanuël A. P. Habets   Spotforming: Spatial Filtering With
                                  Distributed Arrays for
                                  Position-Selective Sound Acquisition . . 1291--1304
              Guangyou Zhou and   
                 Zhiwen Xie and   
                Tingting He and   
                   Jun Zhao and   
                Xiaohua Tony Hu   Learning the Multilingual Translation
                                  Representations for Question Retrieval
                                  in Community Question Answering via
                                  Non-Negative Matrix Factorization  . . . 1305--1314
                Chanwoo Kim and   
               Richard M. Stern   Power-Normalized Cepstral Coefficients
                                  (PNCC) for Robust Speech Recognition . . 1315--1329

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 24, Number 8, 2016

           Henning Schepker and   
                    Simon Doclo   Least-Squares Estimation of the Common
                                  Pole-Zero Filter of Acoustic Feedback
                                  Paths in Hearing Aids  . . . . . . . . . 1334--1347
       Hannes Pessentheiner and   
      Martin Hagmüller and   
                   Gernot Kubin   Localization and Characterization of
                                  Multiple Harmonic Sources  . . . . . . . 1348--1363
           Hanieh Khalilian and   
            Ivan V. Baji\'c and   
              Rodney G. Vaughan   Comparison of Loudspeaker Placement
                                  Methods for Sound Field Reproduction . . 1364--1379
             Cheng-Yen Yang and   
               Chih-Wei Liu and   
                   Shyh-Jye Jou   A Systematic ANSI S1.11 Filter Bank
                                  Specification Relaxation and Its
                                  Efficient Multirate Architecture for
                                  Hearing-Aid Systems  . . . . . . . . . . 1380--1392
   Bracha Laufer-Goldshtein and   
               Ronen Talmon and   
                  Sharon Gannot   Semi-Supervised Sound Source
                                  Localization Based on Manifold
                                  Regularization . . . . . . . . . . . . . 1393--1407
 Dionyssos Kounades-Bastian and   
              Laurent Girin and   
      Xavier Alameda-Pineda and   
              Sharon Gannot and   
                    Radu Horaud   A Variational EM Algorithm for the
                                  Separation of Time-Varying Convolutive
                                  Audio Mixtures . . . . . . . . . . . . . 1408--1423
                     Jun Du and   
                  Yanhui Tu and   
                Li-Rong Dai and   
                   Chin-Hui Lee   A Regression Approach to Single-Channel
                                  Speech Separation Via High-Resolution
                                  Deep Neural Networks . . . . . . . . . . 1424--1437
                Xunying Liu and   
                   Xie Chen and   
             Yongqiang Wang and   
           Mark J. F. Gales and   
             Philip C. Woodland   Two Efficient Lattice Rescoring Methods
                                  Using Recurrent Neural Network Language
                                  Models . . . . . . . . . . . . . . . . . 1438--1449
         Pawel Swietojanski and   
                   Jinyu Li and   
                   Steve Renals   Learning Hidden Unit Contributions for
                                  Unsupervised Acoustic Model Adaptation   1450--1463
                 Meng Zhang and   
                   Yang Liu and   
                Huanbo Luan and   
                    Maosong Sun   Listwise Ranking Functions for
                                  Statistical Machine Translation  . . . . 1464--1472

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 24, Number 9, 2016

                      Anonymous   Table of Contents  . . . . . . . . . . . 1477--1478
                      Anonymous   Table of Contents  . . . . . . . . . . . 1479--1480
        Daniel C. Cavalieri and   
 Sira E. Palazuelos-Cagigas and   
   Teodiano F. Bastos-Filho and   
  Mário Sarcinelli-Filho   Combination of Language Models for Word
                                  Prediction: An Exponential Approach  . . 1481--1494
              Ofer Schwartz and   
              Sharon Gannot and   
      Emanuël A. P. Habets   An Expectation-Maximization Algorithm
                                  for Multimicrophone Speech
                                  Dereverberation and Noise Reduction With
                                  Coherence Matrix Estimation  . . . . . . 1495--1510
    Symeon Delikaris-Manias and   
               Juha Vilkamo and   
                   Ville Pulkki   Signal-Dependent Spatial Filtering Based
                                  on Weighted-Orthogonal Beamformers in
                                  the Spherical Harmonic Domain  . . . . . 1511--1523
                   Sheng Li and   
                 Yuya Akita and   
               Tatsuya Kawahara   Semi-Supervised Acoustic Model Training
                                  by Discriminative Data Selection From
                                  Multiple ASR Systems' Hypotheses . . . . 1524--1534
          Christian Dittmar and   
            Meinard Müller   Reverse Engineering the Amen Break ---
                                  Score-Informed Separation and
                                  Restoration Applied to Drum Recordings   1535--1547
                   Chao Pan and   
              Jingdong Chen and   
                  Jacob Benesty   Reduced-Order Robust Superdirective
                                  Beamforming With Uniform Linear
                                  Microphone Arrays  . . . . . . . . . . . 1548--1559
           Derry FitzGerald and   
            Antoine Liutkus and   
                  Roland Badeau   Projection-Based Demixing of Spatial
                                  Audio  . . . . . . . . . . . . . . . . . 1560--1572
                   Lin Wang and   
            Joshua D. Reiss and   
               Andrea Cavallaro   Over-Determined Source Separation and
                                  Localization Using Distributed
                                  Microphones  . . . . . . . . . . . . . . 1573--1588
                   Yang Liu and   
                  Sujian Li and   
                   Furu Wei and   
                        Heng Ji   Relation Classification Via Modeling
                                  Augmented Dependency Paths . . . . . . . 1589--1598
         Adam Kuklasi\'nski and   
                Simon Doclo and   
  Sòren Holdt Jensen and   
                  Jesper Jensen   Maximum Likelihood PSD Estimation for
                                  Speech Enhancement in Reverberation and
                                  Noise  . . . . . . . . . . . . . . . . . 1599--1612
         Sam Karimian-Azari and   
       Jesper Rindom Jensen and   
Mads Græsbòll Christensen   Computationally Efficient and Noise
                                  Robust DOA and Pitch Estimation  . . . . 1613--1625
            Daichi Kitamura and   
               Nobutaka Ono and   
             Hiroshi Sawada and   
           Hirokazu Kameoka and   
             Hiroshi Saruwatari   Determined Blind Source Separation
                                  Unifying Independent Vector Analysis and
                                  Nonnegative Matrix Factorization . . . . 1626--1641
               Nicolas Obin and   
                    Axel Roebel   Similarity Search of Acted Voices for
                                  Automatic Voice Casting  . . . . . . . . 1642--1651
        Aditya Arie Nugraha and   
            Antoine Liutkus and   
               Emmanuel Vincent   Multichannel Audio Source Separation
                                  With Deep Neural Networks  . . . . . . . 1652--1664
            Stephen H. Shum and   
           David F. Harwath and   
                Najim Dehak and   
                 James R. Glass   On the Use of Acoustic Unit Discovery
                                  for Language Recognition . . . . . . . . 1665--1676
                      Anonymous   \booktitleIEEE/ACM Transactions on
                                  Audio, Speech, and Language Processing
                                  Edics  . . . . . . . . . . . . . . . . . 1677--1678
                      Anonymous   \booktitleIEEE Transactions on
                                  Multimedia information for authors . . . 1679--1680
                      Anonymous   Introducing the IEEE PES Resource Center 1681
                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1
                      Anonymous   IEEE Signal Processing Society . . . . . C2
                      Anonymous   IEEE Signal Processing Society . . . . . C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 24, Number 10, 2016

                      Anonymous   Table of Contents  . . . . . . . . . . . 1677--1678
                      Anonymous   Table of Contents  . . . . . . . . . . . 1679--1680
                James Eaton and   
        Nikolay D. Gaubitch and   
          Alastair H. Moore and   
              Patrick A. Naylor   Estimation of Room Acoustic Parameters:
                                  The ACE Challenge  . . . . . . . . . . . 1681--1693
                   Takashi Nose   Efficient Implementation of Global
                                  Variance Compensation for Parametric
                                  Speech Synthesis . . . . . . . . . . . . 1694--1704
     Shabnam Ghaffarzadegan and   
              Hynek Bo\vril and   
              John H. L. Hansen   Generative Modeling of Pseudo-Whisper
                                  for Robust Whispered Speech Recognition  1705--1720
      Seyedmahdad Mirsamadi and   
              John H. L. Hansen   A Generalized Nonnegative Tensor
                                  Factorization Approach for Distant
                                  Speech Recognition With Distributed
                                  Microphones  . . . . . . . . . . . . . . 1721--1731
               Laura Fuster and   
             Maria de Diego and   
     Luis A. Azpicueta-Ruiz and   
                  Miguel Ferrer   Adaptive Filtered-x Algorithms for Room
                                  Equalization Based on Block-Based
                                  Combination Schemes  . . . . . . . . . . 1732--1745
           Kamil Adilo\uglu and   
               Emmanuel Vincent   Variational Bayesian Inference for
                                  Source Separation and Robust Feature
                                  Extraction . . . . . . . . . . . . . . . 1746--1758
           Steffen Kortlang and   
                 Giso Grimm and   
             Volker Hohmann and   
           Birger Kollmeier and   
               Stephan D. Ewert   Auditory Model-Based Dynamic Compression
                                  Controlled by Subband Instantaneous
                                  Frequency and Speech Presence
                                  Probability Estimates  . . . . . . . . . 1759--1772
         Pawel Swietojanski and   
                   Steve Renals   Differentiable Pooling for Unsupervised
                                  Acoustic Model Adaptation  . . . . . . . 1773--1784
                 Kenta Niwa and   
               Yusuke Hioka and   
             Kazunori Kobayashi   Optimal Microphone Array Observation for
                                  Clear Recording of Distant Sound Sources 1785--1795
              Nicolas Epain and   
                   Craig T. Jin   Spherical Harmonic Signal Covariance and
                                  Sound Field Diffuseness  . . . . . . . . 1796--1807
 Tudor-C\uat\ualin Zoril\ua and   
           Yannis Stylianou and   
           Tatsuma Ishihara and   
                 Masami Akamine   Near and Far Field Speech-in-Noise
                                  Intelligibility Improvements Based on a
                                  Time--Frequency Energy Reallocation
                                  Approach . . . . . . . . . . . . . . . . 1808--1818
                      Xi Ma and   
                  Dong Wang and   
                 Javier Tejedor   Similar Word Model for Unfrequent Word
                                  Enhancement in Speech Recognition  . . . 1819--1830
       Mohammad Hadi Bokaei and   
             Hossein Sameti and   
                       Yang Liu   Summarizing Meeting Transcripts Based on
                                  Functional Segmentation  . . . . . . . . 1831--1841
               Jiajun Zhang and   
                    Yu Zhou and   
                 Chengqing Zong   Abstractive Cross-Language Summarization
                                  via Translation Model Enhanced Predicate
                                  Argument Structure Fusing  . . . . . . . 1842--1853
      Grégoire Lafay and   
           Mathieu Lagrange and   
          Mathias Rossignol and   
          Emmanouil Benetos and   
                    Axel Roebel   A Morphological Model for Simulating
                                  Acoustic Scenes and Its Application to
                                  Sound Event Detection  . . . . . . . . . 1854--1864
                      An Ji and   
         Michael T. Johnson and   
               Jeffrey J. Berry   Parallel Reference Speaker Weighting for
                                  Kinematic-Independent
                                  Acoustic-to-Articulatory Inversion . . . 1865--1875
                      Anonymous   \booktitleIEEE/ACM Transactions on
                                  Audio, Speech, and Language Processing
                                  Edics  . . . . . . . . . . . . . . . . . 1876--1877
                      Anonymous   \booktitleIEEE Transactions on
                                  Multimedia information for authors . . . 1878--1879
                      Anonymous   Introducing the IEEE PES Resource Center 1880
                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1
                      Anonymous   IEEE Signal Processing Society . . . . . C2
                      Anonymous   IEEE Signal Processing Society . . . . . C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 24, Number 11, 2016

                      Anonymous   Table of Contents  . . . . . . . . . . . 1881--1882
                      Anonymous   Table of Contents  . . . . . . . . . . . 1883--1884
            Aggelos Gkiokas and   
         Vassilis Katsouros and   
              George Carayannis   Towards Multi-Purpose Spectral Rhythm
                                  Features: An Application to Dance Style,
                                  Meter and Tempo Estimation . . . . . . . 1885--1896
              Yi-Chin Huang and   
             Chung-Hsien Wu and   
                   Si-Ting Weng   Improving Mandarin Prosody Generation
                                  Using Alternative Smoothing Techniques   1897--1907
   Asger Heidemann Andersen and   
           Jan Mark de Haan and   
              Zheng-Hua Tan and   
                  Jesper Jensen   Predicting the Intelligibility of Noisy
                                  and Nonlinearly Processed Binaural
                                  Speech . . . . . . . . . . . . . . . . . 1908--1920
             Qiaoling Zhang and   
                   Zhe Chen and   
                    Fuliang Yin   Distributed Marginalized Auxiliary
                                  Particle Filter for Speaker Tracking in
                                  Distributed Microphone Networks  . . . . 1921--1934
              Marc Ferr\`as and   
          Srikanth Madikeri and   
          Hervé Bourlard   Speaker Diarization and Linking of
                                  Meeting Data . . . . . . . . . . . . . . 1935--1945
                 Yuzong Liu and   
               Katrin Kirchhoff   Graph-Based Semisupervised Learning for
                                  Acoustic Modeling in Automatic Speech
                                  Recognition  . . . . . . . . . . . . . . 1946--1956
                   Jin Wang and   
              Liang-Chih Yu and   
              K. Robert Lai and   
                   Xuejie Zhang   Community-Based Weighted Graph Model for
                                  Valence-Arousal Prediction of Affective
                                  Words  . . . . . . . . . . . . . . . . . 1957--1968
             Alberto Carini and   
            Stefania Cecchi and   
                   Laura Romoli   Robust Room Impulse Response Measurement
                                  Using Perfect Sequences for Legendre
                                  Nonlinear Filters  . . . . . . . . . . . 1969--1982
            Sebastian Ewert and   
                   Mark Sandler   Piano Transcription in the Studio Using
                                  an Extensible Alternating Directions
                                  Framework  . . . . . . . . . . . . . . . 1983--1997
               Yu-Ren Chien and   
              Hsin-Min Wang and   
                 Shyh-Kang Jeng   Alignment of Lyrics With Accompanied
                                  Singing Audio Based on Acoustic-Phonetic
                                  Vowel Likelihood Modeling  . . . . . . . 1998--2008
              Jesper Jensen and   
                   Cees H. Taal   An Algorithm for Predicting the
                                  Intelligibility of Speech Masked by
                                  Modulated Noise Maskers  . . . . . . . . 2009--2022
               Xiaodong Cui and   
                  Vaibhava Goel   Maximum Likelihood Nonlinear
                                  Transformations Based on Deep Neural
                                  Networks . . . . . . . . . . . . . . . . 2023--2031
             Toru Nakashika and   
          Tetsuya Takiguchi and   
                Yasuhiro Minami   Non-Parallel Training in Voice
                                  Conversion Using an Adaptive Restricted
                                  Boltzmann Machine  . . . . . . . . . . . 2032--2045
                 I-Bin Liao and   
             Chen-Yu Chiang and   
                Yih-Ru Wang and   
                 Sin-Horng Chen   Speaker Adaptation of SR-HPM for
                                  Speaking Rate-Controlled Mandarin TTS    2046--2058
               Hiroki Ouchi and   
                  Kevin Duh and   
            Hiroyuki Shindo and   
                 Yuji Matsumoto   Transition-Based Dependency Parsing
                                  Exploiting Supertags . . . . . . . . . . 2059--2068
                  Tong Xiao and   
              Derek F. Wong and   
                     Jingbo Zhu   A Loss-Augmented Approach to Training
                                  Syntactic Machine Translation Systems    2069--2083
             Yukara Ikemiya and   
         Katsutoshi Itoyama and   
               Kazuyoshi Yoshii   Singing Voice Separation and Vocal F0
                                  Estimation Based on Mutual Combination
                                  of Robust Principal Component Analysis
                                  and Subharmonic Summation  . . . . . . . 2084--2095
           Siddharth Sigtia and   
              Adam M. Stark and   
         Sacha Krstulovi\'c and   
               Mark D. Plumbley   Automatic Environmental Sound
                                  Recognition: Performance Versus
                                  Computational Cost . . . . . . . . . . . 2096--2107
     Srinivas Parthasarathy and   
                Roddy Cowie and   
                   Carlos Busso   Using Agreement on Direction of Change
                                  to Build Rank-Based Emotion Classifiers  2108--2121
             Jia-Ching Wang and   
              Yuan-Shan Lee and   
             Chang-Hong Lin and   
               Shu-Fan Wang and   
              Chih-Hao Shih and   
                 Chung-Hsien Wu   Compressive Sensing-Based Speech
                                  Enhancement  . . . . . . . . . . . . . . 2122--2131
                Siying Wang and   
            Sebastian Ewert and   
                    Simon Dixon   Robust and Efficient Joint Alignment of
                                  Multiple Musical Performances  . . . . . 2132--2145
                   Xie Chen and   
                Xunying Liu and   
             Yongqiang Wang and   
           Mark J. F. Gales and   
             Philip C. Woodland   Efficient Training and Evaluation of
                                  Recurrent Neural Network Language Models
                                  for Automatic Speech Recognition . . . . 2146--2157
              Ping-Keng Jao and   
                      Li Su and   
              Yi-Hsuan Yang and   
                Brendt Wohlberg   Monaural Music Source Separation Using
                                  Convolutional Sparse Coding  . . . . . . 2158--2170

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 24, Number 12, 2016

            Andrea Cogliati and   
                Zhiyao Duan and   
                Brendt Wohlberg   Context-Dependent Piano Music
                                  Transcription With Convolutional Sparse
                                  Coding . . . . . . . . . . . . . . . . . 2218--2230
                Yanmin Qian and   
                   Tian Tan and   
                        Dong Yu   Neural Network Based Multi-Factor Aware
                                  Joint Training for Robust Speech
                                  Recognition  . . . . . . . . . . . . . . 2231--2240
          Lahiru Samarakoon and   
                   Khe Chai Sim   Factorized Hidden Layer Adaptation for
                                  Deep Neural Network Based Acoustic
                                  Modeling . . . . . . . . . . . . . . . . 2241--2250
     Martin Krawczyk-Becker and   
                  Timo Gerkmann   On MMSE-Based Estimation of Amplitude
                                  and Complex Speech Spectral Coefficients
                                  Under Phase-Uncertainty  . . . . . . . . 2251--2262
                Yanmin Qian and   
                Mengxiao Bi and   
                   Tian Tan and   
                         Kai Yu   Very Deep Convolutional Neural Networks
                                  for Noise Robust Speech Recognition  . . 2263--2276
                 Yi-Chan Wu and   
                  Homer H. Chen   Generation of Affective Accompaniment in
                                  Accordance With Emotion Flow . . . . . . 2277--2287
          Mahmood Movassagh and   
                    Peter Kabal   Scalable Audio Coding Using
                                  Trellis-Based Optimized Joint Entropy
                                  Coding and Quantization  . . . . . . . . 2288--2300
               Milos Cernak and   
       Alexandros Lazaridis and   
              Afsaneh Asaei and   
               Philip N. Garner   Composition of Deep and Spiking Neural
                                  Networks for Very Low Bit Rate Speech
                                  Coding . . . . . . . . . . . . . . . . . 2301--2312
                  David Dov and   
               Ronen Talmon and   
                   Israel Cohen   Kernel Method for Voice Activity
                                  Detection in the Presence of Transients  2313--2326
      Jesús Villalba and   
             Antonio Miguel and   
             Alfonso Ortega and   
                 Eduardo Lleida   Bayesian Networks to Model the
                                  Variability of Speaker Verification
                                  Scores in Adverse Environments . . . . . 2327--2340
           Hardik B. Sailor and   
                Hemant A. Patil   Novel Unsupervised Auditory Filterbank
                                  Learning Using Convolutional RBM for
                                  Speech Recognition . . . . . . . . . . . 2341--2353
Sidsel Marie Nòrholm and   
       Jesper Rindom Jensen and   
Mads Græsbòll Christensen   Instantaneous Fundamental Frequency
                                  Estimation With Optimal Segmentation for
                                  Nonstationary Voiced Speech  . . . . . . 2354--2367
                Sheng Zhang and   
               Jiashu Zhang and   
                     Hongyu Han   Robust Variable Step-Size Decorrelation
                                  Normalized Least-Mean-Square Algorithm
                                  and its Application to Acoustic Echo
                                  Cancellation . . . . . . . . . . . . . . 2368--2376
                 Tom Barker and   
                Tuomas Virtanen   Blind Separation of Audio Mixtures
                                  Through Nonnegative Tensor Factorization
                                  of Modulation Spectrograms . . . . . . . 2377--2389
                 Jinxin Liu and   
                   Xuefeng Chen   Adaptive Compensation of Misequalization
                                  in Narrowband Active Noise Equalizer
                                  Systems  . . . . . . . . . . . . . . . . 2390--2399
             Atsunori Ogawa and   
               Takaaki Hori and   
               Atsushi Nakamura   Estimating Speech Recognition Accuracy
                                  Based on Error Type Classification . . . 2400--2413
              Finnian Kelly and   
              John H. L. Hansen   Score-Aging Calibration for Speaker
                                  Verification . . . . . . . . . . . . . . 2414--2424
                  Bochen Li and   
                    Zhiyao Duan   An Approach to Score Following for Piano
                                  Performances With the Sustained Effect   2425--2438
                Niko Moritz and   
           Birger Kollmeier and   
       Jörn Anemüller   Integration of Optimized Modulation
                                  Filter Sets Into Deep Neural Networks
                                  for Automatic Speech Recognition . . . . 2439--2452
             Simon Leglaive and   
              Roland Badeau and   
              Gaël Richard   Multichannel Audio Source Separation
                                  With Probabilistic Reverberation Priors  2453--2465
                   Sakari Tervo   Single Snapshot Detection and Estimation
                                  of Reflections From Room Impulse
                                  Responses in the Spherical Harmonic
                                  Domain . . . . . . . . . . . . . . . . . 2466--2480
           Dejan Markovi\'c and   
            Fabio Antonacci and   
              Lucio Bianchi and   
             Stefano Tubaro and   
                  Augusto Sarti   Extraction of Acoustic Sources Through
                                  the Processing of Sound Field Maps in
                                  the Ray Space  . . . . . . . . . . . . . 2481--2494

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 25, Number 2, 2017

                      Anonymous   Table of Contents  . . . . . . . . . . . 222--223
                      Anonymous   Table of Contents  . . . . . . . . . . . 224--225
                Hanchi Chen and   
Thushara Dheemantha Abhayapala and   
   Prasanga N. Samarasinghe and   
                      Wen Zhang   Direct-to-Reverberant Energy Ratio
                                  Estimation Using a First-Order
                                  Microphone . . . . . . . . . . . . . . . 226--237
                 Peter Bell and   
         Pawel Swietojanski and   
                   Steve Renals   Multitask Learning of Context-Dependent
                                  Targets in Deep Neural Network Acoustic
                                  Models . . . . . . . . . . . . . . . . . 238--247
                   Rui Zhao and   
                      Kezhi Mao   Topic-Aware Deep Compositional Models
                                  for Sentence Classification  . . . . . . 248--260
            Dalia El Badawy and   
           Ngoc Q. K. Duong and   
                  Alexey Ozerov   On-the-Fly Audio Source Separation --- A
                                  Novel User-Friendly Framework  . . . . . 261--272
             Filip Elvander and   
           Johan Swärd and   
              Andreas Jakobsson   Online Estimation of Multiple Harmonic
                                  Signals  . . . . . . . . . . . . . . . . 273--284
            Vincent Renkens and   
                 Hugo Van hamme   Weakly Supervised Learning of Hidden
                                  Markov Models for Spoken Language
                                  Acquisition  . . . . . . . . . . . . . . 285--295
               Luca Remaggi and   
       Philip J. B. Jackson and   
             Philip Coleman and   
                     Wenwu Wang   Acoustic Reflector Localization: Novel
                                  Image Source Reversion and Direct
                                  Localization Methods . . . . . . . . . . 296--309
   Prasanga N. Samarasinghe and   
     Thushara D. Abhayapala and   
                    Hanchi Chen   Estimating the Direct-to-Reverberant
                                  Energy Ratio Using a Spherical
                                  Harmonics-Based Spatial Correlation
                                  Model  . . . . . . . . . . . . . . . . . 310--319
    Shmulik Markovich-Golan and   
              Sharon Gannot and   
              Walter Kellermann   Combined LCMV-TRINICON Beamforming for
                                  Separating Multiple Speech Sources in
                                  Noisy and Reverberant Environments . . . 320--332
              Shakeel Ahmed and   
          Muhammad Tahir Akhtar   Gain Scheduling of Auxiliary Noise and
                                  Variable Step-Size for Online Acoustic
                                  Feedback Cancellation in Narrow-Band
                                  Active Noise Control Systems . . . . . . 333--343
            Gabriel Sargent and   
Frédéric Bimbot and   
               Emmanuel Vincent   Estimating the Structural Segmentation
                                  of Popular Music Pieces Under Regularity
                                  Constraints  . . . . . . . . . . . . . . 344--358
               Jordan Cheer and   
                  Stephen Daley   An Investigation of Delayless Subband
                                  Adaptive Filtering for Multi-Input
                                  Multi-Output Active Noise Control
                                  Applications . . . . . . . . . . . . . . 359--373
      Sebastian J. Schlecht and   
      Emanuël A. P. Habets   Feedback Delay Networks: Echo Density
                                  and Mixing Time  . . . . . . . . . . . . 374--383
              Johannes Abel and   
        Magdalena Kaniewska and   
     Cyril Guillaumé and   
               Wouter Tirry and   
                Tim Fingscheidt   An Instrumental Quality Measure for
                                  Artificially Bandwidth-Extended Speech
                                  Signals  . . . . . . . . . . . . . . . . 384--396
                Robert Rehr and   
                  Timo Gerkmann   An Analysis of Adaptive Recursive
                                  Smoothing with Applications to Noise PSD
                                  Estimation . . . . . . . . . . . . . . . 397--408
             Emilio Granell and   
Carlos-D. Martínez-Hinarejos   Multimodal Crowdsourcing for
                                  Transcribing Handwritten Documents . . . 409--419
                  Yaping Ma and   
                     Yegui Xiao   A New Strategy for Online Secondary-Path
                                  Modeling of Narrowband Active Noise
                                  Control  . . . . . . . . . . . . . . . . 420--434
            Jose A. Belloch and   
           Alberto Gonzalez and   
Enrique S. Quintana-Ortí and   
              Miguel Ferrer and   
        Vesa Välimäki   GPU-Based Dynamic Wave Field Synthesis
                                  Using Fractional Delay Filters and Room
                                  Compensation . . . . . . . . . . . . . . 435--447
                      Anonymous   IEEE/ACM Transactions on Audio, Speech,
                                  and Language Processing Edics  . . . . . 448--449
                      Anonymous   IEEE Transactions on Multimedia
                                  information for authors  . . . . . . . . 450--451
                      Anonymous   Introducing IEEE Collabratec . . . . . . 452
                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1
                      Anonymous   IEEE Signal Processing Society . . . . . C2

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 25, Number 3, 2017

                      Anonymous   Table of Contents  . . . . . . . . . . . 3--4
                      Anonymous   Table of Contents  . . . . . . . . . . . 3--4
                      Anonymous   Table of Contents  . . . . . . . . . . . 3--4
                      Anonymous   Table of Contents  . . . . . . . . . . . 3--4
                      Qi He and   
                   Feng Bao and   
                  Changchun Bao   Multiplicative Update of Auto-Regressive
                                  Gains for Codebook-Based Speech
                                  Enhancement  . . . . . . . . . . . . . . 457--468
             Zhongqing Wang and   
         Sophia Yat Mei Lee and   
                Shoushan Li and   
                   Guodong Zhou   Emotion Analysis in Code-Switching Text
                                  With Joint Factor Graph Model  . . . . . 469--480
              Ashwin Bellur and   
                Mounya Elhilali   Feedback-Driven Sensory Mapping
                                  Adaptation for Robust Speech Activity
                                  Detection  . . . . . . . . . . . . . . . 481--492
               Zhiyuan Tang and   
                 Lantian Li and   
                  Dong Wang and   
           Ravichander Vipperla   Collaborative Joint Training With
                                  Multitask Recurrent Model for Speech and
                                  Speaker Recognition  . . . . . . . . . . 493--504
             Bidisha Sharma and   
        S. R. Mahadeva Prasanna   Sonority Measurement Using System,
                                  Source, and Suprasegmental Information   505--518
                Hung-Yi Lee and   
            Bo-Hsiang Tseng and   
            Tsung-Hsien Wen and   
                        Yu Tsao   Personalizing
                                  Recurrent-Neural-Network-Based Language
                                  Model by Social Network  . . . . . . . . 519--530
                    Ji Ming and   
                  Danny Crookes   Speech Enhancement Based on
                                  Full-Sentence Correlation and Clean
                                  Speech Recognition . . . . . . . . . . . 531--543
             Quoc Truong Do and   
                Tomoki Toda and   
              Graham Neubig and   
             Sakriani Sakti and   
               Satoshi Nakamura   Preserving Word-Level Emphasis in
                                  Speech-to-Speech Translation . . . . . . 544--556
                Zhenghua Li and   
               Jiayuan Chao and   
                  Min Zhang and   
              Wenliang Chen and   
              Meishan Zhang and   
                     Guohong Fu   Coupled POS Tagging on Heterogeneous
                                  Annotations  . . . . . . . . . . . . . . 557--571
        Clement S. J. Doire and   
               Mike Brookes and   
          Patrick A. Naylor and   
       Christopher M. Hicks and   
                 Dave Betts and   
          Mohammad A. Dmour and   
      Sòren Holdt Jensen   Single-Channel Online Enhancement of
                                  Speech Corrupted by Reverberation and
                                  Noise  . . . . . . . . . . . . . . . . . 572--587
            Aleksandr Sizov and   
               Kong Aik Lee and   
                  Tomi Kinnunen   Direct Optimization of the Detection
                                  Cost for $I$-Vector-Based Spoken
                                  Language Recognition . . . . . . . . . . 588--597
               Imran Sheikh and   
             Dominique Fohr and   
               Irina Illina and   
              Georges Linar\`es   Modelling Semantic Context of OOV Words
                                  in Large Vocabulary Continuous Speech
                                  Recognition  . . . . . . . . . . . . . . 598--610
            Mojtaba Farmani and   
   Michael Syskind Pedersen and   
              Zheng-Hua Tan and   
                  Jesper Jensen   Informed Sound Source Localization Using
                                  Relative Transfer Functions for Hearing
                                  Aid Applications . . . . . . . . . . . . 611--623
               C. M. Vikram and   
        S. R. Mahadeva Prasanna   Epoch Extraction From Telephone Quality
                                  Speech Using Single Pole Filter  . . . . 624--636
               Motoi Omachi and   
              Tetsuji Ogawa and   
            Tetsunori Kobayashi   Associative Memory Model-Based Linear
                                  Filtering and Its Application to Tandem
                                  Connectionist Blind Source Separation    637--650
            Dani Cherkassky and   
                  Sharon Gannot   Blind Synchronization in Wireless
                                  Acoustic Sensor Networks . . . . . . . . 651--661
              Laurent Girin and   
              Thomas Hueber and   
          Xavier Alameda-Pineda   Extending the Cascaded Gaussian Mixture
                                  Regression Framework for Cross-Speaker
                                  Acoustic-Articulatory Mapping  . . . . . 662--673
       Mohamad Hasan Bahari and   
         Alexander Bertrand and   
                    Marc Moonen   Blind Sampling Rate Offset Estimation
                                  for Wireless Acoustic Sensor Networks
                                  Through Weighted Least-Squares Coherence
                                  Drift Estimation . . . . . . . . . . . . 674--686
         Adam Kuklasi\'nski and   
                Simon Doclo and   
  Sòren Holdt Jensen and   
                  Jesper Jensen   Correction to ``Maximum Likelihood PSD
                                  Estimation for Speech Enhancement in
                                  Reverberation and Noise''  . . . . . . . 687--687

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 25, Number 4, 2017

                      Anonymous   Table of Contents  . . . . . . . . . . . 688--689
                      Anonymous   Table of Contents  . . . . . . . . . . . 690--691
              Sharon Gannot and   
           Emmanuel Vincent and   
    Shmulik Markovich-Golan and   
                  Alexey Ozerov   A Consolidated Perspective on
                                  Multimicrophone Speech Enhancement and
                                  Source Separation  . . . . . . . . . . . 692--730
               Dongwen Ying and   
                Ruohua Zhou and   
                 Junfeng Li and   
                   Yonghong Yan   Window-Dominant Signal Subspace Methods
                                  for Multiple Short-Term Speech Source
                                  Localization . . . . . . . . . . . . . . 731--744
            Sean U. N. Wood and   
                 Jean Rouat and   
     Stéphane Dupont and   
              Gueorgui Pironkov   Blind Speech Separation and Enhancement
                                  With GCC-NMF . . . . . . . . . . . . . . 745--755
          Constantin Spille and   
           Birger Kollmeier and   
                 Bernd T. Meyer   Combining Binaural and Cortical Features
                                  for Robust Speech Recognition  . . . . . 756--767
               Yuma Koizumi and   
                 Kenta Niwa and   
               Yusuke Hioka and   
         Kazunori Kobayashi and   
                 Hitoshi Ohmuro   Informative Acoustic Feature Selection
                                  to Maximize Mutual Information for
                                  Collecting Target Sources  . . . . . . . 768--779
             Takuya Higuchi and   
               Nobutaka Ito and   
                Shoko Araki and   
            Takuya Yoshioka and   
              Marc Delcroix and   
              Tomohiro Nakatani   Online MVDR Beamformer Based on Complex
                                  Gaussian Mixture Model With Spatial
                                  Prior for Noise Robust ASR . . . . . . . 780--793
              Eita Nakamura and   
           Kazuyoshi Yoshii and   
               Shigeki Sagayama   Rhythm Transcription of Polyphonic Piano
                                  Music Based on Merged-Output HMM for
                                  Multiple Voices  . . . . . . . . . . . . 794--806
               Omid Ghahabi and   
                Javier Hernando   Deep Learning Backend for Single and
                                  Multisession $i$-Vector Speaker
                                  Recognition  . . . . . . . . . . . . . . 807--817
            Penny Karanasou and   
                Chunyang Wu and   
                 Mark Gales and   
             Philip C. Woodland   $I$-Vectors and Structured Neural
                                  Networks for Rapid Adaptation of
                                  Acoustic Models  . . . . . . . . . . . . 818--828
                  G. Aneeja and   
               B. Yegnanarayana   Extraction of Fundamental Frequency From
                                  Degraded Speech Using Temporal Envelopes
                                  at High SNR Frequencies  . . . . . . . . 829--838
       Seyyed Saeed Sarfjoo and   
           Cenk Demiro\uglu and   
                     Simon King   Using Eigenvoices and Nearest-Neighbors
                                  in HMM-Based Cross-Lingual Speaker
                                  Adaptation With Limited Data . . . . . . 839--851
              Yung-Yue Chen and   
                  Jia-Hao Zhang   Background Noise Reduction Design for
                                  Dual Microphone Cellular Phones: Robust
                                  Approach . . . . . . . . . . . . . . . . 852--862
                 Liner Yang and   
              Xinxiong Chen and   
                Zhiyuan Liu and   
                    Maosong Sun   Improving Word Representations with
                                  Document Labels  . . . . . . . . . . . . 863--870
             Shiliang Zhang and   
                   Cong Liu and   
                  Hui Jiang and   
                     Si Wei and   
                 Lirong Dai and   
                          Yu Hu   Nonrecurrent Neural Structure for
                                  Long-Term Dependence . . . . . . . . . . 871--884
               Xuefeng Yang and   
                      Kezhi Mao   Task Independent Fine Tuning for Word
                                  Embeddings . . . . . . . . . . . . . . . 885--894
                     Yu Bao and   
                    Huawei Chen   Design of Robust Broadband Beamformers
                                  Using Worst-Case Performance
                                  Optimization: a Semidefinite Programming
                                  Approach . . . . . . . . . . . . . . . . 895--907
              Sandro Cumani and   
                  Pietro Laface   Nonlinear I-Vector Transformations for
                                  PLDA-Based Speaker Recognition . . . . . 908--919
                      Anonymous   IEEE\slash ACM Transactions on Audio,
                                  Speech, and Language Processing Edics    920--921
                      Anonymous   IEEE Transactions on Audio, Speech, and
                                  Language Processing information for
                                  authors  . . . . . . . . . . . . . . . . 922--923
                      Anonymous   Introducing IEEE Collabratec . . . . . . 924
                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1
                      Anonymous   IEEE Signal Processing Society . . . . . C2
                      Anonymous   IEEE Signal Processing Society . . . . . C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 25, Number 5, 2017

                      Anonymous   Table of Contents  . . . . . . . . . . . 925--926
                      Anonymous   Table of Contents  . . . . . . . . . . . 927--928
            Manu Airaksinen and   
    Tom Bäckström and   
                     Paavo Alku   Quadratic Programming Approach to
                                  Glottal Inverse Filtering by Joint
                                  Norm-1 and Norm-2 Optimization . . . . . 929--939
              Ofer Schwartz and   
              Sharon Gannot and   
      Emanuël A. P. Habets   Multispeaker LCMV Beamformer and
                                  Postfilter for Source Separation and
                                  Noise Reduction  . . . . . . . . . . . . 940--951
               Dongmei Wang and   
                Chengzhu Yu and   
              John H. L. Hansen   Robust Harmonic Features for
                                  Classification-Based Pitch Estimation    952--964
            Tara N. Sainath and   
               Ron J. Weiss and   
            Kevin W. Wilson and   
                      Bo Li and   
             Arun Narayanan and   
              Ehsan Variani and   
          Michiel Bacchiani and   
              Izhak Shafran and   
              Andrew Senior and   
                  Kean Chin and   
               Ananya Misra and   
                    Chanwoo Kim   Multichannel Signal Processing With Deep
                                  Neural Networks for Automatic Speech
                                  Recognition  . . . . . . . . . . . . . . 965--979
           Hanieh Khalilian and   
            Ivan V. Baji\'c and   
              Rodney G. Vaughan   A Simulation Study of a
                                  Three-Dimensional Sound Field
                                  Reproduction System for Immersive
                                  Communication  . . . . . . . . . . . . . 980--995
             Andreas Franck and   
                 Wenwu Wang and   
             Filippo Maria Fazi   Sparse $ \ell_1$-Optimal
                                  Multiloudspeaker Panning and Its
                                  Relation to Vector Base Amplitude
                                  Panning  . . . . . . . . . . . . . . . . 996--1010
                 Songbin Li and   
                 Yizhen Jia and   
                  C.-C. Jay Kuo   Steganalysis of QIM Steganography in
                                  Low-Bit-Rate Speech Signals  . . . . . . 1011--1022
              Naoyuki Kanda and   
                  Xugang Lu and   
                  Hisashi Kawai   Maximum-a-Posteriori-Based Decoding for
                                  End-to-End Acoustic Models . . . . . . . 1023--1034
             Navid Shokouhi and   
              John H. L. Hansen   Teager--Kaiser Energy Operators for
                                  Overlapped Speech Detection  . . . . . . 1035--1047
              Yi-Chin Huang and   
             Chung-Hsien Wu and   
               Yan-You Chen and   
               Ming-Ge Shie and   
                  Jhing-Fa Wang   Personalized Spontaneous Speech
                                  Synthesis Using a Small-Sized
                                  Unsegmented Semispontaneous Speech . . . 1048--1060
              Jeongsoo Park and   
              Jaeyoung Shin and   
                      Kyogu Lee   Exploiting Continuity/Discontinuity of
                                  Basis Vectors in Spectrogram
                                  Decomposition for Harmonic-Percussive
                                  Sound Separation . . . . . . . . . . . . 1061--1074
             Xueliang Zhang and   
                   DeLiang Wang   Deep Learning Based Binaural Speech
                                  Separation in Reverberant Environments   1075--1084
            Masood Delfarah and   
                   DeLiang Wang   Features for Masking-Based Monaural
                                  Speech Separation in Reverberant
                                  Conditions . . . . . . . . . . . . . . . 1085--1094
                Feiran Yang and   
              Gerald Enzner and   
                       Jun Yang   Statistical Convergence Analysis for
                                  Optimal Control of DFT-Domain Adaptive
                                  Echo Canceler  . . . . . . . . . . . . . 1095--1106
               Takashi Nose and   
                Yusuke Arao and   
            Takao Kobayashi and   
              Komei Sugiura and   
                Yoshinori Shiga   Sentence Selection Based on Extended
                                  Entropy Using Phonetic and Prosodic
                                  Contexts for Statistical Parametric
                                  Speech Synthesis . . . . . . . . . . . . 1107--1116
             Gergely Firtha and   
         Péter Fiala and   
              Frank Schultz and   
                   Sascha Spors   Improved Referencing Schemes for 2.5D
                                  Wave Field Synthesis Driving Functions   1117--1127
            Esteban Maestre and   
            Gary P. Scavone and   
                Julius O. Smith   Joint Modeling of Bridge Admittance and
                                  Body Radiativity for Efficient Synthesis
                                  of String Instrument Sound by Digital
                                  Waveguides . . . . . . . . . . . . . . . 1128--1139
             Gongping Huang and   
              Jacob Benesty and   
                  Jingdong Chen   On the Design of Frequency-Invariant
                                  Beampatterns With Uniform Circular
                                  Microphone Arrays  . . . . . . . . . . . 1140--1153
         Zden\vek Pr\ru\vsa and   
               Peter Balazs and   
Peter Lempel Sòndergaard   A Noniterative Method for Reconstruction
                                  of Phase From STFT Magnitude . . . . . . 1154--1164
                      Anonymous   \booktitleIEEE/ACM Transactions on
                                  Audio, Speech, and Language Processing
                                  Edics  . . . . . . . . . . . . . . . . . 1167--1168
                      Anonymous   \booktitleIEEE/ACM Transactions on
                                  Audio, Speech, and Language Processing
                                  for authors  . . . . . . . . . . . . . . 1169--1170
                      Anonymous   Open Access  . . . . . . . . . . . . . . 1171
                      Anonymous   Introducing IEEE Collabratec . . . . . . 1172
                      Anonymous   Member Get-A-Member (MGM) Program  . . . 1173
                      Anonymous   Blank Page . . . . . . . . . . . . . . . B1165--B1166
                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1
                      Anonymous   IEEE Signal Processing Society . . . . . C2
                      Anonymous   IEEE Signal Processing Society . . . . . C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 25, Number 6, 2017

                      Anonymous   Table of Contents  . . . . . . . . . . . 1167--1168
                 G. Richard and   
                T. Virtanen and   
                J. P. Bello and   
                     N. Ono and   
                      H. Glotin   Introduction to the Special Section on
                                  Sound Scene and Event Analysis . . . . . 1169--1171
Héctor A. Sánchez-Hevia and   
        David Ayllón and   
           Roberto Gil-Pita and   
             Manuel Rosa-Zurera   Maximum Likelihood Decision Fusion for
                                  Weapon Classification in Wireless
                                  Acoustic Sensor Networks . . . . . . . . 1172--1182
        Nithin Rao Koluguri and   
         G. Nisha Meenakshi and   
           Prasanta Kumar Ghosh   Spectrogram Enhancement Using Multiple
                                  Window Savitzky--Golay (MWSG) Filter for
                                  Robust Bird Sound Detection  . . . . . . 1183--1192
                Dan Stowell and   
          Emmanouil Benetos and   
                   Lisa F. Gill   On-Bird Sound Recordings: Automatic
                                  Acoustic Recognition of Activities and
                                  Contexts . . . . . . . . . . . . . . . . 1193--1206
         Brandon T. Carroll and   
        Bradley M. Whitaker and   
               Wayne Dayley and   
              David V. Anderson   Outlier Learning via Augmented Frozen
                                  Dictionaries . . . . . . . . . . . . . . 1207--1215
               Victor Bisot and   
             Romain Serizel and   
                 Slim Essid and   
              Gaël Richard   Feature Learning With Matrix
                                  Factorization Applied to Acoustic Scene
                                  Classification . . . . . . . . . . . . . 1216--1229
                    Yong Xu and   
                Qiang Huang and   
                 Wenwu Wang and   
               Peter Foster and   
           Siddharth Sigtia and   
       Philip J. B. Jackson and   
               Mark D. Plumbley   Unsupervised Feature Learning Based on
                                  Deep Models for Environmental Audio
                                  Tagging  . . . . . . . . . . . . . . . . 1230--1241
      René Grzeszick and   
                Axel Plinge and   
                 Gernot A. Fink   Bag-of-Features Methods for Acoustic
                                  Event Detection and Classification . . . 1242--1252
            Alain Rakotomamonjy   Supervised Representation Learning for
                                  Audio Scene Classification . . . . . . . 1253--1265
          Emmanouil Benetos and   
      Grégoire Lafay and   
           Mathieu Lagrange and   
               Mark D. Plumbley   Polyphonic Sound Event Tracking Using
                                  Linear Dynamical Systems . . . . . . . . 1266--1277
                   Huy Phan and   
                Lars Hertel and   
                Marco Maass and   
               Philipp Koch and   
             Radoslaw Mazur and   
                 Alfred Mertins   Improved Audio Scene Classification
                                  Based on Label-Tree Embeddings and
                                  Convolutional Neural Networks  . . . . . 1278--1290
        Emre Çak\i r and   
  Giambattista Parascandolo and   
              Toni Heittola and   
            Heikki Huttunen and   
                Tuomas Virtanen   Convolutional Recurrent Neural Networks
                                  for Polyphonic Sound Event Detection . . 1291--1303
         Jens Schröder and   
                Niko Moritz and   
   Jörn Anemüller and   
              Stefan Goetze and   
               Birger Kollmeier   Classifier Architectures for Acoustic
                                  Scenes and Events: Implications for
                                  DNNs, TDNNs, and Perceptual Features
                                  from DCASE 2016  . . . . . . . . . . . . 1304--1314
                Wenjun Yang and   
               Sridhar Krishnan   Combining Temporal Features by Local
                                  Binary Pattern for Acoustic Scene
                                  Classification . . . . . . . . . . . . . 1315--1321
                  David Dov and   
               Ronen Talmon and   
                   Israel Cohen   Multimodal Kernel Method for Activity
                                  Detection of Sound Sources . . . . . . . 1322--1334
              Keisuke Imoto and   
                   Nobutaka Ono   Spatial Cepstrum as a Spatial Feature
                                  Using a Distributed Microphone Array for
                                  Acoustic Scene Analysis  . . . . . . . . 1335--1343
             Ivo Trowitzsch and   
              Johannes Mohr and   
             Youssef Kashef and   
                Klaus Obermayer   Robust Detection of Environmental Sounds
                                  in Binaural Auditory Scenes  . . . . . . 1344--1356
Abu Shafin Mohammad Mahdee Jameel and   
     Shaikh Anowarul Fattah and   
              Rajib Goswami and   
               Wei-Ping Zhu and   
                 M. Omair Ahmad   Noise Robust Formant Frequency
                                  Estimation Method Based on Spectral
                                  Model of Repeated Autocorrelation of
                                  Speech . . . . . . . . . . . . . . . . . 1357--1370
                      Na Li and   
                Man-Wai Mak and   
                Jen-Tzung Chien   DNN-Driven Mixture of PLDA for Robust
                                  Speaker Verification . . . . . . . . . . 1371--1383
                     Kai Wu and   
Vaninirappuputhenpurayil Gopalan Reju and   
           Andy W. H. Khong and   
                   Shu Ting Goh   Swarm Intelligence Based Particle Filter
                                  for Alternating Talker Localization and
                                  Tracking Using Microphone Arrays . . . . 1384--1397
                      Anonymous   \booktitleIEEE/ACM Transactions on
                                  Audio, Speech, and Language Processing
                                  Edics  . . . . . . . . . . . . . . . . . 1398--1399
                      Anonymous   \booktitleIEEE/ACM Transactions on
                                  Audio, Speech, and Language Processing
                                  for authors  . . . . . . . . . . . . . . 1400--1401
                      Anonymous   Open Access  . . . . . . . . . . . . . . 1402
                      Anonymous   Introducing IEEE Collabratec . . . . . . 1403
                      Anonymous   Member Get-A-Member (MGM) Program  . . . 1404
                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1
                      Anonymous   IEEE Signal Processing Society . . . . . C2
                      Anonymous   IEEE Signal Processing Society . . . . . C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 25, Number 7, 2017

                      Anonymous   Table of Contents  . . . . . . . . . . . 1405--1406
                      Anonymous   Table of Contents Edics  . . . . . . . . 1407--1408
                 Yu-An Chen and   
             Ju-Chiang Wang and   
              Yi-Hsuan Yang and   
                  Homer H. Chen   Component Tying for Mixture Model
                                  Adaptation in Personalization of Music
                                  Emotion Recognition  . . . . . . . . . . 1409--1420
            Hossein Zeinali and   
             Hossein Sameti and   
          Luká\vs Burget   HMM-Based Phrase-Independent $i$-Vector
                                  Extractor for Text-Dependent Speaker
                                  Verification . . . . . . . . . . . . . . 1421--1435
                 Xinzhou Xu and   
                   Jun Deng and   
           Nicholas Cummins and   
               Zixing Zhang and   
                    Chen Wu and   
                    Li Zhao and   
            Björn Schuller   A Two-Dimensional Framework of Multiple
                                  Kernel Subspace Learning for Recognizing
                                  Emotion in Speech  . . . . . . . . . . . 1436--1449
             Mandy Korpusik and   
                    James Glass   Spoken Language Understanding for a
                                  Nutrition Dialogue System  . . . . . . . 1450--1461
             Mahmoud Fakhry and   
        Piergiorgio Svaizer and   
               Maurizio Omologo   Audio Source Separation in Reverberant
                                  Environments Using $ \beta
                                  $-Divergence-Based Nonnegative
                                  Factorization  . . . . . . . . . . . . . 1462--1476
   Bracha Laufer-Goldshtein and   
               Ronen Talmon and   
                  Sharon Gannot   Semi-Supervised Source Localization on
                                  Multiple Manifolds With Distributed
                                  Microphones  . . . . . . . . . . . . . . 1477--1491
       Donald S. Williamson and   
                   DeLiang Wang   Time-Frequency Masking in the Complex
                                  Domain for Speech Dereverberation and
                                  Denoising  . . . . . . . . . . . . . . . 1492--1501
                   Liang Lu and   
                   Steve Renals   Small-Footprint Highway Deep Neural
                                  Networks for Speech Recognition  . . . . 1502--1511
                Ina Kodrasi and   
                    Simon Doclo   Signal-Dependent Penalty Functions for
                                  Robust Acoustic Multi-Channel
                                  Equalization . . . . . . . . . . . . . . 1512--1525
               Jung-Hee Kim and   
                    Jin Kim and   
             Jae Hyeon Jeon and   
                   Sang Won Nam   Delayless Individual-Weighting-Factors
                                  Sign Subband Adaptive Filter With
                                  Band-Dependent Variable Step-Sizes . . . 1526--1534
                Yannan Wang and   
                     Jun Du and   
                Li-Rong Dai and   
                   Chin-Hui Lee   A Gender Mixture Detection Approach to
                                  Unsupervised Single-Channel Speech
                                  Separation Based on Deep Neural Networks 1535--1546
           Giacomo Vairetti and   
               Enzo De Sena and   
           Michael Catrysse and   
  Sòren Holdt Jensen and   
                Marc Moonen and   
           Toon van Waterschoot   A Scalable Algorithm for Physically
                                  Motivated and Sparse Approximation of
                                  Room Impulse Responses With Orthonormal
                                  Basis Functions  . . . . . . . . . . . . 1547--1561
                      Anonymous   \booktitleIEEE/ACM Transactions on
                                  Audio, Speech, and Language Processing
                                  Edics  . . . . . . . . . . . . . . . . . 1562--1563
                      Anonymous   \booktitleIEEE Transactions on
                                  Multimedia information for authors . . . 1564--1565
                      Anonymous   Open Access  . . . . . . . . . . . . . . 1566
                      Anonymous   Introducing IEEE Collabratec . . . . . . 1567
                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1
                      Anonymous   IEEE Signal Processing Society . . . . . C2
                      Anonymous   IEEE Signal Processing Society . . . . . C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 25, Number 8, 2017

                      Anonymous   Table of Contents  . . . . . . . . . . . 1562--1563
                      Anonymous   Table of Contents  . . . . . . . . . . . 1564--1565
            Francis Stevens and   
           Damian T. Murphy and   
              Lauri Savioja and   
        Vesa Välimäki   Modeling Sparsely Reflecting Outdoor
                                  Acoustic Scenes Using the Waveguide Web  1566--1578
        Ferdinando Olivieri and   
         Filippo Maria Fazi and   
             Simone Fontana and   
              Dylan Menzies and   
           Philip Arthur Nelson   Generation of Private Sound With a
                                  Circular Loudspeaker Array and the
                                  Weighted Pressure Matching Method  . . . 1579--1591
               Samy Elshamy and   
               Nilesh Madhu and   
               Wouter Tirry and   
                Tim Fingscheidt   Instantaneous A Priori SNR Estimation by
                                  Cepstral Excitation Manipulation . . . . 1592--1605
                 Paavo Alku and   
                   Rahim Saeidi   The Linear Predictive Modeling of Speech
                                  From Higher-Lag Autocorrelation
                                  Coefficients Applied to Noise-Robust
                                  Speaker Recognition  . . . . . . . . . . 1606--1617
                 Cheng Pang and   
                   Hong Liu and   
                  Jie Zhang and   
                     Xiaofei Li   Binaural Sound Localization Based on
                                  Reverberation Weighting and Generalized
                                  Parametric Mapping . . . . . . . . . . . 1618--1632
           Somanath Pradhan and   
                Vinal Patel and   
               Dipen Somani and   
               Nithin V. George   An Improved Proportionate Delayless
                                  Multiband-Structured Subband Adaptive
                                  Feedback Canceller for Digital Hearing
                                  Aids . . . . . . . . . . . . . . . . . . 1633--1643
               Szymon Drgas and   
            Tuomas Virtanen and   
       Jörg Lücke and   
              Antti Hurmalainen   Binary Non-Negative Matrix Deconvolution
                                  for Audio Dictionary Learning  . . . . . 1644--1656
               Fatemeh Saki and   
             Nasser Kehtarnavaz   Real-Time Unsupervised Classification of
                                  Environmental Noise Signals  . . . . . . 1657--1667
          Lakshmish Kaushik and   
           Abhijeet Sangwan and   
              John H. L. Hansen   Automatic Sentiment Detection in
                                  Naturalistic Audio . . . . . . . . . . . 1668--1679
              Ofer Schwartz and   
              Sharon Gannot and   
      Emanuël A. P. Habets   Cramér--Rao Bound Analysis of
                                  Reverberation Level Estimators for
                                  Dereverberation and Noise Reduction  . . 1680--1693
             Seyran Khademi and   
        Richard C. Hendriks and   
             W. Bastiaan Kleijn   Intelligibility Enhancement Based on
                                  Mutual Information . . . . . . . . . . . 1694--1708
                Yuta Hatano and   
                 Chuang Shi and   
             Yoshinobu Kajikawa   Compensation for Nonlinear Distortion of
                                  the Frequency Modulation-Based
                                  Parametric Array Loudspeaker . . . . . . 1709--1717
               Yu-Ren Chien and   
           Daryush D. Mehta and   
     Jón Gu\ethnason and   
Matías Zañartu and   
             Thomas F. Quatieri   Evaluation of Glottal Inverse Filtering
                                  Algorithms Using a Physiologically Based
                                  Articulatory Speech Synthesizer  . . . . 1718--1730
                      Anonymous   \booktitleIEEE/ACM Transactions on
                                  Audio, Speech, and Language Processing
                                  Edics  . . . . . . . . . . . . . . . . . 1731--1732
                      Anonymous   \booktitleIEEE Transactions on
                                  Multimedia information for authors . . . 1733--1734
                      Anonymous   Open Access  . . . . . . . . . . . . . . 1735
                      Anonymous   Introducing IEEE Collabratec . . . . . . 1736
                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1
                      Anonymous   IEEE Signal Processing Society . . . . . C2
                      Anonymous   IEEE Signal Processing Society . . . . . C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 25, Number 9, 2017

                      Anonymous   Table of Contents  . . . . . . . . . . . 1737--1738
                      Anonymous   Table of Contents  . . . . . . . . . . . 1739--1740
         Jakob Abeßer and   
                Gerald Schuller   Instrument-Centered Music Transcription
                                  of Solo Bass Guitar Recordings . . . . . 1741--1750
            Thomas Le Cornu and   
                     Ben Milner   Generating Intelligible Audio Speech
                                  From Visual Speech . . . . . . . . . . . 1751--1761
                  Lemao Liu and   
             Atsushi Fujita and   
              Masao Utiyama and   
               Andrew Finch and   
                Eiichiro Sumita   Translation Quality Estimation Using
                                  Only Bilingual Corpora . . . . . . . . . 1762--1772
              Emad M. Grais and   
                Gerard Roma and   
       Andrew J. R. Simpson and   
               Mark D. Plumbley   Two-Stage Single-Channel Audio Source
                                  Separation Using Deep Neural Networks    1773--1783
          Giuliano Bernardi and   
       Toon van Waterschoot and   
                Jan Wouters and   
                    Marc Moonen   Adaptive Feedback Cancellation Using a
                                  Partitioned-Block Frequency-Domain
                                  Kalman Filter Approach With PEM-Based
                                  Signal Prewhitening  . . . . . . . . . . 1784--1798
                Vinal Patel and   
               Jordan Cheer and   
               Nithin V. George   Modified Phase-Scheduled-Command FxLMS
                                  Algorithm for Active Sound Profiling . . 1799--1808
              Killian Janod and   
            Mohamed Morchid and   
             Richard Dufour and   
          Georges Linar\`es and   
                 Renato De Mori   Denoised Bottleneck Features From Deep
                                  Autoencoders for Telephone Conversation
                                  Analysis . . . . . . . . . . . . . . . . 1809--1820
        Nikolaos Stefanakis and   
           Despoina Pavlidi and   
          Athanasios Mouchtaris   Perpendicular Cross-Spectra Fusion for
                                  Sound Source Localization With a Planar
                                  Microphone Array . . . . . . . . . . . . 1821--1835
         Takenori Yoshimura and   
              Kei Hashimoto and   
             Keiichiro Oura and   
          Yoshihiko Nankaku and   
                 Keiichi Tokuda   Simultaneous Optimization of Multiple
                                  Tree-Based Factor Analyzed HMM for
                                  Speech Synthesis . . . . . . . . . . . . 1836--1845
              Eita Nakamura and   
           Kazuyoshi Yoshii and   
                    Simon Dixon   Note Value Recognition for Piano
                                  Transcription Using Markov Random Fields 1846--1858
                      Anonymous   \booktitleIEEE/ACM Transactions on
                                  Audio, Speech, and Language Processing
                                  Edics  . . . . . . . . . . . . . . . . . 1859--1860
                      Anonymous   \booktitleIEEE Transactions on
                                  Multimedia information for authors . . . 1861--1862
                      Anonymous   Open Access  . . . . . . . . . . . . . . 1863
                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1
                      Anonymous   IEEE Signal Processing Society . . . . . C2
                      Anonymous   IEEE Signal Processing Society . . . . . C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 25, Number 10, 2017

                      Anonymous   Table of Contents  . . . . . . . . . . . 1859--1860
                      Anonymous   Table of Contents  . . . . . . . . . . . 1861--1862
               Xiaohai Tian and   
                 Siu Wa Lee and   
                Zhizheng Wu and   
             Eng Siong Chng and   
                     Haizhou Li   An Exemplar-Based Approach to Frequency
                                  Warping for Voice Conversion . . . . . . 1863--1876
                Siying Wang and   
            Sebastian Ewert and   
                    Simon Dixon   Identifying Missing and Extra Notes in
                                  Piano Recordings Using Score-Informed
                                  Dictionary Learning  . . . . . . . . . . 1877--1889
              Sandro Cumani and   
                  Pietro Laface   Joint Estimation of PLDA and Nonlinear
                                  Transformations of Speaker Vectors . . . 1890--1900
        Morten Kolbæk and   
                    Dong Yu and   
              Zheng-Hua Tan and   
                  Jesper Jensen   Multitalker Speech Separation With
                                  Utterance-Level Permutation Invariant
                                  Training of Deep Recurrent Neural
                                  Networks . . . . . . . . . . . . . . . . 1901--1913
            Cheng-Tao Chung and   
              Cheng-Yu Tsai and   
            Chia-Hsiang Liu and   
                   Lin-Shan Lee   Unsupervised Iterative Deep Learning of
                                  Speech Features and Acoustic Tokens with
                                  Applications to Spoken Term Detection    1914--1928
        Niccol\`o Antonello and   
               Enzo De Sena and   
                Marc Moonen and   
          Patrick A. Naylor and   
           Toon van Waterschoot   Room Impulse Response Interpolation
                                  Using a Sparse Spatio-Temporal
                                  Representation of the Sound Field  . . . 1929--1941
                Yanmin Qian and   
                Nanxin Chen and   
            Heinrich Dinkel and   
                    Zhizheng Wu   Deep Feature Engineering for Noise
                                  Robust Spoofing Detection  . . . . . . . 1942--1955
                Sina Hafezi and   
          Alastair H. Moore and   
              Patrick A. Naylor   Augmented Intensity Vectors for
                                  Direction of Arrival Estimation in the
                                  Spherical Harmonic Domain  . . . . . . . 1956--1968
                Byeongho Jo and   
                  Jung-Woo Choi   Spherical Harmonic Smoothing for
                                  Localizing Coherent Sound Sources  . . . 1969--1984
               Emma Jokinen and   
                 Ulpu Remes and   
                     Paavo Alku   Intelligibility Enhancement of Telephone
                                  Speech Using Gaussian Process Regression
                                  for Normal-to-Lombard Spectral Tilt
                                  Conversion . . . . . . . . . . . . . . . 1985--1996
                 Xiaofei Li and   
              Laurent Girin and   
                Radu Horaud and   
                  Sharon Gannot   Multiple-Speaker Localization Based on
                                  Direct-Path Features and Likelihood
                                  Maximization With Spatial Sparsity
                                  Regularization . . . . . . . . . . . . . 1997--2012
                Marc Arnela and   
                   Oriol Guasch   Finite Element Synthesis of Diphthongs
                                  Using Tuned Two-Dimensional Vocal Tracts 2013--2023
                Deepak Baby and   
                 Hugo Van hamme   Joint Denoising and Dereverberation
                                  Using Exemplar-Based Sparse
                                  Representations and Decaying Norm
                                  Constraint . . . . . . . . . . . . . . . 2024--2035
                      Anonymous   \booktitleIEEE/ACM Transactions on
                                  Audio, Speech, and Language Processing
                                  Edics  . . . . . . . . . . . . . . . . . 2036--2037
                      Anonymous   \booktitleIEEE Transactions on
                                  Multimedia information for authors . . . 2038--2039
                      Anonymous   Open Access  . . . . . . . . . . . . . . 2040
                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1
                      Anonymous   IEEE Signal Processing Society . . . . . C2
                      Anonymous   IEEE Signal Processing Society . . . . . C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 25, Number 11, 2017

                      Anonymous   Table of Contents  . . . . . . . . . . . 2041--2042
                      Anonymous   Table of Contents  . . . . . . . . . . . 2043--2044
              Qinghua Huang and   
                  Lin Zhang and   
                      Yong Fang   Two-Stage Decoupled DOA Estimation Based
                                  on Real Spherical Harmonics for
                                  Spherical Arrays . . . . . . . . . . . . 2045--2058
             Tomoki Hayashi and   
            Shinji Watanabe and   
                Tomoki Toda and   
               Takaaki Hori and   
           Jonathan Le Roux and   
                  Kazuya Takeda   Duration-Controlled LSTM for Polyphonic
                                  Sound Event Detection  . . . . . . . . . 2059--2070
             Monisankha Pal and   
                    Goutam Saha   Spectral Mapping Using Prior
                                  Re-Estimation of $i$-Vectors and System
                                  Fusion for Voice Conversion  . . . . . . 2071--2084
               Seppo Enarvi and   
                 Peter Smit and   
              Sami Virpioja and   
                   Mikko Kurimo   Automatic Speech Recognition With Very
                                  Large Conversational Finnish and
                                  Estonian Vocabularies  . . . . . . . . . 2085--2097
          Hannah Muckenhirn and   
            Pavel Korshunov and   
        Mathew Magimai-Doss and   
        Sébastien Marcel   Long-Term Spectral Statistics for Voice
                                  Presentation Attack Detection  . . . . . 2098--2111
             Brian Hamilton and   
                  Stefan Bilbao   FDTD Methods for $3$-D Room Acoustics
                                  Simulation With High-Order Accuracy in
                                  Space and Time . . . . . . . . . . . . . 2112--2124
             Pejman Mowlaee and   
               Martin Blass and   
             W. Bastiaan Kleijn   New Results in Modulation-Domain
                                  Single-Channel Speech Enhancement  . . . 2125--2137
              Dylan Menzies and   
             Filippo Maria Fazi   Decoding and Compression of Channel and
                                  Scene Objects for Spatial Audio  . . . . 2138--2151
                Eunwoo Song and   
             Frank K. Soong and   
                  Hong-Goo Kang   Effective Spectral and Excitation
                                  Modeling Techniques for LSTM--RNN-Based
                                  Speech Synthesis Systems . . . . . . . . 2152--2161
              Pulkit Sharma and   
              Vinayak Abrol and   
                 Anil Kumar Sao   Deep-Sparse-Representation-Based
                                  Features for Speech Recognition  . . . . 2162--2175
     Iynkaran Natgunanathan and   
                 Yong Xiang and   
                  Guang Hua and   
              Gleb Beliakov and   
                  John Yearwood   Patchwork-Based Multilayer Audio
                                  Watermarking . . . . . . . . . . . . . . 2176--2187
                Chengzhu Yu and   
              John H. L. Hansen   Active Learning Based Constrained
                                  Clustering For Speaker Diarization . . . 2188--2198
 Emil Solsbæk Ottosen and   
            Monika Dörfler   A Phase Vocoder Based on Nonstationary
                                  Gabor Frames . . . . . . . . . . . . . . 2199--2208
              Boaz Schwartz and   
              Sharon Gannot and   
      Emanuël A. P. Habets   Two Model-Based EM Algorithms for Blind
                                  Source Separation in Noisy Environments  2209--2222
               Maja Taseska and   
      Emanuël A. P. Habets   Nonstationary Noise PSD Matrix
                                  Estimation for Multichannel Blind Speech
                                  Extraction . . . . . . . . . . . . . . . 2223--2236
            Bruno Di Giorgi and   
                Simon Dixon and   
        Massimiliano Zanoni and   
                  Augusto Sarti   A Data-Driven Model of Tonal Chord
                                  Sequence Complexity  . . . . . . . . . . 2237--2250
              N. Stefanakis and   
                 D. Pavlidi and   
                  A. Mouchtaris   Corrections to ``Perpendicular
                                  Cross-Spectra Fusion for Sound Source
                                  Localization With a Planar Microphone
                                  Array'' [Sep 17 1821--1835]  . . . . . . 2251
                      Anonymous   \booktitleIEEE/ACM Transactions on
                                  Audio, Speech, and Language Processing
                                  Edics  . . . . . . . . . . . . . . . . . 2252--2253
                      Anonymous   \booktitleIEEE Transactions on
                                  Multimedia information for authors . . . 2254--2255
                      Anonymous   Open Access  . . . . . . . . . . . . . . 2256
                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1
                      Anonymous   IEEE Signal Processing Society . . . . . C2
                      Anonymous   IEEE Signal Processing Society . . . . . C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 25, Number 12, 2017

                      Anonymous   Table of Contents  . . . . . . . . . . . 2252--2253
                 T. Schultz and   
                  T. Hueber and   
           D. J. Krusienski and   
                 J. S. Brumberg   Introduction to the Special Issue on
                                  Biosignal-Based Spoken Communication . . 2254--2256
              Tanja Schultz and   
               Michael Wand and   
              Thomas Hueber and   
         Dean J. Krusienski and   
            Christian Herff and   
           Jonathan S. Brumberg   Biosignal-Based Spoken Communication: a
                                  Survey . . . . . . . . . . . . . . . . . 2257--2271
         Christopher Dromey and   
             Katherine M. Black   Effects of Laryngeal Activity on
                                  Articulation . . . . . . . . . . . . . . 2272--2280
              Michal Borsky and   
           Daryush D. Mehta and   
         Jarrad H. Van Stan and   
                   Jon Gudnason   Modal and Nonmodal Voice Quality
                                  Classification Using Acoustic and
                                  Electroglottographic Features  . . . . . 2281--2291
 Alborz Rezazadeh Sereshkeh and   
               Robert Trott and   
    Aurélien Bricout and   
                       Tom Chau   EEG Classification of Covert Speech
                                  Using Regularized Neural Networks  . . . 2292--2300
             Reza Sahraeian and   
           Dirk Van Compernolle   Crosslingual and Multilingual Speech
                                  Recognition Based on the Speech Manifold 2301--2312
 \Dbaror\dbare T. Grozdi\'c and   
        Slobodan T. Jovi\vci\'c   Whispered Speech Recognition Using Deep
                                  Denoising Autoencoder and Inverse
                                  Filtering  . . . . . . . . . . . . . . . 2313--2322
              Myungjong Kim and   
                Beiming Cao and   
                    Ted Mau and   
                       Jun Wang   Speaker-Independent Silent Speech
                                  Recognition From Flesh-Point
                                  Articulatory Movements Using an LSTM
                                  Neural Network . . . . . . . . . . . . . 2323--2336
      Patrick Lumban Tobing and   
         Kazuhiro Kobayashi and   
                    Tomoki Toda   Articulatory Controllable Speech
                                  Modification Based on Statistical
                                  Inversion and Production Mappings  . . . 2337--2350
             Ingmar Steiner and   
 Sébastien Le Maguer and   
                Alexander Hewer   Synthesis of Tongue Motion and Acoustics
                                  From Text Using a Multimodal
                                  Articulatory Database  . . . . . . . . . 2351--2361
           Jose A. Gonzalez and   
               Lam A. Cheah and   
             Angel M. Gomez and   
              Phil D. Green and   
           James M. Gilbert and   
             Stephen R. Ell and   
             Roger K. Moore and   
                  Ed Holdsworth   Direct Speech Reconstruction From
                                  Articulatory Sensor Data by Machine
                                  Learning . . . . . . . . . . . . . . . . 2362--2374
             Matthias Janke and   
                  Lorenz Diener   EMG-to-Speech: Direct Generation of
                                  Speech From Facial Electromyographic
                                  Signals  . . . . . . . . . . . . . . . . 2375--2385
       Geoffrey S. Meltzner and   
            James T. Heaton and   
                Yunbin Deng and   
           Gianluca De Luca and   
               Serge H. Roy and   
                Joshua C. Kline   Silent Speech Recognition as an
                                  Alternative Communication Device for
                                  Persons With Laryngectomy  . . . . . . . 2386--2398
                   Fei Chen and   
                   Lan Wang and   
                   Hui Chen and   
                      Gang Peng   Investigations on Mandarin Aspiratory
                                  Animations Using an Airflow Model  . . . 2399--2409
                Wayne Xiong and   
               Jasha Droppo and   
              Xuedong Huang and   
                Frank Seide and   
         Michael L. Seltzer and   
            Andreas Stolcke and   
                    Dong Yu and   
                 Geoffrey Zweig   Toward Human Parity in Conversational
                                  Speech Recognition . . . . . . . . . . . 2410--2423
                 Biao Zhang and   
                 Deyi Xiong and   
                 Jinsong Su and   
                      Hong Duan   A Context-Aware Recurrent Encoder for
                                  Neural Machine Translation . . . . . . . 2424--2432
              Afsaneh Asaei and   
               Milos Cernak and   
          Hervé Bourlard   Perceptual Information Loss due to
                                  Impaired Speech Production . . . . . . . 2433--2443
                    Ning Ma and   
                 Tobias May and   
                   Guy J. Brown   Exploiting Deep Neural Networks and Head
                                  Movements for Robust Binaural
                                  Localization of Multiple Sources in
                                  Reverberant Environments . . . . . . . . 2444--2453
                      Anonymous   List of Reviewers  . . . . . . . . . . . 2454--2457
                      Anonymous   \booktitleIEEE/ACM Transactions on
                                  Audio, Speech, and Language Processing
                                  Edics  . . . . . . . . . . . . . . . . . 2458--2459
                      Anonymous   \booktitleIEEE Transactions on
                                  Multimedia information for authors . . . 2460--2461
                      Anonymous   Open Access  . . . . . . . . . . . . . . 2462
                      Anonymous   2017 Subject Index \booktitleIEEE
                                  Transactions on Applied
                                  Superconductivity Vol. 27  . . . . . . . 2463--2488
                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1
                      Anonymous   IEEE Signal Processing Society . . . . . C2
                      Anonymous   IEEE Signal Processing Society . . . . . C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 26, Number 1, 2018

                      Anonymous   Table of contents  . . . . . . . . . . . 1--2
                      Anonymous   Table of Contents [Edics]  . . . . . . . 3--4
                 Dianna Yee and   
      Homayoun Kamkar-Parsi and   
              Rainer Martin and   
                  Henning Puder   A Noise Reduction Postfilter for
                                  Binaurally Linked Single-Microphone
                                  Hearing Aids Utilizing a Nearby External
                                  Microphone . . . . . . . . . . . . . . . 5--18
  Tom Bäckstròm and   
               Johannes Fischer   Fast Randomization for Distributed
                                  Low-Bitrate Coding of Speech and Audio   19--30
                   Jun Deng and   
                 Xinzhou Xu and   
               Zixing Zhang and   
       Sascha Frühholz and   
            Björn Schuller   Semisupervised Autoencoders for Speech
                                  Emotion Recognition  . . . . . . . . . . 31--43
             Md. Sahidullah and   
Dennis Alexander Lehmann Thomsen and   
Rosa Gonzalez Hautamäki and   
              Tomi Kinnunen and   
              Zheng-Hua Tan and   
               Robert Parts and   
           Martti Pitkänen   Robust Voice Liveness Detection and
                                  Speaker Verification Using Throat
                                  Microphones  . . . . . . . . . . . . . . 44--56
            Gilles Degottex and   
          Pierre Lanchantin and   
                     Mark Gales   A Log Domain Pulse Model for Parametric
                                  Speech Synthesis . . . . . . . . . . . . 57--70
              Johannes Abel and   
                Tim Fingscheidt   Artificial Speech Bandwidth Extension
                                  Using Deep Neural Networks for Wideband
                                  Spectral Envelope Estimation . . . . . . 71--83
                 Yuki Saito and   
       Shinnosuke Takamichi and   
             Hiroshi Saruwatari   Statistical Parametric Speech Synthesis
                                  Incorporating Generative Adversarial
                                  Networks . . . . . . . . . . . . . . . . 84--96
     Kristian Timm Andersen and   
                    Marc Moonen   Robust Speech-Distortion Weighted
                                  Interframe Wiener Filters for
                                  Single-Channel Noise Reduction . . . . . 97--107
                 Chen-Yu Chiang   Cross-Dialect Adaptation Framework for
                                  Constructing Prosodic Models for Chinese
                                  Dialect Text-to-Speech Systems . . . . . 108--121
               Bingquan Liu and   
                    Zhen Xu and   
               Chengjie Sun and   
                Baoxun Wang and   
              Xiaolong Wang and   
              Derek F. Wong and   
                      Min Zhang   Content-Oriented User Modeling for
                                  Personalized Response Ranking in
                                  Chatbots . . . . . . . . . . . . . . . . 122--133
               Zhiyuan Tang and   
                  Dong Wang and   
               Yixiang Chen and   
                 Lantian Li and   
                    Andrew Abel   Phonetic Temporal Neural Model for
                                  Language Identification  . . . . . . . . 134--144
       Soumitro Chakrabarty and   
      Emanuël A. P. Habets   A Bayesian Approach to Informed Spatial
                                  Filtering With Robustness Against DOA
                                  Estimation Errors  . . . . . . . . . . . 145--160
               Kuan-Yu Chen and   
              Shih-Hung Liu and   
                Berlin Chen and   
                  Hsin-Min Wang   An Information Distillation Framework
                                  for Extractive Summarization . . . . . . 161--170
                     Ma Jin and   
                   Yan Song and   
             Ian McLoughlin and   
                    Li-Rong Dai   LID-Senones and Their Statistics for
                                  Language Identification  . . . . . . . . 171--183
               Zhehuai Chen and   
               Jasha Droppo and   
                   Jinyu Li and   
                    Wayne Xiong   Progressive Joint Modeling in
                                  Unsupervised Single-Channel Overlapped
                                  Speech Recognition . . . . . . . . . . . 184--196
             Shivesh Ranjan and   
              John H. L. Hansen   Curriculum Learning Based Approaches for
                                  Noise Robust Speaker Recognition . . . . 197--210

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 26, Number 2, 2018

             Yoshiaki Bando and   
         Katsutoshi Itoyama and   
              Masashi Konyo and   
           Satoshi Tadokoro and   
           Kazuhiro Nakadai and   
           Kazuyoshi Yoshii and   
           Tatsuya Kawahara and   
               Hiroshi G. Okuno   Speech Enhancement Based on Bayesian
                                  Low-Rank and Sparse Decomposition of
                                  Multichannel Magnitude Spectrograms  . . 215--230
               Yu-Ping Ruan and   
                  Qian Chen and   
                  Zhen-Hua Ling   A Sequential Neural Encoder With Latent
                                  Structured Description for Modeling
                                  Sentences  . . . . . . . . . . . . . . . 231--242
            Amelia J. Gully and   
             Helena Daffern and   
               Damian T. Murphy   Diphthong Synthesis Using the Dynamic
                                  $3$D Digital Waveguide Mesh  . . . . . . 243--255
                Chunyang Wu and   
           Mark J. F. Gales and   
                Anton Ragni and   
            Penny Karanasou and   
                   Khe Chai Sim   Improving Interpretability and
                                  Regularization in Deep Learning  . . . . 256--265
                 Kehai Chen and   
                Tiejun Zhao and   
                 Muyun Yang and   
                  Lemao Liu and   
             Akihiro Tamura and   
                   Rui Wang and   
              Masao Utiyama and   
                Eiichiro Sumita   A Neural Approach to Source Dependence
                                  Based Context Model for Statistical
                                  Machine Translation  . . . . . . . . . . 266--280
             Joonas Nikunen and   
           Aleksandr Diment and   
                Tuomas Virtanen   Separation of Moving Sound Sources Using
                                  Multichannel NMF and Acoustic Tracking   281--295
           Johan Swärd and   
                 Hongbin Li and   
              Andreas Jakobsson   Off-Grid Fundamental Frequency
                                  Estimation . . . . . . . . . . . . . . . 296--303
              Dylan Menzies and   
Marcos F. Simón Gálvez and   
             Filippo Maria Fazi   A Low-Frequency Panning Method With
                                  Compensation for Head Rotation . . . . . 304--317
       Branimir Dropulji\'c and   
               Igor Miji\'c and   
         Davor Petrinovi\'c and   
            Tanja Jovanovic and   
           Kre\vsimir \'Cosi\'c   Vocal Analysis of Acoustic Startle
                                  Responses  . . . . . . . . . . . . . . . 318--329
          Philipp Aichinger and   
      Martin Hagmüller and   
   Berit Schneider-Stickler and   
            Jean Schoentgen and   
                 Franz Pernkopf   Tracking of Multiple Fundamental
                                  Frequencies in Diplophonic Voices  . . . 330--341
    Anastasios Alexandridis and   
          Athanasios Mouchtaris   Multiple Sound Source Location
                                  Estimation in Wireless Acoustic Sensor
                                  Networks Using DOA Estimates: The
                                  Data-Association Problem . . . . . . . . 342--356
                Robert Rehr and   
                  Timo Gerkmann   On the Importance of Super-Gaussian
                                  Speech Priors for Machine-Learning Based
                                  Speech Enhancement . . . . . . . . . . . 357--366
        Sonia Djaziri-Larbi and   
      Gaël Mahé and   
              Imen Mezghani and   
                Monia Turki and   
       Mériem Ja\"\idane   Watermark-Driven Acoustic Echo
                                  Cancellation . . . . . . . . . . . . . . 367--378
          Annamaria Mesaros and   
              Toni Heittola and   
          Emmanouil Benetos and   
               Peter Foster and   
           Mathieu Lagrange and   
            Tuomas Virtanen and   
               Mark D. Plumbley   Detection and Classification of Acoustic
                                  Scenes and Events: Outcome of the DCASE
                                  2016 Challenge . . . . . . . . . . . . . 379--393
            Cheng-Tao Chung and   
                   Lin-Shan Lee   Unsupervised Discovery of Structured
                                  Acoustic Tokens With Applications to
                                  Spoken Term Detection  . . . . . . . . . 394--405
                     Tobias May   Robust Speech Dereverberation With a
                                  Neural Network-Based Post-Filter That
                                  Exploits Multi-Conditional Training of
                                  Binaural Cues  . . . . . . . . . . . . . 406--414
           Majid Mirbagheri and   
                  Les Atlas and   
               Adrian K. C. Lee   Regression Factor Analysis With an
                                  Application to Continuous HRIR
                                  Measurement  . . . . . . . . . . . . . . 415--421
                Jen-Tzung Chien   Bayesian Nonparametric Learning for
                                  Hierarchical and Sparse Topics . . . . . 422--435
             Johannes Stahl and   
                 Pejman Mowlaee   A Pitch-Synchronous Simultaneous
                                  Detection-Estimation Framework for
                                  Speech Enhancement . . . . . . . . . . . 436--450

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 26, Number 3, March, 2018

                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1--C1
                      Anonymous   IEEE Signal Processing Society . . . . . C2--C2
                      Anonymous   Table of Contents  . . . . . . . . . . . 457--458
                      Anonymous   Table of Contents [Edics]  . . . . . . . 459--460
             C. D. Salvador and   
                S. Sakamoto and   
          J. Treviño and   
                      Y. Suzuki   Boundary Matching Filters for Spherical
                                  Microphone and Loudspeaker Arrays  . . . 461--474
                A. H. Abdelaziz   Comparing Fusion Models for DNN-Based
                                  Audiovisual Continuous Speech
                                  Recognition  . . . . . . . . . . . . . . 475--484
                       S. Emura   Residual Echo Reduction for Multichannel
                                  Acoustic Echo Cancelers With a
                                  Complex-Valued Residual Echo Estimate    485--500
                   V. H. Do and   
                 N. F. Chen and   
                  B. P. Lim and   
         M. A. Hasegawa-Johnson   Multitask Learning for Phone Recognition
                                  of Underresourced Languages Using
                                  Mismatched Transcription . . . . . . . . 501--514
               M. Zohourian and   
                  G. Enzner and   
                      R. Martin   Binaural Speaker Localization Integrated
                                  Into an Adaptive Beamformer for Hearing
                                  Aids . . . . . . . . . . . . . . . . . . 515--528
                   Y. Xiang and   
           I. Natgunanathan and   
                    D. Peng and   
                     G. Hua and   
                         B. Liu   Spread Spectrum Audio Watermarking Using
                                  Multiple Orthogonal PN Sequences and
                                  Variable Embedding Strengths and
                                  Polarities . . . . . . . . . . . . . . . 529--539
                     C. Tan and   
                     F. Wei and   
                    Q. Zhou and   
                    N. Yang and   
                      B. Du and   
                      W. Lv and   
                        M. Zhou   Context-Aware Answer Sentence Selection
                                  With Hierarchical Gated Recurrent Neural
                                  Networks . . . . . . . . . . . . . . . . 540--549
                   J. Zhang and   
              S. P. Chepuri and   
             R. C. Hendriks and   
                    R. Heusdens   Microphone Subset Selection for MVDR
                                  Beamformer Based Noise Reduction . . . . 550--563
                    S. Wang and   
                     P. Lin and   
                    Y. Tsao and   
                    J. Hung and   
                          B. Su   Suppression by Selecting Wavelets for
                                  Feature Compression in Distributed
                                  Speech Recognition . . . . . . . . . . . 564--579
                    Y. Wang and   
                     M. Brookes   Model-Based Speech Enhancement in the
                                  Modulation Domain  . . . . . . . . . . . 580--594
                 C. Huemmer and   
                 C. Hofmann and   
                    R. Maas and   
                  W. Kellermann   Estimating Parameters of Nonlinear
                                  Systems Using the Elitist Particle
                                  Filter Based on Evolutionary Strategies  595--608
                 D. Salvati and   
                  C. Drioli and   
                  G. L. Foresti   A Low-Complexity Robust Beamforming
                                  Using Diagonal Unloading for Acoustic
                                  Source Localization  . . . . . . . . . . 609--622
                      J. Su and   
                    J. Zeng and   
                   D. Xiong and   
                     Y. Liu and   
                    M. Wang and   
                         J. Xie   A Hierarchy-to-Sequence Attentional
                                  Neural Machine Translation Model . . . . 623--632
               W. B. Kheder and   
                 D. Matrouf and   
                   M. Ajili and   
                    J. Bonastre   A Unified Joint Model to Deal With
                                  Nuisance Variabilities in the $i$-Vector
                                  Space  . . . . . . . . . . . . . . . . . 633--645
                   G. Gelly and   
                     J. Gauvain   Optimization of RNN-Based Speech
                                  Activity Detection . . . . . . . . . . . 646--656
                 M. Taseska and   
                E. A. P. Habets   Blind Source Separation of Moving
                                  Sources Using Sparsity-Based Source
                                  Detection and Tracking . . . . . . . . . 657--670
                      L. Yu and   
                    J. Wang and   
                  K. R. Lai and   
                       X. Zhang   Refining Word Embeddings Using Intensity
                                  Scores for Sentiment Analysis  . . . . . 671--681
                  Y. Dorfan and   
                  A. Plinge and   
                   G. Hazan and   
                      S. Gannot   Distributed Expectation-Maximization
                                  Algorithm for Speaker Localization in
                                  Reverberant Environments . . . . . . . . 682--695
                      Anonymous   \booktitleIEEE/ACM Transactions on
                                  Audio, Speech, and Language Processing
                                  Edics  . . . . . . . . . . . . . . . . . 696--697
                      Anonymous   \booktitleIEEE Transactions on
                                  Multimedia information for authors . . . 698--699
                      Anonymous   Open Access  . . . . . . . . . . . . . . 700--700
                      Anonymous   Introducing IEEE Collabratec . . . . . . 701--701
                      Anonymous   IEEE Signal Processing Society . . . . . C3--C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4--C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 26, Number 4, April, 2018

                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1--C1
                      Anonymous   IEEE Signal Processing Society . . . . . C2--C2
                      Anonymous   Table of Contents  . . . . . . . . . . . 696--697
                      Anonymous   Table of Contents [Edics]  . . . . . . . 698--699
                     Z. Tan and   
                     M. Mak and   
                      B. K. Mak   DNN-Based Score Calibration With
                                  Multitask Learning for Noise Robust
                                  Speaker Verification . . . . . . . . . . 700--712
                      Y. Hu and   
                        Z. Ling   Extracting Spectral Features Using Deep
                                  Autoencoders With Binary Distributed
                                  Hidden Units for Statistical Parametric
                                  Speech Synthesis . . . . . . . . . . . . 713--724
       B. Laufer-Goldshtein and   
                  R. Talmon and   
                      S. Gannot   A Hybrid Approach for Speaker Tracking
                                  Based on TDOA and Data-Driven Models . . 725--735
                  S. Cumani and   
                      P. Laface   Speaker Recognition Using e Vectors  . . 736--748
                      L. Xu and   
                  K. A. Lee and   
                      H. Li and   
                        Z. Yang   Generalizing I-Vector Estimation for
                                  Rapid Speaker Recognition  . . . . . . . 749--759
                 Y. Buchris and   
                   I. Cohen and   
                     J. Benesty   Frequency-Domain Design of Asymmetric
                                  Circular Differential Microphone Arrays  760--773
                   J. Zhang and   
           T. D. Abhayapala and   
                   W. Zhang and   
         P. N. Samarasinghe and   
                       S. Jiang   Active Noise Control Over Space: a Wave
                                  Domain Approach  . . . . . . . . . . . . 774--786
                     Y. Luo and   
                    Z. Chen and   
                   N. Mesgarani   Speaker-Independent Speech Separation
                                  With Deep Attractor Network  . . . . . . 787--796
                  N. M. Joy and   
             S. R. Kothinti and   
                       S. Umesh   FMLLR Speaker Normalization With
                                  i-Vector: In Pseudo-FMLLR and
                                  Distillation Framework . . . . . . . . . 797--805
                 S. Chandna and   
                        W. Wang   Bootstrap Averaging for Model-Based
                                  Source Separation in Reverberant
                                  Conditions . . . . . . . . . . . . . . . 806--819
                     Z. Tan and   
                     M. Mak and   
                  B. K. Mak and   
                         Y. Zhu   Denoised Senone I-Vectors for Robust
                                  Speaker Verification . . . . . . . . . . 820--830
                 K. Itakura and   
                   Y. Bando and   
                E. Nakamura and   
                 K. Itoyama and   
                  K. Yoshii and   
                    T. Kawahara   Bayesian Multichannel Audio Source
                                  Separation Based on Integrated Source
                                  and Spatial Models . . . . . . . . . . . 831--846
                      Anonymous   \booktitleIEEE/ACM Transactions on
                                  Audio, Speech, and Language Processing
                                  Edics  . . . . . . . . . . . . . . . . . 847--848
                      Anonymous   \booktitleIEEE Transactions on
                                  Multimedia information for authors . . . 849--850
                      Anonymous   Open Access  . . . . . . . . . . . . . . 851--851
                      Anonymous   Introducing IEEE Collabratec . . . . . . 852--852
                      Anonymous   IEEE Signal Processing Society . . . . . C3--C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4--C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 26, Number 5, May, 2018

                      Anonymous   Table of Contents  . . . . . . . . . . . 853--854
                      Anonymous   Table of Contents [Edics]  . . . . . . . 855--856
                 Y. E. Baba and   
                 A. Walther and   
                E. A. P. Habets   $3$D Room Geometry Inference Based on
                                  Room Impulse Response Stacks . . . . . . 857--872
                   Q. Zhang and   
                J. H. L. Hansen   Language/Dialect Recognition Based on
                                  Unsupervised Deep Learning . . . . . . . 873--882
                    Z. Ling and   
                      Y. Ai and   
                      Y. Gu and   
                         L. Dai   Waveform Modeling and Generation Using
                                  Hierarchical Recurrent Neural Networks
                                  for Speech Bandwidth Extension . . . . . 883--894
                M. Delcroix and   
               K. Kinoshita and   
                   A. Ogawa and   
                 C. Huemmer and   
                    T. Nakatani   Context Adaptive Neural Network Based
                                  Acoustic Models for Rapid Adaptation . . 895--908
              L. T. T. Tran and   
             S. E. Nordholm and   
                H. Schepker and   
                  H. H. Dam and   
                       S. Doclo   Two-Microphone Hearing Aids Using
                                  Prediction Error Method for Adaptive
                                  Feedback Control . . . . . . . . . . . . 909--923
                   J. Chang and   
                   M. Marschall   Periphony-Lattice Mixed-Order Ambisonic
                                  Scheme for Spherical Microphone Arrays   924--936
                N. Dionelis and   
                     M. Brookes   Phase-Aware Single-Channel Speech
                                  Enhancement With Modulation-Domain
                                  Kalman Filtering . . . . . . . . . . . . 937--950
                   C. Zheng and   
               A. Deleforge and   
                      X. Li and   
                  W. Kellermann   Statistical Analysis of the Multichannel
                                  Wiener Filter Using a Bivariate Normal
                                  Distribution for Sample Covariance
                                  Matrices . . . . . . . . . . . . . . . . 951--966
                     C. Vaz and   
           V. Ramanarayanan and   
                   S. Narayanan   Acoustic Denoising Using Dictionary
                                  Learning With Spectral and Temporal
                                  Regularization . . . . . . . . . . . . . 967--980
                    L. Wang and   
                   A. Cavallaro   Pseudo-Determined Blind Source
                                  Separation for Ad-hoc Microphone
                                  Networks . . . . . . . . . . . . . . . . 981--994
                  S. Cumani and   
                      P. Laface   Scoring Heterogeneous Speaker Vectors
                                  Using Nonlinear Transformations and Tied
                                  PLDA Models  . . . . . . . . . . . . . . 995--1009
                G. Bernardi and   
         T. van Waterschoot and   
                 J. Wouters and   
                      M. Moonen   Subjective and Objective Sound-Quality
                                  Evaluation of Adaptive Feedback
                                  Cancellation Algorithms  . . . . . . . . 1010--1024
                      Anonymous   \booktitleIEEE/ACM Transactions on
                                  Audio, Speech, and Language Processing
                                  Edics  . . . . . . . . . . . . . . . . . 1025--1026
                      Anonymous   \booktitleIEEE Transactions on
                                  Multimedia information for authors . . . 1027--1028
                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1--C1
                      Anonymous   IEEE Signal Processing Society . . . . . C2--C2
                      Anonymous   IEEE Signal Processing Society . . . . . C3--C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4--C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 26, Number 6, June, 2018

                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1--C1
                      Anonymous   IEEE Signal Processing Society . . . . . C2--C2
                      Anonymous   Table of Contents  . . . . . . . . . . . 1025--1026
                      Anonymous   Table of Contents [Edics]  . . . . . . . 1027--1028
                 H. Kameoka and   
                 T. Higuchi and   
                  M. Tanaka and   
                          L. Li   Nonnegative Matrix Factorization With
                                  Basis Clustering Using Cepstral Distance
                                  Regularization . . . . . . . . . . . . . 1029--1040
                  J. Donley and   
                    C. Ritz and   
                   W. B. Kleijn   Multizone Soundfield Reproduction With
                                  Privacy- and Quality-Based Speech
                                  Masking Filters  . . . . . . . . . . . . 1041--1055
                   S. Braun and   
             A. Kuklasi ski and   
                O. Schwartz and   
               O. Thiergart and   
            E. A. P. Habets and   
                  S. Gannot and   
                   S. Doclo and   
                      J. Jensen   Evaluation and Comparison of Late
                                  Reverberation Power Spectral Density
                                  Estimators . . . . . . . . . . . . . . . 1056--1071
             E. L. Benaroya and   
                    N. Obin and   
                   M. Liuni and   
                  A. Roebel and   
                  W. Raumel and   
                  S. Argentieri   Binaural Localization of Multiple Sound
                                  Sources by Non-Negative Tensor
                                  Factorization  . . . . . . . . . . . . . 1072--1082
               N. Perraudin and   
               N. Holighaus and   
                  P. Majdak and   
                      P. Balazs   Inpainting of Long Audio Segments With
                                  Similarity Graphs  . . . . . . . . . . . 1083--1094
                  P. Magron and   
                  R. Badeau and   
                       B. David   Model-Based STFT Phase Recovery for
                                  Audio Source Separation  . . . . . . . . 1095--1105
                 I. Kodrasi and   
                       S. Doclo   Analysis of Eigenvalue
                                  Decomposition-Based Late Reverberation
                                  Power Spectral Density Estimation  . . . 1106--1118
                   S. Braun and   
                E. A. P. Habets   Linear Prediction-Based Online
                                  Dereverberation and Noise Reduction
                                  Using Alternating Kalman Filters . . . . 1119--1129
                     D. Ram and   
                   A. Asaei and   
                    H. Bourlard   Sparse Subspace Modeling for Query by
                                  Example Spoken Term Detection  . . . . . 1130--1143
         M. Krawczyk-Becker and   
                    T. Gerkmann   On Speech Enhancement Under PSD
                                  Uncertainty  . . . . . . . . . . . . . . 1144--1153
                S. Leglaive and   
                  R. Badeau and   
                     G. Richard   Student's $t$-Source and Mixing Models
                                  for Multichannel Audio Source Separation 1154--1168
                      Anonymous   \booktitleIEEE/ACM Transactions on
                                  Audio, Speech, and Language Processing
                                  Edics  . . . . . . . . . . . . . . . . . 1169--1170
                      Anonymous   \booktitleIEEE Transactions on
                                  Multimedia information for authors . . . 1171--1172
                      Anonymous   IEEE Signal Processing Society . . . . . C3--C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4--C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 26, Number 7, July, 2018

                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1--C1
                      Anonymous   IEEE Signal Processing Society . . . . . C2--C2
                      Anonymous   Table of Contents  . . . . . . . . . . . 1173--1174
                      Anonymous   Table of Contents [Edics]  . . . . . . . 1175--1176
               T. Yoshimura and   
               K. Hashimoto and   
                    K. Oura and   
                 Y. Nankaku and   
                      K. Tokuda   Mel-Cepstrum-Based Quantization Noise
                                  Shaping Applied to Neural-Network-Based
                                  Speech Waveform Synthesis  . . . . . . . 1177--1184
                    Q. Wang and   
                      J. Du and   
                     L. Dai and   
                         C. Lee   A Multiobjective Learning and Ensembling
                                  Approach to High-Performance Speech
                                  Enhancement With Compact Neural Network
                                  Architectures  . . . . . . . . . . . . . 1185--1197
      M. Á. Del-Agua and   
          A. Giménez and   
                 A. Sanchis and   
                  J. Civera and   
                        A. Juan   Speaker-Adapted Confidence Measures for
                                  ASR Using Deep Bidirectional Recurrent
                                  Neural Networks  . . . . . . . . . . . . 1198--1206
          J. Proença and   
                   C. Lopes and   
                  M. Tjalve and   
                 A. Stolcke and   
                S. Candeias and   
             F. Perdigão   Mispronunciation Detection in Children's
                                  Reading of Sentences . . . . . . . . . . 1207--1219
      Ljubi\vsa Stankovi\'c and   
             Milo\vs Brajovi\'c   Analysis of the Reconstruction of Sparse
                                  Signals in the DCT Domain Applied to
                                  Audio Signals  . . . . . . . . . . . . . 1220--1235
               J. F. Santos and   
                     T. H. Falk   Speech Dereverberation With
                                  Context-Aware Recurrent Neural Networks  1236--1246
               M. Geronazzo and   
                 S. Spagnol and   
                    F. Avanzini   Do We Need Individual Head-Related
                                  Transfer Functions for Vertical
                                  Localization? The Case Study of a
                                  Spectral Notch Distance Metric . . . . . 1247--1260
               D. Marquardt and   
                       S. Doclo   Interaural Coherence Preservation for
                                  Binaural Noise Reduction Using Partial
                                  Noise Estimation and Spectral
                                  Postfiltering  . . . . . . . . . . . . . 1261--1274
                 M. Farmani and   
             M. S. Pedersen and   
                     Z. Tan and   
                      J. Jensen   Bias-Compensated Informed Sound Source
                                  Localization Using Relative Transfer
                                  Functions  . . . . . . . . . . . . . . . 1275--1289
                     F. Tao and   
                       C. Busso   Gating Neural Network for Large
                                  Vocabulary Audiovisual Speech
                                  Recognition  . . . . . . . . . . . . . . 1290--1302
                      Anonymous   \booktitleIEEE/ACM Transactions on
                                  Audio, Speech, and Language Processing
                                  Edics  . . . . . . . . . . . . . . . . . 1303--1304
                      Anonymous   \booktitleIEEE Transactions on
                                  Multimedia information for authors . . . 1305--1306
                      Anonymous   IEEE Signal Processing Society . . . . . C3--C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4--C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 26, Number 8, August, 2018

                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1--C1
                      Anonymous   IEEE Signal Processing Society . . . . . C2--C2
                      Anonymous   Table of Contents  . . . . . . . . . . . 1303--1304
                      Anonymous   Table of Contents [Edics]  . . . . . . . 1305--1306
                   Z. Rafii and   
                 A. Liutkus and   
             F. Stöter and   
            S. I. Mimilakis and   
              D. FitzGerald and   
                       B. Pardo   An Overview of Lead and Accompaniment
                                  Separation in Music  . . . . . . . . . . 1307--1335
                    C. Wang and   
                    J. Wang and   
                 A. Santoso and   
                  C. Chiang and   
                          C. Wu   Sound Event Recognition Using
                                  Auditory-Receptive-Field Binary Pattern
                                  and Hierarchical-Diving Deep Belief
                                  Network  . . . . . . . . . . . . . . . . 1336--1351
                    L. Yang and   
                   M. Zhang and   
                     Y. Liu and   
                     M. Sun and   
                      N. Yu and   
                          G. Fu   Joint POS Tagging and Dependence Parsing
                                  With Transition-Based Neural Networks    1352--1358
                      K. Yu and   
                    Z. Zhao and   
                      X. Wu and   
                     H. Lin and   
                         X. Liu   Rich Short Text Conversation Using
                                  Semantic-Key-Controlled Sequence
                                  Generation . . . . . . . . . . . . . . . 1359--1368
                  B. Lehner and   
           J. Schlüter and   
                      G. Widmer   Online, Loudness-Invariant Vocal
                                  Detection in Mixed Music Signals . . . . 1369--1380
                   S. Stone and   
                  M. Marxen and   
                    P. Birkholz   Construction and Evaluation of a
                                  Parametric One-Dimensional Vocal Tract
                                  Model  . . . . . . . . . . . . . . . . . 1381--1392
                     T. Tan and   
                    Y. Qian and   
                      H. Hu and   
                    Y. Zhou and   
                    W. Ding and   
                          K. Yu   Adaptive Very Deep Convolutional
                                  Residual Network for Noise Robust Speech
                                  Recognition  . . . . . . . . . . . . . . 1393--1405
                    X. Wang and   
                  S. Takaki and   
                   J. Yamagishi   Autoregressive Neural F0 Model for
                                  Statistical Parametric Speech Synthesis  1406--1419
      C. Valentini-Botinhao and   
                   J. Yamagishi   Speech Enhancement of Noisy and
                                  Reverberant Speech for Text-to-Speech    1420--1433
         A. I. Koutrouvelis and   
              T. W. Sherson and   
                R. Heusdens and   
                 R. C. Hendriks   A Low-Cost Robust Distributed Linearly
                                  Constrained Beamformer for Wireless
                                  Acoustic Sensor Networks With Arbitrary
                                  Topology . . . . . . . . . . . . . . . . 1434--1448
                      Anonymous   \booktitleIEEE/ACM Transactions on
                                  Audio, Speech, and Language Processing
                                  Edics  . . . . . . . . . . . . . . . . . 1449--1450
                      Anonymous   \booktitleIEEE Transactions on
                                  Multimedia information for authors . . . 1451--1452
                      Anonymous   IEEE Signal Processing Society . . . . . C3--C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4--C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 26, Number 9, September, 2018

                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1--C1
                      Anonymous   IEEE Signal Processing Society . . . . . C2--C2
                      Anonymous   Table of Contents  . . . . . . . . . . . 1453--1454
                      Anonymous   Table of Contents [Edics]  . . . . . . . 1455--1456
                      C. Wu and   
                 C. Dittmar and   
                C. Southall and   
                    R. Vogl and   
                  G. Widmer and   
                 J. Hockman and   
             M. Müller and   
                       A. Lerch   A Review of Automatic Drum Transcription 1457--1483
                   C. Evers and   
                   P. A. Naylor   Acoustic SLAM  . . . . . . . . . . . . . 1484--1498
                 C. Laroche and   
                M. Kowalski and   
            H. Papadopoulos and   
                     G. Richard   Hybrid Projective Nonnegative Matrix
                                  Factorization With Drum Dictionaries for
                                  Harmonic/Percussive Source Separation    1499--1511
        J. J. Carabias-Orti and   
                 J. Nikunen and   
                T. Virtanen and   
                P. Vera-Candeas   Multichannel Blind Sound Source
                                  Separation Using Spatial Covariance
                                  Model With Level and Time Differences
                                  and Nonnegative Matrix Factorization . . 1512--1527
                   M. Zhang and   
                      N. Yu and   
                          G. Fu   A Simple and Effective Neural Model for
                                  Joint Word Segmentation and POS Tagging  1528--1538
                 D. Menzies and   
                     F. M. Fazi   A Complex Panning Method for Near-Field
                                  Imaging  . . . . . . . . . . . . . . . . 1539--1548
                   A. Misra and   
                J. H. L. Hansen   Maximum-Likelihood Linear Transformation
                                  for Unsupervised Domain Adaptation in
                                  Speaker Verification . . . . . . . . . . 1549--1558
             Y. Wakabayashi and   
                T. Fukumori and   
                M. Nakayama and   
                T. Nishiura and   
                   Y. Yamashita   Single-Channel Speech Enhancement With
                                  Phase Reconstruction Based on Phase
                                  Distortion Averaging . . . . . . . . . . 1559--1569
                      S. Fu and   
                    T. Wang and   
                    Y. Tsao and   
                      X. Lu and   
                       H. Kawai   End-to-End Waveform Utterance
                                  Enhancement for Direct Evaluation
                                  Metrics Optimization by Fully
                                  Convolutional Neural Networks  . . . . . 1570--1584
                    K. Xiao and   
                    S. Wang and   
                     M. Wan and   
                          L. Wu   Radiated Noise Suppression for
                                  Electrolarynx Speech Based on Multiband
                                  Time-Domain Amplitude Modulation . . . . 1585--1593
                   A. Fahim and   
         P. N. Samarasinghe and   
               T. D. Abhayapala   PSD Estimation and Source Separation in
                                  a Noisy Reverberant Environment Using a
                                  Spherical Microphone Array . . . . . . . 1594--1607
                      H. He and   
                    J. Chen and   
                 J. Benesty and   
                        T. Yang   Noise Robust Frequency-Domain Adaptive
                                  Blind Multichannel Identification With$
                                  \ell_p$-Norm Constraint  . . . . . . . . 1608--1619
                   W. Zhang and   
                    Z. Chen and   
                     F. Yin and   
                       Q. Zhang   Melody Extraction From Polyphonic Music
                                  Using Particle Filter and Dynamic
                                  Programming  . . . . . . . . . . . . . . 1620--1632
                   C. Zhang and   
                K. Koishida and   
                J. H. L. Hansen   Text-Independent Speaker Verification
                                  Based on Triplet Convolutional Neural
                                  Network Embeddings . . . . . . . . . . . 1633--1644
                   A. R. MV and   
                    P. K. Ghosh   PSFM A Probabilistic Source Filter Model
                                  for Noise Robust Glottal Closure Instant
                                  Detection  . . . . . . . . . . . . . . . 1645--1657
              M. Airaksinen and   
                  L. Juvela and   
              B. Bollepalli and   
               J. Yamagishi and   
                        P. Alku   A Comparison Between STRAIGHT, Glottal,
                                  and Sinusoidal Vocoding in Statistical
                                  Parametric Speech Synthesis  . . . . . . 1658--1670
             G. Mahé and   
                  M. Ja\"\idane   Perceptually Controlled Reshaping of
                                  Sound Histograms . . . . . . . . . . . . 1671--1683
                   Q. Huang and   
                   L. Zhang and   
                        Y. Fang   Two-Step Spherical Harmonics ESPRIT-Type
                                  Algorithms and Performance Analysis  . . 1684--1697
                      Anonymous   \booktitleIEEE/ACM Transactions on
                                  Audio, Speech, and Language Processing
                                  Edics  . . . . . . . . . . . . . . . . . 1698--1699
                      Anonymous   \booktitleIEEE Transactions on
                                  Multimedia information for authors . . . 1700--1702
                      Anonymous   IEEE Signal Processing Society . . . . . C3--C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4--C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 26, Number 10, October, 2018

                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1--C1
                      Anonymous   IEEE Signal Processing Society . . . . . C2--C2
                      Anonymous   Table of Contents  . . . . . . . . . . . 1698--1699
                      Anonymous   Table of Contents [Edics]  . . . . . . . 1700--1701
                    D. Wang and   
                        J. Chen   Supervised Speech Separation Based on
                                  Deep Learning: an Overview . . . . . . . 1702--1726
                    R. Wang and   
                 M. Utiyama and   
                   A. Finch and   
                     L. Liu and   
                    K. Chen and   
                      E. Sumita   Sentence Selection and Weighting for
                                  Neural Machine Translation Domain
                                  Adaptation . . . . . . . . . . . . . . . 1727--1741
                 F. U. Khan and   
               B. P. Milner and   
                    T. Le Cornu   Using Visual Speech Information in
                                  Masking Methods for Audio Speaker
                                  Separation . . . . . . . . . . . . . . . 1742--1754
                      X. Li and   
                  S. Gannot and   
                   L. Girin and   
                      R. Horaud   Multichannel Identification and
                                  Nonnegative Equalization for
                                  Dereverberation and Noise Reduction
                                  Based on Convolutive Transfer Function   1755--1768
   Lütfi Kerem \cSenel and   
               \.Ihsan Utlu and   
        Veysel Yücesoy and   
           Aykut Koç and   
             Tolga Çukur   Semantic Structure and Interpretability
                                  of Word Embeddings . . . . . . . . . . . 1769--1779
                 Y. Koizumi and   
                    K. Niwa and   
                   Y. Hioka and   
               K. Kobayashi and   
                      Y. Haneda   DNN-Based Source Enhancement to Increase
                                  Objective Sound Quality Assessment Score 1780--1792
               C. Paleologu and   
                 J. Benesty and   
                  S. Ciochin\ua   Linear System Identification Based on a
                                  Kronecker Product Decomposition  . . . . 1793--1808
                   F. Xiong and   
                  S. Goetze and   
               B. Kollmeier and   
                    B. T. Meyer   Exploring Auditory-Inspired Acoustic
                                  Features for Room Acoustic Parameter
                                  Estimation From Monaural Speech  . . . . 1809--1820
                  G. Le Lan and   
                 D. Charlet and   
                 A. Larcher and   
                    S. Meignier   An Adaptive Method for Cross-Recording
                                  Speaker Diarization  . . . . . . . . . . 1821--1832
                     W. Xue and   
                A. H. Moore and   
                 M. Brookes and   
                   P. A. Naylor   Modulation-Domain Multichannel Kalman
                                  Filtering for Speech Enhancement . . . . 1833--1847
                      K. Wu and   
                 V. G. Reju and   
                 A. W. H. Khong   Multisource DOA Estimation in a
                                  Reverberant Environment Using a Single
                                  Acoustic Vector Sensor . . . . . . . . . 1848--1859
                   J. Huang and   
                     Y. Sun and   
                   W. Zhang and   
                    H. Wang and   
                         T. Liu   Entity Highlight Generation as
                                  Statistical and Neural Machine
                                  Translation  . . . . . . . . . . . . . . 1860--1872
                   Q. T. Do and   
                   S. Sakti and   
                    S. Nakamura   Sequence-to-Sequence Models for Emphasis
                                  Speech Translation . . . . . . . . . . . 1873--1883
                 F. Fontana and   
                       E. Bozzo   Explicit Fixed-Point Computation of
                                  Nonlinear Delay-Free Loop Filter
                                  Networks . . . . . . . . . . . . . . . . 1884--1896
                     S. Widmark   Causal IIR Audio Precompensator Filters
                                  Subject to Quadratic Constraints . . . . 1897--1912
                  F. Winter and   
               H. Wierstorf and   
                    C. Hold and   
             F. Krüger and   
                   A. Raake and   
                       S. Spors   Colouration in Local Wave Field
                                  Synthesis  . . . . . . . . . . . . . . . 1913--1924
             A. H. Andersen and   
              J. M. de Haan and   
                     Z. Tan and   
                      J. Jensen   Nonintrusive Speech Intelligibility
                                  Prediction Using Convolutional Neural
                                  Networks . . . . . . . . . . . . . . . . 1925--1939
                      Anonymous   \booktitleIEEE/ACM Transactions on
                                  Audio, Speech, and Language Processing
                                  Edics  . . . . . . . . . . . . . . . . . 1940--1941
                      Anonymous   \booktitleIEEE Transactions on
                                  Multimedia information for authors . . . 1942--1944
                      Anonymous   IEEE Signal Processing Society . . . . . C3--C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4--C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 26, Number 11, November, 2018

                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1--C1
                      Anonymous   IEEE Signal Processing Society . . . . . C2--C2
                      Anonymous   Table of Contents  . . . . . . . . . . . 1945--1946
                      Anonymous   Table of Contents [Edics]  . . . . . . . 1947--1948
                  H. Hadian and   
                  H. Sameti and   
                   D. Povey and   
                   S. Khudanpur   Flat-Start Single-Stage Discriminatively
                                  Trained HMM-Based Models for ASR . . . . 1949--1961
                F. Katzberg and   
                   R. Mazur and   
                   M. Maass and   
                    P. Koch and   
                     A. Mertins   A Compressed Sensing Framework for
                                  Dynamic Sound-Field Measurements . . . . 1962--1975
                  H. Sundar and   
            T. V. Sreenivas and   
             C. S. Seelamantula   TDOA-Based Multiple Acoustic Source
                                  Localization Without Association
                                  Ambiguity  . . . . . . . . . . . . . . . 1976--1990
               R. Sahraeian and   
             D. Van Compernolle   Cross-Entropy Training of DNN Ensemble
                                  Acoustic Models for Low-Resource ASR . . 1991--2001
                  H. Dinkel and   
                    Y. Qian and   
                          K. Yu   Investigating Raw Wave Deep Neural
                                  Networks for End-to-End Speaker Spoofing
                                  Detection  . . . . . . . . . . . . . . . 2002--2014
                   J. Zhang and   
                R. Heusdens and   
                 R. C. Hendriks   Rate-Distributed Spatial Filtering Based
                                  Noise Reduction in Wireless Acoustic
                                  Sensor Networks  . . . . . . . . . . . . 2015--2026
                    M. Heck and   
                   S. Sakti and   
                    S. Nakamura   Dirichlet Process Mixture of Mixtures
                                  Model for Unsupervised Subword Modeling  2027--2042
                     S. Nie and   
                   S. Liang and   
                     W. Liu and   
                   X. Zhang and   
                         J. Tao   Deep Learning Based Speech Separation
                                  via NMF-Style Reconstructions  . . . . . 2043--2055
                   H. Dubey and   
                 A. Sangwan and   
                J. H. L. Hansen   Leveraging Frequency-Dependent Kernel
                                  and DIP-Based Clustering for Robust
                                  Speech Activity Detection in
                                  Naturalistic Audio Streams . . . . . . . 2056--2071
                    Y. Jang and   
                     J. Ham and   
                     B. Lee and   
                         K. Kim   Cross-Language Neural Dialog State
                                  Tracker for Large Ontologies Using
                                  Hierarchical Attention . . . . . . . . . 2072--2082
                   G. Weisz and   
            P. Budzianowski and   
                      P. Su and   
                   M. Ga\vsi\'c   Sample Efficient Deep Reinforcement
                                  Learning for Dialogue Systems With Large
                                  Action Spaces  . . . . . . . . . . . . . 2083--2097
                         S. Lin   Reverberation-Robust Localization of
                                  Speakers Using Distinct Speech Onsets
                                  and Multichannel Cross Correlations  . . 2098--2111
                  S. Abidin and   
                 R. Togneri and   
                       F. Sohel   Spectrotemporal Analysis Using Local
                                  Binary Pattern Variants for Acoustic
                                  Scene Classification . . . . . . . . . . 2112--2121
                      N. Ma and   
             J. A. Gonzalez and   
                    G. J. Brown   Robust Binaural Localization of a Target
                                  Sound Source by Combining Spectral
                                  Source Models and Deep Neural Networks   2122--2131
                      S. Wu and   
                   D. Zhang and   
                   Z. Zhang and   
                    N. Yang and   
                      M. Li and   
                        M. Zhou   Dependency-to-Dependency Neural Machine
                                  Translation  . . . . . . . . . . . . . . 2132--2141
                      J. Xu and   
                      H. He and   
                     X. Sun and   
                     X. Ren and   
                          S. Li   Cross-Domain and Semisupervised Named
                                  Entity Recognition in Chinese Social
                                  Media: a Unified Model . . . . . . . . . 2142--2152
                S. Van Kuyk and   
               W. B. Kleijn and   
                 R. C. Hendriks   An Evaluation of Intrusive Instrumental
                                  Intelligibility Metrics  . . . . . . . . 2153--2166
                  X. Ouyang and   
                      K. Gu and   
                        P. Zhou   Spatial Pyramid Pooling Mechanism in 3D
                                  Convolutional Network for Sentence-Level
                                  Classification . . . . . . . . . . . . . 2167--2179
                   B. McFee and   
                 J. Salamon and   
                    J. P. Bello   Adaptive Pooling Operators for Weakly
                                  Labeled Sound Event Detection  . . . . . 2180--2193
               I. Barbancho and   
              G. Tzanetakis and   
            A. M. Barbancho and   
            L. J. Tardón   Discrimination Between
                                  Ascending/Descending Pitch Arpeggios . . 2194--2203
                     Y. Kim and   
                     M. Kim and   
                     J. Goo and   
                         H. Kim   Learning Self-Informed Feature
                                  Contribution for Deep Learning-Based
                                  Acoustic Modeling  . . . . . . . . . . . 2204--2214
   M. B. Çöteli and   
                   O. Olgun and   
      H. Hacìhabibo\uglu   Multiple Sound Source Localization With
                                  Steered Response Power Density and
                                  Hierarchical Grid Refinement . . . . . . 2215--2229
                     J. Bao and   
                    Y. Gong and   
                    N. Duan and   
                    M. Zhou and   
                        T. Zhao   Question Generation With Doubly
                                  Adversarial Nets . . . . . . . . . . . . 2230--2239
                      B. Bu and   
                     C. Bao and   
                         M. Jia   Design of a Planar First-Order
                                  Loudspeaker Array for Global Active
                                  Noise Control  . . . . . . . . . . . . . 2240--2250
                      Anonymous   \booktitleIEEE/ACM Transactions on
                                  Audio, Speech, and Language Processing
                                  Edics  . . . . . . . . . . . . . . . . . 2251--2252
                      Anonymous   \booktitleIEEE Transactions on
                                  Multimedia information for authors . . . 2253--2255
                      Anonymous   IEEE Signal Processing Society . . . . . C3--C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4--C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 26, Number 12, December, 2018

                      Anonymous   Front Cover  . . . . . . . . . . . . . . C1--C1
                      Anonymous   IEEE Signal Processing Society . . . . . C2--C2
                      Anonymous   Table of Contents  . . . . . . . . . . . 2251--2252
                      Anonymous   Table of Contents [Edics]  . . . . . . . 2253--2254
                    X. Wang and   
                      Z. Tu and   
                       M. Zhang   Incorporating Statistical Machine
                                  Translation Word Knowledge Into Neural
                                  Machine Translation  . . . . . . . . . . 2255--2266
                    Y. Zhao and   
       M. Kuruvilla-Dugdale and   
                        M. Song   Structured Sparse Spectral Transforms
                                  and Structural Measures for Voice
                                  Conversion . . . . . . . . . . . . . . . 2267--2276
                  H. Salehi and   
                 D. Suelzle and   
                P. Folkeard and   
                       V. Parsa   Learning-Based Reference-Free Speech
                                  Quality Measures for Hearing Aid
                                  Applications . . . . . . . . . . . . . . 2277--2288
                  G. Enzner and   
                  P. Thüne   Bayesian MMSE Filtering of Noisy Speech
                                  by SNR Marginalization With Global PSD
                                  Priors . . . . . . . . . . . . . . . . . 2289--2304
                   G. Huang and   
                    J. Chen and   
                     J. Benesty   Insights Into Frequency-Invariant
                                  Beamforming With Concentric Circular
                                  Microphone Arrays  . . . . . . . . . . . 2305--2318
                      Ayana and   
                    S. Shen and   
                    Y. Chen and   
                    C. Yang and   
                     Z. Liu and   
                         M. Sun   Zero-Shot Cross-Lingual Neural Headline
                                  Generation . . . . . . . . . . . . . . . 2319--2327
               S. Surendran and   
                    T. K. Kumar   Oblique Projection and Cepstral
                                  Subtraction in Signal Subspace Speech
                                  Enhancement for Colored Noise Reduction  2328--2340
                      Q. Li and   
                 D. F. Wong and   
                 L. S. Chao and   
                     M. Zhu and   
                    T. Xiao and   
                     J. Zhu and   
                       M. Zhang   Linguistic Knowledge-Aware Neural
                                  Machine Translation  . . . . . . . . . . 2341--2354
                   W. Zhang and   
                 C. Hofmann and   
                 M. Buerger and   
           T. D. Abhayapala and   
                  W. Kellermann   Spatial Noise-Field Control With Online
                                  Secondary Path Modeling: a Wave-Domain
                                  Approach . . . . . . . . . . . . . . . . 2355--2370
                 A. Meynard and   
            B. Torrésani   Spectral Analysis for Nonstationary
                                  Audio  . . . . . . . . . . . . . . . . . 2371--2380
I. Martín-Morató and   
                   M. Cobos and   
                    F. J. Ferri   Adaptive Mid-Term Representations for
                                  Robust Audio Event Classification  . . . 2381--2392
                  G. Firtha and   
                   P. Fiala and   
                 F. Schultz and   
                       S. Spors   On the General Relation of Wave Field
                                  Synthesis and Spectral Division Method
                                  for Linear Arrays  . . . . . . . . . . . 2393--2403
                P. Birkholz and   
                   S. Stone and   
                    K. Wolf and   
                 D. Plettemeier   Non-Invasive Silent Phoneme Recognition
                                  Using Microwave Signals  . . . . . . . . 2404--2411
                     W. Lin and   
                     M. Mak and   
                       J. Chien   Multisource I-Vectors Domain Adaptation
                                  Using Maximum Mean Discrepancy Based
                                  Autoencoders . . . . . . . . . . . . . . 2412--2422
              M. Abdelwahab and   
                       C. Busso   Domain Adversarial for Acoustic Emotion
                                  Recognition  . . . . . . . . . . . . . . 2423--2435
               D. El Badawy and   
                  I. Dokmani\'c   Direction of Arrival With One
                                  Microphone, a Few LEGOs, and
                                  Non-Negative Matrix Factorization  . . . 2436--2446
                     H. Lee and   
                   P. Chung and   
                      Y. Wu and   
                     T. Lin and   
                         T. Wen   Interactive Spoken Content Retrieval by
                                  Deep Reinforcement Learning  . . . . . . 2447--2459
                 S. Elshamy and   
                   N. Madhu and   
                   W. Tirry and   
                 T. Fingscheidt   DNN-Supported Speech Enhancement With
                                  Cepstral Estimation of Both Excitation
                                  and Envelope . . . . . . . . . . . . . . 2460--2474
                     Y. Bao and   
                        H. Chen   A Chance-Constrained Programming
                                  Approach to the Design of Robust
                                  Broadband Beamformers With Microphone
                                  Mismatches . . . . . . . . . . . . . . . 2475--2488
                      Anonymous   Farewell Editorial . . . . . . . . . . . 2489--2489
                      Anonymous   List of Reviewers  . . . . . . . . . . . 2490--2496
                      Anonymous   \booktitleIEEE/ACM Transactions on
                                  Audio, Speech, and Language Processing
                                  Edics  . . . . . . . . . . . . . . . . . 2497--2498
                      Anonymous   \booktitleIEEE Transactions on
                                  Multimedia information for authors . . . 2499--2501
                      Anonymous   IEEE Open Access Publishing  . . . . . . 2502--2502
                      Anonymous   2018 Index \booktitleIEEE/ACM
                                  Transactions on Audio, Speech, and
                                  Language Processing Vol. 26  . . . . . . 2503--2528
                      Anonymous   IEEE Signal Processing Society . . . . . C3--C3
                      Anonymous   Blank page . . . . . . . . . . . . . . . C4--C4

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 27, Number 1, January, 2019

                      Anonymous   Table of contents  . . . . . . . . . . . C1--1
                      Anonymous   IEEE Signal Processing Society . . . . . C2--C2
                      Anonymous   Table of Contents [Edics]  . . . . . . . 2--3
                      Anonymous   [Blank page] . . . . . . . . . . . . . . B4--B4
                      Anonymous   Inaugural Editorial Innovations in an
                                  Era of Ubiquitous Audio, Speech, and
                                  Language Processing  . . . . . . . . . . 5--6
                     F. Bao and   
                  W. H. Abdulla   A New Ratio Mask Representation for
                                  CASA-Based Speech Enhancement  . . . . . 7--19
                  P. Magron and   
                    T. Virtanen   Complex ISNMF: a Phase-Aware Model for
                                  Monaural Audio Source Separation . . . . 20--31
             T. T. H. Duong and   
             N. Q. K. Duong and   
               P. C. Nguyen and   
                   C. Q. Nguyen   Gaussian Modeling-Based Multichannel
                                  Audio Source Separation Exploiting
                                  Generic Source Spectral Model  . . . . . 32--43
                   G. Zhang and   
                     J. Tao and   
                     X. Qiu and   
                     I. Burnett   Decentralized Two-Channel Active Noise
                                  Control for Single Frequency by Shaping
                                  Matrix Eigenvalues . . . . . . . . . . . 44--52
                    Y. Zhao and   
                    Z. Wang and   
                        D. Wang   Two-Stage Deep Learning for
                                  Noisy-Reverberant Speech Enhancement . . 53--62
                   N. Zheng and   
                       X. Zhang   Phase-Aware Speech Enhancement Based on
                                  Deep Neural Networks . . . . . . . . . . 63--76
                  T. Moriya and   
                  T. Tanaka and   
               T. Shinozaki and   
                S. Watanabe and   
                         K. Duh   Evolution-Strategy-Based Automation of
                                  System Development for High-Performance
                                  Speech Recognition . . . . . . . . . . . 77--88
                  H. Kamper and   
           G. Shakhnarovich and   
                     K. Livescu   Semantic Speech Retrieval With a
                                  Visually Grounded Model of Untranscribed
                                  Speech . . . . . . . . . . . . . . . . . 89--98
          M. S. Kavalekalam and   
              J. K. Nielsen and   
                J. B. Boldt and   
              M. G. Christensen   Model-Based Speech Enhancement for
                                  Intelligibility Improvement in Binaural
                                  Hearing Aids . . . . . . . . . . . . . . 99--113
                   A. R. MV and   
                    P. K. Ghosh   Glottal Inverse Filtering Using
                                  Probabilistic Weighted Linear Prediction 114--124
                     Y. Sun and   
                    W. Wang and   
                J. Chambers and   
                    S. M. Naqvi   Two-Stage Monaural Source Separation in
                                  Reverberant Room Environments Using Deep
                                  Neural Networks  . . . . . . . . . . . . 125--139
                  L. Ferrer and   
             M. K. Nandwana and   
                 M. McLaren and   
                  D. Castan and   
                      A. Lawson   Toward Fail-Safe Speaker Recognition:
                                  Trial-Based Calibration With a Reject
                                  Option . . . . . . . . . . . . . . . . . 140--153
                   J. Amini and   
             R. C. Hendriks and   
                R. Heusdens and   
                     M. Guo and   
                      J. Jensen   Asymmetric Coding for Rate-Constrained
                                  Noise Reduction in Binaural Hearing Aids 154--167
                      J. Yu and   
                   J. Jiang and   
                         R. Xia   Global Inference for Aspect and Opinion
                                  Terms Co-Extraction Based on Multi-Task
                                  Neural Networks  . . . . . . . . . . . . 168--177
                    Z. Wang and   
                   X. Zhang and   
                        D. Wang   Robust Speaker Localization Guided by
                                  Deep Learning-Based Time-Frequency
                                  Masking  . . . . . . . . . . . . . . . . 178--188
                     K. Tan and   
                    J. Chen and   
                        D. Wang   Gated Residual Networks With Dilated
                                  Convolutions for Monaural Speech
                                  Enhancement  . . . . . . . . . . . . . . 189--198
                  G. H. Ngo and   
                  M. Nguyen and   
                     N. F. Chen   Phonology-Augmented Statistical
                                  Framework for Machine Transliteration
                                  Using Limited Linguistic Resources . . . 199--211
                 Y. Koizumi and   
                   S. Saito and   
                 H. Uematsu and   
                 Y. Kawachi and   
                      N. Harada   Unsupervised Detection of Anomalous
                                  Sound Based on Deep Learning and the
                                  Neyman--Pearson Lemma  . . . . . . . . . 212--224
                  Y. Laufer and   
                      S. Gannot   A Bayesian Hierarchical Model for Speech
                                  Enhancement With Time-Varying Audio
                                  Channel  . . . . . . . . . . . . . . . . 225--239
                      Anonymous   Erratum for Nonlinear Audio Systems
                                  Identification Through Audio Input
                                  Gaussianization  . . . . . . . . . . . . 240--240
                      Anonymous   IEEE Signal Processing Society . . . . . C3--C3

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 27, Number 2, February, 2019

                      Anonymous   Table of Contents  . . . . . . . . . . . C1--241
                      Anonymous   IEEE Signal Processing Society . . . . . C2--C2
                      Anonymous   Table of Contents[Edics] . . . . . . . . 242--243
               T. Nakashika and   
                  S. Takaki and   
                   J. Yamagishi   Complex-Valued Restricted Boltzmann
                                  Machine for Speaker-Dependent Speech
                                  Parameterization From Complex Spectra    244--254
                   F. Xiong and   
                  S. Goetze and   
               B. Kollmeier and   
                    B. T. Meyer   Joint Estimation of Reverberation Time
                                  and Early-To-Late Reverberation Ratio
                                  From Single-Channel Speech Signals . . . 255--267
             F. Stöter and   
             S. Chakrabarty and   
                   B. Edler and   
                E. A. P. Habets   CountNet: Estimating the Number of
                                  Concurrent Speakers Using Supervised
                                  Learning . . . . . . . . . . . . . . . . 268--282
            M. Kolbæk and   
                     Z. Tan and   
                      J. Jensen   On the Relationship Between Short-Time
                                  Objective Intelligibility and Short-Time
                                  Spectral-Amplitude Mean-Square Error for
                                  Speech Enhancement . . . . . . . . . . . 283--295
               M. W. Hansen and   
               J. R. Jensen and   
              M. G. Christensen   Estimation of Fundamental Frequencies in
                                  Stereophonic Music Mixtures  . . . . . . 296--310
                     J. Bao and   
                    D. Tang and   
                    N. Duan and   
                     Z. Yan and   
                    M. Zhou and   
                        T. Zhao   Text Generation From Tables  . . . . . . 311--320
         A. I. Koutrouvelis and   
             R. C. Hendriks and   
                R. Heusdens and   
                      J. Jensen   A Convex Approximation of the Relaxed
                                  Binaural Beamforming Optimization
                                  Problem  . . . . . . . . . . . . . . . . 321--331
               T. Hashimoto and   
                   D. Saito and   
                   N. Minematsu   Many-to-Many and Completely
                                  Parallel-Data-Free Voice Conversion
                                  Based on Eigenspace DNN  . . . . . . . . 332--341
              F. Pishdadian and   
                       B. Pardo   Multi-Resolution Common Fate Transform   342--354
                      Y. Wu and   
                          W. Li   Automatic Audio Chord Recognition With
                                  MIDI-Trained Deep Feature and BLSTM-CRF
                                  Sequence Decoding Model  . . . . . . . . 355--366
                   K. Imoto and   
                         N. Ono   Acoustic Topic Model for Scene Analysis
                                  With Intermittently Missing Observations 367--382
                    K. Xiao and   
                    S. Wang and   
                     M. Wan and   
                          L. Wu   Reconstruction of Mandarin
                                  Electrolaryngeal Fricatives With Hybrid
                                  Noise Source . . . . . . . . . . . . . . 383--391
                L. Krishnan and   
                T. Betlehem and   
                     P. D. Teal   Fast Algorithms for Acoustic Impulse
                                  Response Shaping . . . . . . . . . . . . 392--403
                  V. Zakeri and   
                  A. J. Hodgson   Automatic Identification of Hard and
                                  Soft Bone Tissues by Analyzing Drilling
                                  Sounds . . . . . . . . . . . . . . . . . 404--414
                  S. Bilbao and   
                    B. Hamilton   Directional Sources in Wave-Based
                                  Acoustic Simulation  . . . . . . . . . . 415--428
                   Y. Zhang and   
                   B. Pardo and   
                        Z. Duan   Siamese Style Convolutional Neural
                                  Networks for Sound Search by Vocal
                                  Imitation  . . . . . . . . . . . . . . . 429--441
                    F. Feng and   
                    M. Kowalski   Underdetermined Reverberant Blind Source
                                  Separation: Sparse Approaches for
                                  Multiplicative and Convolutive
                                  Narrowband Approximation . . . . . . . . 442--456
                    Z. Wang and   
                        D. Wang   Combining Spectral and Spatial Features
                                  for Deep Learning Based Blind Speaker
                                  Separation . . . . . . . . . . . . . . . 457--468
                      Anonymous   IEEE Signal Processing Society . . . . . C3--C3

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 27, Number 3, March, 2019

                      Anonymous   Table of Contents  . . . . . . . . . . . C1--469
                      Anonymous   IEEE Signal Processing Society . . . . . C2--C2
                      Anonymous   Table of Contents[Edics] . . . . . . . . 470--471
              M. Z. Jahromi and   
                  A. Zahedi and   
                  J. Jensen and   
           J. Òstergaard   Information Loss in the Human Auditory
                                  System . . . . . . . . . . . . . . . . . 472--481
                 Y. Buchris and   
                    A. Amar and   
                 J. Benesty and   
                       I. Cohen   Incoherent Synthesis of Sparse Arrays
                                  for Frequency-Invariant Beamforming  . . 482--495
          Y. Rahulamathavan and   
           K. R. Sutharsini and   
                  I. G. Ray and   
                      R. Lu and   
                   M. Rajarajan   Privacy-Preserving $i$Vector-Based
                                  Speaker Verification . . . . . . . . . . 496--506
                   J. Zhang and   
                    Y. Zhao and   
                      H. Li and   
                        C. Zong   Attention With Sparsity Regularization
                                  for Neural Machine Translation and
                                  Summarization  . . . . . . . . . . . . . 507--518
                A. H. Moore and   
                     W. Xue and   
               P. A. Naylor and   
                     M. Brookes   Noise Covariance Matrix Estimation for
                                  Rotating Microphone Arrays . . . . . . . 519--530
                    G. Yang and   
                      H. He and   
                        Q. Chen   Emotion-Semantic-Enhanced Neural Network 531--543
                 T. Dietzen and   
                  A. Spriet and   
                   W. Tirry and   
                   S. Doclo and   
                  M. Moonen and   
             T. van Waterschoot   Comparative Analysis of Generalized
                                  Sidelobe Cancellation and Multi-Channel
                                  Linear Prediction for Speech
                                  Dereverberation and Noise Reduction  . . 544--558
                     J. Gao and   
                      J. Du and   
                        E. Chen   Mixed-Bandwidth Cross-Channel Speech
                                  Recognition via Joint Optimization of
                                  DNN-Based Bandwidth Expansion and
                                  Acoustic Modeling  . . . . . . . . . . . 559--571
                   S. Deena and   
                   M. Hasan and   
                 M. Doulaty and   
                     O. Saz and   
                        T. Hain   Recurrent Neural Network Language Model
                                  Adaptation for Multi-Genre Broadcast
                                  Speech Recognition and Alignment . . . . 572--582
           F. B. Gelderblom and   
             T. V. Tronstad and   
                   E. M. Viggen   Subjective Evaluation of a Noise-Reduced
                                  Training Target for Deep Neural
                                  Network-Based Speech Enhancement . . . . 583--594
             M. Luis Valero and   
                E. A. P. Habets   Low-Complexity Multi-Microphone Acoustic
                                  Echo Control in the Short-Time Fourier
                                  Transform Domain . . . . . . . . . . . . 595--609
                     Q. Zhu and   
                 P. Coleman and   
                     X. Qiu and   
                      M. Wu and   
                    J. Yang and   
                     I. Burnett   Robust Personal Audio Geometry
                                  Optimization in the SVD-Based Modal
                                  Domain . . . . . . . . . . . . . . . . . 610--620
                      J. Yi and   
                     J. Tao and   
                     Z. Wen and   
                         Y. Bai   Language-Adversarial Transfer Learning
                                  for Low-Resource Speech Recognition  . . 621--630
                   J. Zhang and   
                    Z. Ling and   
                     L. Liu and   
                   Y. Jiang and   
                         L. Dai   Sequence-to-Sequence Acoustic Modeling
                                  for Voice Conversion . . . . . . . . . . 631--644
                      X. Li and   
                   L. Girin and   
                  S. Gannot and   
                      R. Horaud   Multichannel Speech Separation and
                                  Enhancement Using the Convolutive
                                  Transfer Function  . . . . . . . . . . . 645--659
                      Anonymous   IEEE Signal Processing Society . . . . . C3--C3

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 27, Number 4, April, 2019

                      Anonymous   Table of Contents  . . . . . . . . . . . C1--660
                      Anonymous   IEEE Signal Processing Society . . . . . C2--C2
                      Anonymous   Table of Contents[Edics] . . . . . . . . 661--662
                    Z. Zhao and   
                     H. Liu and   
                 T. Fingscheidt   Convolutional Neural Networks to Enhance
                                  Coded Speech . . . . . . . . . . . . . . 663--678
                H. Schepker and   
             S. E. Nordholm and   
              L. T. T. Tran and   
                       S. Doclo   Null-Steering Beamformer-Based Feedback
                                  Cancellation for Multi-Microphone
                                  Hearing Aids With Incoming Signal
                                  Preservation . . . . . . . . . . . . . . 679--691
                      Z. Li and   
                    Y. Song and   
                     L. Dai and   
                  I. McLoughlin   Listening and Grouping: an Online
                                  Autoregressive Approach for Monaural
                                  Speech Separation  . . . . . . . . . . . 692--703
                    D. Deng and   
                    L. Jing and   
                      J. Yu and   
                     S. Sun and   
                       M. K. Ng   Sentiment Lexicon Construction With
                                  Hierarchical Supervision Topic Model . . 704--718
                    M. Zhou and   
                   M. Huang and   
                         X. Zhu   Story Ending Selection by Finding Hints
                                  From Pairwise Candidate Endings  . . . . 719--729
                 J. Richter and   
                        J. Fels   On the Influence of Continuous Subject
                                  Rotation During High-Resolution
                                  Head-Related Transfer Function
                                  Measurements . . . . . . . . . . . . . . 730--741
                      J. Yu and   
                  K. Markov and   
                      T. Matsui   Articulatory and Spectrum Information
                                  Fusion Based on Deep Recurrent Neural
                                  Networks . . . . . . . . . . . . . . . . 742--752
            F. P. Itturriet and   
                    M. H. Costa   Perceptually Relevant Preservation of
                                  Interaural Time Differences in Binaural
                                  Hearing Aids . . . . . . . . . . . . . . 753--764
                    J. Abel and   
                 T. Fingscheidt   Sinusoidal-Based Lowband Synthesis for
                                  Artificial Speech Bandwidth Extension    765--776
                    Q. Kong and   
                      Y. Xu and   
                I. Sobieraj and   
                    W. Wang and   
                 M. D. Plumbley   Sound Event Detection and Time Frequency
                                  Segmentation from Weakly Labelled Data   777--787
                    Y. Tuan and   
                         H. Lee   Improving Conditional Sequence
                                  Generative Adversarial Networks by
                                  Stepwise Evaluation  . . . . . . . . . . 788--798
                N. Dionelis and   
                     M. Brookes   Modulation-Domain Kalman Filtering for
                                  Monaural Blind Speech Denoising and
                                  Dereverberation  . . . . . . . . . . . . 799--814
                 R. Lotfian and   
                       C. Busso   Curriculum Learning for Speech Emotion
                                  Recognition From Crowdsourced Labels . . 815--826
                         S. Lin   Robust Pitch Estimation and Tracking For
                                  Speakers Based on Subband Encoding and
                                  The Generalized Labeled Multi-Bernoulli
                                  Filter . . . . . . . . . . . . . . . . . 827--841
                    X. Wang and   
                   I. Cohen and   
                    J. Chen and   
                     J. Benesty   On Robust and High Directive Beamforming
                                  With Small-Spacing Microphone Arrays for
                                  Scattered Sources  . . . . . . . . . . . 842--852
                    Z. Quan and   
                    Z. Wang and   
                      Y. Le and   
                     B. Yao and   
                      K. Li and   
                         J. Yin   An Efficient Framework for Sentence
                                  Similarity Modeling  . . . . . . . . . . 853--865
                   N. Lubis and   
                   S. Sakti and   
                 K. Yoshino and   
                    S. Nakamura   Positive Emotion Elicitation in
                                  Chat-Based Dialogue Systems  . . . . . . 866--877
                      Anonymous   IEEE Signal Processing Society . . . . . C3--C3

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 27, Number 5, May, 2019

                      Anonymous   Table of Contents  . . . . . . . . . . . C1--878
                      Anonymous   IEEE Signal Processing Society . . . . . C2--C2
                      Anonymous   Table of Contents  . . . . . . . . . . . 879--880
             F. J. Ibarrola and   
                R. D. Spies and   
                L. E. D. Persia   Switching Divergences for Spectral
                                  Learning in Blind Speech Dereverberation 881--891
                   I. Cohen and   
                 J. Benesty and   
                        J. Chen   Differential Kronecker Product
                                  Beamforming  . . . . . . . . . . . . . . 892--902
          C. Elisei-Iliescu and   
               C. Paleologu and   
                 J. Benesty and   
                 C. Stanciu and   
                  C. Anghel and   
                  S. Ciochin\ua   Recursive Least-Squares Algorithms for
                                  the Identification of Low-Rank Systems   903--918
                   A. Kumar and   
                    T. Guha and   
                    P. K. Ghosh   Dirichlet Latent Variable Model: a
                                  Dynamic Model Based on Dirichlet Prior
                                  for Audio Processing . . . . . . . . . . 919--931
                P. Jancovic and   
            M. Köküer   Bird Species Recognition Using
                                  Unsupervised Modeling of Individual
                                  Vocalization Elements  . . . . . . . . . 932--947
                T. Koriyama and   
                   T. Kobayashi   Statistical Parametric Speech Synthesis
                                  Using Deep Gaussian Processes  . . . . . 948--959
                 K. Shimada and   
                   Y. Bando and   
                  M. Mimura and   
                 K. Itoyama and   
                  K. Yoshii and   
                    T. Kawahara   Unsupervised Speech Enhancement Based on
                                  Multichannel NMF-Informed Beamforming
                                  for Noise-Robust Automatic Speech
                                  Recognition  . . . . . . . . . . . . . . 960--971
                     S. Widmark   Causal MSE-Optimal Filters for Personal
                                  Audio Subject to Constrained Contrast    972--987
                      Anonymous   Article Awards for the
                                  \booktitleIEEE/ACM Transactions on
                                  Audio, Speech, and Language Processing   988--988
                      Anonymous   IEEE Signal Processing Society . . . . . C3--C3

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 27, Number 6, June, 2019

                      Anonymous   Table of contents  . . . . . . . . . . . C1--989
                      Anonymous   IEEE Signal Processing Society . . . . . C2--C2
                      Anonymous   Table of contents (EDICS)  . . . . . . . 990--991
                 A. Mesaros and   
                  A. Diment and   
                B. Elizalde and   
                T. Heittola and   
                 E. Vincent and   
                     B. Raj and   
                    T. Virtanen   Sound Event Detection in the DCASE 2017
                                  Challenge  . . . . . . . . . . . . . . . 992--1006
           S. R. Chetupalli and   
                T. V. Sreenivas   Late Reverberation Cancellation Using
                                  Bayesian Estimation of Multi-Channel
                                  Linear Predictors and Student's
                                  $t$-Source Prior . . . . . . . . . . . . 1007--1018
                  L. Juvela and   
              B. Bollepalli and   
                 V. Tsiaras and   
                        P. Alku   GlotNet --- a Raw Waveform Model for the
                                  Glottal Excitation in Statistical
                                  Parametric Speech Synthesis  . . . . . . 1019--1030
                  F. Winter and   
                 F. Schultz and   
                  G. Firtha and   
                       S. Spors   A Geometric Model for Prediction of
                                  Spatial Aliasing in $ 2.5 $D Sound Field
                                  Synthesis  . . . . . . . . . . . . . . . 1031--1046
                     Y. Liu and   
                     T. Lee and   
                     T. Law and   
                      K. Y. Lee   Acoustical Assessment of Voice Disorder
                                  With Continuous Speech Using ASR
                                  Posterior Features . . . . . . . . . . . 1047--1059
         C. Pörschmann and   
                J. M. Arend and   
                   F. Brinkmann   Directional Equalization of Sparse
                                  Head-Related Transfer Function Sets for
                                  Spatial Upsampling . . . . . . . . . . . 1060--1071
                S. S. Payal and   
              V. J. Mathews and   
               D. J. Button and   
                    A. Iyer and   
              R. H. Lambert and   
               J. Hutchings and   
           L. A. Azpicueta-Ruiz   Equalization of Nonlinear Propagation
                                  Distortion in Cylindrical Waveguides . . 1072--1084
                  B. Sisman and   
                   M. Zhang and   
                          H. Li   Group Sparse Representation With WaveNet
                                  Vocoder Adaptation for Spectrum and
                                  Prosody Conversion . . . . . . . . . . . 1085--1097
                     J. Lee and   
                        H. Kang   A Joint Learning Algorithm for
                                  Complex-Valued T--F Masks in Deep
                                  Learning-Based Single-Channel Speech
                                  Enhancement Systems  . . . . . . . . . . 1098--1108
                      Anonymous   IEEE Signal Processing Society . . . . . C3--C3

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 27, Number 7, July, 2019

                      Anonymous   Table of Contents  . . . . . . . . . . . C1--1109
                      Anonymous   IEEE Signal Processing Society . . . . . C2--C2
                      Anonymous   Table of Contents[Edics] . . . . . . . . 1110--1111
           J. Fleßner and   
                T. Biberger and   
                    S. D. Ewert   Subjective and Objective Assessment of
                                  Monaural and Binaural Aspects of Audio
                                  Quality  . . . . . . . . . . . . . . . . 1112--1125
                   B. Yusuf and   
                B. Gundogdu and   
                    M. Saraclar   Low Resource Keyword Search With
                                  Synthesized Crosslingual Exemplars . . . 1126--1135
         A. I. Koutrouvelis and   
             R. C. Hendriks and   
                R. Heusdens and   
                      J. Jensen   Robust Joint Estimation of
                                  Multimicrophone Signal Model Parameters  1136--1150
                  B. Cauchi and   
              K. Siedenburg and   
               J. F. Santos and   
                 T. H. Falk and   
                   S. Doclo and   
                      S. Goetze   Non-Intrusive Speech Quality Prediction
                                  Using Modulation Energies and
                                  LSTM-Network . . . . . . . . . . . . . . 1151--1163
                   Y. Zhang and   
                   P. Zhang and   
                         Y. Yan   Tailoring an Interpretable Neural
                                  Language Model . . . . . . . . . . . . . 1164--1178
                  A. Pandey and   
                        D. Wang   A New Framework for CNN-Based Speech
                                  Enhancement in the Time Domain . . . . . 1179--1188
               C. M. Vikram and   
                   N. Adiga and   
              S. R. M. Prasanna   Detection of Nasalized Voiced Stops in
                                  Cleft Palate Speech Using
                                  Epoch-Synchronous Features . . . . . . . 1189--1200
                     H. Luo and   
                      T. Li and   
                     B. Liu and   
                    B. Wang and   
                       H. Unger   Improving Aspect Term Extraction With
                                  Bidirectional Dependency Tree
                                  Representation . . . . . . . . . . . . . 1201--1212
                      Anonymous   IEEE Signal Processing Society . . . . . C3--C3

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 27, Number 8, August, 2019

                      Anonymous   Table of Contents  . . . . . . . . . . . C1--1213
                      Anonymous   IEEE Signal Processing Society . . . . . C2--C2
                      Anonymous   Table of Contents  . . . . . . . . . . . 1214--1215
                   T. Zhang and   
                          J. Wu   Constrained Learned Feature Extraction
                                  for Acoustic Scene Classification  . . . 1216--1228
               L. Gabrielli and   
              S. Tomassetti and   
               S. Squartini and   
                  C. Zinato and   
                     S. Guaiana   A Multi-Stage Algorithm for Acoustic
                                  Physical Model Parameters Estimation . . 1229--1240
                    B. Yang and   
                     H. Liu and   
                    C. Pang and   
                          X. Li   Multiple Sound Source Counting and
                                  Localization Based on TF-Wise Spatial
                                  Spectrum Clustering  . . . . . . . . . . 1241--1255
                     Y. Luo and   
                   N. Mesgarani   Conv-TasNet: Surpassing Ideal Time
                                  Frequency Magnitude Masking for Speech
                                  Separation . . . . . . . . . . . . . . . 1256--1266
               A. K. Sarkar and   
                     Z. Tan and   
                    H. Tang and   
                    S. Shon and   
                       J. Glass   Time-Contrastive Learning Based Deep
                                  Bottleneck Features for Text-Dependent
                                  Speaker Verification . . . . . . . . . . 1267--1279
                    J. Chua and   
                   W. B. Kleijn   A Low Latency Approach for Blind Source
                                  Separation . . . . . . . . . . . . . . . 1280--1294
                     C. Pan and   
                    J. Chen and   
                 J. Benesty and   
                         G. Shi   On the Design of Target Beampatterns for
                                  Differential Microphone Arrays . . . . . 1295--1307
                 A. M. Azmi and   
             M. N. Almutery and   
                H. A. Aboalsamh   Real-Word Errors in Arabic Texts: a
                                  Better Algorithm for Detection and
                                  Correction . . . . . . . . . . . . . . . 1308--1320
                M. Korpusik and   
                       J. Glass   Deep Learning for Database Mapping and
                                  Asking Clarification Questions in
                                  Dialogue Systems . . . . . . . . . . . . 1321--1334
                     J. Pak and   
                     J. W. Shin   Sound Localization Based on Phase
                                  Difference Enhancement Using Deep Neural
                                  Networks . . . . . . . . . . . . . . . . 1335--1345
                      Anonymous   IEEE Signal Processing Society . . . . . C3--C3

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 27, Number 9, September, 2019

                     R. Ali and   
                G. Bernardi and   
         T. van Waterschoot and   
                      M. Moonen   Methods of Extending a Generalized
                                  Sidelobe Canceller With External
                                  Microphones  . . . . . . . . . . . . . . 1349--1364
                      X. Li and   
                   L. Girin and   
                  S. Gannot and   
                      R. Horaud   Multichannel Online Dereverberation
                                  Based on Spectral Magnitude Inverse
                                  Filtering  . . . . . . . . . . . . . . . 1365--1377
                    L. Chen and   
                    Z. Chen and   
                     B. Tan and   
                    S. Long and   
               M. Ga\vsi\'c and   
                          K. Yu   AgentGraph: Toward Universal Dialogue
                                  Management With Structured Deep
                                  Reinforcement Learning . . . . . . . . . 1378--1391
                      L. Li and   
                    J. Wang and   
                      J. Li and   
                      Q. Ma and   
                         J. Wei   Relation Classification via
                                  Keyword-Attentive Sentence Mechanism and
                                  Synthetic Stimulation Loss . . . . . . . 1392--1404
        M. B. Mòller and   
              J. K. Nielsen and   
        E. Fernandez-Grande and   
                   S. K. Olesen   On the Influence of Transfer Function
                                  Noise on Sound Zone Control in a Room    1405--1418
                      Z. Xu and   
                     C. Sun and   
                    Y. Long and   
                     B. Liu and   
                    B. Wang and   
                    M. Wang and   
                   M. Zhang and   
                        X. Wang   Dynamic Working Memory for Context-Aware
                                  Response Generation  . . . . . . . . . . 1419--1431
                 H. Kameoka and   
                  T. Kaneko and   
                  K. Tanaka and   
                        N. Hojo   ACVAE-VC: Non-Parallel Voice Conversion
                                  With Auxiliary Classifier Variational
                                  Autoencoder  . . . . . . . . . . . . . . 1432--1443
                    X. Chen and   
                     X. Liu and   
                    Y. Wang and   
                   A. Ragni and   
              J. H. M. Wong and   
                 M. J. F. Gales   Exploiting Future Word Contexts in
                                  Neural Network Language Models for
                                  Speech Recognition . . . . . . . . . . . 1444--1454
                    R. Wang and   
                    Z. Chen and   
                         F. Yin   DOA-Based Three-Dimensional Node
                                  Geometry Calibration in Acoustic Sensor
                                  Networks and Its Cramér--Rao Bound and
                                  Sensitivity Analysis . . . . . . . . . . 1455--1468
                     C. Lee and   
                     H. Lee and   
                      S. Wu and   
                     C. Liu and   
                    W. Fang and   
                     J. Hsu and   
                       B. Tseng   Machine Comprehension of Spoken Content:
                                  TOEFL Listening Test and Spoken SQuAD    1469--1480
                    Y. Chen and   
                   S. Huang and   
                     H. Lee and   
                    Y. Wang and   
                        C. Shen   Audio Word2vec: Sequence-to-Sequence
                                  Autoencoding for Unsupervised Learning
                                  of Audio Segmentation and Representation 1481--1493

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 27, Number 10, October, 2019

                      P. Li and   
                    C. Chen and   
                   W. Zheng and   
                    Y. Deng and   
                      F. Ye and   
                       Z. Zheng   STD: an Automatic Evaluation Metric for
                                  Machine Translation Based on Word
                                  Embeddings . . . . . . . . . . . . . . . 1497--1506
                   J. Zhang and   
                R. Heusdens and   
                 R. C. Hendriks   Relative Acoustic Transfer Function
                                  Estimation in Wireless Acoustic Sensor
                                  Networks . . . . . . . . . . . . . . . . 1507--1519
                    J. Park and   
                       J. Chang   State-Space Microphone Array Nonlinear
                                  Acoustic Echo Cancellation Using
                                  Multi-Microphone Near-End Speech
                                  Covariance . . . . . . . . . . . . . . . 1520--1534
                     Z. Luo and   
                    J. Chen and   
               T. Takiguchi and   
                       Y. Ariki   Emotional Voice Conversion Using Dual
                                  Supervised Adversarial Networks With
                                  Continuous Wavelet Transform F0 Features 1535--1548
                   H. As'ad and   
                M. Bouchard and   
                H. Kamkar-Parsi   A Robust Target Linearly Constrained
                                  Minimum Variance Beamformer With Spatial
                                  Cues Preservation for Binaural Hearing
                                  Aids . . . . . . . . . . . . . . . . . . 1549--1563
                    Y. Wang and   
                     Y. Xia and   
                    L. Zhao and   
                    J. Bian and   
                     T. Qin and   
                    E. Chen and   
                         T. Liu   Semi-Supervised Neural Machine
                                  Translation via Marginal Distribution
                                  Estimation . . . . . . . . . . . . . . . 1564--1576
                    A. Jati and   
                    P. Georgiou   Neural Predictive Coding Using
                                  Convolutional Neural Networks Toward
                                  Unsupervised Learning of Speaker
                                  Characteristics  . . . . . . . . . . . . 1577--1589
                 F. Fontana and   
                       E. Bozzo   Newton--Raphson Solution of Nonlinear
                                  Delay-Free Loop Filter Networks  . . . . 1590--1600
               N. Makishima and   
                  S. Mogami and   
                N. Takamune and   
                D. Kitamura and   
                  H. Sumino and   
               S. Takamichi and   
              H. Saruwatari and   
                         N. Ono   Independent Deeply Learned Matrix
                                  Analysis for Determined Audio Source
                                  Separation . . . . . . . . . . . . . . . 1601--1615
              J. J. Prakash and   
                   H. A. Murthy   Analysis of Inter-Pausal Units in Indian
                                  Languages and Its Application to
                                  Text-to-Speech Synthesis . . . . . . . . 1616--1628
                     Y. Lan and   
                    S. Wang and   
                       J. Jiang   Knowledge Base Question Answering With a
                                  Matching-Aggregation Model and
                                  Question-Specific Contextual Relations   1629--1638
                     X. Bai and   
                     H. Cao and   
                    K. Chen and   
                        T. Zhao   A Bilingual Adversarial Autoencoder for
                                  Unsupervised Bilingual Lexicon Induction 1639--1648
                    G. Zhao and   
             R. Gutierrez-Osuna   Using Phonetic Posteriorgram Based Frame
                                  Pairing for Segmental Accent Conversion  1649--1660

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 27, Number 11, November, 2019

                   Z. Zhang and   
                    H. Zhao and   
                    K. Ling and   
                      J. Li and   
                      Z. Li and   
                      S. He and   
                          G. Fu   Effective Subword Segmentation for Text
                                  Comprehension  . . . . . . . . . . . . . 1664--1674
                     Y. Xie and   
                   R. Liang and   
                   Z. Liang and   
                   C. Huang and   
                     C. Zou and   
                    B. Schuller   Speech Emotion Classification Using
                                  Attention-Based LSTM . . . . . . . . . . 1675--1685
                    S. Wang and   
                   Z. Huang and   
                    Y. Qian and   
                          K. Yu   Discriminative Neural Embedding Learning
                                  for Short-Duration Text-Independent
                                  Speaker Verification . . . . . . . . . . 1686--1696
                      R. Lu and   
                    Z. Duan and   
                       C. Zhang   Audio Visual Deep Clustering for Speech
                                  Separation . . . . . . . . . . . . . . . 1697--1712

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 27, Number 12, December, 2019

                    N. Ueno and   
                  S. Koyama and   
                  H. Saruwatari   Three-Dimensional Sound Field
                                  Reproduction Based on Weighted
                                  Mode-Matching Method . . . . . . . . . . 1852--1867
                      L. Wu and   
                     X. Tan and   
                     T. Qin and   
                     J. Lai and   
                         T. Liu   Beyond Error Propagation: Language
                                  Branching Also Affects the Accuracy of
                                  Sequence Generation  . . . . . . . . . . 1868--1879
                     A. Das and   
                      J. Li and   
                      G. Ye and   
                    R. Zhao and   
                        Y. Gong   Advancing Acoustic-to-Word CTC Model
                                  With Attention and Mixed-Units . . . . . 1880--1892
               N. Antonello and   
                 E. De Sena and   
                  M. Moonen and   
               P. A. Naylor and   
             T. van Waterschoot   Joint Acoustic Localization and
                                  Dereverberation Through Plane Wave
                                  Decomposition and Sparse Regularization  1893--1905
                   F. Borra and   
              A. Bernardini and   
               F. Antonacci and   
                       A. Sarti   Uniform Linear Arrays of First-Order
                                  Steerable Differential Microphones . . . 1906--1918
                    L. Chai and   
                      J. Du and   
                     Q. Liu and   
                         C. Lee   Using Generalized Gaussian Distributions
                                  to Improve Regression Error Modeling for
                                  Deep Learning-Based Speech Enhancement   1919--1931
                      J. Qi and   
                      J. Du and   
          S. M. Siniscalchi and   
                         C. Lee   A Theory on Deep Neural Network Based
                                  Vector-to-Vector Regression With an
                                  Illustration of Its Expressive Power in
                                  Speech Enhancement . . . . . . . . . . . 1932--1943
                    X. Dang and   
                   Q. Cheng and   
                         H. Zhu   Indoor Multiple Sound Source
                                  Localization via Multi-Dimensional
                                  Assignment Data Association  . . . . . . 1944--1956
               M. Schneider and   
                E. A. P. Habets   Iterative DFT-Domain Inverse Filter
                                  Optimization Using a Weighted
                                  Least-Squares Criterion  . . . . . . . . 1957--1969
                    K. Chen and   
                    R. Wang and   
                 M. Utiyama and   
                  E. Sumita and   
                        T. Zhao   Neural Machine Translation With
                                  Sentence-Level Topic Context . . . . . . 1970--1984
            A. Gomez-Alanis and   
              A. M. Peinado and   
             J. A. Gonzalez and   
                    A. M. Gomez   A Gated Recurrent Convolutional Neural
                                  Network for Robust Spoofing Detection    1985--1999
                    S. Feng and   
                         T. Lee   Exploiting Cross-Lingual Speaker and
                                  Phonetic Diversity for Unsupervised
                                  Subword Modeling . . . . . . . . . . . . 2000--2011
                      W. Li and   
                 N. F. Chen and   
          S. M. Siniscalchi and   
                         C. Lee   Improving Mispronunciation Detection of
                                  Mandarin Tones for Non-Native Learners
                                  With Soft-Target Tone Labels and
                                  BLSTM-Based Deep Tone Models . . . . . . 2012--2024
                      Q. Tu and   
                        H. Chen   On Mainlobe Orientation of the First-
                                  and Second-Order Differential Microphone
                                  Arrays . . . . . . . . . . . . . . . . . 2025--2040
               J. Chorowski and   
                R. J. Weiss and   
                  S. Bengio and   
                A. van den Oord   Unsupervised Speech Representation
                                  Learning Using WaveNet Autoencoders  . . 2041--2053
                V. Varanasi and   
                 A. Agarwal and   
                    R. M. Hegde   Near-Field Acoustic Source Localization
                                  Using Spherical Harmonic Features  . . . 2054--2066
                   Y. Zheng and   
                     J. Tao and   
                     Z. Wen and   
                          J. Yi   Forward Backward Decoding Sequence for
                                  Regularizing End-to-End TTS  . . . . . . 2067--2079
                      Y. Tu and   
                      J. Du and   
                         C. Lee   Speech Enhancement Based on Teacher
                                  Student Deep Learning Using Improved
                                  Speech Presence Probability for
                                  Noise-Robust Speech Recognition  . . . . 2080--2091
                     Y. Liu and   
                        D. Wang   Divide and Conquer: A Deep CASA Approach
                                  to Talker-Independent Monaural Speaker
                                  Separation . . . . . . . . . . . . . . . 2092--2102
                     X. Liu and   
                 D. F. Wong and   
                 L. S. Chao and   
                         Y. Liu   Latent Attribute Based Hierarchical
                                  Decoder for Neural Machine Translation   2103--2112
                      J. Hu and   
                        N. Chen   Enhanced Feature Summarizing for
                                  Effective Cover Song Identification  . . 2113--2126
                      Q. Ma and   
                      L. Yu and   
                    S. Tian and   
                    E. Chen and   
                    W. W. Y. Ng   Global-Local Mutual Attention Model for
                                  Text Classification  . . . . . . . . . . 2127--2139
      V. Välimäki and   
              J. Rämö   Neurally Controlled Graphic Equalizer    2140--2149
              S. U. N. Wood and   
             J. K. W. Stahl and   
                     P. Mowlaee   Binaural Codebook-Based Speech
                                  Enhancement With Atomic Speech Presence
                                  Probability  . . . . . . . . . . . . . . 2150--2161
           L. Pfeifenberger and   
             M. Zöhrer and   
                    F. Pernkopf   Eigenvector-Based Speech Mask Estimation
                                  for Multi-Channel Speech Enhancement . . 2162--2172
                  M. Arnela and   
            S. Dabbaghchian and   
                  O. Guasch and   
                     O. Engwall   MRI-Based Vocal Tract Representations
                                  for the Three-Dimensional Finite Element
                                  Synthesis of Diphthongs  . . . . . . . . 2173--2182
               K. Sekiguchi and   
                   Y. Bando and   
              A. A. Nugraha and   
                  K. Yoshii and   
                    T. Kawahara   Semi-Supervised Multichannel Speech
                                  Enhancement With a Deep Speech Prior . . 2197--2212
                     Q. Guo and   
                     X. Qiu and   
                     X. Xue and   
                       Z. Zhang   Low-Rank and Locality Constrained
                                  Self-Attention for Sequence Modeling . . 2213--2222
                      J. Yu and   
                    Q. Ling and   
                     C. Luo and   
                     C. W. Chen   Synthesizing $3$D Trump: Predicting and
                                  Visualizing the Relationship Between
                                  Text, Speech, and Articulatory Movements 2223--2233
                 R. Sugiura and   
                Y. Kamamoto and   
                      T. Moriya   Shape Control of Discrete Generalized
                                  Gaussian Distributions for
                                  Frequency-Domain Audio Coding  . . . . . 2234--2248
                 Z. Ben-Hur and   
                 D. L. Alon and   
                   R. Mehra and   
                     B. Rafaely   Efficient Representation and Sparse
                                  Sampling of Head-Related Transfer
                                  Functions Using Phase-Correction Based
                                  on Ear Alignment . . . . . . . . . . . . 2249--2262
                 L. Remaggi and   
           P. J. B. Jackson and   
                        W. Wang   Modeling the Comb Filter Effect and
                                  Interaural Coherence for Binaural Source
                                  Separation . . . . . . . . . . . . . . . 2263--2277
                   B. Zhang and   
                   D. Xiong and   
                      J. Su and   
                         J. Luo   Future-Aware Knowledge Distillation for
                                  Neural Machine Translation . . . . . . . 2278--2287
                     R. Ali and   
         T. Van Waterschoot and   
                      M. Moonen   Integration of a Priori and Estimated
                                  Constraints Into an MVDR Beamformer for
                                  Speech Enhancement . . . . . . . . . . . 2288--2300
                  N. Tiwari and   
                   P. C. Pandey   Speech Enhancement Using Noise
                                  Estimation With Dynamic Quantile
                                  Tracking . . . . . . . . . . . . . . . . 2301--2312
                    J. Duan and   
                    X. Ding and   
                   Y. Zhang and   
                         T. Liu   TEND: A Target-Dependent Representation
                                  Learning Framework for News Document . . 2313--2325
                    L. Zhao and   
                     X. Qiu and   
                   Q. Zhang and   
                       X. Huang   Sequence Labeling With Deep Gated Dual
                                  Path CNN . . . . . . . . . . . . . . . . 2326--2335
                    A. Kato and   
                 T. H. Kinnunen   Statistical Regression Models for Noise
                                  Robust F0 Estimation Using Recurrent
                                  Deep Neural Networks . . . . . . . . . . 2336--2349
                     D. Liu and   
                      J. Fu and   
                      Q. Qu and   
                          J. Lv   BFGAN: Backward and Forward Generative
                                  Adversarial Networks for Lexically
                                  Constrained Sentence Generation  . . . . 2350--2361
               A. Marafioti and   
               N. Perraudin and   
               N. Holighaus and   
                      P. Majdak   A Context Encoder For Audio Inpainting   2362--2372
                    J. Yang and   
                  R. K. Das and   
                        N. Zhou   Extraction of Octave Spectra Information
                                  for Spoofing Attack Detection  . . . . . 2373--2384

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 28, Number ??, January, 2020

                Jamal Amini and   
 Richard Christian Hendriks and   
           Richard Heusdens and   
                   Meng Guo and   
                  Jesper Jensen   Rate-Constrained Noise Reduction in
                                  Wireless Acoustic Sensor Networks  . . . 1--12
          Chitralekha Gupta and   
                 Haizhou Li and   
                        Ye Wang   Automatic Leaderboard: Evaluation of
                                  Singing Quality Without a Standard
                                  Reference  . . . . . . . . . . . . . . . 13--26
         Sefik Emre Eskimez and   
             Ross K. Maddox and   
               Chenliang Xu and   
                    Zhiyao Duan   Noise-Resilient Training Method for Face
                                  Landmark Generation From Speech  . . . . 27--38
               Peidong Wang and   
                     Ke Tan and   
                  De Liang Wang   Bridging the Gap Between Monaural Speech
                                  Enhancement and Recognition With
                                  Distortion-Independent Acoustic Modeling 39--48
             Yuki Mitsufuji and   
              Stefan Uhlich and   
          Norihiro Takamune and   
            Daichi Kitamura and   
             Shoichi Koyama and   
             Hiroshi Saruwatari   Multichannel Non-Negative Matrix
                                  Factorization Using Banded Spatial
                                  Covariance Matrices in Wavenumber Domain 49--60
               Yaron Laufer and   
                  Sharon Gannot   Scoring-Based ML Estimation and CRBs for
                                  Reverberation, Speech, and Noise PSDs in
                                  a Spatially Homogeneous Noise Field  . . 61--76
      Naveen Kumar Desiraju and   
                Simon Doclo and   
                Markus Buck and   
                   Tobias Wolff   Online Estimation of Reverberation
                                  Parameters For Late Residual Echo
                                  Suppression  . . . . . . . . . . . . . . 77--91
            Mehdi Zohourian and   
                  Rainer Martin   Binaural Direct-to-Reverberant Energy
                                  Ratio and Speaker Distance Estimation    92--104
               Youhyun Shin and   
                   Sang-goo Lee   Learning Context Using Segment-Level
                                  LSTM for Neural Sequence Labeling  . . . 105--115
             Gongping Huang and   
              Jingdong Chen and   
                  Jacob Benesty   Design of Planar Differential Microphone
                                  Arrays With Fractional Orders  . . . . . 116--130
             Ming-Hsiang Su and   
             Chung-Hsien Wu and   
                  Liang-Yu Chen   Attention-Based Response Generation
                                  Using Parallel Double Q-Learning for
                                  Dialog Policy Decision in a
                                  Conversational System  . . . . . . . . . 131--143
                   Satoru Emura   Wave-Domain Residual Echo Reduction
                                  Using Subspace Tracking  . . . . . . . . 144--156
                   Xin Wang and   
              Shinji Takaki and   
          Junichi Yamagishi and   
                 Simon King and   
                 Keiichi Tokuda   A Vector Quantized Variational
                                  Autoencoder (VQ-VAE) Autoregressive
                                  Neural $ F_0 $ Model for Statistical
                                  Parametric Speech Synthesis  . . . . . . 157--170
       Falk-Martin Hoffmann and   
       Philip Arthur Nelson and   
             Filippo Maria Fazi   DOA Estimation Performance With Circular
                                  Arrays in Sound Fields With Finite Rate
                                  of Innovation  . . . . . . . . . . . . . 171--184
                Rongfeng Su and   
                Xunying Liu and   
                   Lan Wang and   
                  Jingzhou Yang   Cross-Domain Deep Visual Feature
                                  Generation for Mandarin Audio--Visual
                                  Speech Recognition . . . . . . . . . . . 185--197
          Titouan Parcollet and   
            Mohamed Morchid and   
                Xavier Bost and   
          Georges Linar\`es and   
                 Renato De Mori   Real to H-Space Autoencoders for Theme
                                  Identification in Telephone
                                  Conversations  . . . . . . . . . . . . . 198--210
           Antonio Canclini and   
            Fabio Antonacci and   
             Stefano Tubaro and   
                  Augusto Sarti   A Methodology for the Robust Estimation
                                  of the Radiation Pattern of Acoustic
                                  Sources  . . . . . . . . . . . . . . . . 211--224
                      Yi Yu and   
                 Hongsen He and   
                Badong Chen and   
                Jianghui Li and   
               Youwen Zhang and   
                          Lu Lu   $M$-Estimate Based Normalized Subband
                                  Adaptive Filter Algorithm: Performance
                                  Analysis and Improvements  . . . . . . . 225--239
              Hao-Xiang Wen and   
              Sen-Quan Yang and   
             Yuan-Quan Hong and   
                       Huan Luo   A Partial Update Adaptive Algorithm for
                                  Sparse System Identification . . . . . . 240--255
    Martin Bo Mòller and   
          Jan Òstergaard   A Moving Horizon Framework for Sound
                                  Zones  . . . . . . . . . . . . . . . . . 256--265
Stylianos Ioannis Mimilakis and   
       Konstantinos Drossos and   
      Estefanía Cano and   
                Gerald Schuller   Examining the Mapping Functions of
                                  Denoising Autoencoders in Singing Voice
                                  Separation . . . . . . . . . . . . . . . 266--278
          Lachlan I. Birnie and   
     Thushara D. Abhayapala and   
       Prasanga N. Samarasinghe   Reflection Assisted Sound Source
                                  Localization Through a Harmonic Domain
                                  MUSIC Framework  . . . . . . . . . . . . 279--293
                Wenhao Ding and   
                       Liang He   Adaptive Multi-Scale Detection of
                                  Acoustic Events  . . . . . . . . . . . . 294--306
              Weijian Zhang and   
                      Peng Song   Transfer Sparse Discriminant Subspace
                                  Learning for Cross-Corpus Speech Emotion
                                  Recognition  . . . . . . . . . . . . . . 307--318
             Bidisha Sharma and   
                        Ye Wang   Automatic Evaluation of Song
                                  Intelligibility Using Singing Adapted
                                  STOI and Vocal-Specific Features . . . . 319--331
            Hai Morgenstern and   
                   Boaz Rafaely   Perceptually-Transparent Online
                                  Estimation of Two-Channel Room Transfer
                                  Function for Sound Calibration . . . . . 332--342
               Shaojin Ding and   
              Guanlong Zhao and   
     Christopher Liberatore and   
        Ricardo Gutierrez-Osuna   Learning Structured Sparse
                                  Representations for Voice Conversion . . 343--354
                Mireia Diez and   
      Luká\vs Burget and   
           Federico Landini and   
          Jan \vCernocký   Analysis of Speaker Diarization Based on
                                  Bayesian HMM With Eigenvoice Priors  . . 355--368
                Jia-Chen Gu and   
              Zhen-Hua Ling and   
                       Quan Liu   Utterance-to-Utterance Interactive
                                  Matching Network for Multi-Turn Response
                                  Selection in Retrieval-Based Chatbots    369--379
                     Ke Tan and   
                   DeLiang Wang   Learning Complex Spectral Mapping With
                                  Gated Convolutional Recurrent Networks
                                  for Monaural Speech Enhancement  . . . . 380--390
               Richeng Duan and   
           Tatsuya Kawahara and   
          Masatake Dantsuji and   
                  Hiroaki Nanjo   Cross-Lingual Transfer Learning of
                                  Non-Native Acoustic Modeling for
                                  Pronunciation Error Detection and
                                  Diagnosis  . . . . . . . . . . . . . . . 391--401
                   Xin Wang and   
              Shinji Takaki and   
              Junichi Yamagishi   Neural Source-Filter Waveform Models for
                                  Statistical Parametric Speech Synthesis  402--415
             Sanjeel Parekh and   
                 Slim Essid and   
              Alexey Ozerov and   
           Ngoc Q. K. Duong and   
       Patrick Pérez and   
              Gaël Richard   Weakly Supervised Representation
                                  Learning for Audio-Visual Scene Analysis 416--428
                 Jianfei Yu and   
                 Jing Jiang and   
                        Rui Xia   Entity-Sensitive Attention and Fusion
                                  Network for Entity-Level Multimodal
                                  Sentiment Classification . . . . . . . . 429--439
           John G. Beerends and   
        Niels M. P. Neumann and   
      Egon L. van den Broek and   
  Anna Llagostera Casanovas and   
     Jovana Torres Menendez and   
        Christian Schmidmer and   
                    Jens Berger   Subjective and Objective Assessment of
                                  Full Bandwidth Speech Quality  . . . . . 440--449
           Vikram C. Mathad and   
        S. R. Mahadeva Prasanna   Vowel Onset Point Based Screening of
                                  Misarticulated Stops in Cleft Lip and
                                  Palate Speech  . . . . . . . . . . . . . 450--460
                Minh Nguyen and   
                 Gia H. Ngo and   
                  Nancy F. Chen   Hierarchical Character Embeddings:
                                  Learning Phonological and Semantic
                                  Representations in Languages of
                                  Logographic Origin Using Recursive
                                  Neural Networks  . . . . . . . . . . . . 461--473
            Dani Cherkassky and   
                  Sharon Gannot   Successive Relative Transfer Function
                                  Identification Using Blind Oblique
                                  Projection . . . . . . . . . . . . . . . 474--486
             Ivo Trowitzsch and   
       Christopher Schymura and   
           Dorothea Kolossa and   
                Klaus Obermayer   Joining Sound Event Detection and
                                  Localization Through Spatial Segregation 487--502
            Shinichi Mogami and   
          Norihiro Takamune and   
            Daichi Kitamura and   
         Hiroshi Saruwatari and   
               Yu Takahashi and   
             Kazunobu Kondo and   
                   Nobutaka Ono   Independent Low-Rank Matrix Analysis
                                  Based on Time-Variant Sub-Gaussian
                                  Source Model for Determined Blind Source
                                  Separation . . . . . . . . . . . . . . . 503--518
         Hamzeh Ghasemzadeh and   
             Meisam K. Arjmandi   Toward Optimum Quantification of
                                  Pathology-Induced Noises: an
                                  Investigation of Information Missed by
                                  Human Auditory System  . . . . . . . . . 519--528
                     Fei Ma and   
                  Wen Zhang and   
 Thushara Dheemantha Abhayapala   Active Control of Outgoing Broadband
                                  Noise Fields in Rooms  . . . . . . . . . 529--539
            Jing-Xuan Zhang and   
              Zhen-Hua Ling and   
                    Li-Rong Dai   Non-Parallel Sequence-to-Sequence Voice
                                  Conversion With Disentangled Linguistic
                                  and Speaker Representations  . . . . . . 540--552
                    Tao Dai and   
                     Li Zhu and   
               Yaxiong Wang and   
             Kathleen M. Carley   Attentive Stacked Denoising Autoencoder
                                  With Bi-LSTM for Personalized
                                  Context-Aware Citation Recommendation    553--568
             Yuta Nishimura and   
            Katsuhito Sudoh and   
              Graham Neubig and   
               Satoshi Nakamura   Multi-Source Neural Machine Translation
                                  With Missing Data  . . . . . . . . . . . 569--580
                   Jin Wang and   
              Liang-Chih Yu and   
              K. Robert Lai and   
                   Xuejie Zhang   Tree-Structured Regional CNN-LSTM Model
                                  for Dimensional Sentiment Analysis . . . 581--591
                  Abul Azad and   
                    Lamine Mili   Robust Speech Filter and Voice Encoder
                                  Parameter Estimation Using the
                                  Phase--Phase Correlator  . . . . . . . . 592--604
             Abdullah Fahim and   
   Prasanga N. Samarasinghe and   
         Thushara D. Abhayapala   Multi-Source DOA Estimation Through
                                  Pattern Recognition of the Modal
                                  Coherence of a Reverberant Soundfield    605--618
               Yaron Laufer and   
   Bracha Laufer-Goldshtein and   
                  Sharon Gannot   ML Estimation and CRBs for
                                  Reverberation, Speech, and Noise PSDs in
                                  Rank-Deficient Noise Field . . . . . . . 619--634
             Zhongqing Wang and   
               Qingying Sun and   
                Shoushan Li and   
               Qiaoming Zhu and   
                   Guodong Zhou   Neural Stance Detection With
                                  Hierarchical Linguistic Representations  635--645
                  Ruizhi Li and   
               Xiaofei Wang and   
         Sri Harish Mallidi and   
            Shinji Watanabe and   
               Takaaki Hori and   
                Hynek Hermansky   Multi-Stream End-to-End Speech
                                  Recognition  . . . . . . . . . . . . . . 646--655
                   Yu Maeno and   
             Yuki Mitsufuji and   
   Prasanga N. Samarasinghe and   
               Naoki Murata and   
         Thushara D. Abhayapala   Spherical-Harmonic-Domain Feedforward
                                  Active Noise Control Using Sparse
                                  Decomposition of Reference Signals from
                                  Distributed Sensor Arrays  . . . . . . . 656--670
                Qingyu Zhou and   
                   Nan Yang and   
                   Furu Wei and   
              Shaohan Huang and   
                  Ming Zhou and   
                    Tiejun Zhao   A Joint Sentence Scoring and Selection
                                  Framework for Neural Extractive Document
                                  Summarization  . . . . . . . . . . . . . 671--681
               Ivan Kukanov and   
            Trung Ngo Trong and   
       Ville Hautamäki and   
   Sabato Marco Siniscalchi and   
      Valerio Mario Salerno and   
                   Kong Aik Lee   Maximal Figure-of-Merit Framework to
                                  Detect Multi-Label Phonetic Features for
                                  Spoken Language Recognition  . . . . . . 682--695
             Shoichi Koyama and   
             Gilles Chardon and   
                 Laurent Daudet   Optimizing Source and Sensor Placement
                                  for Sound Field Control: an Overview . . 696--714
               Atsushi Ando and   
               Ryo Masumura and   
            Hosana Kamiyama and   
        Satoshi Kobashikawa and   
                 Yushi Aono and   
                    Tomoki Toda   Customer Satisfaction Estimation in
                                  Contact Center Calls Based on a
                                  Hierarchical Multi-Task Model  . . . . . 715--728
             Thomas Dietzen and   
                Simon Doclo and   
                Marc Moonen and   
           Toon van Waterschoot   Integrated Sidelobe Cancellation and
                                  Linear Prediction Kalman Filter for
                                  Joint Multi-Microphone Speech
                                  Dereverberation, Interfering Speech
                                  Cancellation, and Noise Reduction  . . . 740--754
             Thomas Dietzen and   
                Simon Doclo and   
                Marc Moonen and   
           Toon van Waterschoot   Square Root-Based Multi-Source Early PSD
                                  Estimation and Recursive RETF Update in
                                  Reverberant Environments by Means of the
                                  Orthogonal Procrustes Problem  . . . . . 755--769
                Liwen Zhang and   
                Ziqiang Shi and   
                     Jiqing Han   Pyramidal Temporal Pooling With
                                  Discriminative Mapping for Audio
                                  Classification . . . . . . . . . . . . . 770--784
              Mengfan Zhang and   
                Zhongshu Ge and   
                 Tiejun Liu and   
                  Xihong Wu and   
                     Tianshu Qu   Modeling of Individual HRTFs Based on
                                  Spatial Principal Component Analysis . . 785--797

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 29, Number ??, January, 2021

                  Bijue Jia and   
               Jiancheng Lv and   
                    Xi Peng and   
                   Yao Chen and   
                  Shenglan Yang   Hierarchical Regulated Iterative Network
                                  for Joint Task of Music Detection and
                                  Music Relative Loudness Estimation . . . 1--13
         Nauman Dawalatabad and   
          Srikanth Madikeri and   
          C. Chandra Sekhar and   
                 Hema A. Murthy   Novel Architectures for Unsupervised
                                  Information Bottleneck Based Speaker
                                  Diarization of Meetings  . . . . . . . . 14--27
              Midia Yousefi and   
              John H. L. Hansen   Block-Based High Performance CNN
                                  Architectures for Frame-Level
                                  Overlapping Speech Detection . . . . . . 28--40
              Jiaming Cheng and   
                Ruiyu Liang and   
              Zhenlin Liang and   
                    Li Zhao and   
             Chengwei Huang and   
            Björn Schuller   A Deep Adaptation Network for Speech
                                  Enhancement: Combining a Relativistic
                                  Discriminator With Multi-Kernel Maximum
                                  Mean Discrepancy . . . . . . . . . . . . 41--53
               Franz Anders and   
          Mario Hlawitschka and   
                    Mirco Fuchs   Comparison of Artificial Neural Network
                                  Types for Infant Vocalization
                                  Classification . . . . . . . . . . . . . 54--67
          Tomohiko Nakamura and   
               Hirokazu Kameoka   Harmonic-Temporal Factor Decomposition
                                  for Unsupervised Monaural Separation of
                                  Harmonic Sounds  . . . . . . . . . . . . 68--82
                Jens Ahrens and   
                  Stefan Bilbao   Computation of Spherical Harmonic
                                  Representations of Source Directivity
                                  Based on the Finite-Distance Signature   83--92
             Shun-Po Chuang and   
           Alexander H. Liu and   
               Tzu-Wei Sung and   
                    Hung-yi Lee   Improving Automatic Speech Recognition
                                  and Speech Translation via Word
                                  Embedding Prediction . . . . . . . . . . 93--105
                    Li Chai and   
                     Jun Du and   
              Qing-Feng Liu and   
                   Chin-Hui Lee   A Cross-Entropy-Guided Measure (CEGM)
                                  for Assessing Speech Recognition
                                  Performance and Optimizing DNN-Based
                                  Speech Enhancement . . . . . . . . . . . 106--117
                      De Hu and   
                   Zhe Chen and   
                    Fuliang Yin   Passive Geometry Calibration for
                                  Microphone Arrays Based on Distributed
                                  Damped Newton Optimization . . . . . . . 118--131
              Berrak Sisman and   
          Junichi Yamagishi and   
                 Simon King and   
                     Haizhou Li   An Overview of Voice Conversion and Its
                                  Challenges: From Statistical Modeling to
                                  Deep Learning  . . . . . . . . . . . . . 132--157
                   Jilu Jin and   
             Gongping Huang and   
                Xuehan Wang and   
              Jingdong Chen and   
              Jacob Benesty and   
                   Israel Cohen   Steering Study of Linear Differential
                                  Microphone Arrays  . . . . . . . . . . . 158--170
              Ching-Hua Lee and   
             Bhaskar D. Rao and   
             Harinath Garudadri   Proportionate Adaptive Filtering
                                  Algorithms Derived Using an Iterative
                                  Reweighting Framework  . . . . . . . . . 171--186
              Shakeel Ahmed and   
            Muhammad Tufail and   
             Muhammad Rehan and   
              Tanveer Abbas and   
                     Amna Majid   A Novel Approach for Improved Noise
                                  Reduction Performance in Feed-Forward
                                  Active Noise Control Systems With
                                  (Loudspeaker) Saturation Non-Linearity
                                  in the Secondary Path  . . . . . . . . . 187--197
                Cunhang Fan and   
                Jiangyan Yi and   
                Jianhua Tao and   
              Zhengkun Tian and   
                    Bin Liu and   
                    Zhengqi Wen   Gated Recurrent Fusion With Joint
                                  Training Framework for Robust End-to-End
                                  Speech Recognition . . . . . . . . . . . 198--209
                Amin Edraki and   
               Wai-Yip Chan and   
              Jesper Jensen and   
                 Daniel Fogerty   Speech Intelligibility Prediction Using
                                  Spectro-Temporal Modulation Analysis . . 210--225
                    Phan Le Son   On the Design of Sparse Arrays With
                                  Frequency-Invariant Beam Pattern . . . . 226--238
              Dylan Menzies and   
             Philip Coleman and   
             Filippo Maria Fazi   A Room Compensation Method by
                                  Modification of Reverberant Audio
                                  Objects  . . . . . . . . . . . . . . . . 239--252
                Yonggang Hu and   
     Thushara D. Abhayapala and   
       Prasanga N. Samarasinghe   Multiple Source Direction of Arrival
                                  Estimations Using Relative Sound
                                  Pressure Based MUSIC . . . . . . . . . . 253--264
                   Alan Kan and   
                   Qinglin Meng   The Temporal Limits Encoder as a Sound
                                  Coding Strategy for Bilateral Cochlear
                                  Implants . . . . . . . . . . . . . . . . 265--273
                    Rui Liu and   
              Berrak Sisman and   
                Feilong Bao and   
                Jichen Yang and   
               Guanglai Gao and   
                     Haizhou Li   Exploiting Morphological and
                                  Phonological Features to Improve
                                  Prosodic Phrasing for Mongolian Speech
                                  Synthesis  . . . . . . . . . . . . . . . 274--285
                     Fei Ma and   
     Thushara D. Abhayapala and   
                      Wen Zhang   Multiple Circular Arrays of Vector
                                  Sensors for Real-Time Sound Field
                                  Analysis . . . . . . . . . . . . . . . . 286--299
          David Diaz-Guerra and   
             Antonio Miguel and   
                Jose R. Beltran   Robust Sound Source Tracking Using
                                  SRP-PHAT and $3$D Convolutional Neural
                                  Networks . . . . . . . . . . . . . . . . 300--311
             Viet Anh Trinh and   
                 Michael Mandel   Directly Comparing the Listening
                                  Strategies of Humans and Machines  . . . 312--323
                  Leda Sari and   
      Mark Hasegawa-Johnson and   
                  Samuel Thomas   Auxiliary Networks for Joint Speaker
                                  Adaptation and Speaker Change Detection  324--333
               Jielong Yang and   
              Xionghu Zhong and   
              Weiguang Chen and   
                     Wenwu Wang   Multiple Acoustic Source Localization in
                                  Microphone Array Networks  . . . . . . . 334--347
                     Bin Wu and   
             Sakriani Sakti and   
              Jinsong Zhang and   
               Satoshi Nakamura   Tackling Perception Bias in Unsupervised
                                  Phoneme Discovery Using DPGMM-RNN Hybrid
                                  Model and Functional Load  . . . . . . . 348--362
               Taewoong Lee and   
                 Liming Shi and   
  Jesper Kjær Nielsen and   
Mads Græsbòll Christensen   Fast Generation of Sound Zones Using
                                  Variable Span Trade-Off Filters in the
                                  DFT-Domain . . . . . . . . . . . . . . . 363--378
                Maoshen Jia and   
                  Yuxuan Wu and   
              Changchun Bao and   
                 Christian Ritz   Multi-Source DOA Estimation in
                                  Reverberant Environments by Jointing
                                  Detection and Modeling of Time-Frequency
                                  Points . . . . . . . . . . . . . . . . . 379--392
                    Wei Xue and   
          Alastair H. Moore and   
               Mike Brookes and   
              Patrick A. Naylor   Speech Enhancement Based on
                                  Modulation-Domain Parametric
                                  Multichannel Kalman Filtering  . . . . . 393--405
                   Wei Song and   
                Jingjin Guo and   
                   Ruiji Fu and   
                   Ting Liu and   
                     Lizhen Liu   A Knowledge Graph Embedding Approach for
                                  Metaphor Processing  . . . . . . . . . . 406--420
             Longbiao Cheng and   
                Xingwei Sun and   
               Dingding Yao and   
                 Junfeng Li and   
                   Yonghong Yan   Estimation Reliability Function Assisted
                                  Sound Source Localization With Enhanced
                                  Steering Vector Phase Difference . . . . 421--435
                Wangyang Yu and   
             W. Bastiaan Kleijn   Room Acoustical Parameter Estimation
                                  From Room Impulse Responses Using Deep
                                  Neural Networks  . . . . . . . . . . . . 436--447
              Miguel Ferrer and   
             Maria de Diego and   
         Gema Piñero and   
               Alberto Gonzalez   Affine Projection Algorithm Over
                                  Acoustic Sensor Networks for Active
                                  Noise Control  . . . . . . . . . . . . . 448--461
    Nico Gößling and   
           Daniel Marquardt and   
                    Simon Doclo   Performance Analysis of the Extended
                                  Binaural MVDR Beamformer With Partial
                                  Noise Estimation . . . . . . . . . . . . 462--476
     Gábor Gosztolya and   
      Róbert Busa-Fekete   Ensemble Bag-of-Audio-Words
                                  Representation Improves Paralinguistic
                                  Classification Accuracy  . . . . . . . . 477--488
             Alfred Mertins and   
                Marco Maass and   
               Fabrice Katzberg   Room Impulse Response Reshaping and
                                  Crosstalk Cancellation Using Convex
                                  Optimization . . . . . . . . . . . . . . 489--502
                Xuefeng Bai and   
                 Pengbo Liu and   
                      Yue Zhang   Investigating Typed Syntactic
                                  Dependencies for Targeted Sentiment
                                  Classification Using Graph Attention
                                  Neural Network . . . . . . . . . . . . . 503--514
    Bengt J. Borgström and   
          Michael S. Brandstein   Speech Enhancement via Attention Masking
                                  Network (SEAMNET): an End-to-End System
                                  for Joint Suppression of Noise and
                                  Reverberation  . . . . . . . . . . . . . 515--526
           Juan M. Miramont and   
       Marcelo A. Colominas and   
     Gastón Schlotthauer   Voice Jitter Estimation Using High-Order
                                  Synchrosqueezing Operators . . . . . . . 527--536
               Peidong Wang and   
                  Zhuo Chen and   
               DeLiang Wang and   
                   Jinyu Li and   
                     Yifan Gong   Speaker Separation Using Speaker
                                  Inventories and Estimated Speech . . . . 537--546
                  Sandro Cumani   On the Distribution of Speaker
                                  Verification Scores: Generative Models
                                  for Unsupervised Calibration . . . . . . 547--562
               Yu-Ren Chien and   
         Jón Gu\ethnason   Acoustic Measure of Vocal Strain Based
                                  on Glottal Airflow Periodicity . . . . . 563--574
                Xingfa Shen and   
               Xingkun Shao and   
                  Quanbo Ge and   
                       Lili Liu   RARS: Recognition of Audio Recording
                                  Source Based on Residual Neural Network  575--584
                  Gang Chen and   
                   Yang Liu and   
                Huanbo Luan and   
                 Meng Zhang and   
                    Qun Liu and   
                    Maosong Sun   Learning to Generate Explainable Plots
                                  for Neural Story Generation  . . . . . . 585--593
               Wenxing Yang and   
              Jacob Benesty and   
             Gongping Huang and   
                  Jingdong Chen   A New Class of Differential Beamformers  594--606
             Yuki Mitsufuji and   
          Norihiro Takamune and   
             Shoichi Koyama and   
             Hiroshi Saruwatari   Multichannel Blind Source Separation
                                  Based on Evanescent-Region-Aware
                                  Non-Negative Tensor Factorization in
                                  Spherical Harmonic Domain  . . . . . . . 607--617
         Dörte Fischer and   
                    Simon Doclo   Robust Constrained MFMVDR Filters for
                                  Single-Channel Speech Enhancement Based
                                  on Spherical Uncertainty Set . . . . . . 618--631
                Xudong Zhao and   
              Jacob Benesty and   
              Jingdong Chen and   
                 Gongping Huang   Differential Beamforming From the
                                  Beampattern Factorization Perspective    632--643
                Yuki Kawara and   
                Chenhui Chu and   
                     Yuki Arase   Preordering Encoding on Transformer for
                                  Translation  . . . . . . . . . . . . . . 644--655
                      Anonymous   Table of Contents  . . . . . . . . . . . c1--ix
                      Anonymous   IEEE Signal Processing Society . . . . . c2--c2
                      Anonymous   Table of Contents  . . . . . . . . . . . x--xx
                Yuki Kawara and   
                Chenhui Chu and   
                     Yuki Arase   Preordering Encoding on Transformer for
                                  Translation  . . . . . . . . . . . . . . 644--655
           Hirokazu Kameoka and   
             Wen-Chin Huang and   
                 Kou Tanaka and   
            Takuhiro Kaneko and   
             Nobukatsu Hojo and   
                    Tomoki Toda   Many-to-Many Voice Transformer Network   656--670
                  Jie Zhang and   
                Huawei Chen and   
                Li-Rong Dai and   
     Richard Christian Hendriks   A Study on Reference Microphone
                                  Selection for Multi-Microphone Speech
                                  Enhancement  . . . . . . . . . . . . . . 671--683
          Archontis Politis and   
          Annamaria Mesaros and   
           Sharath Adavanne and   
              Toni Heittola and   
                Tuomas Virtanen   Overview and Evaluation of Sound Event
                                  Localization and Detection in DCASE 2019 684--698
            Markus Niermann and   
                     Peter Vary   Listening Enhancement in Noisy
                                  Environments: Solutions in Time and
                                  Frequency Domain . . . . . . . . . . . . 699--709
             Hyeonseung Lee and   
              Woo Hyun Kang and   
             Sung Jun Cheon and   
               Hyeongju Kim and   
                    Nam Soo Kim   Gated Recurrent Context: Softmax-Free
                                  Attention for Online Encoder-Decoder
                                  Speech Recognition . . . . . . . . . . . 710--719
           Elizabeth Vargas and   
           James R. Hopgood and   
                Keith Brown and   
                    Kartic Subr   On Improved Training of CNN for Acoustic
                                  Source Localisation  . . . . . . . . . . 720--732
                  Yunqi Cai and   
                 Lantian Li and   
                Andrew Abel and   
                Xiaoyan Zhu and   
                      Dong Wang   Deep Normalization for Speaker Vectors   733--744
             Wen-Chin Huang and   
             Tomoki Hayashi and   
                Yi-Chiao Wu and   
           Hirokazu Kameoka and   
                    Tomoki Toda   Pretraining Techniques for
                                  Sequence-to-Sequence Voice Conversion    745--755
               Arindam Jati and   
          Amrutha Nadarajan and   
             Raghuveer Peri and   
             Karel Mundnich and   
              Tiantian Feng and   
           Benjamin Girault and   
            Shrikanth Narayanan   Temporal Dynamics of Workplace Acoustic
                                  Scenes: Egocentric Analysis and
                                  Prediction . . . . . . . . . . . . . . . 756--769
               Chaoqun Duan and   
                 Kehai Chen and   
                   Rui Wang and   
              Masao Utiyama and   
            Eiichiro Sumita and   
                Conghui Zhu and   
                    Tiejun Zhao   Modeling Future Cost for Neural Machine
                                  Translation  . . . . . . . . . . . . . . 770--781
               Kashif Munir and   
                   Hai Zhao and   
                      Zuchao Li   Adaptive Convolution for Semantic Role
                                  Labeling . . . . . . . . . . . . . . . . 782--791
                Yi-Chiao Wu and   
             Tomoki Hayashi and   
             Takuma Okamoto and   
              Hisashi Kawai and   
                    Tomoki Toda   Quasi-Periodic Parallel WaveGAN: a
                                  Non-Autoregressive Raw Waveform
                                  Generative Model With Pitch-Dependent
                                  Dilated Convolution Neural Network . . . 792--806
                Weitao Yuan and   
                 Bofei Dong and   
              Shengbei Wang and   
              Masashi Unoki and   
                     Wenwu Wang   Evolving Multi-Resolution Pooling CNN
                                  for Monaural Singing Voice Separation    807--822
                 Liming Shi and   
               Taewoong Lee and   
                Lijun Zhang and   
  Jesper Kjær Nielsen and   
Mads Græsbòll Christensen   Generation of Personal Sound Zones With
                                  Physical Meaningful Constraints and
                                  Conjugate Gradient Method  . . . . . . . 823--837
                    Xi Chen and   
              Jacob Benesty and   
             Gongping Huang and   
                  Jingdong Chen   On the Robustness of the Superdirective
                                  Beamformer . . . . . . . . . . . . . . . 838--849
              Xinsheng Wang and   
              Tingting Qiao and   
                  Jihua Zhu and   
              Alan Hanjalic and   
             Odette Scharenborg   Generating Images From Spoken
                                  Descriptions . . . . . . . . . . . . . . 850--865
           Vevake Balaraman and   
               Bernardo Magnini   Domain-Aware Dialogue State Tracker for
                                  Multi-Domain Dialogue Systems  . . . . . 866--873
                   Xixin Wu and   
                 Yuewen Cao and   
                     Hui Lu and   
              Songxiang Liu and   
                Shiyin Kang and   
                 Zhiyong Wu and   
                Xunying Liu and   
                     Helen Meng   Exemplar-Based Emotive Speech Synthesis  874--886
            Heinrich Dinkel and   
                 Mengyue Wu and   
                         Kai Yu   Towards Duration Robust Weakly
                                  Supervised Sound Event Detection . . . . 887--900
              Zamir Ben-Hur and   
             David Lou Alon and   
               Ravish Mehra and   
                   Boaz Rafaely   Binaural Reproduction Based on Bilateral
                                  Ambisonics and Ear-Aligned HRTFs . . . . 901--913
          Philipp Aichinger and   
                 Franz Pernkopf   Synthesis and Analysis-By-Synthesis of
                                  Modulated Diplophonic Glottal Area
                                  Waveforms  . . . . . . . . . . . . . . . 914--926
              Finnian Kelly and   
              John H. L. Hansen   Analysis and Calibration of Lombard
                                  Effect and Whisper for Speaker
                                  Recognition  . . . . . . . . . . . . . . 927--942
       Matthias Müller and   
               Thilo Schulz and   
           Tatiana Ermakova and   
             Philipp P. Caffier   Lyric or Dramatic --- Vibrato Analysis
                                  for Voice Type Classification in
                                  Professional Opera Singers . . . . . . . 943--955
Demóstenes Z. Rodríguez and   
              Dick Carrillo and   
   Miguel A. Ramírez and   
       Pedro H. J. Nardelli and   
          Sebastian Möller   Incorporating Wireless Communication
                                  Parameters Into the E-Model Algorithm    956--968
               Tianrui Zong and   
                 Yong Xiang and   
     Iynkaran Natgunanathan and   
              Longxiang Gao and   
                  Guang Hua and   
                    Wanlei Zhou   Non-Linear-Echo Based Anti-Collusion
                                  Mechanism for Audio Signals  . . . . . . 969--984
                 Zheng Lian and   
                    Bin Liu and   
                    Jianhua Tao   CTNet: Conversational Transformer
                                  Network for Emotion Recognition  . . . . 985--1000
             Jiacheng Zhang and   
                Huanbo Luan and   
                Maosong Sun and   
                Feifei Zhai and   
                Jingfang Xu and   
                       Yang Liu   Neural Machine Translation With Explicit
                                  Phrase Alignment . . . . . . . . . . . . 1001--1010
              Maria Vukovic and   
             Melissa Stolar and   
                  Margaret Lech   Cognitive Load Estimation From Speech
                                  Commands to Simulated Aircraft . . . . . 1011--1022
                      De Hu and   
                   Zhe Chen and   
                    Fuliang Yin   Geometry Calibration for Acoustic
                                  Transceiver Networks Based on Network
                                  Newton Distributed Optimization  . . . . 1023--1032
                 Yuki Saito and   
       Shinnosuke Takamichi and   
             Hiroshi Saruwatari   Perceptual-Similarity-Aware Deep Speaker
                                  Representation Learning for
                                  Multi-Speaker Generative Modeling  . . . 1033--1048
             Tadashi Sakata and   
             Naomitsu Ikeda and   
                Yuichi Ueda and   
                 Akira Watanabe   Vocal Tract Length Estimation Using
                                  Accumulated Means of Formants and Its
                                  Effects on Speaker-Normalization . . . . 1049--1064
                Jichen Yang and   
                Hongji Wang and   
            Rohan Kumar Das and   
                    Yanmin Qian   Modified Magnitude-Phase Spectrum
                                  Information for Spoofing Detection . . . 1065--1078
                Yanmin Qian and   
             Zhengyang Chen and   
                     Shuai Wang   Audio-Visual Deep Neural Network for
                                  Robust Person Verification . . . . . . . 1079--1092
                 Peiqin Lin and   
                  Meng Yang and   
                  Jianhuang Lai   Deep Selective Memory Network With
                                  Selective Attention and Inter-Aspect
                                  Modeling for Aspect Level Sentiment
                                  Classification . . . . . . . . . . . . . 1093--1106
              Herman Kamper and   
          Yevgen Matusevych and   
               Sharon Goldwater   Improved Acoustic Word Embeddings for
                                  Zero-Resource Languages Using
                                  Multilingual Transfer  . . . . . . . . . 1107--1118
               Weiqing Wang and   
                    Jin Pan and   
                     Hua Yi and   
               Zhanmei Song and   
                        Ming Li   Audio-Based Piano Performance Evaluation
                                  for Beginners With Convolutional Neural
                                  Network and Attention Mechanism  . . . . 1119--1133
                Yi-Chiao Wu and   
             Tomoki Hayashi and   
      Patrick Lumban Tobing and   
         Kazuhiro Kobayashi and   
                    Tomoki Toda   Quasi-Periodic WaveNet: an
                                  Autoregressive Raw Waveform Generative
                                  Model With Pitch-Dependent Dilated
                                  Convolution Neural Network . . . . . . . 1134--1148
    Vesa Välimäki and   
                Karolina Prawda   Late-Reverberation Synthesis Using
                                  Interleaved Velvet-Noise Sequences . . . 1149--1160
            Zhuosheng Zhang and   
                 Junlong Li and   
                       Hai Zhao   Multi-Turn Dialogue Reading
                                  Comprehension With Pivot Turns and
                                  Knowledge  . . . . . . . . . . . . . . . 1161--1173
    Clément Gaultier and   
                Sr an Kiti  and   
      Rémi Gribonval and   
                   Nancy Bertin   Sparsity-Based Audio Declipping Methods:
                                  Selected Overview, New Algorithms, and
                                  Large-Scale Evaluation . . . . . . . . . 1174--1187
             Lachlan Birnie and   
        Thushara Abhayapala and   
         Vladimir Tourbabin and   
          Prasanga Samarasinghe   Mixed Source Sound Field Translation for
                                  Virtual Binaural Application With
                                  Perceptual Validation  . . . . . . . . . 1188--1203
             Monisankha Pal and   
                Manoj Kumar and   
             Raghuveer Peri and   
               Tae Jin Park and   
                So Hyun Kim and   
             Catherine Lord and   
               Somer Bishop and   
            Shrikanth Narayanan   Meta-Learning With Latent Space
                                  Clustering in Generative Adversarial
                                  Network for Speaker Diarization  . . . . 1204--1219
                  Jie Zhang and   
                     Jun Du and   
                    Li-Rong Dai   Sensor Selection for Relative Acoustic
                                  Transfer Function Steered
                                  Linearly-Constrained Beamformers . . . . 1220--1232
                  Huang Xie and   
                Tuomas Virtanen   Zero-Shot Audio Classification Via
                                  Semantic Embeddings  . . . . . . . . . . 1233--1242
              Xianhong Chen and   
                  Changchun Bao   Phoneme-Unit-Specific Time-Delay Neural
                                  Network for Speaker Verification . . . . 1243--1255
               Dongyuan Shi and   
              Woon-Seng Gan and   
                   Bhan Lam and   
                 Shulin Wen and   
                    Xiaoyi Shen   Optimal Output-Constrained Active Noise
                                  Control Based on Inverse Adaptive
                                  Modeling Leak Factor Estimate  . . . . . 1256--1269
            Ashutosh Pandey and   
                   DeLiang Wang   Dense CNN With Self-Attention for
                                  Time-Domain Speech Enhancement . . . . . 1270--1279
                   Libo Qin and   
               Wanxiang Che and   
                 Minheng Ni and   
                Yangming Li and   
                       Ting Liu   Knowing Where to Leverage: Context-Aware
                                  Graph Convolutional Network With an
                                  Adaptive Fusion Layer for Contextual
                                  Spoken Language Understanding  . . . . . 1280--1289
             Mingyang Zhang and   
                    Yi Zhou and   
                    Li Zhao and   
                     Haizhou Li   Transfer Learning From Speech Synthesis
                                  to Voice Conversion With Non-Parallel
                                  Training Data  . . . . . . . . . . . . . 1290--1302
                 Weipeng He and   
              Petr Motlicek and   
               Jean-Marc Odobez   Neural Network Adaptation and Data
                                  Augmentation for Multi-Speaker
                                  Direction-of-Arrival Estimation  . . . . 1303--1317
                  Yile Wang and   
                 Leyang Cui and   
                      Yue Zhang   Improving Skip-Gram Embeddings Using
                                  BERT . . . . . . . . . . . . . . . . . . 1318--1328
                  Linzhi Wu and   
                  Meishan Zhang   Deep Graph-Based Character-Level Chinese
                                  Dependency Parsing . . . . . . . . . . . 1329--1339
                     Ye Bai and   
                Jiangyan Yi and   
                Jianhua Tao and   
                Zhengqi Wen and   
              Zhengkun Tian and   
                    Shuai Zhang   Integrating Knowledge Into End-to-End
                                  Speech Recognition From External
                                  Text-Only Data . . . . . . . . . . . . . 1340--1351
             Byung Joon Cho and   
                 Hyung-Min Park   Convolutional Maximum-Likelihood
                                  Distortionless Response Beamforming With
                                  Steering Vector Estimation for Robust
                                  Speech Recognition . . . . . . . . . . . 1352--1367
         Daniel Michelsanti and   
              Zheng-Hua Tan and   
            Shi-Xiong Zhang and   
                    Yong Xu and   
                    Meng Yu and   
                    Dong Yu and   
                  Jesper Jensen   An Overview of Deep-Learning-Based
                                  Audio-Visual Speech Enhancement and
                                  Separation . . . . . . . . . . . . . . . 1368--1396
                 Gal Itzhak and   
              Jacob Benesty and   
                   Israel Cohen   On the Design of Differential Kronecker
                                  Product Beamformers  . . . . . . . . . . 1397--1410
                Zhongshu Ge and   
                   Liang Li and   
                     Tianshu Qu   Partially Matching Projection Decoding
                                  Method Evaluation Under Different
                                  Playback Conditions  . . . . . . . . . . 1411--1423
                  Sijie Mai and   
              Songlong Xing and   
                     Haifeng Hu   Analyzing Multimodal Sentiment Via
                                  Acoustic- and Visual-LSTM With
                                  Channel-Aware Temporal Convolution
                                  Network  . . . . . . . . . . . . . . . . 1424--1437
                   Tao Qian and   
              Meishan Zhang and   
                 Yinxia Lou and   
                     Daiwen Hua   A Joint Model for Named Entity
                                  Recognition With Sentence-Level Entity
                                  Type Attentions  . . . . . . . . . . . . 1438--1448
               Ryotaro Sato and   
                 Kenta Niwa and   
             Kazunori Kobayashi   Ambisonic Signal Processing DNNs
                                  Guaranteeing Rotation, Scale and Time
                                  Translation Equivariance . . . . . . . . 1449--1462
               Sooyeon Park and   
                  Jung-Woo Choi   Iterative Echo Labeling Algorithm With
                                  Convex Hull Expansion for Room Geometry
                                  Estimation . . . . . . . . . . . . . . . 1463--1478
           Aidan O. T. Hogg and   
            Christine Evers and   
          Alastair H. Moore and   
              Patrick A. Naylor   Overlapping Speaker Segmentation Using
                                  Multiple Hypothesis Tracking of
                                  Fundamental Frequency  . . . . . . . . . 1479--1490
               Rajib Sharma and   
               Israel Cohen and   
                 Baruch Berdugo   Controlling Elevation and Azimuth
                                  Beamwidths With Concentric Circular
                                  Microphone Arrays  . . . . . . . . . . . 1491--1502
                Run-Ze Wang and   
              Zhen-Hua Ling and   
               Jing-Bo Zhou and   
                          Yu Hu   A Multiple-Integration Encoder for
                                  Multi-Turn Text-to-SQL Semantic Parsing  1503--1513
                Shoukang Hu and   
                 Xurong Xie and   
               Shansong Liu and   
                 Jianwei Yu and   
                      Zi Ye and   
               Mengzhe Geng and   
                Xunying Liu and   
                     Helen Meng   Bayesian Learning of LF-MMI Trained Time
                                  Delay Neural Networks for Speech
                                  Recognition  . . . . . . . . . . . . . . 1514--1529
             Matteo Torcoli and   
           Thorsten Kastner and   
              Jürgen Herre   Objective Measures of Perceptual Audio
                                  Quality Reviewed: an Evaluation of Their
                                  Application Domain Dependence  . . . . . 1530--1541
            Heinrich Dinkel and   
                 Shuai Wang and   
                  Xuenan Xu and   
                 Mengyue Wu and   
                         Kai Yu   Voice Activity Detection in the Wild: a
                                  Data-Driven Approach Using
                                  Teacher-Student Training . . . . . . . . 1542--1555
                 Songbin Li and   
               Jingang Wang and   
                   Peng Liu and   
                   Miao Wei and   
                   Qiandong Yan   Detection of Multiple Steganography
                                  Methods in Compressed Speech Based on
                                  Code Element Embedding, Bi-LSTM and CNN
                                  With Attention Mechanisms  . . . . . . . 1556--1569
                  Qianli Ma and   
               Jiangyue Yan and   
                 Zhenxi Lin and   
                 Liuhong Yu and   
                    Zipeng Chen   Deformable Self-Attention for Text
                                  Classification . . . . . . . . . . . . . 1570--1581
               Ya-Jie Zhang and   
                  Zhen-Hua Ling   Extracting and Predicting Word-Level
                                  Style Variations for Speech Synthesis    1582--1593
        Alexander Bohlender and   
                 Ann Spriet and   
               Wouter Tirry and   
                   Nilesh Madhu   Exploiting Temporal Context in CNN Based
                                  Multisource DOA Estimation . . . . . . . 1594--1608
               Kohei Yatabe and   
                Daichi Kitamura   Determined BSS Based on Time-Frequency
                                  Masking and Its Application to Harmonic
                                  Vector Analysis  . . . . . . . . . . . . 1609--1625
                Ji Won Yoon and   
             Hyeonseung Lee and   
             Hyung Yong Kim and   
                 Won Ik Cho and   
                    Nam Soo Kim   TutorNet: Towards Flexible Knowledge
                                  Distillation for End-to-End Speech
                                  Recognition  . . . . . . . . . . . . . . 1626--1638
               Prachi Singh and   
               Sriram Ganapathy   Self-Supervised Representation Learning
                                  With Path Integral Clustering for
                                  Speaker Diarization  . . . . . . . . . . 1639--1649
                Penghui Wei and   
                Jiahao Zhao and   
                      Wenji Mao   A Graph-to-Sequence Learning Framework
                                  for Summarizing Opinionated Texts  . . . 1650--1660
             Dovid Y. Levin and   
    Shmulik Markovich-Golan and   
                  Sharon Gannot   Near-Field Superdirectivity: an
                                  Analytical Perspective . . . . . . . . . 1661--1674
                Jia-Hao Hsu and   
             Ming-Hsiang Su and   
             Chung-Hsien Wu and   
                  Yi-Hsuan Chen   Speech Emotion Recognition Considering
                                  Nonverbal Vocalization in Affective
                                  Conversations  . . . . . . . . . . . . . 1675--1686
          Tomohiko Nakamura and   
             Shihori Kozuka and   
             Hiroshi Saruwatari   Time-Domain Audio Source Separation With
                                  Neural Networks Based on Multiresolution
                                  Analysis . . . . . . . . . . . . . . . . 1687--1701
                  Yun Zhang and   
                Yongguo Liu and   
                Jiajing Zhu and   
                     Xindong Wu   FSPRM: a Feature Subsequence Based
                                  Probability Representation Model for
                                  Chinese Word Embedding . . . . . . . . . 1702--1716
              Songxiang Liu and   
                 Yuewen Cao and   
                Disong Wang and   
                   Xixin Wu and   
                Xunying Liu and   
                     Helen Meng   Any-to-Many Voice Conversion With
                                  Location-Relative Sequence-to-Sequence
                                  Modeling . . . . . . . . . . . . . . . . 1717--1728
            Rafael A. Chiea and   
     Márcio H. Costa and   
       Júlio A. Cordioli   An Optimal Envelope-Based Noise
                                  Reduction Method for Cochlear Implants:
                                  an Upper Bound Performance Investigation 1729--1739
               Junliang Guo and   
               Zhirui Zhang and   
                   Linli Xu and   
                Boxing Chen and   
                    Enhong Chen   Adaptive Adapters: an Efficient Way to
                                  Incorporate BERT Into Neural Machine
                                  Translation  . . . . . . . . . . . . . . 1740--1751
                     Yi Luo and   
                   Cong Han and   
                 Nima Mesgarani   Group Communication With Context Codec
                                  for Lightweight Source Separation  . . . 1752--1761
                 Zhiwen Xie and   
                 Runjie Zhu and   
                    Jin Liu and   
              Guangyou Zhou and   
            Jimmy Xiangji Huang   Hierarchical Neighbor Propagation With
                                  Bidirectional Graph Attention Network
                                  for Relation Prediction  . . . . . . . . 1762--1773
                Xuehan Wang and   
              Jacob Benesty and   
              Jingdong Chen and   
             Gongping Huang and   
                   Israel Cohen   Beamforming with Cube Microphone Arrays
                                  Via Kronecker Product Decompositions . . 1774--1784
                     Ke Tan and   
                   DeLiang Wang   Towards Model Compression for Deep
                                  Learning Based Speech Enhancement  . . . 1785--1794
             Kristina Tesch and   
                  Timo Gerkmann   Nonlinear Spatial Filtering in
                                  Multichannel Speech Enhancement  . . . . 1795--1805
                    Rui Liu and   
              Berrak Sisman and   
               Guanglai Gao and   
                     Haizhou Li   Expressive TTS Training With Frame and
                                  Style Reconstruction Loss  . . . . . . . 1806--1818
               Jipeng Qiang and   
                   Xinyu Lu and   
                     Yun Li and   
                Yunhao Yuan and   
                     Xindong Wu   Chinese Lexical Simplification . . . . . 1819--1828
                  Andong Li and   
                 Wenzhe Liu and   
             Chengshi Zheng and   
                Cunhang Fan and   
                    Xiaodong Li   Two Heads are Better Than One: a
                                  Two-Stage Complex Spectral Mapping
                                  Approach for Monaural Speech Enhancement 1829--1843
             Eric C. Hamdan and   
             Filippo Maria Fazi   Weighted Orthogonal Vector Rejection
                                  Method for Loudspeaker-Based Binaural
                                  Audio Reproduction . . . . . . . . . . . 1844--1852
                     Ke Tan and   
             Xueliang Zhang and   
                   DeLiang Wang   Deep Learning Based Real-Time Speech
                                  Enhancement for Dual-Microphone Mobile
                                  Phones . . . . . . . . . . . . . . . . . 1853--1863
            Kunkun SongGong and   
                Huawei Chen and   
                     Wenwu Wang   Indoor Multi-Speaker Localization Based
                                  on Bayesian Nonparametrics in the
                                  Circular Harmonic Domain . . . . . . . . 1864--1880
            Aleksej Chinaev and   
         Philipp Thüne and   
                  Gerald Enzner   Double-Cross-Correlation Processing for
                                  Blind Sampling-Rate and Time-Offset
                                  Estimation . . . . . . . . . . . . . . . 1881--1896
                     Ye Bai and   
                Jiangyan Yi and   
                Jianhua Tao and   
              Zhengkun Tian and   
                Zhengqi Wen and   
                    Shuai Zhang   Fast End-to-End Speech Recognition Via
                                  Non-Autoregressive Models and
                                  Cross-Modal Knowledge Transferring From
                                  BERT . . . . . . . . . . . . . . . . . . 1897--1911
Öykü Deniz Köse and   
          Murat Saraçlar   Multimodal Representations for
                                  Synchronized Speech and Real-Time MRI
                                  Video Processing . . . . . . . . . . . . 1912--1924
             N. P. Narendra and   
        Björn Schuller and   
                     Paavo Alku   The Detection of Parkinson's Disease
                                  From Speech Using Voice Source
                                  Information  . . . . . . . . . . . . . . 1925--1936
                Robert Rehr and   
                  Timo Gerkmann   SNR-Based Features and Diverse Training
                                  Data for Robust DNN-Based Speech
                                  Enhancement  . . . . . . . . . . . . . . 1937--1949
               Nobutaka Ito and   
           Rintaro Ikeshita and   
             Hiroshi Sawada and   
              Tomohiro Nakatani   A Joint Diagonalization Based Efficient
                                  Approach to Underdetermined Blind Audio
                                  Source Separation Using the Multichannel
                                  Wiener Filter  . . . . . . . . . . . . . 1950--1965
                    Hao Fei and   
              Shengqiong Wu and   
                 Yafeng Ren and   
                    Donghong Ji   Second-Order Semantic Role Labeling With
                                  Global Structural Refinement . . . . . . 1966--1976
         Humberto M. Torres and   
       Mercedes Güemes and   
         Jorge A. Gurlekian and   
                  Diego A. Evin   F0 Perturbation Due to Articulatory
                                  Movements: Filtering, Characterization
                                  and Applications . . . . . . . . . . . . 1977--1986
             Khaled Koutini and   
         Hamid Eghbal-zadeh and   
                 Gerhard Widmer   Receptive Field Regularization
                                  Techniques for Audio Classification and
                                  Tagging With Deep Convolutional Neural
                                  Networks . . . . . . . . . . . . . . . . 1987--2000
             Zhong-Qiu Wang and   
               Peidong Wang and   
                   DeLiang Wang   Multi-microphone Complex Spectral
                                  Mapping for Utterance-wise and
                                  Continuous Speech Separation . . . . . . 2001--2014
               Mengjia Zhou and   
                Donghong Ji and   
                         Fei Li   Relation Extraction in Dialogues: a Deep
                                  Learning Model Based on the Generality
                                  and Specialty of Dialogue Text . . . . . 2015--2026
                Minh Nguyen and   
                 Gia H. Ngo and   
                  Nancy F. Chen   Domain-Shift Conditioning Using
                                  Adaptable Filtering Via Hierarchical
                                  Embeddings for Robust Chinese Spell
                                  Check  . . . . . . . . . . . . . . . . . 2027--2036
               Lior Madmoni and   
                 Shir Tibor and   
              Israel Nelken and   
                   Boaz Rafaely   The Effect of Partial Time-Frequency
                                  Masking of the Direct Sound on the
                                  Perception of Reverberant Speech . . . . 2037--2047
                Haibin Chen and   
                  Qianli Ma and   
                 Liuhong Yu and   
                 Zhenxi Lin and   
                   Jiangyue Yan   Corpus-Aware Graph Aggregation Network
                                  for Sequence Labeling  . . . . . . . . . 2048--2057
                Heming Wang and   
                   DeLiang Wang   Towards Robust Speech Super-Resolution   2058--2066
                 Jianwei Yu and   
            Shi-Xiong Zhang and   
                      Bo Wu and   
               Shansong Liu and   
                Shoukang Hu and   
               Mengzhe Geng and   
                Xunying Liu and   
                 Helen Meng and   
                        Dong Yu   Audio-Visual Multi-Channel Integration
                                  and Recognition of Overlapped Speech . . 2067--2082
           Olga Slizovskaia and   
                Gloria Haro and   
            Emilia Gómez   Conditioned Source Separation for
                                  Musical Instrument Performances  . . . . 2083--2095
                 Xurong Xie and   
                Xunying Liu and   
                    Tan Lee and   
                       Lan Wang   Bayesian Learning for Deep Neural
                                  Network Adaptation . . . . . . . . . . . 2096--2110
Sankha Subhra Bhattacharjee and   
               Nithin V. George   Nearest Kronecker Product Decomposition
                                  Based Linear-in-The-Parameters Nonlinear
                                  Filters  . . . . . . . . . . . . . . . . 2111--2122
                Canguang Li and   
                Guohua Wang and   
                    Jin Cao and   
                         Yi Cai   A Multi-Agent Communication Based Model
                                  for Nested Named Entity Recognition  . . 2123--2136
                  Jonah Ong and   
                Ba Tuong Vo and   
                  Sven Nordholm   Blind Separation for Multiple Moving
                                  Sources With Labeled Random Finite Sets  2137--2151
                  Yixuan Su and   
                   Yan Wang and   
                   Deng Cai and   
                Simon Baker and   
              Anna Korhonen and   
                  Nigel Collier   PROTOTYPE-TO-STYLE: Dialogue Generation
                                  With Style-Aware Editing on Retrieval
                                  Memory . . . . . . . . . . . . . . . . . 2152--2161
         Alberto Bernardini and   
               Enrico Bozzo and   
           Federico Fontana and   
                  Augusto Sarti   A Wave Digital Newton--Raphson Method
                                  for Virtual Analog Modeling of Audio
                                  Circuits with Multiple One-Port
                                  Nonlinearities . . . . . . . . . . . . . 2162--2173
                   Gang Guo and   
                      Yi Yu and   
       Rodrigo C. de Lamare and   
            Zongsheng Zheng and   
                      Lu Lu and   
                  Qiangming Cai   Proximal Normalized Subband Adaptive
                                  Filtering for Acoustic Echo Cancellation 2174--2188
                 Juho Liski and   
         Aki Mäkivirta and   
        Vesa Välimäki   Audibility of Group-Delay Equalization   2189--2201
        Farjana Sultana Mim and   
                Naoya Inoue and   
               Paul Reisert and   
               Hiroki Ouchi and   
                   Kentaro Inui   Corruption Is Not All Bad: Incorporating
                                  Discourse Structure Into Pre-Training
                                  via Corruption for Essay Scoring . . . . 2202--2215
                Dror Kipnis and   
                   Roee Diamant   Graph-Based Clustering of Dolphin
                                  Whistles . . . . . . . . . . . . . . . . 2216--2227
               Yuanyuan Liu and   
        Nelly Penttilä and   
            Tiina Ihalainen and   
             Juulia Lintula and   
              Rachel Convey and   
         Okko Räsänen   Language-Independent Approach for
                                  Automatic Computation of Vowel
                                  Articulation Features in Dysarthric
                                  Speech Assessment  . . . . . . . . . . . 2228--2243
                  C. Medina and   
                  R. Coelho and   
                  L. Zão   Impulsive Noise Detection for Speech
                                  Enhancement in HHT Domain  . . . . . . . 2244--2253
Iván López-Espejo and   
              Zheng-Hua Tan and   
                  Jesper Jensen   A Novel Loss Function and Training
                                  Strategy for Noise-Robust Keyword
                                  Spotting . . . . . . . . . . . . . . . . 2254--2266
               Shansong Liu and   
               Mengzhe Geng and   
                Shoukang Hu and   
                 Xurong Xie and   
                 Mingyu Cui and   
                 Jianwei Yu and   
                Xunying Liu and   
                     Helen Meng   Recent Progress in the CUHK Dysarthric
                                  Speech Recognition System  . . . . . . . 2267--2281
                  Juan Zhao and   
               Tianrui Zong and   
                 Yong Xiang and   
              Longxiang Gao and   
                Wanlei Zhou and   
                  Gleb Beliakov   Desynchronization Attacks Resilient
                                  Watermarking Method Based on Frequency
                                  Singular Value Coefficient Modification  2282--2295
Mert Burkay Çöteli and   
Hüseyin Hacìhabibo\uglu   Sparse Representations With Legendre
                                  Kernels for DOA Estimation and Acoustic
                                  Source Separation  . . . . . . . . . . . 2296--2309
             Nicolas Furnon and   
             Romain Serizel and   
                 Slim Essid and   
                   Irina Illina   DNN-Based Mask Estimation for
                                  Distributed Speech Enhancement in
                                  Spatially Unconstrained Microphone
                                  Arrays . . . . . . . . . . . . . . . . . 2310--2323
            Or Haim Anidjar and   
             Itshak Lapidot and   
                 Chen Hajaj and   
                  Amit Dvir and   
                 Issachar Gilad   Hybrid Speech and Text Analysis Methods
                                  for Speaker Change Detection . . . . . . 2324--2338
                 Chuang Fan and   
                Chaofa Yuan and   
                    Lin Gui and   
                  Yue Zhang and   
                     Ruifeng Xu   Multi-Task Sequence Tagging for
                                  Emotion-Cause Pair Extraction Via Tag
                                  Distribution Refinement  . . . . . . . . 2339--2350
                Andy T. Liu and   
               Shang-Wen Li and   
                    Hung-yi Lee   TERA: Self-Supervised Learning of
                                  Transformer Encoder Representation for
                                  Speech . . . . . . . . . . . . . . . . . 2351--2366
              Guanlong Zhao and   
               Shaojin Ding and   
        Ricardo Gutierrez-Osuna   Converting Foreign Accent Speech Without
                                  a Reference  . . . . . . . . . . . . . . 2367--2381
     Kilian Schulze-Forster and   
        Clement S. J. Doire and   
          Gaël Richard and   
                  Roland Badeau   Phoneme Level Lyrics Alignment and
                                  Text-Informed Singing Voice Separation   2382--2395
              Shengqiong Wu and   
                    Hao Fei and   
                 Yafeng Ren and   
                    Bobo Li and   
                     Fei Li and   
                    Donghong Ji   High-Order Pair-Wise Aspect and Opinion
                                  Terms Extraction With Edge-Enhanced
                                  Syntactic Graph Convolution  . . . . . . 2396--2406
                  Jingyi Wu and   
                  Lin Shang and   
                   Xiaoying Gao   Sentiment Time Series Calibration for
                                  Event Detection  . . . . . . . . . . . . 2407--2420
               Kashif Munir and   
                   Hai Zhao and   
                      Zuchao Li   Learning Context-Aware Convolutional
                                  Filters for Implicit Discourse Relation
                                  Classification . . . . . . . . . . . . . 2421--2433
               Seokhwan Kim and   
              Hannes Schulz and   
         Chulaka Gunasekara and   
                Chiori Hori and   
            Abhinav Rastogi and   
          Luis Fernando D. Haro   Editorial: Special Issue on the Eighth
                                  Dialog System Technology Challenge . . . 2434--2436
              Byoungjae Kim and   
                Jungyun Seo and   
                 Myoung-Wan Koo   Randomly Wired Network Based on RoBERTa
                                  and Dialog History Attention for
                                  Response Selection . . . . . . . . . . . 2437--2442
                Jia-Chen Gu and   
                  Tianda Li and   
              Zhen-Hua Ling and   
                   Quan Liu and   
                 Zhiming Su and   
               Yu-Ping Ruan and   
                    Xiaodan Zhu   Deep Contextualized Utterance
                                  Representations for Response Selection
                                  and Dialogue Analysis  . . . . . . . . . 2443--2455
                Yun-Wei Chu and   
               Kuan-Yen Lin and   
              Chao-Chun Hsu and   
                     Lun-Wei Ku   End-to-End Recurrent Cross-Modality
                                  Attention for Video Dialogue . . . . . . 2456--2464
                     Kun Xu and   
                     Han Wu and   
               Linfeng Song and   
              Haisong Zhang and   
                 Linqi Song and   
                        Dong Yu   Conversational Semantic Role Labeling    2465--2475
                  Zekang Li and   
                 Zongjia Li and   
              Jinchao Zhang and   
                  Yang Feng and   
                       Jie Zhou   Bridging Text and Video: a Universal
                                  Multimodal Transformer for Audio-Visual
                                  Scene-Aware Dialog . . . . . . . . . . . 2476--2483
            Igor Shalyminov and   
         Alessandro Sordoni and   
              Adam Atkinson and   
                  Hannes Schulz   GRTr: Generative-Retrieval Transformers
                                  for Data-Efficient Dialogue Domain
                                  Adaptation . . . . . . . . . . . . . . . 2484--2492
                 Jiali Zeng and   
               Yongjing Yin and   
                   Yang Liu and   
                   Yubin Ge and   
                     Jinsong Su   Domain Adaptive Meta-Learning for
                                  Dialogue State Tracking  . . . . . . . . 2493--2501
                 Chen Zhang and   
                Grandee Lee and   
      Luis Fernando D. Haro and   
                     Haizhou Li   D-Score: Holistic Dialogue Evaluation
                                  Without Reference  . . . . . . . . . . . 2502--2516
           Shrikant Malviya and   
               Rohit Mishra and   
      Santosh Kumar Barnwal and   
             Uma Shanker Tiwary   HDRS: Hindi Dialogue Restaurant Search
                                  Corpus for Dialogue State Tracking in
                                  Task-Oriented Environment  . . . . . . . 2517--2528
               Seokhwan Kim and   
              Michel Galley and   
         Chulaka Gunasekara and   
                Sungjin Lee and   
              Adam Atkinson and   
                Baolin Peng and   
              Hannes Schulz and   
               Jianfeng Gao and   
                 Jinchao Li and   
              Mahmoud Adada and   
               Minlie Huang and   
               Luis Lastras and   
     Jonathan K. Kummerfeld and   
          Walter S. Lasecki and   
                Chiori Hori and   
              Anoop Cherian and   
               Tim K. Marks and   
            Abhinav Rastogi and   
               Xiaoxue Zang and   
           Srinivas Sunkara and   
                   Raghav Gupta   Overview of the Eighth Dialog System
                                  Technology Challenge: DSTC8  . . . . . . 2529--2540
             Myeongho Jeong and   
             Seungtaek Choi and   
               Jinyoung Yeo and   
                Seung-won Hwang   Label and Context Augmentation for
                                  Response Selection at DSTC8  . . . . . . 2541--2550
                   Qing Liu and   
                   Lei Chen and   
                  Yuan Yuan and   
                      Huarui Wu   History Reuse and Bag-of-Words Loss for
                                  Long Summary Generation  . . . . . . . . 2551--2560
                   Lu Zhang and   
             Mingjiang Wang and   
               Qiquan Zhang and   
              Xinsheng Wang and   
                       Ming Liu   PhaseDCN: a Phase-Enhanced Dual-Path
                                  Dilated Convolutional Network for
                                  Single-Channel Speech Enhancement  . . . 2561--2574
          Kazi Nazmul Haque and   
                 Rajib Rana and   
                 Jiajun Liu and   
          John H. L. Hansen and   
           Nicholas Cummins and   
               Carlos Busso and   
         Björn W. Schuller   Guided Generative Adversarial Neural
                                  Network for Representation Learning and
                                  Audio Generation Using Fewer Labelled
                                  Audio Data . . . . . . . . . . . . . . . 2575--2590
             Toru Nakashika and   
                   Kohei Yatabe   Gamma Boltzmann Machine for Audio
                                  Modeling . . . . . . . . . . . . . . . . 2591--2605
                 Xintong Li and   
                  Lemao Liu and   
                Zhaopeng Tu and   
                 Guanlin Li and   
                Shuming Shi and   
                 Max Q.-H. Meng   Attending From Foresight: a Novel
                                  Attention Mechanism for Neural Machine
                                  Translation  . . . . . . . . . . . . . . 2606--2616
              Hengshun Zhou and   
                     Jun Du and   
             Yuanyuan Zhang and   
                  Qing Wang and   
              Qing-Feng Liu and   
                   Chin-Hui Lee   Information Fusion in Attention Networks
                                  Using Adaptive and Multi-Level
                                  Factorized Bilinear Pooling for
                                  Audio-Visual Emotion Recognition . . . . 2617--2629
                  Yuling Li and   
                     Kui Yu and   
                   Yuhong Zhang   Learning Cross-Lingual Mappings in
                                  Imperfectly Isomorphic Embedding Spaces  2630--2642
                  Xiao Zhou and   
              Zhen-Hua Ling and   
                    Li-Rong Dai   UnitNet: a Sequence-to-Sequence Acoustic
                                  Model for Concatenative Speech Synthesis 2643--2655
                  Zihan Pan and   
                 Malu Zhang and   
                   Jibin Wu and   
               Jiadong Wang and   
                     Haizhou Li   Multi-Tone Phase Coding of Interaural
                                  Time Difference for Sound Source
                                  Localization With Spiking Neural
                                  Networks . . . . . . . . . . . . . . . . 2656--2670
               Ken O Hanlon and   
                Mark B. Sandler   FifthNet: Structured Compact Neural
                                  Networks for Automatic Chord Recognition 2671--2682
             Simone Spagnol and   
           Riccardo Miccini and   
      Marius George Onofrei and   
          Runar Unnthorsson and   
               Stefania Serafin   Estimation of Spectral Notches From
                                  Pinna Meshes: Insights From a Simple
                                  Computational Model  . . . . . . . . . . 2683--2695
                Chenglin Xu and   
                    Wei Rao and   
                   Jibin Wu and   
                     Haizhou Li   Target Speaker Verification With
                                  Selective Auditory Attention for Single
                                  and Multi-Talker Speech  . . . . . . . . 2696--2709
                Adel Zahedi and   
   Michael Syskind Pedersen and   
      Jan Òstergaard and   
 Thomas Ulrich Christiansen and   
       Lars Bramslòw and   
                  Jesper Jensen   Minimum Processing Beamforming . . . . . 2710--2724
              Xianghui Wang and   
                   Jie Chen and   
                Xiaoyi Chen and   
                   Jing Guo and   
                     Qian Xiang   Multichannel Iterative Noise Reduction
                                  Filters in the
                                  Short-Time-Fourier-Transform Domain
                                  Based on Kronecker Product Decomposition 2725--2740
                 Kai-Li Yin and   
                  Yi-Fei Pu and   
                          Lu Lu   Robust Q-Gradient Subband Adaptive
                                  Filter for Nonlinear Active Noise
                                  Control  . . . . . . . . . . . . . . . . 2741--2752
                 Jaeuk Byun and   
                  Jong Won Shin   Monaural Speech Separation Using Speaker
                                  Embedding From Preliminary Separation    2753--2763
                Xudong Zhao and   
             Gongping Huang and   
              Jingdong Chen and   
                  Jacob Benesty   On the Design of 3D Steerable
                                  Beamformers With Uniform Concentric
                                  Circular Microphone Arrays . . . . . . . 2764--2778
               Zifeng Cheng and   
               Zhiwei Jiang and   
                 Yafeng Yin and   
                      Na Li and   
                        Qing Gu   A Unified Target-Oriented
                                  Sequence-to-Sequence Model for
                                  Emotion-Cause Pair Extraction  . . . . . 2779--2791
                Hamid Azadi and   
   Mohammad-R. Akbarzadeh-T and   
           Hamid-R. Kobravi and   
                    Ali Shoeibi   Robust Voice Feature Selection Using
                                  Interval Type-2 Fuzzy AHP for Automated
                                  Diagnosis of Parkinson's Disease . . . . 2792--2802
                Yukiya Hono and   
              Kei Hashimoto and   
             Keiichiro Oura and   
          Yoshihiko Nankaku and   
                 Keiichi Tokuda   Sinsy: a Deep Neural Network-Based
                                  Singing Voice Synthesis System . . . . . 2803--2815
                  Jian Tang and   
                  Jie Zhang and   
                   Yan Song and   
             Ian McLoughlin and   
                    Li-Rong Dai   Multi-Granularity Sequence Alignment
                                  Mapping for Encoder-Decoder Based
                                  End-to-End ASR . . . . . . . . . . . . . 2816--2828
             Chongman Leong and   
                  Xuebo Liu and   
              Derek F. Wong and   
                  Lidia S. Chao   Exploiting Translation Model for
                                  Parallel Corpus Mining . . . . . . . . . 2829--2839
             Neil Zeghidour and   
                 David Grangier   Wavesplit: End-to-End Speech Separation
                                  by Speaker Clustering  . . . . . . . . . 2840--2849
                 Dino Oglic and   
            Zoran Cvetkovic and   
                  Peter Sollich   Learning Waveform-Based Acoustic Models
                                  Using Deep Variational Convolutional
                                  Neural Networks  . . . . . . . . . . . . 2850--2863
            Alexandru Nelus and   
                  Rainer Martin   Privacy-Preserving Audio Classification
                                  Using Variational Information Feature
                                  Extraction . . . . . . . . . . . . . . . 2864--2877
                     Hao Li and   
               DeLiang Wang and   
             Xueliang Zhang and   
                   Guanglai Gao   Recurrent Neural Networks and Acoustic
                                  Features for Frame-Level Signal-to-Noise
                                  Ratio Estimation . . . . . . . . . . . . 2878--2887
                    Yi Zhou and   
             Xiaoqing Zheng and   
                 Xuanjing Huang   Generating Responses With a Given
                                  Syntactic Pattern in Chinese Dialogues   2888--2898
          Viktor Gunnarsson and   
                 Mikael Sternad   Binaural Auralization of Microphone
                                  Array Room Impulse Responses Using
                                  Causal Wiener Filtering  . . . . . . . . 2899--2914
               Zuolong Chen and   
                Huawei Chen and   
                   Quansheng Tu   Sensor Imperfection Tolerance Analysis
                                  of Robust Linear Differential Microphone
                                  Arrays . . . . . . . . . . . . . . . . . 2915--2929
                 Yusheng Su and   
                     Xu Han and   
                 Yankai Lin and   
             Zhengyan Zhang and   
                Zhiyuan Liu and   
                    Peng Li and   
                   Jie Zhou and   
                    Maosong Sun   CSS-LM: a Contrastive Framework for
                                  Semi-Supervised Fine-Tuning of
                                  Pre-Trained Language Models  . . . . . . 2930--2941
           Tobias Kabzinski and   
                      Peter Jax   A Causality-Constrained Frequency-Domain
                                  Least-Squares Filter Design Method for
                                  Crosstalk Cancellation . . . . . . . . . 2942--2956
               Frank Zalkow and   
            Meinard Müller   CTC-Based Learning of Chroma Features
                                  for Score Audio Music Retrieval  . . . . 2957--2971
              Teck Kai Chan and   
               Cheng Siong Chin   Multi-Branch Convolutional Macaron net
                                  for Sound Event Detection  . . . . . . . 2972--2985
          Tedd Kourkounakis and   
         Amirhossein Hajavi and   
                     Ali Etemad   FluentNet: End-to-End Detection of
                                  Stuttered Speech Disfluencies With Deep
                                  Learning . . . . . . . . . . . . . . . . 2986--2999
                   Haoyu Li and   
              Junichi Yamagishi   Multi-Metric Optimization Using
                                  Generative Adversarial Networks for
                                  Near-End Speech Intelligibility
                                  Enhancement  . . . . . . . . . . . . . . 3000--3011
                  Zehao Lin and   
                 Shaobo Cui and   
                  Guodun Li and   
              Xiaoming Kang and   
                    Feng Ji and   
                 Fenglin Li and   
             Zhongzhou Zhao and   
               Haiqing Chen and   
                      Yin Zhang   Predict-Then-Decide: a Predictive
                                  Approach for Wait or Answer Task in
                                  Dialogue Systems . . . . . . . . . . . . 3012--3024
                Metin Calis and   
          Steven van de Par and   
           Richard Heusdens and   
     Richard Christian Hendriks   Localization Based on Enhanced Low
                                  Frequency Interaural Level Difference    3025--3039
         Christopher Liberatore   Native-Nonnative Voice Conversion by
                                  Residual Warping in a Sparse,
                                  Anchor-Based Representation  . . . . . . 3040--3051
             Shoichi Koyama and   
     Jesper Brunnström and   
                 Hayato Ito and   
               Natsuki Ueno and   
             Hiroshi Saruwatari   Spatial Active Noise Control Based on
                                  Kernel Interpolation of Sound Field  . . 3052--3063
               Jipeng Qiang and   
                     Yun Li and   
                     Yi Zhu and   
                Yunhao Yuan and   
                   Yang Shi and   
                     Xindong Wu   LSBert: Lexical Simplification Based on
                                  BERT . . . . . . . . . . . . . . . . . . 3064--3076
               Ningyu Zhang and   
                 Hongbin Ye and   
                Shumin Deng and   
                Chuanqi Tan and   
                 Mosha Chen and   
             Songfang Huang and   
                  Fei Huang and   
                    Huajun Chen   Contrastive Information Extraction With
                                  Generative Transformer . . . . . . . . . 3077--3088
                Jianyu Wang and   
             Shanzheng Guan and   
                 Shupei Liu and   
                 Xiao-Lei Zhang   Minimum-Volume Multichannel Nonnegative
                                  Matrix Factorization for Blind Audio
                                  Source Separation  . . . . . . . . . . . 3089--3103
             Alberto Carini and   
            Stefania Cecchi and   
         Alessandro Terenzi and   
                 Simone Orcioni   A Room Impulse Response Measurement
                                  Method Robust Towards Nonlinearities
                                  Based on Orthogonal Periodic Sequences   3104--3117
                  Jie Zhang and   
                   Changheng Li   Quantization-Aware Binaural MWF Based
                                  Noise Reduction Incorporating External
                                  Wireless Devices . . . . . . . . . . . . 3118--3131
                   Biru Zhu and   
              Xingyao Zhang and   
                    Ming Gu and   
                  Yangdong Deng   Knowledge Enhanced Fact Checking and
                                  Verification . . . . . . . . . . . . . . 3132--3143
            Mark A. Poletti and   
                   Paul D. Teal   A Superfast Toeplitz Matrix Inversion
                                  Method for Single- and Multi-Channel
                                  Inverse Filters and Its Application to
                                  Room Equalization  . . . . . . . . . . . 3144--3157
                 Guanlin Li and   
                  Lemao Liu and   
                Conghui Zhu and   
                   Rui Wang and   
                Tiejun Zhao and   
                    Shuming Shi   Detecting Source Contextual Barriers for
                                  Understanding Neural Machine Translation 3158--3169
              Chia-Chih Kuo and   
               Kuan-Yu Chen and   
                  Shang-Bao Luo   Audio-Aware Spoken Multiple-Choice
                                  Question Answering With Pre-Trained
                                  Language Models  . . . . . . . . . . . . 3170--3179
                    Rui Liu and   
                  Zheng Lin and   
                   Weiping Wang   Addressing Extraction and Generation
                                  Separately: Keyphrase Prediction With
                                  Pre-Trained Language Models  . . . . . . 3180--3191
                Jiangnan Li and   
              Hongliang Pan and   
                  Zheng Lin and   
                    Peng Fu and   
                   Weiping Wang   Sarcasm Detection with Commonsense
                                  Knowledge  . . . . . . . . . . . . . . . 3192--3201
                Runyan Yang and   
              Gaofeng Cheng and   
                Haoran Miao and   
                      Ta Li and   
             Pengyuan Zhang and   
                   Yonghong Yan   Keyword Search Using Attention-Based
                                  End-to-End ASR and Frame-Synchronous
                                  Phoneme Alignments . . . . . . . . . . . 3202--3215
             Tareq Alkhaldi and   
                Chenhui Chu and   
                Sadao Kurohashi   Flexibly Focusing on Supporting Facts,
                                  Using Bridge Links, and Jointly Training
                                  Specialized Modules for Multi-Hop
                                  Question Answering . . . . . . . . . . . 3216--3225
                   Wenyi Wu and   
                 Yegui Xiao and   
                Jianhui Lin and   
                  Liying Ma and   
            Khashayar Khorasani   An Efficient Filter Bank Structure for
                                  Adaptive Notch Filtering and
                                  Applications . . . . . . . . . . . . . . 3226--3241
              Xinsheng Wang and   
        Justin van der Hout and   
                  Jihua Zhu and   
      Mark Hasegawa-Johnson and   
             Odette Scharenborg   Synthesizing Spoken Descriptions of
                                  Images . . . . . . . . . . . . . . . . . 3242--3254
             Vincent W. Neo and   
            Christine Evers and   
              Patrick A. Naylor   Enhancement of Noisy Reverberant Speech
                                  Using Polynomial Matrix Eigenvalue
                                  Decomposition  . . . . . . . . . . . . . 3255--3266
       Riccardo Giampiccolo and   
     Mauro Giuseppe de Bari and   
         Alberto Bernardini and   
                  Augusto Sarti   Wave Digital Modeling and Implementation
                                  of Nonlinear Audio Circuits With Nullors 3267--3279
                   Xixin Wu and   
                 Yuewen Cao and   
                     Hui Lu and   
              Songxiang Liu and   
                Disong Wang and   
                 Zhiyong Wu and   
                Xunying Liu and   
                     Helen Meng   Speech Emotion Recognition Using
                                  Sequential Capsule Networks  . . . . . . 3280--3291
                  Yuan Gong and   
                Yu-An Chung and   
                    James Glass   PSLA: Improving Audio Tagging With
                                  Pretraining, Sampling, Labeling, and
                                  Aggregation  . . . . . . . . . . . . . . 3292--3306
              Licheng Zhang and   
               Zhendong Mao and   
                 Benfeng Xu and   
                  Quan Wang and   
                 Yongdong Zhang   Review and Arrange: Curriculum Learning
                                  for Natural Language Understanding . . . 3307--3320
                     Fei He and   
                    Ling He and   
                 Jing Zhang and   
                Yuanyuan Li and   
                       Xi Xiong   Automatic Detection of Affective
                                  Flattening in Schizophrenia: Acoustic
                                  Correlates to Sound Waves and Auditory
                                  Perception . . . . . . . . . . . . . . . 3321--3334
 Saoussen Mathlouthi Bouzid and   
       Chiraz Ben Othmane Zribi   Efficient Learning Approach for
                                  Pronominal Anaphora and Ellipsis
                                  Identification and Resolution in Arabic
                                  Texts  . . . . . . . . . . . . . . . . . 3335--3348
           Arda Yüksel and   
             Berke U\ugurlu and   
               Aykut Koç   Semantic Change Detection With Gaussian
                                  Word Embeddings  . . . . . . . . . . . . 3349--3361
                     Mei Li and   
                   Lu Xiang and   
              Xiaomian Kang and   
                  Yang Zhao and   
                    Yu Zhou and   
                 Chengqing Zong   Medical Term and Status Generation From
                                  Chinese Clinical Dialogue With
                                  Multi-Granularity Transformer  . . . . . 3362--3374
                 Yongwei Li and   
                Jianhua Tao and   
             Donna Erickson and   
                    Bin Liu and   
                   Masato Akagi   $ F_0 $-Noise-Robust Glottal Source and
                                  Vocal Tract Analysis Based on ARX-LF
                                  Model  . . . . . . . . . . . . . . . . . 3375--3383
               Xianwen Liao and   
            Yongzhong Huang and   
             Yongzhuang Wei and   
              Chenhao Zhang and   
                    Fu Wang and   
                      Yong Wang   Efficient Estimate of Sentence's
                                  Representation Based on the Difference
                                  Semantics Model  . . . . . . . . . . . . 3384--3399
           Kwang Myung Jeon and   
               Geon Woo Lee and   
               Nam Kyun Kim and   
                  Hong Kook Kim   TAU-Net: Temporal Activation $U$-Net
                                  Shared With Nonnegative Matrix
                                  Factorization for Speech Enhancement in
                                  Unseen Noise Environments  . . . . . . . 3400--3414
               Yi-Yang Ding and   
               Hao-Jian Lin and   
                Li-Juan Liu and   
              Zhen-Hua Ling and   
                          Yu Hu   Robustness of Speech Spoofing Detectors
                                  Against Adversarial Post-Processing of
                                  Voice Conversion . . . . . . . . . . . . 3415--3426
                    Yi Zhou and   
               Xiaohai Tian and   
                     Haizhou Li   Language Agnostic Speaker Embedding for
                                  Cross-Lingual Personalized Speech
                                  Generation . . . . . . . . . . . . . . . 3427--3439
                     Ju Lin and   
Adriaan J. de Lind van Wijngaarden and   
           Kuang-Ching Wang and   
               Melissa C. Smith   Speech Enhancement Using Multi-Stage
                                  Self-Attentive Temporal Convolutional
                                  Networks . . . . . . . . . . . . . . . . 3440--3450
               Wei-Ning Hsu and   
             Benjamin Bolte and   
       Yao-Hung Hubert Tsai and   
            Kushal Lakhotia and   
       Ruslan Salakhutdinov and   
            Abdelrahman Mohamed   HuBERT: Self-Supervised Speech
                                  Representation Learning by Masked
                                  Prediction of Hidden Units . . . . . . . 3451--3460
              Kouei Yamaoka and   
               Nobutaka Ono and   
                   Shoji Makino   Time-Frequency-Bin-Wise Linear
                                  Combination of Beamformers for
                                  Distortionless Signal Enhancement  . . . 3461--3475
             Zhong-Qiu Wang and   
             Gordon Wichern and   
               Jonathan Le Roux   Convolutive Prediction for Monaural
                                  Speech Dereverberation and
                                  Noisy-Reverberant Speaker Separation . . 3476--3490
                  Bing Yang and   
                   Hong Liu and   
                     Xiaofei Li   Learning Deep Direct-Path Relative
                                  Transfer Function for Binaural Sound
                                  Source Localization  . . . . . . . . . . 3491--3503
                 Yiming Cui and   
               Wanxiang Che and   
                   Ting Liu and   
                   Bing Qin and   
                    Ziqing Yang   Pre-Training With Whole Word Masking for
                                  Chinese BERT . . . . . . . . . . . . . . 3504--3514
                  Leda Sar  and   
      Mark Hasegawa-Johnson and   
                   Chang D. Yoo   Counterfactually Fair Automatic Speech
                                  Recognition  . . . . . . . . . . . . . . 3515--3525
            Zhuohuang Zhang and   
                    Yong Xu and   
                    Meng Yu and   
            Shi-Xiong Zhang and   
                Lianwu Chen and   
       Donald S. Williamson and   
                        Dong Yu   Multi-Channel Multi-Frame ADL-MVDR for
                                  Target Speech Separation . . . . . . . . 3526--3540
         Nils L. Westhausen and   
               Rainer Huber and   
         Hannah Baumgartner and   
               Ragini Sinha and   
                Jan Rennies and   
                 Bernd T. Meyer   Reduction of Subjective Listening Effort
                                  for TV Broadcast Signals With Recurrent
                                  Neural Networks  . . . . . . . . . . . . 3541--3550
               Shota Sasaki and   
                 Jun Suzuki and   
                   Kentaro Inui   Subword-Based Compact Reconstruction for
                                  Open-Vocabulary Neural Word Embeddings   3551--3564
               Xiaodong Cui and   
                  Wei Zhang and   
              Abdullah Kayi and   
                Mingrui Liu and   
             Ulrich Finkler and   
            Brian Kingsbury and   
                George Saon and   
                     David Kung   Asynchronous Decentralized Distributed
                                  Training of Acoustic Models  . . . . . . 3565--3576
              Junqing Zhang and   
                  Wen Zhang and   
          Jihui Aimee Zhang and   
Thushara Dheemantha Abhayapala and   
                    Lijun Zhang   Spatial Active Noise Control in Rooms
                                  Using Higher Order Sources . . . . . . . 3577--3591
               Bingzhi Chen and   
                     Qi Cao and   
                 Mixiao Hou and   
                Zheng Zhang and   
               Guangming Lu and   
                    David Zhang   Multimodal Emotion Recognition With
                                  Temporal and Semantic Consistency  . . . 3592--3603
                 S. Supraja and   
           Andy W. H. Khong and   
                    S. Tatinati   Regularized Phrase-Based Topic Model for
                                  Automatic Question Classification With
                                  Domain-Agnostic Class Labels . . . . . . 3604--3616
              Natsuko Maeda and   
         Filippo Maria Fazi and   
           Falk-Martin Hoffmann   Sound Field Reproduction With a
                                  Cylindrical Loudspeaker Array Using
                                  First Order Wall Reflections . . . . . . 3617--3630
                  Xugang Lu and   
                  Peng Shen and   
                    Yu Tsao and   
                  Hisashi Kawai   Coupling a Generative Model With a
                                  Discriminative Learning Framework for
                                  Speaker Verification . . . . . . . . . . 3631--3641
            Hannes Helmholz and   
             David Lou Alon and   
Sebasti\`a V. Amengual Garí and   
                    Jens Ahrens   Effects of Additive Noise in Binaural
                                  Rendering of Spherical Microphone Array
                                  Signals  . . . . . . . . . . . . . . . . 3642--3653
                Joanna Hong and   
                  Minsu Kim and   
                Se Jin Park and   
                    Yong Man Ro   Speech Reconstruction With Reminiscent
                                  Sound Via Visual Voice Memory  . . . . . 3654--3667
                Ran Weisman and   
                 Tom Shlomo and   
         Vladimir Tourbabin and   
               Paul Calamia and   
                   Boaz Rafaely   Robustness of Acoustic Rake Filters in
                                  Minimum Variance Beamforming . . . . . . 3668--3678
                  Junhao Xu and   
                 Jianwei Yu and   
                Shoukang Hu and   
                Xunying Liu and   
                     Helen Meng   Mixed Precision Low-Bit Quantization of
                                  Neural Network Language Models for
                                  Speech Recognition . . . . . . . . . . . 3679--3693
                  Jidong Ge and   
               Yunyun Huang and   
                Xiaoyu Shen and   
                 Chuanyi Li and   
                         Wei Hu   Learning Fine-Grained Fact-Article
                                  Correspondence in Legal Cases  . . . . . 3694--3706
              Qiuqiang Kong and   
                  Bochen Li and   
                Xuchen Song and   
                   Yuan Wan and   
                    Yuxuan Wang   High-Resolution Piano Transcription With
                                  Pedals by Regressing Onset and Offset
                                  Times  . . . . . . . . . . . . . . . . . 3707--3717
                      Anonymous   2021 Index \booktitleIEEE/ACM
                                  Transactions on Audio, Speech, and
                                  Language Processing Vol. 29  . . . . . . 3718--3760

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 30, Number ??, 2022

                      Anonymous   IEEE Signal Processing Society . . . . . C2--C2
               Qianying Liu and   
                 Wenyu Guan and   
                  Sujian Li and   
                  Fei Cheng and   
           Daisuke Kawahara and   
                Sadao Kurohashi   RODA: Reverse Operation Based Data
                                  Augmentation for Solving Math Word
                                  Problems . . . . . . . . . . . . . . . . 1--11
                   Kai Zhen and   
                Jongmo Sung and   
                 Mi Suk Lee and   
            Seungkwon Beack and   
                      Minje Kim   Scalable and Efficient Neural Speech
                                  Coding: a Hybrid Design  . . . . . . . . 12--25
                   Sen Yang and   
                   Yang Liu and   
                 Dawei Feng and   
                   Dongsheng Li   Text Generation From Data With Dynamic
                                  Planning . . . . . . . . . . . . . . . . 26--34
             Stefan Liebich and   
                     Peter Vary   Occlusion Effect Cancellation in
                                  Headphones and Hearing Devices The
                                  Sister of Active Noise Cancellation  . . 35--48
            Zhuosheng Zhang and   
                  Haojie Yu and   
                   Hai Zhao and   
                  Masao Utiyama   Which Apple Keeps Which Doctor Away?
                                  Colorful Word Representations With
                                  Visual Oracles . . . . . . . . . . . . . 49--59
                Zhenyu Wang and   
              John H. L. Hansen   Multi-Source Domain Adaptation for
                                  Text-Independent Forensic Speaker
                                  Recognition  . . . . . . . . . . . . . . 60--75
              Kengtao Zheng and   
                 Nankai Lin and   
                  Shengyi Jiang   Unsupervised Character Embedding
                                  Correction and Candidate Word Denoising  76--86
                    Bing Ma and   
                Haifeng Sun and   
                Jingyu Wang and   
                      Qi Qi and   
                   Jianxin Liao   Extractive Dialogue Summarization
                                  Without Annotation Based on Distantly
                                  Supervised Machine Reading Comprehension
                                  in Customer Service  . . . . . . . . . . 87--97
               Shengcai Liu and   
                    Ning Lu and   
                 Cheng Chen and   
                        Ke Tang   Efficient Combinatorial Optimization for
                                  Word-Level Adversarial Textual Attack    98--111
         Alessandro Terenzi and   
            Nicola Ortolani and   
         Inês Nolasco and   
          Emmanouil Benetos and   
                Stefania Cecchi   Comparison of Feature Extraction Methods
                                  for Sound-Based Classification of Honey
                                  Bee Activity . . . . . . . . . . . . . . 112--122
               Shuiyang Mao and   
                P. C. Ching and   
                        Tan Lee   Enhancing Segment-Based Speech Emotion
                                  Recognition by Iterative Self-Learning   123--134
Abdolreza Sabzi Shahrebabaki and   
            Giampiero Salvi and   
   Torbjòrn Svendsen and   
       Sabato Marco Siniscalchi   Acoustic-to-Articulatory Mapping With
                                  Joint Optimization of Deep Speech
                                  Enhancement and Articulatory Inversion
                                  Models . . . . . . . . . . . . . . . . . 135--147
               Javier Jorge and   
     Adri\`a Giménez and   
Joan Albert Silvestre-Cerd\`a and   
               Jorge Civera and   
             Albert Sanchis and   
                    Alfons Juan   Live Streaming Speech Recognition Using
                                  Deep Bidirectional LSTM Acoustic Models
                                  and Interpolated Language Models . . . . 148--161
      Muhammed P. V. Shifas and   
       C\uat\ualin Zoril\ua and   
               Yannis Stylianou   End-to-End Neural Based Modification of
                                  Noisy Speech for Speech-in-Noise
                                  Intelligibility Improvement  . . . . . . 162--173
            Joon-Young Yang and   
                Joon-Hyuk Chang   VACE-WPE: Virtual Acoustic Channel
                                  Expansion Based on Neural Networks for
                                  Weighted Prediction Error-Based Speech
                                  Dereverberation  . . . . . . . . . . . . 174--189
                Chenpeng Du and   
                         Kai Yu   Phone-Level Prosody Modelling With
                                  GMM-Based MDN for Diverse and
                                  Controllable Speech Synthesis  . . . . . 190--201
                  Haibin Wu and   
                      Xu Li and   
                Andy T. Liu and   
                 Zhiyong Wu and   
                 Helen Meng and   
                    Hung-Yi Lee   Improving the Adversarial Robustness for
                                  Speaker Verification by Self-Supervised
                                  Learning . . . . . . . . . . . . . . . . 202--217
                 Mixiao Hou and   
                Zheng Zhang and   
                     Qi Cao and   
                David Zhang and   
                   Guangming Lu   Multi-View Speech Emotion Recognition
                                  Via Collective Relation Construction . . 218--229
                Da-rong Liu and   
                Po-chun Hsu and   
               Yi-chen Chen and   
            Sung-feng Huang and   
             Shun-po Chuang and   
                   Da-yi Wu and   
                    Hung-yi Lee   Learning Phone Recognition From Unpaired
                                  Audio and Phone Sequences Based on
                                  Generative Adversarial Network . . . . . 230--243
                Yuting Zhao and   
             Mamoru Komachi and   
          Tomoyuki Kajiwara and   
                    Chenhui Chu   Word-Region Alignment-Guided Multimodal
                                  Neural Machine Translation . . . . . . . 244--259
            Zhuosheng Zhang and   
               Yiqing Zhang and   
                       Hai Zhao   Syntax-Aware Multi-Spans Generation for
                                  Reading Comprehension  . . . . . . . . . 260--268
                Pengfei Zhu and   
            Zhuosheng Zhang and   
                   Hai Zhao and   
                   Xiaoguang Li   DUMA: Reading Comprehension With
                                  Transposition Thinking . . . . . . . . . 269--279
                Jiayuan Xie and   
               Ningxin Peng and   
                     Yi Cai and   
                   Tao Wang and   
                  Qingbao Huang   Diverse Distractor Generation for
                                  Constructing High-Quality Multiple
                                  Choice Questions . . . . . . . . . . . . 280--291
                  Jie Zhang and   
                 Guanghui Zhang   A Parametric Unconstrained Beamformer
                                  Based Binaural Noise Reduction for
                                  Assistive Hearing  . . . . . . . . . . . 292--304
               Luca Turchet and   
                  Johan Pauwels   Music Emotion Recognition: Intention of
                                  Composers-Performers Versus Perception
                                  of Musicians, Non-Musicians, and
                                  Listening Machines . . . . . . . . . . . 305--316
                 Wenxin Hou and   
                    Han Zhu and   
                Yidong Wang and   
               Jindong Wang and   
                    Tao Qin and   
                  Renjun Xu and   
             Takahiro Shinozaki   Exploiting Adapters for Cross-Lingual
                                  Low-Resource Speech Recognition  . . . . 317--329
                 Kehai Chen and   
                   Rui Wang and   
              Masao Utiyama and   
                Eiichiro Sumita   Integrating Prior Translation Knowledge
                                  Into Neural Machine Translation  . . . . 330--339
                  Keqi Deng and   
              Gaofeng Cheng and   
                Runyan Yang and   
                   Yonghong Yan   Alleviating ASR Long-Tailed Problem by
                                  Decoupling the Learning of
                                  Representation and Classification  . . . 340--354
                  Zuchao Li and   
                 Junru Zhou and   
                   Hai Zhao and   
                   Kevin Parnow   HPSG-Inspired Joint Neural Constituent
                                  and Dependency Parsing in $ O(n^3) $
                                  Time Complexity  . . . . . . . . . . . . 355--366
                   Xuan Shi and   
               Erica Cooper and   
              Junichi Yamagishi   Use of Speaker Recognition Approaches
                                  for Learning and Evaluating Embedding
                                  Representations of Musical Instrument
                                  Sounds . . . . . . . . . . . . . . . . . 367--377
                Zengwei Yao and   
                 Wenjie Pei and   
               Fanglin Chen and   
               Guangming Lu and   
                    David Zhang   Stepwise-Refining Speech Separation
                                  Network via Fine-Grained Encoding in
                                  High-Order Latent Domain . . . . . . . . 378--393
                Yanmin Qian and   
                    Zhikai Zhou   Optimizing Data Usage for Low-Resource
                                  Speech Recognition . . . . . . . . . . . 394--403
Narla John Metilda Sagaya Mary and   
           Srinivasan Umesh and   
       Sandesh Varadaraju Katta   S-Vectors and TESA: Speaker Embeddings
                                  and a Speaker Authenticator Based on
                                  Transformer Encoder  . . . . . . . . . . 404--413
        Bengt J. Borgström   Bayesian Estimation of PLDA in the
                                  Presence of Noisy Training Labels, With
                                  Applications to Speaker Verification . . 414--428
                Menglong Lu and   
                 Zhen Huang and   
                 Binyang Li and   
              Yunxiang Zhao and   
                  Zheng Qin and   
                   DongSheng Li   SIFTER: a Framework for Robust Rumor
                                  Detection  . . . . . . . . . . . . . . . 429--442
                 Lantian Li and   
                  Dong Wang and   
                Jiawen Kang and   
                 Renyu Wang and   
                    Jing Wu and   
               Zhendong Gao and   
                      Xiao Chen   A Principle Solution for Enroll-Test
                                  Mismatch in Speaker Recognition  . . . . 443--455
                    Feiran Yang   Analysis of Deficient-Length
                                  Partitioned-Block Frequency-Domain
                                  Adaptive Filters . . . . . . . . . . . . 456--467
                  Hui Jiang and   
               Linfeng Song and   
                   Yubin Ge and   
               Fandong Meng and   
                Junfeng Yao and   
                     Jinsong Su   An AST Structure Enhanced Decoder for
                                  Code Generation  . . . . . . . . . . . . 468--476
           Anssi Kanervisto and   
       Ville Hautamäki and   
              Tomi Kinnunen and   
              Junichi Yamagishi   Optimizing Tandem Speaker Verification
                                  and Anti-Spoofing Systems  . . . . . . . 477--488
                     Xin Ni and   
                        Jia Ren   FC-U2-Net: a Novel Deep Neural Network
                                  for Singing Voice Separation . . . . . . 489--494
             Neil Zeghidour and   
            Alejandro Luebs and   
                Ahmed Omran and   
               Jan Skoglund and   
             Marco Tagliasacchi   SoundStream: an End-to-End Neural Audio
                                  Codec  . . . . . . . . . . . . . . . . . 495--507
         Wageesha Manamperi and   
     Thushara D. Abhayapala and   
                Jihui Zhang and   
       Prasanga N. Samarasinghe   Drone Audition: Sound Source
                                  Localization Using On-Board Microphones  508--519
                    Qian Li and   
                   Hao Peng and   
                 Jianxin Li and   
                     Jia Wu and   
              Yuanxing Ning and   
                Lihong Wang and   
               Philip S. Yu and   
                     Zheng Wang   Reinforcement Learning-Based Dialogue
                                  Guided Event Extraction to Exploit
                                  Argument Relations . . . . . . . . . . . 520--533
              Santiago Ruiz and   
       Toon van Waterschoot and   
                    Marc Moonen   Distributed Combined Acoustic Echo
                                  Cancellation and Noise Reduction in
                                  Wireless Acoustic Sensor and Actuator
                                  Networks . . . . . . . . . . . . . . . . 534--547
        Lukas Grinewitschus and   
                     Peter Jung   The Harmonic Shift Algorithm for
                                  Efficient Multi-Pitch Detection  . . . . 548--561
                   Ziyao Lu and   
                   Xiang Li and   
                   Yang Liu and   
                Chulun Zhou and   
                Jianwei Cui and   
                   Bin Wang and   
                  Min Zhang and   
                     Jinsong Su   Exploring Multi-Stage Information
                                  Interactions for Multi-Source Neural
                                  Machine Translation  . . . . . . . . . . 562--570
              Jingxuan Yang and   
                      Si Li and   
                  Sheng Gao and   
                        Jun Guo   CorefDPR: a Joint Model for Coreference
                                  Resolution and Dropped Pronoun Recovery
                                  in Chinese Conversations . . . . . . . . 571--581
 Timuçin Berk Atalay and   
Zühre Sü Gül and   
               Enzo De Sena and   
          Zoran Cvetkovi\'c and   
Hüseyin Hacìhabibo\uglu   Scattering Delay Network Simulator of
                                  Coupled Volume Acoustics . . . . . . . . 582--593
                   Yi Zhang and   
                     Lei Li and   
                 Yunfang Wu and   
                      Qi Su and   
                         Xu Sun   Alleviating the Knowledge-Language
                                  Inconsistency: a Study for Deep
                                  Commonsense Knowledge  . . . . . . . . . 594--604
                     Ke Tan and   
             Zhong-Qiu Wang and   
                   DeLiang Wang   Neural Spectrospatial Filtering  . . . . 605--621
                Qianren Mao and   
                 Jianxin Li and   
               Chenghua Lin and   
               Congwen Chen and   
                   Hao Peng and   
                Lihong Wang and   
                   Philip S. Yu   Adaptive Pre-Training and Collaborative
                                  Fine-Tuning: a Win-Win Strategy to
                                  Improve Review Analysis Tasks  . . . . . 622--634
               Zifeng Cheng and   
               Zhiwei Jiang and   
                 Yafeng Yin and   
                  Cong Wang and   
                        Qing Gu   Learning to Classify Open Intent via
                                  Soft Labeling and Manifold Mixup . . . . 635--645
                Xiaochun An and   
             Frank K. Soong and   
                        Lei Xie   Disentangling Style and Speaker
                                  Attributes for TTS Style Transfer  . . . 646--658
                Zhuang Chen and   
                    Tieyun Qian   Retrieve-and-Edit Domain Adaptation for
                                  End2End Aspect Based Sentiment Analysis  659--672
                   Jian Liu and   
                 Mengshi Yu and   
                Yufeng Chen and   
                       Jinan Xu   Cross-Domain Slot Filling as Machine
                                  Reading Comprehension: a New Perspective 673--685
               Yongkang Liu and   
              Qingbao Huang and   
                    Jing Li and   
                Linzhang Mo and   
                     Yi Cai and   
                        Qing Li   SSAP: Storylines and Sentiment Aware
                                  Pre-Trained Model for Story Ending
                                  Generation . . . . . . . . . . . . . . . 686--694
                  Ying Zhou and   
              Xuefeng Liang and   
                      Yu Gu and   
                  Yifei Yin and   
                   Longshan Yao   Multi-Classifier Interactive Learning
                                  for Ambiguous Speech Emotion Recognition 695--705
                 Poul Hoang and   
           Jan Mark de Haan and   
              Zheng-Hua Tan and   
                  Jesper Jensen   Multichannel Speech Enhancement With Own
                                  Voice-Based Interfering Speech
                                  Suppression for Hearing Assistive
                                  Devices  . . . . . . . . . . . . . . . . 706--720
                  Weijie Yu and   
                    Chen Xu and   
                     Jun Xu and   
                 Liang Pang and   
                    Ji-Rong Wen   Distribution Distance Regularized
                                  Sequence Representation for Text
                                  Matching in Asymmetrical Domains . . . . 721--733
                Heming Wang and   
                   DeLiang Wang   Neural Cascade Architecture With
                                  Triple-Domain Loss for Speech
                                  Enhancement  . . . . . . . . . . . . . . 734--743
       Riccardo R. De Lucia and   
           Antonio Canclini and   
            Fabio Antonacci and   
                  Augusto Sarti   Group Dictionary Equivalent Source
                                  Method for Sparse Nearfield Acoustic
                                  Holography . . . . . . . . . . . . . . . 744--757
                    Tong Ma and   
                   Ying Wei and   
                        Xin Lou   Reconfigurable Nonuniform Filter Bank
                                  for Hearing Aid Systems  . . . . . . . . 758--771
           Victoria Mingote and   
             Antonio Miguel and   
               Dayana Ribas and   
             Alfonso Ortega and   
                 Eduardo Lleida   aDCF Loss Function for Deep Metric
                                  Learning in End-to-End Text-Dependent
                                  Speaker Verification Systems . . . . . . 772--784
               Quansheng Tu and   
                    Huawei Chen   Theoretical Lower Bounds on the
                                  Performance of the First-Order
                                  Differential Microphone Arrays With
                                  Sensor Imperfections . . . . . . . . . . 785--801
                Taihui Wang and   
                Feiran Yang and   
                       Jun Yang   Convolutive Transfer Function-Based
                                  Multichannel Nonnegative Matrix
                                  Factorization for Overdetermined Blind
                                  Source Separation  . . . . . . . . . . . 802--815
                   Yi Zhang and   
              Guangyou Zhou and   
                 Zhiwen Xie and   
            Jimmy Xiangji Huang   HGEN: Learning Hierarchical
                                  Heterogeneous Graph Encoding for Math
                                  Word Problem Solving . . . . . . . . . . 816--828
            Eduardo Fonseca and   
              Xavier Favory and   
                 Jordi Pons and   
              Frederic Font and   
                   Xavier Serra   FSD50K: an Open Dataset of Human-Labeled
                                  Sound Events . . . . . . . . . . . . . . 829--852
                     Yi Lei and   
                  Shan Yang and   
              Xinsheng Wang and   
                        Lei Xie   MsEmoTTS: Multi-Scale Emotion Transfer,
                                  Prediction, and Control for Emotional
                                  Speech Synthesis . . . . . . . . . . . . 853--864
                   Tao Wang and   
                   Ruibo Fu and   
                Jiangyan Yi and   
                Jianhua Tao and   
                    Zhengqi Wen   NeuralDPS: Neural Deterministic Plus
                                  Stochastic Model With Multiband
                                  Excitation for Noise-Controllable
                                  Waveform Generation  . . . . . . . . . . 865--878
                Simon Stone and   
               Yingming Gao and   
                 Peter Birkholz   Articulatory Synthesis of Vocalized /r/
                                  Allophones in German . . . . . . . . . . 879--889
             Prashant Serai and   
              Vishal Sunder and   
            Eric Fosler-Lussier   Hallucination of Speech Recognition
                                  Errors With Sequence to Sequence
                                  Learning . . . . . . . . . . . . . . . . 890--900
                     Bin Wu and   
             Sakriani Sakti and   
              Jinsong Zhang and   
               Satoshi Nakamura   Modeling Unsupervised Empirical
                                  Adaptation by DPGMM and DPGMM-RNN Hybrid
                                  Model to Extract Perceptual Features for
                                  Low-Resource ASR . . . . . . . . . . . . 901--916
                   Mi Zhang and   
                Tieyun Qian and   
                       Bing Liu   Exploit Feature and Relation Hierarchy
                                  for Relation Extraction  . . . . . . . . 917--930
              Wenxiang Jiao and   
                  Xing Wang and   
                  Shilin He and   
                Zhaopeng Tu and   
                 Irwin King and   
                 Michael R. Lyu   Exploiting Inactive Examples for Natural
                                  Language Generation With Data
                                  Rejuvenation . . . . . . . . . . . . . . 931--943
                  Youzhi Tu and   
                    Man-Wai Mak   Aggregating Frame-Level Information in
                                  the Spectral Domain With Self-Attention
                                  for Speaker Embedding  . . . . . . . . . 944--957
                Zhixing Tan and   
                Zeyuan Yang and   
                 Meng Zhang and   
                    Qun Liu and   
                Maosong Sun and   
                       Yang Liu   Dynamic Multi-Branch Layers for
                                  On-Device Neural Machine Translation . . 958--967
                 Weiwei Lin and   
                    Man-Wai Mak   Mixture Representation Learning for Deep
                                  Speaker Embedding  . . . . . . . . . . . 968--978
                   Peng Zhu and   
                Dawei Cheng and   
              Fangzhou Yang and   
                 Yifeng Luo and   
            Dingjiang Huang and   
               Weining Qian and   
                    Aoying Zhou   Improving Chinese Named Entity
                                  Recognition by Large-Scale Syntactic
                                  Dependency Graph . . . . . . . . . . . . 979--991
               Xiaobo Liang and   
                   Lijun Wu and   
                  Juntao Li and   
                    Tao Qin and   
                  Min Zhang and   
                    Tie-Yan Liu   Multi-Teacher Distillation With Single
                                  Model for Neural Machine Translation . . 992--1002
              Xiaofeng Chen and   
                Guohua Wang and   
                Haopeng Ren and   
                     Yi Cai and   
              Ho-fung Leung and   
                       Tao Wang   Task-Adaptive Feature Fusion for
                                  Generalized Few-Shot Relation
                                  Classification in an Open World
                                  Environment  . . . . . . . . . . . . . . 1003--1015
                Yu-Chen Lin and   
                   Cheng Yu and   
                  Yi-Te Hsu and   
                 Szu-Wei Fu and   
                    Yu Tsao and   
                    Tei-Wei Kuo   SEOFP-NET: Compression and Acceleration
                                  of Deep Neural Networks for Speech
                                  Enhancement Using Sign-Exponent-Only
                                  Floating-Points  . . . . . . . . . . . . 1016--1031
          Tomohiro Nakatani and   
           Rintaro Ikeshita and   
          Keisuke Kinoshita and   
             Hiroshi Sawada and   
               Naoyuki Kamo and   
                    Shoko Araki   Switching Independent Vector Analysis
                                  and its Extension to Blind and Spatially
                                  Guided Convolutional Beamforming
                                  Algorithms . . . . . . . . . . . . . . . 1032--1047
               Jianhua Geng and   
                 Sifan Wang and   
                Qinglai Liu and   
                        Xin Lou   Multi-Level Time-Frequency Bins
                                  Selection for Direction of Arrival
                                  Estimation Using a Single Acoustic
                                  Vector Sensor  . . . . . . . . . . . . . 1048--1060
                 Qinzhuo Wu and   
                   Qi Zhang and   
                 Xuanjing Huang   Automatic Math Word Problem Generation
                                  With Topic-Expression Co-Attention
                                  Mechanism and Reinforcement Learning . . 1061--1072
              Michael Nigro and   
               Sridhar Krishnan   Multimodal System for Audio Scene Source
                                  Counting and Analysis  . . . . . . . . . 1073--1082
                 Yishu Peng and   
                Sheng Zhang and   
               Jiashu Zhang and   
                 Wei Xing Zheng   Combined-Sample Multiband-Structured
                                  Subband Filtering Algorithms . . . . . . 1083--1092
                Shoukang Hu and   
                 Xurong Xie and   
                 Mingyu Cui and   
                Jiajun Deng and   
               Shansong Liu and   
                 Jianwei Yu and   
               Mengzhe Geng and   
                Xunying Liu and   
                     Helen Meng   Neural Architecture Search for LF-MMI
                                  Trained Time Delay Neural Networks . . . 1093--1107
                Xudong Dang and   
                     Wen Ma and   
  Emanuël A. P. Habets and   
                    Hongyan Zhu   TDOA-Based Robust Sound Source
                                  Localization With Sparse Regularization
                                  in Wireless Acoustic Sensor Networks . . 1108--1123
                   Shan Gao and   
                   Jing Lin and   
                  Xihong Wu and   
                     Tianshu Qu   Sparse DNN Model for Frequency Expanding
                                  of Higher Order Ambisonics Encoding
                                  Process  . . . . . . . . . . . . . . . . 1124--1135
              Giovanni Pepe and   
         Leonardo Gabrielli and   
          Stefano Squartini and   
              Carlo Tripodi and   
               Nicol\`o Strozzi   Deep Optimization of Parametric IIR
                                  Filters for Audio Equalization . . . . . 1136--1149
                    Moa Lee and   
                  Junmo Lee and   
                Joon-Hyuk Chang   Non-Autoregressive Fully Parallel Deep
                                  Convolutional Neural Speech Synthesis    1150--1159
               Liam Barrett and   
                 Junchao Hu and   
                   Peter Howell   Systematic Review of Machine Learning
                                  Approaches for Detecting Developmental
                                  Stuttering . . . . . . . . . . . . . . . 1160--1172
              Sang-Hoon Lee and   
             Hyeong-Rae Noh and   
             Woo-Jeoung Nam and   
                 Seong-Whan Lee   Duration Controllable Voice Conversion
                                  via Phoneme-Based Information Bottleneck 1173--1183
               Zhihong Shao and   
                Zhongqin Wu and   
                   Minlie Huang   AdvExpander: Generating Natural Language
                                  Adversarial Examples by Expanding Text   1184--1196
 Dhanunjaya Varma Devalraju and   
              Padmanabhan Rajan   Multiview Embeddings for Soundscape
                                  Classification . . . . . . . . . . . . . 1197--1206
               Chengyu Wang and   
                 Suyang Dai and   
                Yipeng Wang and   
                   Fei Yang and   
                Minghui Qiu and   
                 Kehan Chen and   
                   Wei Zhou and   
                      Jun Huang   ARoBERT: an ASR Robust Pre-Trained
                                  Language Model for Spoken Language
                                  Understanding  . . . . . . . . . . . . . 1207--1218
                  Jonah Ong and   
                Ba Tuong Vo and   
              Sven Nordholm and   
                  Ba-Ngu Vo and   
          Diluka Moratuwage and   
                 Changbeom Shim   Audio-Visual Based Online Multi-Source
                                  Separation . . . . . . . . . . . . . . . 1219--1234
                 Leyang Cui and   
                    Yafu Li and   
                      Yue Zhang   Label Attention Network for Structured
                                  Prediction . . . . . . . . . . . . . . . 1235--1248
             Sarinah Sutojo and   
                 Tobias May and   
              Steven van de Par   Segmentation of Multitalker Mixtures
                                  Based on Local Feature Contrasts and
                                  Auditory Glimpses  . . . . . . . . . . . 1249--1262
                    Hao Gao and   
                Xuelei Feng and   
                      Yong Shen   Weighted Loudspeaker Placement Method
                                  for Sound Field Reproduction . . . . . . 1263--1276
             Gongping Huang and   
              Jacob Benesty and   
               Israel Cohen and   
                  Jingdong Chen   Kronecker Product Multichannel Linear
                                  Filtering for Adaptive Weighted
                                  Prediction Error-Based Speech
                                  Dereverberation  . . . . . . . . . . . . 1277--1289
              Takehiro Sugimoto   Loudness-Level-Chasing Algorithm for
                                  Multiformat Live Audio Production  . . . 1290--1304
               Junshuang Wu and   
              Richong Zhang and   
                 Yongyi Mao and   
                   Jinpeng Huai   Dealing With Hierarchical Types and
                                  Label Noise in Fine-Grained Entity
                                  Typing . . . . . . . . . . . . . . . . . 1305--1318
                Anton Ragni and   
           Mark J. F. Gales and   
                Oliver Rose and   
         Katherine M. Knill and   
        Alexandros Kastanos and   
                  Qiujia Li and   
                 Preben M. Ness   Increasing Context for Estimating
                                  Confidence Scores in Automatic Speech
                                  Recognition  . . . . . . . . . . . . . . 1319--1329
               Zhongxin Bai and   
                Jianyu Wang and   
             Xiao-Lei Zhang and   
                  Jingdong Chen   End-to-End Speaker Verification via
                                  Curriculum Bipartite Ranking Weighted
                                  Binary Cross-Entropy . . . . . . . . . . 1330--1344
            Shang-Yi Chuang and   
              Hsin-Min Wang and   
                        Yu Tsao   Improved Lite Audio-Visual Speech
                                  Enhancement  . . . . . . . . . . . . . . 1345--1359
              Gaofeng Cheng and   
                Haoran Miao and   
                Runyan Yang and   
                  Keqi Deng and   
                   Yonghong Yan   ETEH: Unified Attention-Based End-to-End
                                  ASR and KWS Architecture . . . . . . . . 1360--1373
            Ashutosh Pandey and   
                   DeLiang Wang   Self-Attending RNN for Speech
                                  Enhancement to Improve Cross-Corpus
                                  Generalization . . . . . . . . . . . . . 1374--1385
                     Di Jin and   
                Shuyang Gao and   
               Seokhwan Kim and   
                   Yang Liu and   
         Dilek Hakkani-Tür   Towards Textual Out-of-Domain Detection
                                  Without In-Domain Labels . . . . . . . . 1386--1395
               K. Mrinalini and   
           P. Vijayalakshmi and   
                   T. Nagarajan   SBSim: a Sentence-BERT Similarity-Based
                                  Evaluation Metric for Indian Language
                                  Neural Machine Translation Systems . . . 1396--1406
             Changhong Wang and   
          Emmanouil Benetos and   
          Vincent Lostanlen and   
                    Elaine Chew   Adaptive Scattering Transforms for
                                  Playing Technique Recognition  . . . . . 1407--1421
                 Danwei Cai and   
               Weiqing Wang and   
                        Ming Li   Incorporating Visual Information in
                                  Audio Based Self-Supervised Speaker
                                  Recognition  . . . . . . . . . . . . . . 1422--1435
                     Yu Luo and   
                        Lina Pu   EC-ANC: Edge Case-Enhanced Active Noise
                                  Cancellation for True Wireless Stereo
                                  Earbuds  . . . . . . . . . . . . . . . . 1436--1447
                     Tao Li and   
              Xinsheng Wang and   
                 Qicong Xie and   
               Zhichao Wang and   
                        Lei Xie   Cross-Speaker Emotion Disentangling and
                                  Transfer for End-to-End Speech Synthesis 1448--1460
                 Yilin Zhao and   
            Zhuosheng Zhang and   
                       Hai Zhao   Reference Knowledgeable Network for
                                  Machine Reading Comprehension  . . . . . 1461--1473
                  Fu-Hao Yu and   
               Kuan-Yu Chen and   
                      Ke-Han Lu   Non-Autoregressive ASR Modeling Using
                                  Pre-Trained Language Models for Chinese
                                  Speech Recognition . . . . . . . . . . . 1474--1482
                 Yiming Cui and   
                   Ting Liu and   
               Wanxiang Che and   
               Zhigang Chen and   
                    Shijin Wang   Teaching Machines to Read, Answer and
                                  Explain  . . . . . . . . . . . . . . . . 1483--1492
            Shota Horiguchi and   
              Yusuke Fujita and   
            Shinji Watanabe and   
                  Yawen Xue and   
            Paola García   Encoder-Decoder Based Attractors for
                                  End-to-End Neural Diarization  . . . . . 1493--1507
                  Chenda Li and   
                  Zhuo Chen and   
                    Yanmin Qian   Dual-Path Modeling With Memory Embedding
                                  Model for Continuous Speech Separation   1508--1520
                    Yu Tong and   
                Jingzhi Guo and   
                     Jizhe Zhou   Separation Inference: a Unified
                                  Framework for Word Segmentation in East
                                  Asian Languages  . . . . . . . . . . . . 1521--1530

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 31, Number ??, 2023

      Mrinmoy Bhattacharjee and   
          S. R. M. Prasanna and   
                Prithwijit Guha   Clean vs. Overlapped Speech-Music
                                  Detection Using Harmonic-Percussive
                                  Features and Multi-Task Learning . . . . 1--10
                Zhaojie Luo and   
               Shoufeng Lin and   
                    Rui Liu and   
                   Jun Baba and   
         Yuichiro Yoshikawa and   
               Hiroshi Ishiguro   Decoupling Speaker-Independent Emotions
                                  for Voice Conversion via Source-Filter
                                  Networks . . . . . . . . . . . . . . . . 11--24
              Jinchuan Tian and   
                 Jianwei Yu and   
                  Chao Weng and   
                Yuexian Zou and   
                        Dong Yu   Integrating Lattice-Free MMI Into
                                  End-to-End Speech Recognition  . . . . . 25--38
               Ravi Shankar and   
              Hsi-Wei Hsieh and   
             Nicolas Charon and   
           Archana Venkataraman   A Diffeomorphic Flow-Based Variational
                                  Framework for Multi-Speaker Emotion
                                  Conversion . . . . . . . . . . . . . . . 39--53
      Ryandhimas E. Zezario and   
                 Szu-Wei Fu and   
                   Fei Chen and   
            Chiou-Shann Fuh and   
              Hsin-Min Wang and   
                        Yu Tsao   Deep Learning-Based Non-Intrusive
                                  Multi-Objective Speech Assessment Model
                                  With Cross-Domain Features . . . . . . . 54--70
                 Xiaoyi Qin and   
                 Danwei Cai and   
                        Ming Li   Robust Multi-Channel Far-Field Speaker
                                  Verification Under Different In-Domain
                                  Data Availability Scenarios  . . . . . . 71--85
           Vikram C. Mathad and   
              Julie M. Liss and   
              Kathy Chapman and   
              Nancy Scherer and   
                  Visar Berisha   Consonant-Vowel Transition Models Based
                                  on Deep Learning for Objective
                                  Evaluation of Articulation . . . . . . . 86--95
                      Li Li and   
           Hirokazu Kameoka and   
                   Shoji Makino   FastMVAE2: On Improving and Accelerating
                                  the Fast Variational Autoencoder-Based
                                  Source Separation Algorithm for
                                  Determined Mixtures  . . . . . . . . . . 96--110
                   Jie Wang and   
                   Yan Yang and   
                   Keyu Liu and   
                Zhiping Zhu and   
                   Xiaorong Liu   M3S: Scene Graph Driven
                                  Multi-Granularity Multi-Task Learning
                                  for Multi-Modal NER  . . . . . . . . . . 111--120
              Marc Delcroix and   
     Jorge Bennasar Vazquez and   
             Tsubasa Ochiai and   
          Keisuke Kinoshita and   
            Yasunori Ohishi and   
                    Shoko Araki   SoundBeam: Target Sound Extraction
                                  Conditioned on Sound-Class Labels and
                                  Enrollment Clues for Increased
                                  Performance and Continuous Learning  . . 121--136
            Daisuke Niizumi and   
             Daiki Takeuchi and   
            Yasunori Ohishi and   
              Noboru Harada and   
                  Kunio Kashino   BYOL for Audio: Exploring Pre-Trained
                                  General-Purpose Audio Representations    137--151
                 Yingrui Xu and   
                    Hao Liu and   
                 Jingguo Ge and   
              Xiaodan Zhang and   
                Jingyuan Hu and   
                   Yulei Wu and   
                 Honglei Lv and   
                Hongbin Shi and   
                       Wei Zhou   Mining Weak Relations Between Reviews
                                  for Opinion Spam Detection . . . . . . . 152--162
           Yoshiki Masuyama and   
               Kohei Yatabe and   
             Kento Nagatomo and   
                Yasuhiro Oikawa   Online Phase Reconstruction via
                                  DNN-Based Phase Differences Estimation   163--176
                  Jiang Liu and   
                Donghong Ji and   
                  Jingye Li and   
               Dongdong Xie and   
                 Chong Teng and   
                 Liang Zhao and   
                         Fei Li   TOE: a Grid-Tagging Discontinuous NER
                                  Model Enhanced by Embedding Tag\slash
                                  Word Relations and More Fine-Grained
                                  Tags . . . . . . . . . . . . . . . . . . 177--187
                     Zhe Hu and   
                 Zhiwei Cao and   
              Hou Pong Chan and   
                Jiachen Liu and   
                Xinyan Xiao and   
                 Jinsong Su and   
                         Hua Wu   Controllable Dialogue Generation With
                                  Disentangled Multi-Grained Style
                                  Specification and Attribute Consistency
                                  Reward . . . . . . . . . . . . . . . . . 188--199
          Sondes Abderrazek and   
         Corinne Fredouille and   
                 Alain Ghio and   
              Muriel Lalain and   
          Christine Meunier and   
               Virginie Woisard   Interpreting Deep Representations of
                                  Phonetic Features via Neuro-Based
                                  Concept Detector: Application to Speech
                                  Disorders Due to Head and Neck Cancer    200--214
                  Jie Zhang and   
                    Rui Tao and   
                     Jun Du and   
                    Li-Rong Dai   Energy-Efficient Sparsity-Driven Speech
                                  Enhancement in Wireless Acoustic Sensor
                                  Networks . . . . . . . . . . . . . . . . 215--228
                Xianke Wang and   
                 Bowen Tian and   
               Weiming Yang and   
                     Wei Xu and   
                  Wenqing Cheng   MusicYOLO: a Vision-Based Framework for
                                  Automatic Singing Transcription  . . . . 229--241
               Yuanyuan Liu and   
     Mittapalle Kiran Reddy and   
             Nelly Penttila and   
            Tiina Ihalainen and   
                 Paavo Alku and   
                   Okko Rasanen   Automatic Assessment of
                                  Parkinson&#x0027;s Disease Using Speech
                                  Representations of Phonation and
                                  Articulation . . . . . . . . . . . . . . 242--255
              David Sudholt and   
                Alec Wright and   
               Cumhur Erkut and   
                  Vesa Valimaki   Pruning Deep Neural Network Models of
                                  Guitar Distortion Effects  . . . . . . . 256--264
               Fangkai Jiao and   
               Yangyang Guo and   
               Minlie Huang and   
                    Liqiang Nie   Enhanced Multi-Domain Dialogue State
                                  Tracker With Second-Order Slot
                                  Interactions . . . . . . . . . . . . . . 265--276
                   Hui Tian and   
                  Yiqin Qiu and   
         Wojciech Mazurczyk and   
                 Haizhou Li and   
                  Zhenxing Qian   STFF-SM: Steganalysis Model Based on
                                  Spatial and Temporal Feature Fusion for
                                  Speech Streams . . . . . . . . . . . . . 277--289
      Gopendra Vikram Singh and   
           Mauajama Firdaus and   
                 Asif Ekbal and   
          Pushpak Bhattacharyya   EmoInt-Trans: a Multimodal Transformer
                                  for Identifying Emotions and Intents in
                                  Social Conversations . . . . . . . . . . 290--300
                   De De Hu and   
              Huaiwen Zhang and   
                Feilong Bao and   
                       Rui Wang   Distributed Sampling Rate Offset
                                  Estimation Over Acoustic Sensor Networks
                                  Based on Asynchronous Network Newton
                                  Optimization . . . . . . . . . . . . . . 301--312
          David Diaz-Guerra and   
             Antonio Miguel and   
                Jose R. Beltran   Direction of Arrival Estimation of Sound
                                  Sources Using Icosahedral CNNs . . . . . 313--321
                Peiming Guo and   
                 Shen Huang and   
               Peijie Jiang and   
                Yueheng Sun and   
              Meishan Zhang and   
                      Min Zhang   Curriculum-Style Fine-Grained Adaption
                                  for Unsupervised Cross-Lingual
                                  Dependency Transfer  . . . . . . . . . . 322--332
      Naveen Kumar Desiraju and   
                Simon Doclo and   
                Markus Buck and   
                   Tobias Wolff   Joint Online Estimation of Early and
                                  Late Residual Echo PSD for Residual Echo
                                  Suppression  . . . . . . . . . . . . . . 333--344
               Guangzhi Sun and   
                 Chao Zhang and   
             Philip C. Woodland   Minimising Biasing Word Errors for
                                  Contextual ASR With the Tree-Constrained
                                  Pointer Generator  . . . . . . . . . . . 345--354
             Jonah Casebeer and   
          Nicholas J. Bryan and   
                Paris Smaragdis   Meta-AF: Meta-Learning for Adaptive
                                  Filters  . . . . . . . . . . . . . . . . 355--370
                 Yingwen Fu and   
                 Nankai Lin and   
                  Boyu Chen and   
                  Ziyu Yang and   
                  Shengyi Jiang   Cross-Lingual Named Entity Recognition
                                  for Heterogeneous Languages  . . . . . . 371--382
               Jun-You Wang and   
           Jyh-Shing Roger Jang   Training a Singing Transcription Model
                                  Using Connectionist Temporal
                                  Classification Loss and Cross-Entropy
                                  Loss . . . . . . . . . . . . . . . . . . 383--396
             Zhong-Qiu Wang and   
             Gordon Wichern and   
            Shinji Watanabe and   
               Jonathan Le Roux   STFT-Domain Neural Speech Enhancement
                                  With Very Low Algorithmic Latency  . . . 397--410
                      Yu Li and   
                   Bojie Hu and   
                   Jian Liu and   
                Yufeng Chen and   
                       Jinan Xu   A Neighborhood Re-Ranking Model With
                                  Relation Constraint for Knowledge Graph
                                  Completion . . . . . . . . . . . . . . . 411--425
            Alessio Miaschi and   
          Dominique Brunato and   
        Felice Dell'Orletta and   
                 Giulia Venturi   On Robustness and Sensitivity of a
                                  Neural Language Model: a Case Study on
                                  Italian L1 Learner Errors  . . . . . . . 426--438
                  Rong Xiao and   
                     Yu Wan and   
               Baosong Yang and   
                Haibo Zhang and   
                Huajin Tang and   
              Derek F. Wong and   
                    Boxing Chen   Towards Energy-Preserving Natural
                                  Language Understanding With Spiking
                                  Neural Networks  . . . . . . . . . . . . 439--447
                  Juan Zhao and   
               Tianrui Zong and   
                 Yong Xiang and   
              Longxiang Gao and   
                  Guang Hua and   
                Keshav Sood and   
                    Yushu Zhang   SSVS-SSVD Based Desynchronization
                                  Attacks Resilient Watermarking Method
                                  for Stereo Signals . . . . . . . . . . . 448--461
               Qiquan Zhang and   
               Xinyuan Qian and   
                Zhaoheng Ni and   
             Aaron Nicolson and   
    Eliathamby Ambikairajah and   
                     Haizhou Li   A Time-Frequency Attention Module for
                                  Neural Speech Enhancement  . . . . . . . 462--475
                Binhong Xie and   
                      Yu Li and   
               Hongyan Zhao and   
                   Lihu Pan and   
                     Enhui Wang   A Cross-Attention Fusion Based Graph
                                  Convolution Auto-Encoder for Open
                                  Relation Extraction  . . . . . . . . . . 476--485
              Qian-Bei Hong and   
             Chung-Hsien Wu and   
                  Hsin-Min Wang   Generalization Ability Improvement of
                                  Speaker Representation and
                                  Anti-Interference for Speaker
                                  Verification . . . . . . . . . . . . . . 486--499
                Xinglin Lyu and   
                  Junhui Li and   
                  Min Zhang and   
              Chenchen Ding and   
              Hideki Tanaka and   
                  Masao Utiyama   Refining History for Future-Aware Neural
                                  Machine Translation  . . . . . . . . . . 500--512
                   Mou Wang and   
                 Junqi Chen and   
             Xiao-Lei Zhang and   
               Susanto Rahardja   End-to-End Multi-Modal Speech
                                  Recognition on an Air and Bone Conducted
                                  Speech Corpus  . . . . . . . . . . . . . 513--524
Asier López Zorrilla and   
María Inés Torres and   
    Heriberto Cuayáhuitl   Audio Embedding-Aware Dialogue Policy
                                  Learning . . . . . . . . . . . . . . . . 525--538
               Xichen Shang and   
                Chuxin Chen and   
                Zipeng Chen and   
                      Qianli Ma   Modularized Mutuality Network for
                                  Emotion-Cause Pair Extraction  . . . . . 539--549
               Xinyuan Qian and   
             Zhengdong Wang and   
               Jiadong Wang and   
                Guohui Guan and   
                     Haizhou Li   Audio-Visual Cross-Attention Network for
                                  Robotic Speaker Tracking . . . . . . . . 550--562
             Kristina Tesch and   
                  Timo Gerkmann   Insights Into Deep Non-Linear Filters
                                  for Improved Multi-Channel Speech
                                  Enhancement  . . . . . . . . . . . . . . 563--575
          Thilo von Neumann and   
          Keisuke Kinoshita and   
        Christoph Boeddeker and   
              Marc Delcroix and   
           Reinhold Haeb-Umbach   Segment-Less Continuous Speech
                                  Separation of Meetings: Training and
                                  Evaluation Criteria  . . . . . . . . . . 576--589
           Davide Albertini and   
         Alberto Bernardini and   
             Federico Borra and   
            Fabio Antonacci and   
                  Augusto Sarti   Two-Stage Beamforming With Arbitrary
                                  Planar Arrays of Differential Microphone
                                  Array Units  . . . . . . . . . . . . . . 590--602
              Yi-Syuan Chen and   
               Yun-Zhu Song and   
                 Hong-Han Shuai   SPEC: Summary Preference Decomposition
                                  for Low-Resource Abstractive
                                  Summarization  . . . . . . . . . . . . . 603--618
              Yingying Xiao and   
               Shanmou Chen and   
           Qiangqiang Zhang and   
               Dongyuan Lin and   
               Minglin Shen and   
                Junhui Qian and   
                   Shiyuan Wang   Generalized Hyperbolic Tangent Based
                                  Random Fourier Conjugate Gradient Filter
                                  for Nonlinear Active Noise Control . . . 619--632
                     Jun Qi and   
         Chao-Han Huck Yang and   
                Pin-Yu Chen and   
                 Javier Tejedor   Exploiting Low-Rank Tensor-Train Deep
                                  Neural Networks Based on Riemannian
                                  Gradient Descent With Illustrations of
                                  Speech Processing  . . . . . . . . . . . 633--642
                     Bin Gu and   
                     Wu Guo and   
                      Jie Zhang   Memory Storable Network Based Feature
                                  Aggregation for Speaker Representation
                                  Learning . . . . . . . . . . . . . . . . 643--655
                 Takumi Abe and   
             Shoichi Koyama and   
               Natsuki Ueno and   
             Hiroshi Saruwatari   Amplitude Matching for Multizone Sound
                                  Field Control  . . . . . . . . . . . . . 656--669
             Mahdi Barhoush and   
              Ahmed Hallawa and   
                 Arne Peine and   
               Lukas Martin and   
                  Anke Schmeink   Localization-Driven Speech Enhancement
                                  in Noisy Multi-Speaker Hospital
                                  Environments Using Deep Learning and
                                  Meta Learning  . . . . . . . . . . . . . 670--683
                  Herman Kamper   Word Segmentation on Discovered Phone
                                  Units With Dynamic Programming and
                                  Self-Supervised Scoring  . . . . . . . . 684--694
               Changheng Li and   
             Jorge Martinez and   
     Richard Christian Hendriks   Joint Maximum Likelihood Estimation of
                                  Microphone Array Parameters for a
                                  Reverberant Single Source Scenario . . . 695--705
            Shota Horiguchi and   
            Shinji Watanabe and   
        Paola García and   
             Yuki Takashima and   
                Yohei Kawaguchi   Online Neural Diarization of Unlimited
                                  Numbers of Speakers Using Global and
                                  Local Attractors . . . . . . . . . . . . 706--720
                    Ling He and   
                     Jia Fu and   
                Yuanyuan Li and   
                   Xi Xiong and   
                     Jing Zhang   WNSA-Net: an Axial-Attention-Based
                                  Network for Schizophrenia Detection
                                  Using Wideband and Narrowband
                                  Spectrograms . . . . . . . . . . . . . . 721--733
             Anusha Prakash and   
                 Hema A. Murthy   Exploring the Role of Language Families
                                  for Building Indic Speech Synthesisers   734--747
         Mahdin Rohmatillah and   
                Jen-Tzung Chien   Hierarchical Reinforcement Learning With
                                  Guidance for Multi-Domain Dialogue
                                  Policy . . . . . . . . . . . . . . . . . 748--761
           Shahram Ghorbani and   
              John H. L. Hansen   Domain Expansion for End-to-End Speech
                                  Recognition: Applications for
                                  Accent\slash Dialect Speech  . . . . . . 762--774
               Weidong Chen and   
               Xiaofen Xing and   
                Xiangmin Xu and   
               Jianxin Pang and   
                         Lan Du   SpeechFormer++: a Hierarchical Efficient
                                  Framework for Paralinguistic Speech
                                  Processing . . . . . . . . . . . . . . . 775--788
            Nicki Holighaus and   
     Günther Koliander and   
             Clara Hollomey and   
       Friedrich Pillichshammer   Grid-Based Decimation for Wavelet
                                  Transforms With Stably Invertible
                                  Implementation . . . . . . . . . . . . . 789--801
                 Weiwei Lin and   
                    Man-Wai Mak   Robust Speaker Verification Using Deep
                                  Weight Space Ensemble  . . . . . . . . . 802--812
                  Lin Zhang and   
                   Xin Wang and   
               Erica Cooper and   
             Nicholas Evans and   
              Junichi Yamagishi   The PartialSpoof Database and
                                  Countermeasures for the Detection of
                                  Short Fake Speech Segments Embedded in
                                  an Utterance . . . . . . . . . . . . . . 813--825
                    Jie Mei and   
                 Yufan Wang and   
                  Xinhui Tu and   
                  Ming Dong and   
                    Tingting He   Incorporating BERT With
                                  Probability-Aware Gate for Spoken
                                  Language Understanding . . . . . . . . . 826--834
             Tsubasa Ochiai and   
              Marc Delcroix and   
          Tomohiro Nakatani and   
                    Shoko Araki   Mask-Based Neural Beamforming for Moving
                                  Speakers With Self-Attention-Based
                                  Tracking . . . . . . . . . . . . . . . . 835--848
                 Rongzhi Gu and   
            Shi-Xiong Zhang and   
                Yuexian Zou and   
                        Dong Yu   Towards Unified All-Neural Beamforming
                                  for Time and Frequency Domain Speech
                                  Separation . . . . . . . . . . . . . . . 849--862
             Naotake Masuda and   
                  Daisuke Saito   Improving Semi-Supervised Differentiable
                                  Synthesizer Sound Matching for Practical
                                  Applications . . . . . . . . . . . . . . 863--875
              Erfan Loweimi and   
               Zhengjun Yue and   
                 Peter Bell and   
               Steve Renals and   
                Zoran Cvetkovic   Multi-Stream Acoustic Modelling Using
                                  Raw Real and Imaginary Parts of the
                                  Fourier Transform  . . . . . . . . . . . 876--890
        Bengt J. Borgström   A Generative Approach to Condition-Aware
                                  Score Calibration for Speaker
                                  Verification . . . . . . . . . . . . . . 891--901
Irene Martín-Morató and   
              Annamaria Mesaros   Strong Labeling of Sound Events Using
                                  Crowdsourced Weak Labels and Annotator
                                  Competence Estimation  . . . . . . . . . 902--914
                Wenzhao Zhu and   
                    Lei Luo and   
                 Jinwei Sun and   
Mads Græsbòll Christensen   A New Virtual Tracking Sub-Algorithm
                                  Based Hybrid Active Control System for
                                  Narrowband Noise With Impulsive
                                  Interference . . . . . . . . . . . . . . 915--926
            Thomas Deppisch and   
Sebasti\`a V. Amengual Garí and   
               Paul Calamia and   
                    Jens Ahrens   Direct and Residual Subspace
                                  Decomposition of Spatial Room Impulse
                                  Responses  . . . . . . . . . . . . . . . 927--942
               Eloi Moliner and   
        Vesa Välimäki   BEHM-GAN: Bandwidth Extension of
                                  Historical Music Using Generative
                                  Adversarial Networks . . . . . . . . . . 943--956
         Martin Jälmby and   
             Filip Elvander and   
           Toon van Waterschoot   Low-Rank Room Impulse Response
                                  Estimation . . . . . . . . . . . . . . . 957--969
                   Hong Liu and   
                Yucheng Cai and   
                 Zhenru Lin and   
                 Zhijian Ou and   
                   Yi Huang and   
                    Junlan Feng   Variational Latent-State GPT for
                                  Semi-Supervised Task-Oriented Dialog
                                  Systems  . . . . . . . . . . . . . . . . 970--984
                      De Hu and   
                 Qintuya Si and   
                    Rui Liu and   
                    Feilong Bao   Distributed Sensor Selection for Speech
                                  Enhancement With Acoustic Sensor
                                  Networks . . . . . . . . . . . . . . . . 985--999
                 Yingke Zhu and   
                      Brian Mak   Bayesian Self-Attentive Speaker
                                  Embeddings for Text-Independent Speaker
                                  Verification . . . . . . . . . . . . . . 1000--1012
                  Yuying Li and   
                 Yuchen Liu and   
           Donald S. Williamson   A Composite T60 Regression and
                                  Classification Approach for Speech
                                  Dereverberation  . . . . . . . . . . . . 1013--1023
                Hanyi Zhang and   
              Longbiao Wang and   
               Kong Aik Lee and   
                   Meng Liu and   
                Jianwu Dang and   
                     Helen Meng   Meta-Generalization for Domain-Invariant
                                  Speaker Verification . . . . . . . . . . 1024--1036
               Shu-Tong Niu and   
                     Jun Du and   
                    Lei Sun and   
                      Yu Hu and   
                   Chin-Hui Lee   QDM-SSD: Quality-Aware Dynamic Masking
                                  for Separation-Based Speaker Diarization 1037--1049
                 Boyang Lyu and   
               Chunxiao Fan and   
                   Yue Ming and   
                 Panzi Zhao and   
                      Nannan Hu   En-HACN: Enhancing Hybrid Architecture
                                  With Fast Attention and Capsule Network
                                  for End-to-end Speech Recognition  . . . 1050--1062
                   Yang Liu and   
                 Haoqin Sun and   
                 Wenbo Guan and   
                   Yuqi Xia and   
                 Yongwei Li and   
              Masashi Unoki and   
                      Zhen Zhao   A Discriminative Feature Representation
                                  Method Based on Cascaded Attention
                                  Network With Adversarial Strategy for
                                  Speech Emotion Recognition . . . . . . . 1063--1074
                  Hao Zhang and   
                 Nianwen Si and   
                  Yaqi Chen and   
               Wenlin Zhang and   
                 Xukui Yang and   
                     Dan Qu and   
                Wei-Qiang Zhang   Improving Speech Translation by
                                  Cross-Modal Multi-Grained Contrastive
                                  Learning . . . . . . . . . . . . . . . . 1075--1086
              Wei-Cheng Lin and   
                   Carlos Busso   Sequential Modeling by Leveraging
                                  Non-Uniform Distribution of Speech
                                  Emotion  . . . . . . . . . . . . . . . . 1087--1099
       Achyut Mani Tripathi and   
                  Om Jee Pandey   Divide and Distill: New Outlooks on
                                  Knowledge Distillation for Environmental
                                  Sound Classification . . . . . . . . . . 1100--1113
                  Hao Zhang and   
            Ashutosh Pandey and   
                  De Liang Wang   Low-Latency Active Noise Control Using
                                  Attentive Recurrent Network  . . . . . . 1114--1123
               Avital Bross and   
                  Sharon Gannot   Training-Based Multiple Source Tracking
                                  Using Manifold-Learning and Recursive
                                  Expectation-Maximization . . . . . . . . 1124--1140
                  Guimin Hu and   
                    Yi Zhao and   
                   Guangming Lu   Emotion Prediction Oriented Method With
                                  Multiple Supervisions for Emotion-Cause
                                  Pair Extraction  . . . . . . . . . . . . 1141--1152
           Reza Mohsenipour and   
          Daniel Massicotte and   
                   Wei-Ping Zhu   PI Control of Loudspeakers Based on
                                  Linear Fractional Order Model  . . . . . 1153--1162
            Tim Lübeck and   
          Johannes M. Arend and   
      Christoph Pörschmann   Spatial Upsampling of Sparse Spherical
                                  Microphone Array Signals . . . . . . . . 1163--1174
                Jiajun Deng and   
                 Xurong Xie and   
                Tianzi Wang and   
                 Mingyu Cui and   
                 Boyang Xue and   
                Zengrui Jin and   
                  Guinan Li and   
                  Shujie Hu and   
                    Xunying Liu   Confidence Score Based Speaker
                                  Adaptation of Conformer Speech
                                  Recognition Systems  . . . . . . . . . . 1175--1190
            Hongsheng Zhang and   
                Jizhang Gan and   
                   Ting Liu and   
                  Kui Huang and   
                      Hong Yang   Coefficients-Switched Normalized
                                  Least-Mean- Squares Adaption in Echo
                                  Canceler of Sparse-Echo-Path . . . . . . 1191--1199
                Eric Guizzo and   
              Tillman Weyde and   
          Simone Scardapane and   
             Danilo Comminiello   Learning Speech Emotion Representations
                                  in the Quaternion Domain . . . . . . . . 1200--1212
                  Jiaqi Bai and   
                    Ze Yang and   
                  Jian Yang and   
              Hongcheng Guo and   
                     Zhoujun Li   KINet: Incorporating Relevant Facts Into
                                  Knowledge-Grounded Dialog Generation . . 1213--1222
               Haiquan Zhao and   
                   Yuan Gao and   
                   Yingying Zhu   Robust Subband Adaptive Filter
                                  Algorithms-Based Mixture Correntropy and
                                  Application to Acoustic Echo
                                  Cancellation . . . . . . . . . . . . . . 1223--1233
                 Chen Zhang and   
       Luis Fernando D'Haro and   
               Qiquan Zhang and   
          Thomas Friedrichs and   
                     Haizhou Li   PoE: a Panel of Experts for Generalized
                                  Automatic Dialogue Assessment  . . . . . 1234--1250
                  Qing Wang and   
                     Jun Du and   
                 Hua-Xin Wu and   
                    Jia Pan and   
                    Feng Ma and   
                   Chin-Hui Lee   A Four-Stage Data Augmentation Approach
                                  to ResNet-Conformer Based Acoustic
                                  Modeling for Sound Event Localization
                                  and Detection  . . . . . . . . . . . . . 1251--1264
                 Yingwen Fu and   
                 Nankai Lin and   
                 Xiaohui Yu and   
                  Shengyi Jiang   Self-Training With Double Selectors for
                                  Low-Resource Named Entity Recognition    1265--1275
     Kilian Schulze-Forster and   
          Gaël Richard and   
                Liam Kelley and   
        Clement S. J. Doire and   
                  Roland Badeau   Unsupervised Music Source Separation
                                  Using Differentiable Parametric Source
                                  Models . . . . . . . . . . . . . . . . . 1276--1289
               Yinggang Liu and   
                    Hong Fu and   
                   Ying Wei and   
                  Hanbing Zhang   Sound Event Classification Based on
                                  Frequency-Energy Feature Representation
                                  and Two-Stage Data Dimension Reduction   1290--1304
                  Ege Erdem and   
          Zoran Cvetkovi\'c and   
Hüseyin Hacìhabibo\uglu   $3$D Perceptual Soundfield
                                  Reconstruction via Virtual Microphone
                                  Synthesis  . . . . . . . . . . . . . . . 1305--1317
               Dongyuan Shi and   
              Woon-Seng Gan and   
                   Bhan Lam and   
                    Xiaoyi Shen   A Frequency-Domain Output-Constrained
                                  Active Noise Control Algorithm Based on
                                  an Intuitive Circulant Convolutional
                                  Penalty Factor . . . . . . . . . . . . . 1318--1332
      Muhammed Zahid Ozturk and   
                 Chenshu Wu and   
                Beibei Wang and   
                     Min Wu and   
                  K. J. Ray Liu   RadioSES: mmWave-Based Audioradio Speech
                                  Enhancement and Separation System  . . . 1333--1347
              Jianwei Zhang and   
                 Julie Liss and   
           Suren Jayasuriya and   
                  Visar Berisha   Robust Vocal Quality Feature Embeddings
                                  for Dysphonic Voice Detection  . . . . . 1348--1359
            Ashutosh Pandey and   
                   DeLiang Wang   Attentive Training: a New Training
                                  Framework for Speech Enhancement . . . . 1360--1370
           Hirofumi Inaguma and   
               Tatsuya Kawahara   Alignment Knowledge Distillation for
                                  Online Streaming Attention-Based Speech
                                  Recognition  . . . . . . . . . . . . . . 1371--1385
     Mittapalle Kiran Reddy and   
                     Paavo Alku   Exemplar-Based Sparse Representations
                                  for Detection of Parkinson's Disease
                                  From Speech  . . . . . . . . . . . . . . 1386--1396
              Shunsuke Kita and   
             Yoshinobu Kajikawa   Sound Source Localization Inside a
                                  Structure Under Semi-Supervised
                                  Conditions . . . . . . . . . . . . . . . 1397--1408
                  Guowei Wu and   
                 Shipei Liu and   
                     Xiaoya Fan   The Power of Fragmentation: a
                                  Hierarchical Transformer Model for
                                  Structural Segmentation in Symbolic
                                  Music Generation . . . . . . . . . . . . 1409--1420
                 Xueqin Luo and   
             Gongping Huang and   
                   Jilu Jin and   
              Jingdong Chen and   
              Jacob Benesty and   
                  Wen Zhang and   
                Mengyao Zhu and   
                    Chunjian Li   Design of Maximum Directivity
                                  Beamformers With Linear Acoustic Vector
                                  Sensor Arrays  . . . . . . . . . . . . . 1421--1435
                 Ruchao Fan and   
                    Wei Chu and   
                 Peng Chang and   
                    Abeer Alwan   A CTC Alignment-Based Non-Autoregressive
                                  Transformer for End-to-End Automatic
                                  Speech Recognition . . . . . . . . . . . 1436--1448
                 Tianyou Li and   
                Siyuan Lian and   
                 Sipei Zhao and   
                    Jing Lu and   
                 Ian S. Burnett   Distributed Active Noise Control Based
                                  on an Augmented Diffusion FxLMS
                                  Algorithm  . . . . . . . . . . . . . . . 1449--1463
                Jiayuan Xie and   
                Wenhao Fang and   
              Qingbao Huang and   
                     Yi Cai and   
                       Tao Wang   Enhancing Paraphrase Question Generation
                                  With Prior Knowledge . . . . . . . . . . 1464--1475
                  Chen Chen and   
              Hansheng Hong and   
                    Jie Guo and   
                       Bin Song   Inter- Intra Modal Representation
                                  Augmentation With Trimodal Collaborative
                                  Disentanglement Network for Multimodal
                                  Sentiment Analysis . . . . . . . . . . . 1476--1488
                  Jian Yang and   
                  Yuwei Yin and   
                 Liqun Yang and   
                 Shuming Ma and   
              Haoyang Huang and   
             Dongdong Zhang and   
                   Furu Wei and   
                     Zhoujun Li   GTrans: Grouping and Fusing Transformer
                                  Layers for Neural Machine Translation    1489--1498
                     Xin Wu and   
                     Yi Cai and   
                 Zetao Lian and   
              Ho-fung Leung and   
                       Tao Wang   Generating Natural Language From Logic
                                  Expressions With Structural
                                  Representation . . . . . . . . . . . . . 1499--1510
                      Yi Li and   
                   Yang Sun and   
                 Wenwu Wang and   
              Syed Mohsen Naqvi   U-Shaped Transformer With Frequency-Band
                                  Aware Attention for Speech Enhancement   1511--1521
Christian Antoñanzas and   
              Miguel Ferrer and   
             Maria de Diego and   
               Alberto Gonzalez   Remote Microphone Technique for Active
                                  Noise Control Over Distributed Networks  1522--1535
                     Yi Zhu and   
            Abhishek Tiwari and   
       João Monteiro and   
          Shruti Kshirsagar and   
            Tiago Henrique Falk   COVID-19 Detection via Fusion of
                                  Modulation Spectrum and Linear
                                  Prediction Speech Features . . . . . . . 1536--1549
                   Jijie Li and   
                 Kai Shuang and   
                  Jinyu Guo and   
                 Zengyi Shi and   
                   Hongman Wang   Enhancing Semantic Relation
                                  Classification With Shortest Dependency
                                  Path Reasoning . . . . . . . . . . . . . 1550--1560
                 Mao-Kui He and   
                     Jun Du and   
              Qing-Feng Liu and   
                   Chin-Hui Lee   ANSD-MA-MSE: Adaptive Neural Speaker
                                  Diarization Using Memory-Aware
                                  Multi-Speaker Embedding  . . . . . . . . 1561--1573
                Longting Xu and   
                Jichen Yang and   
             Chang Huai You and   
               Xinyuan Qian and   
                    Daiyu Huang   Device Features Based on Linear
                                  Transformation With Parallel Training
                                  Data for Replay Speech Detection . . . . 1574--1586
               Huajian Fang and   
              Dennis Becker and   
             Stefan Wermter and   
                  Timo Gerkmann   Integrating Uncertainty Into Neural
                                  Network-Based Speech Enhancement . . . . 1587--1600
                   Libo Qin and   
                    Xiao Xu and   
                 Lehan Wang and   
                  Yue Zhang and   
                   Wanxiang Che   Modularized Pre-Training for End-to-End
                                  Task-Oriented Dialogue . . . . . . . . . 1601--1610
               Hanlei Zhang and   
                     Hua Xu and   
               Shaojie Zhao and   
                   Qianrui Zhou   Learning Discriminative Representations
                                  and Decision Boundaries for Open Intent
                                  Detection  . . . . . . . . . . . . . . . 1611--1623
             Guangsheng Bao and   
                      Yue Zhang   A General Contextualized Rewriting
                                  Framework for Text Summarization . . . . 1624--1635
           Christoph Kirsch and   
               Stephan D. Ewert   A Universal Filter Approximation of Edge
                                  Diffraction for Geometrical Acoustics    1636--1651
                Peyman Goli and   
              Steven van de Par   Deep Learning-Based Speech Specific
                                  Source Localization by Using Binaural
                                  and Monaural Microphone Arrays in
                                  Hearing Aids . . . . . . . . . . . . . . 1652--1666
          Nguyen Binh Thien and   
          Yukoh Wakabayashi and   
                 Kenta Iwai and   
              Takanobu Nishiura   Inter-Frequency Phase Difference for
                                  Phase Reconstruction Using Deep Neural
                                  Networks and Maximum Likelihood  . . . . 1667--1680
    Srikanth Raj Chetupalli and   
      Emanuël A. P. Habets   Speaker Counting and Separation From
                                  Single-Channel Noisy Mixtures  . . . . . 1681--1692
             Guangyan Zhang and   
                   Ying Qin and   
               Wenjie Zhang and   
                  Jialun Wu and   
                     Mei Li and   
                  Yutao Gai and   
               Feijun Jiang and   
                        Tan Lee   iEmoTTS: Toward Robust Cross-Speaker
                                  Emotion Transfer and Control for Speech
                                  Synthesis Based on Disentanglement
                                  Between Prosody and Timbre . . . . . . . 1693--1705
                 Ruijie Tao and   
               Kong Aik Lee and   
            Rohan Kumar Das and   
       Ville Hautamäki and   
                     Haizhou Li   Self-Supervised Training of Speaker
                                  Encoder With Multi-Modal Diverse
                                  Positive Pairs . . . . . . . . . . . . . 1706--1719
              Dongchao Yang and   
                 Jianwei Yu and   
                 Helin Wang and   
                   Wen Wang and   
                  Chao Weng and   
                Yuexian Zou and   
                        Dong Yu   Diffsound: Discrete Diffusion Model for
                                  Text-to-Sound Generation . . . . . . . . 1720--1733
       Paul Konstantin Krug and   
             Peter Birkholz and   
          Branislav Gerazov and   
 Daniel Rudolph van Niekerk and   
                    Anqi Xu and   
                          Yi Xu   Artificial Vocal Learning Guided by
                                  Phoneme Recognition and Visual
                                  Information  . . . . . . . . . . . . . . 1734--1744
              Qian-Bei Hong and   
             Chung-Hsien Wu and   
                  Hsin-Min Wang   Decomposition and Reorganization of
                                  Phonetic Information for Speaker
                                  Embedding Learning . . . . . . . . . . . 1745--1757
               Wenbin Jiang and   
                         Kai Yu   Speech Enhancement With Integration of
                                  Neural Homomorphic Synthesis and
                                  Spectral Masking . . . . . . . . . . . . 1758--1770
                 Shu'ang Li and   
                  Xuming Hu and   
                     Li Lin and   
                  Aiwei Liu and   
                  Lijie Wen and   
                   Philip S. Yu   A Multi-Level Supervised Contrastive
                                  Learning Framework for Low-Resource
                                  Natural Language Inference . . . . . . . 1771--1783
                 Xiaoqing Zheng   Building Conventional ``Experts'' With a
                                  Dialogue Logic Programming Language  . . 1784--1796
                 Haitao Lin and   
                 Junnan Zhu and   
                   Lu Xiang and   
                Feifei Zhai and   
                    Yu Zhou and   
               Jiajun Zhang and   
                 Chengqing Zong   Topic-Oriented Dialogue Summarization    1797--1810
                 Haohan Guo and   
               Fenglong Xie and   
                   Xixin Wu and   
             Frank K. Soong and   
                     Helen Meng   MSMC-TTS: Multi-Stage Multi-Codebook
                                  VQ-VAE Based Neural TTS  . . . . . . . . 1811--1824
                    Bei Liu and   
             Zhengyang Chen and   
                    Yanmin Qian   Depth-First Neural Architecture With
                                  Attentive Feature Fusion for Efficient
                                  Speaker Verification . . . . . . . . . . 1825--1838
                  Ria Ghosh and   
              John H. L. Hansen   Bilateral Cochlear Implant Processing of
                                  Coding Strategies With CCi-MOBILE, an
                                  Open-Source Research Platform  . . . . . 1839--1850
                Aolong Zhou and   
                  Wen Zhang and   
                  Guojun Xu and   
                Xiaoyong Li and   
                Kefeng Deng and   
                  Junqiang Song   DBSA-Net: Dual Branch Self-Attention
                                  Network for Underwater Acoustic Signal
                                  Denoising  . . . . . . . . . . . . . . . 1851--1865
                 Weiwei Lin and   
                    Man-Wai Mak   Model-Agnostic Meta-Learning for Fast
                                  Text-Dependent Speaker Embedding
                                  Adaptation . . . . . . . . . . . . . . . 1866--1876
             Andrea Galassi and   
                Marco Lippi and   
                  Paolo Torroni   Multi-Task Attentive Residual Networks
                                  for Argument Mining  . . . . . . . . . . 1877--1892
                     Yi Luo and   
                     Jianwei Yu   Music Source Separation With Band-Split
                                  RNN  . . . . . . . . . . . . . . . . . . 1893--1901
          Keisuke Matsubara and   
             Takuma Okamoto and   
          Ryoichi Takashima and   
          Tetsuya Takiguchi and   
                Tomoki Toda and   
                  Hisashi Kawai   Harmonic-Net: Fundamental Frequency and
                                  Speech Rate Controllable Fast Neural
                                  Vocoder  . . . . . . . . . . . . . . . . 1902--1915
                    Yi Zhou and   
                Zhizheng Wu and   
               Xiaohai Tian and   
                     Haizhou Li   Optimization of Cross-Lingual Voice
                                  Conversion With Linguistics Losses to
                                  Reduce Foreign Accents . . . . . . . . . 1916--1926
                Qiu-Shi Zhu and   
                  Jie Zhang and   
             Zi-Qiang Zhang and   
                    Li-Rong Dai   A Joint Speech Enhancement and
                                  Self-Supervised Representation Learning
                                  Framework for Noise-Robust Speech
                                  Recognition  . . . . . . . . . . . . . . 1927--1939
                   Siqi Sun and   
             Korin Richmond and   
                       Hao Tang   Improving Seq2Seq TTS Frontends With
                                  Transcribed Speech Audio . . . . . . . . 1940--1952
                Shih-Lun Wu and   
                  Yi-Hsuan Yang   MuseMorphose: Full-Song and Fine-Grained
                                  Piano Music Style Transfer With One
                                  Transformer VAE  . . . . . . . . . . . . 1953--1967
                Xiaoxue Gao and   
          Chitralekha Gupta and   
                     Haizhou Li   PoLyScriber: Integrated Fine-Tuning of
                                  Extractor and Lyrics Transcriber for
                                  Polyphonic Music . . . . . . . . . . . . 1968--1981
              Zhicheng Lian and   
               Haonan Cheng and   
                   Jiawan Zhang   PQG-A2SA: Performance Quantification
                                  Guided Audio-to-Score Alignment for
                                  Orchestral Music . . . . . . . . . . . . 1982--1992
                  Jingen Ni and   
             Ningning Zhang and   
                      Haofen Li   Sparsity-Promoting Affine Projection
                                  Algorithm With Periodically-Updated Gain
                                  Matrix and Its Performance Analysis  . . 1993--2003
              Orchisama Das and   
      Sebastian J. Schlecht and   
                   Enzo De Sena   Grouped Feedback Delay Networks With
                                  Frequency-Dependent Coupling . . . . . . 2004--2015
                Xudong Zhao and   
             Gongping Huang and   
              Jingdong Chen and   
                  Jacob Benesty   Design of $2$D and $3$D Differential
                                  Microphone Arrays With a Multistage
                                  Framework  . . . . . . . . . . . . . . . 2016--2031
                Jia-Hao Hsu and   
               Jeremy Chang and   
              Min-Hsueh Kuo and   
                 Chung-Hsien Wu   Empathetic Response Generation Based on
                                  Plug-and-Play Mechanism With Empathy
                                  Perturbation . . . . . . . . . . . . . . 2032--2042
                Aditya Dutt and   
                     Paul Gader   Wavelet Multiresolution Analysis Based
                                  Speech Emotion Recognition System Using
                                  $1$D CNN LSTM Networks . . . . . . . . . 2043--2054
             Arturo Morales and   
                Juan I. Yuz and   
      Juan P. Cortés and   
         Javier G. Fontanet and   
   Matías Zañartu   Glottal Airflow Estimation Using Neck
                                  Surface Acceleration and Low-Order
                                  Kalman Smoothing . . . . . . . . . . . . 2055--2066
                Yuya Hosoda and   
             Arata Kawamura and   
                   Youji Iiguni   Complex-Domain Pitch Estimation
                                  Algorithm for Narrowband Speech Signals  2067--2078
                Zhidong Liu and   
                  Junhui Li and   
                      Muhua Zhu   Alleviating Exposure Bias for Neural
                                  Machine Translation via Contextual
                                  Augmentation and Self Distillation . . . 2079--2089
              Hanan Beit-On and   
                 Tom Shlomo and   
                   Boaz Rafaely   Weighted Frequency Smoothing for
                                  Enhanced Speaker Localization  . . . . . 2090--2099
                   Shan Gao and   
                  Xihong Wu and   
                     Tianshu Qu   A Physical Model-Based Self-Supervised
                                  Learning Method for Signal Enhancement
                                  Under Reverberant Environment  . . . . . 2100--2110
                  Xue Jiang and   
               Xiulian Peng and   
                Huaying Xue and   
                 Yuan Zhang and   
                         Yan Lu   Latent-Domain Predictive Neural Speech
                                  Coding . . . . . . . . . . . . . . . . . 2111--2123
                Shumin Deng and   
              Jiacheng Yang and   
                 Hongbin Ye and   
                Chuanqi Tan and   
                 Mosha Chen and   
             Songfang Huang and   
                  Fei Huang and   
                Huajun Chen and   
                   Ningyu Zhang   LOGEN: Few-Shot Logical
                                  Knowledge-Conditioned Text Generation
                                  With Self-Training . . . . . . . . . . . 2124--2133
                Yuanzhi Liu and   
                     Min He and   
              Qingqing Yang and   
                  Gwanggil Jeon   An Unsupervised Framework With Attention
                                  Mechanism and Embedding Perturbed
                                  Encoder for Non-Parallel Text Sentiment
                                  Style Transfer . . . . . . . . . . . . . 2134--2144
                    Yang Ai and   
                  Zhen-Hua Ling   APNet: an All-Frame-Level Neural Vocoder
                                  Incorporating Direct Prediction of
                                  Amplitude and Phase Spectra  . . . . . . 2145--2157
                   Fei Zhao and   
                    Zhen Wu and   
                   Liang He and   
                     Xin-Yu Dai   Label-Correction Capsule Network for
                                  Hierarchical Text Classification . . . . 2158--2168
                Cem Subakan and   
            Mirco Ravanelli and   
            Samuele Cornell and   
    François Grondin and   
                   Mirko Bronzi   Exploring Self-Attention Mechanisms for
                                  Speech Separation  . . . . . . . . . . . 2169--2180
            Chenggang Zhang and   
               Jinjiang Liu and   
                     Hao Li and   
                 Xueliang Zhang   Neural Multi-Channel and
                                  Multi-Microphone Acoustic Echo
                                  Cancellation . . . . . . . . . . . . . . 2181--2192
                  Zheng Liu and   
                   Xin Kang and   
                       Fuji Ren   Dual-TBNet: Improving the Robustness of
                                  Speech Features via
                                  Dual-Transformer-BiLSTM for Speech
                                  Emotion Recognition  . . . . . . . . . . 2193--2203
              Sandro Cumani and   
                Salvatore Sarni   The Distributions of Uncalibrated
                                  Speaker Verification Scores: a
                                  Generative Model for Domain Mismatch and
                                  Trial-Dependent Calibration  . . . . . . 2204--2219
                      Xi Ai and   
                       Bin Fang   Cross-Modal Language Modeling in
                                  Multi-Motion-Informed Context for Lip
                                  Reading  . . . . . . . . . . . . . . . . 2220--2232
      Andreas Jonas Fuglsig and   
              Jesper Jensen and   
              Zheng-Hua Tan and   
Lars Sòndergaard Bertelsen and   
      Jens Christian Lindof and   
          Jan Òstergaard   Minimum Processing Near-End Listening
                                  Enhancement  . . . . . . . . . . . . . . 2233--2245
                 Zhiwen Xie and   
                 Runjie Zhu and   
                    Jin Liu and   
              Guangyou Zhou and   
            Jimmy Xiangji Huang   TARGAT: a Time-Aware Relational Graph
                                  Attention Model for Temporal Knowledge
                                  Graph Embedding  . . . . . . . . . . . . 2246--2258
              Cuilian Zhang and   
              Derek F. Wong and   
             Eddy S. K. Lei and   
                Runzhe Zhan and   
                  Lidia S. Chao   Obscurity-Quantified Curriculum Learning
                                  for Machine Translation Evaluation . . . 2259--2271
                  Yaxin Liu and   
                   Yan Zhou and   
                  Ziming Li and   
                Junlin Wang and   
                   Wei Zhou and   
                     Songlin Hu   HIM: an End-to-End Hierarchical
                                  Interaction Model for Aspect Sentiment
                                  Triplet Extraction . . . . . . . . . . . 2272--2285
          Yukoh Wakabayashi and   
              Kouei Yamaoka and   
                   Nobutaka Ono   Sound Field Interpolation for
                                  Rotation-Invariant Multichannel Array
                                  Signal Processing  . . . . . . . . . . . 2286--2298
  Jesper Kjær Nielsen and   
Mads Græsbòll Christensen and   
       Jesper Bünsow Boldt   An Analysis of Traditional Noise Power
                                  Spectral Density Estimators Based on the
                                  Gaussian Stochastic Volatility Model . . 2299--2313
Karen Gissell Rosero Jacome and   
     Felipe Leonel Grijalva and   
          Bruno Sanches Masiero   Sound Events Localization and Detection
                                  Using Bio-Inspired Gammatone Filters and
                                  Temporal Convolutional Neural Networks   2314--2324
                   Lin Yuan and   
              Guoheng Huang and   
                Fenghuan Li and   
              Xiaochen Yuan and   
                Chi-Man Pun and   
                      Guo Zhong   RBA-GCN: Relational Bilevel Aggregation
                                  Graph Convolutional Network for Emotion
                                  Recognition  . . . . . . . . . . . . . . 2325--2337
              Samuel Poirot and   
              Stefan Bilbao and   
            Mitsuko Aramaki and   
         Sòlvi Ystad and   
      Richard Kronland-Martinet   A Perceptually Evaluated Signal Model:
                                  Collisions Between a Vibrating Object
                                  and an Obstacle  . . . . . . . . . . . . 2338--2350
             Julius Richter and   
               Simon Welker and   
       Jean-Marie Lemercier and   
                Bunlong Lay and   
                  Timo Gerkmann   Speech Enhancement and Dereverberation
                                  With Diffusion-Based Generative Models   2351--2364
       Siarhei Y. Barysenka and   
             Vasili I. Vorobiov   SNR-Based Inter-Component Phase
                                  Estimation Using Bi-Phase Prior
                                  Statistics for Single-Channel Speech
                                  Enhancement  . . . . . . . . . . . . . . 2365--2381
              Jiandian Zeng and   
               Jiantao Zhou and   
                   Caishi Huang   Exploring Semantic Relations for Social
                                  Media Sentiment Analysis . . . . . . . . 2382--2394
         Fotios Drakopoulos and   
                 Sarah Verhulst   A Neural-Network Framework for the
                                  Design of Individualised Hearing-Loss
                                  Compensation . . . . . . . . . . . . . . 2395--2409
                  Xinbei Ma and   
            Zhuosheng Zhang and   
                       Hai Zhao   Enhanced Speaker-Aware Multi-Party
                                  Multi-Turn Dialogue Comprehension  . . . 2410--2423
               Tianrui Wang and   
                 Weibin Zhu and   
               Yingying Gao and   
               Shilei Zhang and   
                    Junlan Feng   Harmonic Attention for Monaural Speech
                                  Enhancement  . . . . . . . . . . . . . . 2424--2436
                    Lei Lei and   
               Guoshun Yuan and   
               Hongjiang Yu and   
                 Dewei Kong and   
                     Yuefeng He   Multilingual Customized Keyword Spotting
                                  Using Similar-Pair Contrastive Learning  2437--2447
                 Shaokai Li and   
                  Peng Song and   
                  Wenming Zheng   Multi-Source Discriminant Subspace
                                  Alignment for Cross-Domain Speech
                                  Emotion Recognition  . . . . . . . . . . 2448--2460
                 Yeqing Ren and   
               Haipeng Peng and   
                 Lixiang Li and   
               Xiaopeng Xue and   
                   Yang Lan and   
                    Yixian Yang   Generalized Voice Spoofing Detection via
                                  Integral Knowledge Amalgamation  . . . . 2461--2475
                  Xing Chen and   
                   Jie Wang and   
             Xiao-Lei Zhang and   
            Wei-Qiang Zhang and   
                     Kunde Yang   LMD: a Learnable Mask Network to Detect
                                  Adversarial Examples for Speaker
                                  Verification . . . . . . . . . . . . . . 2476--2490
               Benjamin Yen and   
               Yameizhen Li and   
                   Yusuke Hioka   Rotor Noise-Aware Noise Covariance
                                  Matrix Estimation for Unmanned Aerial
                                  Vehicle Audition . . . . . . . . . . . . 2491--2506
                Xuechen Liu and   
                   Xin Wang and   
              Md Sahidullah and   
                Jose Patino and   
      Héctor Delgado and   
              Tomi Kinnunen and   
       Massimiliano Todisco and   
          Junichi Yamagishi and   
             Nicholas Evans and   
            Andreas Nautsch and   
                   Kong Aik Lee   ASVspoof 2021: Towards Spoofed and
                                  Deepfake Speech Detection in the Wild    2507--2522
        Zalán Borsos and   
      Raphaël Marinier and   
             Damien Vincent and   
          Eugene Kharitonov and   
           Olivier Pietquin and   
               Matt Sharifi and   
             Dominik Roblek and   
             Olivier Teboul and   
             David Grangier and   
         Marco Tagliasacchi and   
                 Neil Zeghidour   AudioLM: a Language Modeling Approach to
                                  Audio Generation . . . . . . . . . . . . 2523--2533
                Xingfeng Li and   
                Xiaohan Shi and   
                 Desheng Hu and   
                 Yongwei Li and   
             Qingchen Zhang and   
              Zhengxia Wang and   
              Masashi Unoki and   
                   Masato Akagi   Music Theory-Inspired Acoustic
                                  Representation for Speech Emotion
                                  Recognition  . . . . . . . . . . . . . . 2534--2547
               Jiachen Lian and   
              Chunlei Zhang and   
   Gopala K. Anumanchipalli and   
                        Dong Yu   Unsupervised TTS Acoustic Modeling for
                                  TTS With Conditional Disentangled
                                  Sequential VAE . . . . . . . . . . . . . 2548--2557
              Arsalan Malik and   
              Nipun Agarwal and   
Harshavardhan Settibhaktini and   
    Ananthakrishna Chintanpalli   Predicting Level-Dependent Changes in
                                  Concurrent Vowel Scores Using the
                                  $2$D-CNN Models  . . . . . . . . . . . . 2558--2566
             Michael Krause and   
            Meinard Müller   Hierarchical Classification for
                                  Instrument Activity Detection in
                                  Orchestral Music Recordings  . . . . . . 2567--2578
                Julie Meyer and   
    Sebastian Prepeli\ct\ua and   
           Ali Khajeh-Saeed and   
            Michael Smirnov and   
                 Pablo Hoffmann   Verification on Head-Related Transfer
                                  Functions of a Snowman Model Simulated
                                  Using the Finite-Difference Time-Domain
                                  Method . . . . . . . . . . . . . . . . . 2579--2591
           Darius Petermann and   
             Gordon Wichern and   
Aswin Shanmugam Subramanian and   
             Zhong-Qiu Wang and   
               Jonathan Le Roux   Tackling the Cocktail Fork Problem for
                                  Separation and Transcription of
                                  Real-World Soundtracks . . . . . . . . . 2592--2605
                Hailong Cao and   
                   Liguo Li and   
                Conghui Zhu and   
                 Muyun Yang and   
                    Tiejun Zhao   Dual Word Embedding for Robust
                                  Unsupervised Bilingual Lexicon Induction 2606--2615
                   Lin Xiao and   
                  Pengyu Xu and   
              Mingyang Song and   
                Huafeng Liu and   
                Liping Jing and   
               Xiangliang Zhang   Triple Alliance Prototype Orthotist
                                  Network for Long-Tailed Multi-Label Text
                                  Classification . . . . . . . . . . . . . 2616--2628
                  Juhua Liu and   
              Qihuang Zhong and   
                 Liang Ding and   
                    Hua Jin and   
                      Bo Du and   
                    Dacheng Tao   Unified Instance and Knowledge Alignment
                                  Pretraining for Aspect-Based Sentiment
                                  Analysis . . . . . . . . . . . . . . . . 2629--2642
               Yiming Zhang and   
                    Hong Yu and   
                   Ruoyi Du and   
              Zheng-Hua Tan and   
                 Wenwu Wang and   
                  Zhanyu Ma and   
                      Yuan Dong   ACTUAL: Audio Captioning With Caption
                                  Feature Space Regularization . . . . . . 2643--2657
         Jakob Abeßer and   
          Sascha Grollmisch and   
            Meinard Müller   How Robust are Audio Embeddings for
                                  Polyphonic Sound Event Tagging?  . . . . 2658--2667
                    Wei Xia and   
              John H. L. Hansen   Attention and DCT Based Global Context
                                  Modeling for Text-Independent Speaker
                                  Recognition  . . . . . . . . . . . . . . 2668--2679
              Takuya Hasumi and   
          Tomohiko Nakamura and   
          Norihiro Takamune and   
         Hiroshi Saruwatari and   
            Daichi Kitamura and   
               Yu Takahashi and   
                 Kazunobu Kondo   PoP-IDLMA: Product-of-Prior Independent
                                  Deeply Learned Matrix Analysis for
                                  Multichannel Music Source Separation . . 2680--2694
                    Ben Liu and   
                   Jun Wang and   
                Guanyuan Yu and   
                   Shaolei Chen   CUPVC: a Constraint-Based Unsupervised
                                  Prosody Transfer for Improving Telephone
                                  Banking Services . . . . . . . . . . . . 2695--2706
                  Guinan Li and   
                Jiajun Deng and   
               Mengzhe Geng and   
                Zengrui Jin and   
                Tianzi Wang and   
                  Shujie Hu and   
                 Mingyu Cui and   
                 Helen Meng and   
                    Xunying Liu   Audio-Visual End-to-End Multi-Channel
                                  Speech Separation, Dereverberation and
                                  Recognition  . . . . . . . . . . . . . . 2707--2723
       Jean-Marie Lemercier and   
             Julius Richter and   
               Simon Welker and   
                  Timo Gerkmann   StoRM: a Diffusion-Based Stochastic
                                  Regeneration Model for Speech
                                  Enhancement and Dereverberation  . . . . 2724--2737
                  Yen-Ju Lu and   
              Chia-Yu Chang and   
                   Cheng Yu and   
             Ching-Feng Liu and   
             Jeih-weih Hung and   
            Shinji Watanabe and   
                        Yu Tsao   Improving Speech Enhancement Performance
                                  by Leveraging Contextual Broad Phonetic
                                  Class Information  . . . . . . . . . . . 2738--2750
                Sungjae Kim and   
                  Yewon Kim and   
                  Jewoo Jun and   
                     Injung Kim   MuSE-SVS: Multi-Singer Emotional Singing
                                  Voice Synthesizer That Controls
                                  Emotional Intensity  . . . . . . . . . . 2751--2764
                  Xinxin Su and   
                 Zhen Huang and   
              Yunxiang Zhao and   
                 Yifan Chen and   
                   Yong Dou and   
                    Hengyue Pan   Recent Trends in Deep Learning Based
                                  Textual Emotion Cause Extraction . . . . 2765--2786
                   Junyu Lu and   
                Hongfei Lin and   
              Xiaokun Zhang and   
                Zhaoqing Li and   
              Tongyue Zhang and   
                Linlin Zong and   
                Fenglong Ma and   
                          Bo Xu   Hate Speech Detection via Dual
                                  Contrastive Learning . . . . . . . . . . 2787--2795
     Diego Marques do Carmo and   
          Ricardo A. Borsoi and   
   Márcio Holsbach Costa   Closed-Form Solution to the Multichannel
                                  Wiener Filter With Interaural Level
                                  Difference Preservation  . . . . . . . . 2796--2811
               Ya-Jie Zhang and   
                 Chao Zhang and   
                   Wei Song and   
            Zhengchen Zhang and   
                Youzheng Wu and   
                    Xiaodong He   Prosody Modelling With Pre-Trained
                                  Cross-Utterance Representations for
                                  Improved Speech Synthesis  . . . . . . . 2812--2823
              Ching-Yu Chiu and   
        Meinard Müller and   
       Matthew E. P. Davies and   
            Alvin Wen-Yu Su and   
                  Yi-Hsuan Yang   Local Periodicity-Based Beat Tracking
                                  for Expressive Classical Piano Music . . 2824--2835
                  Feng Chen and   
                      Ke Ma and   
                 Yapeng Mao and   
                 Desen Yang and   
                   Yi Zhang and   
                    Jie Shi and   
                   Shiqi Mo and   
               Gui Chenyang and   
                        Song Li   A Novel Method to Design Steerable
                                  Differential Beamformer Using Linear
                                  Acoustics Vector Sensor Array  . . . . . 2836--2849
               Tianyu Huang and   
              Weisheng Dong and   
                Fangfang Wu and   
                     Xin Li and   
                  Guangming Shi   Uncertainty-Driven Knowledge
                                  Distillation for Language Model
                                  Compression  . . . . . . . . . . . . . . 2850--2858
    Andrés Carofilis and   
             Enrique Alegre and   
            Eduardo Fidalgo and   
  Laura Fernández-Robles   Improvement of Accent Classification
                                  Models Through Grad-Transfer From
                                  Spectrograms and Gradient-Weighted Class
                                  Activation Mapping . . . . . . . . . . . 2859--2871
             Jacob Hollebon and   
             Filippo Maria Fazi   Higher-Order Stereophony . . . . . . . . 2872--2885
          Jeremy H. M. Wong and   
               Huayun Zhang and   
                  Nancy F. Chen   Modelling Inter-Rater Uncertainty in
                                  Spoken Language Assessment . . . . . . . 2886--2898
              Qinghua Zheng and   
                  Yuefei Wu and   
              Guangtao Wang and   
               Yanping Chen and   
                     Wei Wu and   
                  Zai Zhang and   
                    Bin Shi and   
                        Bo Dong   Exploring Interactive and Contrastive
                                  Relations for Nested Named Entity
                                  Recognition  . . . . . . . . . . . . . . 2899--2909
               Dongyuan Shi and   
              Woon-Seng Gan and   
                   Bhan Lam and   
              Zhengding Luo and   
                    Xiaoyi Shen   Transferable Latent of CNN-Based
                                  Selective Fixed-Filter Active Noise
                                  Control  . . . . . . . . . . . . . . . . 2910--2921
           Dorian Desblancs and   
          Vincent Lostanlen and   
               Romain Hennequin   Zero-Note Samba: Self-Supervised Beat
                                  Tracking . . . . . . . . . . . . . . . . 2922--2934
                 Nankai Lin and   
                 Yingwen Fu and   
               Xiaotian Lin and   
                  Dong Zhou and   
                 Aimin Yang and   
                  Shengyi Jiang   CL-XABSA: Contrastive Learning for
                                  Cross-Lingual Aspect-Based Sentiment
                                  Analysis . . . . . . . . . . . . . . . . 2935--2946
                Hanmeng Liu and   
                   Jian Liu and   
                 Leyang Cui and   
               Zhiyang Teng and   
                   Nan Duan and   
                  Ming Zhou and   
                      Yue Zhang   LogiQA 2.0 --- an Improved Dataset for
                                  Logical Reasoning in Natural Language
                                  Understanding  . . . . . . . . . . . . . 2947--2962
                Jiangyan Yi and   
                Jianhua Tao and   
                   Ruibo Fu and   
                   Tao Wang and   
             Chu Yuan Zhang and   
                 Chenglong Wang   Adversarial Multi-Task Learning for
                                  Mandarin Prosodic Boundary Prediction
                                  With Multi-Modal Embeddings  . . . . . . 2963--2973
                Ji Won Yoon and   
             Hyung Yong Kim and   
             Hyeonseung Lee and   
               Sunghwan Ahn and   
                    Nam Soo Kim   Oracle Teacher: Leveraging Target
                                  Information for Better Knowledge
                                  Distillation of CTC Models . . . . . . . 2974--2987
                Sufeng Duan and   
                   Hai Zhao and   
                 Dongdong Zhang   Syntax-Aware Data Augmentation for
                                  Neural Machine Translation . . . . . . . 2988--2999
              Tongzheng Liu and   
                  Zhihua Lu and   
João Paulo J. da Costa and   
                        Tai Fei   A Hybrid Reverberation Model and Its
                                  Application to Joint Speech
                                  Dereverberation and Separation . . . . . 3000--3014
                 Junjun Guo and   
                  Junjie Ye and   
                  Yan Xiang and   
                    Zhengtao Yu   Layer-Level Progressive Transformer With
                                  Modality Difference Awareness for
                                  Multi-Modal Neural Machine Translation   3015--3026
                   Qian Tao and   
               Zhihao Xiong and   
                Bocheng Han and   
               Xiaoyang Fan and   
                        Lusi Li   A Novel Unsupervised Approach for
                                  Cross-Lingual Word Alignment in Low
                                  Isomorphic Embedding Spaces  . . . . . . 3027--3041
                   Jilu Jin and   
              Jacob Benesty and   
              Jingdong Chen and   
                 Gongping Huang   Differential Beamforming From a
                                  Geometric Perspective  . . . . . . . . . 3042--3054
      Alberto Palomo-Alonso and   
David Casillas-Pérez and   
Silvia Jiménez-Fernández and   
  Jose A. Portilla-Figueras and   
            Sancho Salcedo-Sanz   A Flexible Architecture Using Temporal,
                                  Spatial and Semantic Correlation-Based
                                  Algorithms for Story Segmentation of
                                  Broadcast News . . . . . . . . . . . . . 3055--3069
               Bolaji Yusuf and   
      Jan \vCernocký and   
          Murat Saraçlar   End-to-End Open Vocabulary Keyword
                                  Search With Multilingual Neural
                                  Representations  . . . . . . . . . . . . 3070--3080
              Adrian Herzog and   
    Srikanth Raj Chetupalli and   
      Emanuël A. P. Habets   AmbiSep: Joint Ambisonic-to-Ambisonic
                                  Speech Separation and Noise Reduction    3081--3094
                Po-chun Hsu and   
                Da-rong Liu and   
                Andy T. Liu and   
                    Hung-yi Lee   Parallel Synthesis for Autoregressive
                                  Speech Generation  . . . . . . . . . . . 3095--3111
           Siddharth Dalmia and   
             Dmytro Okhonko and   
                 Mike Lewis and   
              Sergey Edunov and   
            Shinji Watanabe and   
              Florian Metze and   
           Luke Zettlemoyer and   
            Abdelrahman Mohamed   LegoNN: Building Modular Encoder-Decoder
                                  Models . . . . . . . . . . . . . . . . . 3112--3126
                Tom Gajecki and   
                 Waldo Nogueira   Deep Latent Fusion Layers for Binaural
                                  Speech Enhancement . . . . . . . . . . . 3127--3138
                Huawen Feng and   
                 Zhenxi Lin and   
                      Qianli Ma   Perturbation-Based Self-Supervised
                                  Attention for Attention Bias in Text
                                  Classification . . . . . . . . . . . . . 3139--3151
               Jiaxin Zhong and   
                 Tao Zhuang and   
                Mengtong Li and   
                  Ray Kirby and   
             Mahmoud Karimi and   
                    Jing Lu and   
                     Dong Zhang   Sidelobe Suppression for a Steerable
                                  Parametric Source Using the Sparse
                                  Random Array Technique . . . . . . . . . 3152--3161
                   Yan Fang and   
                     Wei Lu and   
               Xiaodong Liu and   
             Witold Pedrycz and   
                    Qi Lang and   
                   Jianhua Yang   CircularE: a Complex Space Circular
                                  Correlation Relational Model for Link
                                  Prediction in Knowledge Graph Embedding  3162--3175
                  Jie Zhang and   
                    Rui Tao and   
                     Jun Du and   
                    Li-Rong Dai   SDW-SWF: Speech Distortion Weighted
                                  Single-Channel Wiener Filter for Noise
                                  Reduction  . . . . . . . . . . . . . . . 3176--3189
                 Haozhou Li and   
                 Qinke Peng and   
                     Xu Mou and   
                  Ying Wang and   
                Zeyuan Zeng and   
           Muhammad Fiaz Bashir   Abstractive Financial News Summarization
                                  via Transformer-BiLSTM Encoder and Graph
                                  Attention-Based Decoder  . . . . . . . . 3190--3205
                Weitao Yuan and   
              Shengbei Wang and   
              Jianming Wang and   
              Masashi Unoki and   
                     Wenwu Wang   Unsupervised Deep Unfolded
                                  Representation Learning for Singing
                                  Voice Separation . . . . . . . . . . . . 3206--3220
             Zhong-Qiu Wang and   
            Samuele Cornell and   
               Shukjae Choi and   
                Younglo Lee and   
            Byeong-Yeol Kim and   
                Shinji Watanabe   TF-GridNet: Integrating Full- and
                                  Sub-Band Modeling for Speech Separation  3221--3236
              Marvin Tammen and   
                    Simon Doclo   Parameter Estimation Procedures for Deep
                                  Multi-Frame MVDR Filtering for
                                  Single-Microphone Speech Enhancement . . 3237--3248
                     Yi Lin and   
              Qingyang Wang and   
                Xincheng Yu and   
               Zichen Zhang and   
                Dongyue Guo and   
                     Jizhe Zhou   Towards Recognition for Radio-Echo
                                  Speech in Air Traffic Control: Dataset
                                  and a Contrastive Learning Approach  . . 3249--3262
       Diego Caviedes-Nozal and   
         Efren Fernandez-Grande   Spatio-Temporal Bayesian Regression for
                                  Room Impulse Response Reconstruction
                                  With Spherical Waves . . . . . . . . . . 3263--3277
                   Xinyu Hu and   
                    Xiaojun Wan   RST Discourse Parsing as Text-to-Text
                                  Generation . . . . . . . . . . . . . . . 3278--3289
                   Shun Lei and   
                Yixuan Zhou and   
                Liyang Chen and   
                 Zhiyong Wu and   
                   Xixin Wu and   
                Shiyin Kang and   
                     Helen Meng   MSStyleTTS: Multi-Scale Style Modeling
                                  With Hierarchical Context Information
                                  for Expressive Speech Synthesis  . . . . 3290--3303
    Pedro Izquierdo Lehmann and   
    Rodrigo F. Cádiz and   
            Carlos A. Sing Long   Towards Maximizing a Perceptual \em
                                  Sweet Spot for Spatial Sound With
                                  Loudspeakers . . . . . . . . . . . . . . 3304--3319
                    Han Zhu and   
                 Dongji Gao and   
              Gaofeng Cheng and   
               Daniel Povey and   
             Pengyuan Zhang and   
                   Yonghong Yan   Alternative Pseudo-Labeling for
                                  Semi-Supervised Automatic Speech
                                  Recognition  . . . . . . . . . . . . . . 3320--3330
              Junqing Zhang and   
                 Liming Shi and   
Mads Græsbòll Christensen and   
                  Wen Zhang and   
                Lijun Zhang and   
                  Jingdong Chen   CGMM-Based Sound Zone Generation Using
                                  Robust Pressure Matching With ATF
                                  Perturbation Constraints . . . . . . . . 3331--3345
              Erfan Loweimi and   
          Andrea Carmantini and   
                 Peter Bell and   
               Steve Renals and   
                Zoran Cvetkovic   Phonetic Error Analysis Beyond Phone
                                  Error Rate . . . . . . . . . . . . . . . 3346--3361
               Runxuan Yang and   
                Yuyang Peng and   
                     Xiaolin Hu   A Fast High-Fidelity Source-Filter
                                  Vocoder With Lightweight Neural Modules  3362--3373
              Yuxiang Zhang and   
                    Zhuo Li and   
                  Jingze Lu and   
                    Hua Hua and   
               Wenchao Wang and   
                 Pengyuan Zhang   The Impact of Silence on Speech
                                  Anti-Spoofing  . . . . . . . . . . . . . 3374--3389
          Philippe Gonzalez and   
 Tommy Sonne Alstròm and   
                     Tobias May   Assessing the Generalization Gap of
                                  Learning-Based Speech Enhancement
                                  Systems in Noisy and Reverberant
                                  Environments . . . . . . . . . . . . . . 3390--3403
                    Ziyi Xu and   
                 Ziyue Zhao and   
                Tim Fingscheidt   Coded Speech Quality Measurement by a
                                  Non-Intrusive PESQ-DNN . . . . . . . . . 3404--3417
                     Tao Li and   
                  Chenxu Hu and   
                  Jian Cong and   
                  Xinfa Zhu and   
                 Jingbei Li and   
                  Qiao Tian and   
                Yuping Wang and   
                        Lei Xie   DiCLET-TTS: Diffusion Model Based
                                  Cross-Lingual Emotion Transfer for
                                  Text-to-Speech --- a Study Between
                                  English and Mandarin . . . . . . . . . . 3418--3430
                  Xuexin Xu and   
                  Liang Shi and   
               Xunquan Chen and   
               Pingyuan Lin and   
                   Jie Lian and   
                Jinhui Chen and   
              Zhihong Zhang and   
               Edwin R. Hancock   Any-to-Any Voice Conversion With
                                  Multi-Layer Speaker Adaptation and
                                  Content Supervision  . . . . . . . . . . 3431--3445
                Chenpeng Du and   
                  Yiwei Guo and   
                   Xie Chen and   
                         Kai Yu   Speaker Adaptive Text-to-Speech With
                                  Timbre-Normalized Vector-Quantized
                                  Feature  . . . . . . . . . . . . . . . . 3446--3456
            Yash Kumar Atri and   
               Vikram Goyal and   
             Tanmoy Chakraborty   Multi-Document Summarization Using
                                  Selective Attention Span and
                                  Reinforcement Learning . . . . . . . . . 3457--3467
              Maochun Huang and   
               Chunmei Qing and   
                Junpeng Tan and   
                    Xiangmin Xu   Context-Based Adaptive Multimodal Fusion
                                  Network for Continuous Frame-Level
                                  Sentiment Prediction . . . . . . . . . . 3468--3477
      Sebastian J. Schlecht and   
        Jon Fagerström and   
        Vesa Välimäki   Decorrelation in Feedback Delay Networks 3478--3487
                Jinliang Lu and   
                   Jiajun Zhang   Towards Unified Multi-Domain Machine
                                  Translation With Mixture of Domain
                                  Experts  . . . . . . . . . . . . . . . . 3488--3498
              Julien Hauret and   
             Thomas Joubaud and   
   Véronique Zimpfer and   
               Éric Bavu   Configurable EBEN: Extreme Bandwidth
                                  Extension Network to Enhance
                                  Body-Conducted Speech Capture  . . . . . 3499--3512
                 Wanli Peng and   
                   Sheng Li and   
              Zhenxing Qian and   
                  Xinpeng Zhang   Text Steganalysis Based on Hierarchical
                                  Supervised Learning and Dual Attention
                                  Mechanism  . . . . . . . . . . . . . . . 3513--3526
                     Lin Xu and   
                Qixian Zhou and   
                  Jinlan Fu and   
                   See-Kiong Ng   CET2: Modelling Topic Transitions for
                                  Coherent and Engaging Knowledge-Grounded
                                  Conversations  . . . . . . . . . . . . . 3527--3536
             Vincent W. Neo and   
            Christine Evers and   
              Stephan Weiss and   
              Patrick A. Naylor   Signal Compaction Using Polynomial EVD
                                  for Spherical Array Processing With
                                  Applications . . . . . . . . . . . . . . 3537--3549
              Gerald Enzner and   
                   Svantje Voit   Hybrid- Frequency-Resolution Adaptive
                                  Kalman Filter for Online Identification
                                  of Long Acoustic Responses With Low
                                  Input-Output Latency . . . . . . . . . . 3550--3563
                  Shang Gao and   
                Maoshen Jia and   
               Dingding Yao and   
                      Jing Wang   Multi-Source Localization Using
                                  Optimized Time-Frequency Representation
                                  and Sparsity Component Analysis  . . . . 3564--3578
                      Qi He and   
                Mingjie Gao and   
          Ka Fai Cedric Yiu and   
                  Sven Nordholm   Distributed Microphone Array
                                  Localization Problem via SDP-SOCP Method 3579--3588
             Hiroshi Sawada and   
           Rintaro Ikeshita and   
          Keisuke Kinoshita and   
              Tomohiro Nakatani   Multi-Frame Full-Rank Spatial Covariance
                                  Analysis for Underdetermined Blind
                                  Source Separation and Dereverberation    3589--3602
             Hongyang Chang and   
                 Hongfei Xu and   
         Josef van Genabith and   
                 Deyi Xiong and   
                   Hongying Zan   JoinER-BART: Joint Entity and Relation
                                  Extraction With Constrained Decoding,
                                  Representation Reuse and Fusion  . . . . 3603--3616
                Xinqi Huang and   
                Yingsong Li and   
             Yuriy Zakharov and   
              Yongchun Miao and   
                 Zhixiang Huang   Squared Sine Adaptive Algorithm and Its
                                  Performance Analysis . . . . . . . . . . 3617--3628
                  Andong Li and   
                 Guochen Yu and   
             Chengshi Zheng and   
                 Wenzhe Liu and   
                    Xiaodong Li   A General Unfolding Speech Enhancement
                                  Method Motivated by Taylor's Theorem . . 3629--3646
                     Bin Gu and   
                  Jie Zhang and   
                         Wu Guo   A Dynamic Convolution Framework for
                                  Session-Independent Speaker Embedding
                                  Learning . . . . . . . . . . . . . . . . 3647--3658
               Daojian Zeng and   
                  Chao Zhao and   
                 Chao Jiang and   
               Jianling Zhu and   
                    Jianhua Dai   Document-Level Relation Extraction With
                                  Context Guided Mention Integration and
                                  Inter-Pair Reasoning . . . . . . . . . . 3659--3666
                      Lu Li and   
                Maoshen Jia and   
                  Jing Wang and   
                    Ruiyuan Cao   Multiple-Speech-Source DOA Estimation
                                  Based on Single-Source Cluster Detection 3667--3680
              Xiaoxiao Miao and   
                   Xin Wang and   
               Erica Cooper and   
          Junichi Yamagishi and   
             Natalia Tomashenko   Speaker Anonymization Using Orthogonal
                                  Householder Neural Network . . . . . . . 3681--3695
              Zhengshan Xue and   
              Xiaolei Zhang and   
                Tingxun Shi and   
                     Deyi Xiong   DetTrans: a Lightweight Framework to
                                  Detect and Translate Noisy Inputs
                                  Simultaneously . . . . . . . . . . . . . 3696--3705
                  Chang Liu and   
              Zhen-Hua Ling and   
                  Ling-Hui Chen   Pronunciation Dictionary-Free
                                  Multilingual Speech Synthesis Using
                                  Learned Phonetic Representations . . . . 3706--3716
               Reo Yoneyama and   
                Yi-Chiao Wu and   
                    Tomoki Toda   High-Fidelity and Pitch-Controllable
                                  Neural Vocoder Based on Unified
                                  Source-Filter Networks . . . . . . . . . 3717--3729
           Stefan Thaleiser and   
                  Gerald Enzner   Binaural-Projection Multichannel Wiener
                                  Filter for Cue-Preserving Binaural
                                  Speech Enhancement . . . . . . . . . . . 3730--3745
                 Yixin Wang and   
                    Wei Wei and   
               Xiangming Gu and   
              Xiaohong Guan and   
                        Ye Wang   Disentangled Adversarial Domain
                                  Adaptation for Phonation Mode Detection
                                  in Singing and Speech  . . . . . . . . . 3746--3759
               Yixuan Zhang and   
                Heming Wang and   
                   DeLiang Wang   $ F0 $ Estimation and Voicing Detection
                                  With Cascade Architecture in Noisy
                                  Speech . . . . . . . . . . . . . . . . . 3760--3770
              Zhengdao Zhao and   
                 Yuhua Wang and   
                 Guang Shen and   
                  Yuezhu Xu and   
                  Jiayuan Zhang   TDFNet: Transformer-Based Deep-Scale
                                  Fusion Network for Multimodal Emotion
                                  Recognition  . . . . . . . . . . . . . . 3771--3782
          Johannes M. Arend and   
  Christoph Pörschmann and   
           Stefan Weinzierl and   
               Fabian Brinkmann   Magnitude-Corrected and Time-Aligned
                                  Interpolation of Head-Related Transfer
                                  Functions  . . . . . . . . . . . . . . . 3783--3799
                   Desh Raj and   
               Daniel Povey and   
              Sanjeev Khudanpur   SURT 2.0: Advances in Transducer-Based
                                  Multi-Talker Speech Recognition  . . . . 3800--3813
                 Jiaming An and   
               Zixiang Ding and   
                      Ke Li and   
                        Rui Xia   Global-View and Speaker-Aware Emotion
                                  Cause Extraction in Conversations  . . . 3814--3823
                  Yuqin Lin and   
              Longbiao Wang and   
               Yanbing Yang and   
                    Jianwu Dang   CFDRN: a Cognition-Inspired Feature
                                  Decomposition and Recombination Network
                                  for Dysarthric Speech Recognition  . . . 3824--3836
        Rémi Blandin and   
                Simon Stone and   
   Angélique Remacle and   
             Vincent Didone and   
                 Peter Birkholz   A Comparative Study of $3$D and $1$D
                                  Acoustic Simulations of the Higher
                                  Frequencies of Speech  . . . . . . . . . 3837--3847
                  Qing Wang and   
                  Jixun Yao and   
                   Li Zhang and   
              Pengcheng Guo and   
                        Lei Xie   Timbre-Reserved Adversarial Attack in
                                  Speaker Identification . . . . . . . . . 3848--3858
                  Yachao Li and   
                  Junhui Li and   
                 Jing Jiang and   
                 Shimin Tao and   
                   Hao Yang and   
                      Min Zhang   P-Transformer: Towards Better
                                  Document-to-Document Neural Machine
                                  Translation  . . . . . . . . . . . . . . 3859--3870
                   Chao Xie and   
                    Tomoki Toda   Noisy-to-Noisy Voice Conversion Under
                                  Variations of Noisy Condition  . . . . . 3871--3882
               Zhichao Wang and   
              Xinsheng Wang and   
                 Qicong Xie and   
                     Tao Li and   
                    Lei Xie and   
                  Qiao Tian and   
                    Yuping Wang   MSM-VC: High-Fidelity Source Style
                                  Transfer for Non-Parallel Voice
                                  Conversion by Multi-Scale Style Modeling 3883--3895
                 Yilin Zhao and   
                   Hai Zhao and   
                    Sufeng Duan   Multi-Grained Evidence Inference for
                                  Multi-Choice Reading Comprehension . . . 3896--3907
                 Ye-Qian Du and   
                  Jie Zhang and   
                   Xin Fang and   
                Ming-Hui Wu and   
                 Zhou-Wang Yang   A Semi-Supervised Complementary Joint
                                  Training Approach for Low-Resource
                                  Speech Recognition . . . . . . . . . . . 3908--3921
               Changheng Li and   
            Richard C. Hendriks   Alternating Least-Squares-Based
                                  Microphone Array Parameter Estimation
                                  for a Single-Source Reverberant and
                                  Noisy Acoustic Scenario  . . . . . . . . 3922--3934
                   Kun Zhou and   
              Yuanhang Zhou and   
             Wayne Xin Zhao and   
                    Ji-Rong Wen   Learning to Perturb for Contrastive
                                  Learning of Unsupervised Sentence
                                  Representations  . . . . . . . . . . . . 3935--3944
            Georg Götz and   
      Sebastian J. Schlecht and   
                   Ville Pulkki   Common-Slope Modeling of Late
                                  Reverberation  . . . . . . . . . . . . . 3945--3957
               Guanhua Chen and   
                Runzhe Zhan and   
              Derek F. Wong and   
                  Lidia S. Chao   Multi-Level Curriculum Learning for
                                  Multi-Turn Dialogue Generation . . . . . 3958--3967
             Yun-Yen Chuang and   
               Hung-Min Hsu and   
                  Kevin Lin and   
               Ray-I. Chang and   
                    Hung-Yi Lee   MetaEx-GAN: Meta Exploration to Improve
                                  Natural Language Generation via
                                  Generative Adversarial Networks  . . . . 3968--3980
               Chuxuan Tong and   
                   Xi Zheng and   
                 Jianhua Li and   
                 Xingjun Ma and   
              Longxiang Gao and   
                     Yong Xiang   Query-Efficient Black-Box Adversarial
                                  Attacks on Automatic Speech Recognition  3981--3992
                   Xixin Wu and   
                     Hui Lu and   
                     Kun Li and   
                 Zhiyong Wu and   
                Xunying Liu and   
                     Helen Meng   Hiformer: Sequence Modeling Networks
                                  With Hierarchical Attention Mechanisms   3993--4003
                  Ante Wang and   
               Linfeng Song and   
                 Lifeng Jin and   
                Junfeng Yao and   
                  Haitao Mi and   
                   Chen Lin and   
                 Jinsong Su and   
                        Dong Yu   D$^2$PSG: Multi-Party Dialogue Discourse
                                  Parsing as Sequence Generation . . . . . 4004--4013
                    Nan Gao and   
              Yongjian Wang and   
                  Peng Chen and   
                     Jijun Tang   Boosting Short Text Classification by
                                  Solving the OOV Problem  . . . . . . . . 4014--4024

IEEE\slash ACM Transactions on Audio, Speech, and Language Processing
Volume 32, Number ??, 2024

                 Jin Chu Wu and   
                Raghu N. Kacker   Statistical Analysis for Speaker
                                  Recognition Evaluation With Data
                                  Dependence and Three Score Distributions 1--14
               Yongwei Zhou and   
                 Junwei Bao and   
                Youzheng Wu and   
                Xiaodong He and   
                    Tiejun Zhao   Operation-Augmented Numerical Reasoning
                                  for Question Answering . . . . . . . . . 15--28
    Anurenjan Purushothaman and   
             Debottam Dutta and   
                Rohit Kumar and   
               Sriram Ganapathy   Speech Dereverberation With Frequency
                                  Domain Autoregressive Modeling . . . . . 29--38
                  Leyuan Qu and   
                  Taihao Li and   
            Cornelius Weber and   
      Theresa Pekarek-Rosin and   
                   Fuji Ren and   
                 Stefan Wermter   Disentangling Prosody Representations
                                  With Unsupervised Speech Reconstruction  39--54
      Mathias Bach Pedersen and   
  Sòren Holdt Jensen and   
              Zheng-Hua Tan and   
                  Jesper Jensen   Data-Driven Non-Intrusive Speech
                                  Intelligibility Prediction Using Speech
                                  Presence Probability . . . . . . . . . . 55--67
                 Yuanbo Hou and   
                    Bo Kang and   
            Andrew Mitchell and   
                 Wenwu Wang and   
                  Jian Kang and   
              Dick Botteldooren   Cooperative Scene-Event Modelling for
                                  Acoustic Scene Classification  . . . . . 68--82
             Xiaotong Jiang and   
                 Peiwen You and   
                  Chen Chen and   
             Zhongqing Wang and   
                   Guodong Zhou   Exploring Scope Detection for
                                  Aspect-Based Sentiment Analysis  . . . . 83--94
                  Xuenan Xu and   
                   Zeyu Xie and   
                 Mengyue Wu and   
                         Kai Yu   Beyond the Status Quo: a Contemporary
                                  Survey of Advances and Challenges in
                                  Audio Captioning . . . . . . . . . . . . 95--112
          Federico Miotello and   
              Mirco Pezzoli and   
            Luca Comanducci and   
            Fabio Antonacci and   
                  Augusto Sarti   Deep Prior-Based Audio Inpainting Using
                                  Multi-Resolution Harmonic Convolutional
                                  Neural Networks  . . . . . . . . . . . . 113--123
    Cristian-Lucian Stanciu and   
              Jacob Benesty and   
       Constantin Paleologu and   
      Ruxandra-Liana Costea and   
        Laura-Maria Dogariu and   
              Silviu Ciochin\ua   Decomposition-Based Wiener Filter Using
                                  the Kronecker Product and Conjugate
                                  Gradient Method  . . . . . . . . . . . . 124--138
                Huiyao Chen and   
                Yueheng Sun and   
              Meishan Zhang and   
                      Min Zhang   Automatic Noise Generation and Reduction
                                  for Text Classification  . . . . . . . . 139--150
                 Jiaming Xu and   
                   Jian Cui and   
                 Yunzhe Hao and   
                          Bo Xu   Multi-Cue Guided Semi-Supervised
                                  Learning Toward Target Speaker
                                  Separation in Real Environments  . . . . 151--163
                 Yang Xiang and   
Jesper Lisby Hòjvang and   
Morten Hòjfeldt Rasmussen and   
Mads Græsbòll Christensen   A Two-Stage Deep Representation
                                  Learning-Based Speech Enhancement Method
                                  Using Variational Autoencoder and
                                  Adversarial Training . . . . . . . . . . 164--177
                    Xiao Li and   
                 Ruirui Liu and   
              Huichou Huang and   
                     Qingyao Wu   Contrastive Learning for Target Speaker
                                  Extraction With Attention-Based Fusion   178--188
               Xiaobo Liang and   
                  Runze Mao and   
                   Lijun Wu and   
                  Juntao Li and   
                  Min Zhang and   
                        Qing Li   Enhancing Low-Resource NLP by
                                  Consistency Training With Data and Model
                                  Perturbations  . . . . . . . . . . . . . 189--199
                Haisheng Lu and   
             Jiangnan Liang and   
                     Chuang Shi   Comments on ``Primary-Ambient Extraction
                                  Using Ambient Spectrum Estimation for
                                  Immersive Spatial Audio Reproduction''   200--202
               Szymon Drgas and   
       Lars Bramslòw and   
          Archontis Politis and   
            Gaurav Naithani and   
                Tuomas Virtanen   Dynamic Processing Neural Network
                                  Architecture for Hearing Loss
                                  Compensation . . . . . . . . . . . . . . 203--214
        Femke B. Gelderblom and   
        Tron Vedul Tronstad and   
   Torbjòrn Svendsen and   
              Tor Andre Myrvoll   On the Predictive Power of Objective
                                  Intelligibility Metrics for the
                                  Subjective Performance of Deep Complex
                                  Convolutional Recurrent Speech
                                  Enhancement Networks . . . . . . . . . . 215--226
             Thomas Haubner and   
            Andreas Brendel and   
              Walter Kellermann   End-to-End Deep Learning-Based
                                  Adaptation Control for Linear Acoustic
                                  Echo Cancellation  . . . . . . . . . . . 227--238
             Congcong Jiang and   
                Tieyun Qian and   
                       Bing Liu   One General Teacher for Multi-Data
                                  Multi-Task: a New Knowledge Distillation
                                  Framework for Discourse Relation
                                  Analysis . . . . . . . . . . . . . . . . 239--249
        Khandokar Md. Nayem and   
           Donald S. Williamson   Attention-Based Speech Enhancement Using
                                  Human Quality Perception Modeling  . . . 250--260
                 Ying Zhang and   
               Fandong Meng and   
                Yufeng Chen and   
                   Jinan Xu and   
                       Jie Zhou   Complex Question Enhanced Transfer
                                  Learning for Zero-Shot Joint Information
                                  Extraction . . . . . . . . . . . . . . . 261--275
               Jingsong Yan and   
                    Piji Li and   
                Haibin Chen and   
               Junhao Zheng and   
                      Qianli Ma   Does the Order Matter? A Random
                                  Generative Way to Learn Label Hierarchy
                                  for Hierarchical Text Classification . . 276--285
   Georgios Paraskevopoulos and   
         Theodoros Kouzelis and   
          Georgios Rouvalis and   
      Athanasios Katsamanis and   
         Vassilis Katsouros and   
          Alexandros Potamianos   Sample-Efficient Unsupervised Domain
                                  Adaptation of Speech Recognition
                                  Systems: a Case Study for Modern Greek   286--299
            Ernesto Accolti and   
             Javier Gimenez and   
         Michael Vorländer   Uncertainties of Room Acoustics
                                  Simulation Due to Directivity Data of
                                  Musical Instruments  . . . . . . . . . . 300--309
           Yoshiki Masuyama and   
              Kouei Yamaoka and   
             Yuma Kinoshita and   
           Taishi Nakashima and   
                   Nobutaka Ono   Causal and Relaxed-Distortionless
                                  Response Beamforming for Online Target
                                  Source Extraction  . . . . . . . . . . . 310--324
         Rohit Prabhavalkar and   
               Takaaki Hori and   
            Tara N. Sainath and   
         Ralf Schlüter and   
                Shinji Watanabe   End-to-End Speech Recognition: a Survey  325--351
                   Yun Zhao and   
                   Dexi Liu and   
              Changxuan Wan and   
                 Xiping Liu and   
               Jian-yun Nie and   
                    Jiaming Liu   JMS-QA: a Joint Hierarchical
                                  Architecture for Mental Health Question
                                  Answering  . . . . . . . . . . . . . . . 352--363
                  Shiwen Ni and   
                  Jiawen Li and   
                   Min Yang and   
                    Hung-Yu Kao   DropAttack: a Random Dropped Weight
                                  Attack Adversarial Training for Natural
                                  Language Understanding . . . . . . . . . 364--373
               Tiantian Zhu and   
                   Yang Qin and   
                  Ming Feng and   
               Qingcai Chen and   
                 Baotian Hu and   
                     Yang Xiang   BioPRO: Context-Infused Prompt Learning
                                  for Biomedical Entity Linking  . . . . . 374--385
                 Jiapu Wang and   
                 Boyue Wang and   
                 Junbin Gao and   
                   Simin Hu and   
                  Yongli Hu and   
                     Baocai Yin   Multi-Level Interaction Based Knowledge
                                  Graph Completion . . . . . . . . . . . . 386--396
           Qiangqiang Zhang and   
               Dongyuan Lin and   
              Yingying Xiao and   
               Yunfei Zheng and   
                   Shiyuan Wang   Error Reused Filtered-$X$ Least Mean
                                  Square Algorithm for Active Noise
                                  Control  . . . . . . . . . . . . . . . . 397--412
                Zengrui Jin and   
               Mengzhe Geng and   
                Jiajun Deng and   
                Tianzi Wang and   
                  Shujie Hu and   
                  Guinan Li and   
                    Xunying Liu   Personalized Adversarial Data
                                  Augmentation for Dysarthric and Elderly
                                  Speech Recognition . . . . . . . . . . . 413--429
                   Jun Kong and   
                   Jin Wang and   
                   Xuejie Zhang   Adaptive Ensemble Self-Distillation With
                                  Consistent Gradients for Fast Inference
                                  of Pretrained Language Models  . . . . . 430--442
          Sr\dbaran Kiti\'c and   
     Jérôme Daniel   Blind Identification of Ambisonic
                                  Reduced Room Impulse Response  . . . . . 443--458
                 Qijie Shao and   
              Pengcheng Guo and   
                Jinghao Yan and   
                 Pengfei Hu and   
                        Lei Xie   Decoupling and Interacting Multi-Task
                                  Learning Network for Joint Speech and
                                  Accent Recognition . . . . . . . . . . . 459--470
                    Han Zhu and   
              Gaofeng Cheng and   
               Jindong Wang and   
                 Wenxin Hou and   
             Pengyuan Zhang and   
                   Yonghong Yan   Boosting Cross-Domain Speech Recognition
                                  With Self-Supervision  . . . . . . . . . 471--485
                  Yile Wang and   
                  Yue Zhang and   
                    Peng Li and   
                       Yang Liu   Gradual Syntactic Label Replacement for
                                  Language Model Pre-Training  . . . . . . 486--496
                 Penghui Ma and   
                Jianfeng Li and   
               Jingjing Pan and   
              Xiaofei Zhang and   
               Roberto Gil-Pita   Coherent Signal DOA Estimation With
                                  Coprime Array: Exploiting Signal
                                  Subspace Reconstructing Strategy . . . . 497--508
                 Emma Hamel and   
                  Nickvash Kani   Factors That Influence Automatic
                                  Recognition of African-American
                                  Vernacular English in Machine-Learning
                                  Models . . . . . . . . . . . . . . . . . 509--516
                 Jingbei Li and   
                   Sipan Li and   
                  Ping Chen and   
                Luwen Zhang and   
                    Yi Meng and   
                 Zhiyong Wu and   
                 Helen Meng and   
                  Qiao Tian and   
                Yuping Wang and   
                    Yuxuan Wang   Joint Multiscale Cross-Lingual Speaking
                                  Style Transfer With Bidirectional
                                  Attention Mechanism for Automatic
                                  Dubbing  . . . . . . . . . . . . . . . . 517--528
                   Bing Han and   
             Zhengyang Chen and   
                    Yanmin Qian   Self-Supervised Learning With
                                  Cluster-Aware-DINO for High-Performance
                                  Robust Speaker Verification  . . . . . . 529--541
             Kristina Tesch and   
                  Timo Gerkmann   Multi-Channel Speech Separation Using
                                  Spatially Selective Deep Non-Linear
                                  Filters  . . . . . . . . . . . . . . . . 542--553
               Hao-Chen Pei and   
                   Hao Fang and   
                    Xin Luo and   
                    Xin-Shun Xu   Gradformer: a Framework for Multi-Aspect
                                  Multi-Granularity Pronunciation
                                  Assessment . . . . . . . . . . . . . . . 554--563
              Garima Sharma and   
       Karthikeyan Umapathy and   
               Sridhar Krishnan   Time-Frequency Scattergrams for
                                  Biomedical Audio Signal Representation
                                  and Classification . . . . . . . . . . . 564--576
                  Zhibo Man and   
            Zengcheng Huang and   
                Yujie Zhang and   
                      Yu Li and   
              Yuanmeng Chen and   
                Yufeng Chen and   
                       Jinan Xu   WDSRL: Multi-Domain Neural Machine
                                  Translation With Word-Level
                                  Domain-Sensitive Representation Learning 577--590
               Chin-Po Chen and   
               Ho-Hsien Pan and   
         Susan Shur-Fen Gau and   
                   Chi-Chun Lee   Using Measures of Vowel Space for
                                  Autistic Traits Characterization . . . . 591--607
          Kevin Wilkinghoff and   
                    Frank Kurth   Why Do Angular Margin Losses Work Well
                                  for Semi-Supervised Anomalous Sound
                                  Detection? . . . . . . . . . . . . . . . 608--622
                  Aku Rouhe and   
  Tamás Grósz and   
                   Mikko Kurimo   Principled Comparisons for End-to-End
                                  Speech Recognition: Attention vs Hybrid
                                  at the $ 1000$-Hour Scale  . . . . . . . 623--638
                  Yile Wang and   
                      Yue Zhang   Lost in Context? On the Sense-Wise
                                  Variance of Contextualized Word
                                  Embeddings . . . . . . . . . . . . . . . 639--650
             Christoph Hold and   
               Ville Pulkki and   
          Archontis Politis and   
                  Leo McCormack   Compression of Higher-Order Ambisonic
                                  Signals Using Directional Audio Coding   651--665
               Shouhui Wang and   
                       Biao Qin   A Novel Joint Training Model for
                                  Knowledge Base Question Answering  . . . 666--679
                 Songbin Li and   
               Jingang Wang and   
                   Peng Liu and   
                         Ke Shi   SANet: a Compressed Speech Encoder and
                                  Steganography Algorithm Independent
                                  Steganalysis Deep Neural Network . . . . 680--690
                Tarek Kanan and   
         Amani AbedAlghafer and   
              Shadi AlZu'bi and   
             Bilal Hawashin and   
                Ala Mughaid and   
             Ghassan Kanaan and   
              M. M. Kamruzzaman   An Intelligent Health Care System for
                                  Detecting Drug Abuse in Social Media
                                  Platforms Based on Low Resource Language 691--703
  Alejandro Santorum Varela and   
        Svetlana Stoyanchev and   
               Simon Keizer and   
            Rama Doddipatla and   
                     Kate Knill   Entity Resolution in Situated Dialog
                                  With Unimodal and Multimodal
                                  Transformers . . . . . . . . . . . . . . 704--713
                   Huang He and   
                     Hua Lu and   
                   Siqi Bao and   
                   Fan Wang and   
                     Hua Wu and   
               Zheng-Yu Niu and   
                   Haifeng Wang   Learning to Select External Knowledge
                                  With Multi-Scale Negative Sampling . . . 714--720
                     Hua Lu and   
                   Zhen Guo and   
                Chanjuan Li and   
                 Yunyi Yang and   
                   Huang He and   
                       Siqi Bao   Towards Building an Open-Domain Dialogue
                                  System Incorporated With Internet Memes  721--726
                Jungwoo Lim and   
               Taesun Whang and   
                Dongyub Lee and   
                   Heuiseok Lim   Adaptive Multi-Domain Dialogue State
                                  Tracking on Spoken Conversations . . . . 727--732
               David Thulke and   
                Nico Daheim and   
           Christian Dugast and   
                    Hermann Ney   Task-Oriented Document-Grounded Dialog
                                  Systems by HLTPR@RWTH for DSTC9 and
                                  DSTC10 . . . . . . . . . . . . . . . . . 733--741
                     Han Wu and   
                     Kun Xu and   
                     Linqi Song   Structure-Aware Dialogue Modeling
                                  Methods for Conversational Semantic Role
                                  Labeling . . . . . . . . . . . . . . . . 742--752
                   Zhe Chen and   
              Hongcheng Liu and   
                        Yu Wang   DialogMCF: Multimodal Context Flow for
                                  Audio Visual Scene-Aware Dialog  . . . . 753--764
           Koichiro Yoshino and   
              Yun-Nung Chen and   
                 Paul Crook and   
              Satwik Kottur and   
                 Jinchao Li and   
          Behnam Hedayatnia and   
             Seungwhan Moon and   
              Zhengcong Fei and   
                  Zekang Li and   
              Jinchao Zhang and   
                  Yang Feng and   
                   Jie Zhou and   
               Seokhwan Kim and   
                   Yang Liu and   
                     Di Jin and   
      Alexandros Papangelis and   
     Karthik Gopalakrishnan and   
          Dilek Hakkani-Tur and   
            Babak Damavandi and   
          Alborz Geramifard and   
                Chiori Hori and   
                 Ankit Shah and   
                 Chen Zhang and   
                 Haizhou Li and   
          João Sedoc and   
             Luis F. D'Haro and   
              Rafael Banchs and   
             Alexander Rudnicky   Overview of the Tenth Dialog System
                                  Technology Challenge: DSTC10 . . . . . . 765--778
        Shekhar Kumar Yadav and   
               Nithin V. George   Joint Dereverberation and Beamforming
                                  With Blind Estimation of the Shape
                                  Parameter of the Desired Source Prior    779--793
                Yanxiong Li and   
             Zhongjie Jiang and   
              Qisheng Huang and   
               Wenchang Cao and   
                     Jialong Li   Lightweight Speaker Verification Using
                                  Transformation Module With Feature
                                  Partition and Fusion . . . . . . . . . . 794--806
                  Yuhan Dai and   
               Zhirui Zhang and   
                  Yichao Du and   
               Shengcai Liu and   
                  Lemao Liu and   
                        Tong Xu   Datastore Distillation for Nearest
                                  Neighbor Machine Translation . . . . . . 807--817
                Changtao Li and   
                Feiran Yang and   
                       Jun Yang   A Two-Stage Approach to Quality
                                  Restoration of Bone-Conducted Speech . . 818--829
                   Jie Zhou and   
               Yuanbiao Lin and   
                   Qin Chen and   
                   Qi Zhang and   
             Xuanjing Huang and   
                       Liang He   CausalABSC: Causal Inference for Aspect
                                  Debiasing in Aspect-Based Sentiment
                                  Classification . . . . . . . . . . . . . 830--840
                 Ruiying Lu and   
                    Bo Chen and   
                 Dandan Guo and   
             Dongsheng Wang and   
                  Mingyuan Zhou   Hierarchical Topic-Aware Contextualized
                                  Transformers . . . . . . . . . . . . . . 841--852
                  Yaru Zhao and   
                   Bo Cheng and   
                Yakun Huang and   
                     Zhiguo Wan   FluGCF: a Fluent Dialogue Generation
                                  Model With Coherent Concept Entity Flow  853--867
              Changhao Ding and   
                Zhangjie Fu and   
            Zhongliang Yang and   
                      Qi Yu and   
                   Daqiu Li and   
                 Yongfeng Huang   Context-Aware Linguistic Steganography
                                  Model Based on Neural Machine
                                  Translation  . . . . . . . . . . . . . . 868--878
            Zainab Alhakeem and   
                 Se-In Jang and   
                  Hong-Goo Kang   Disentangled Representations in
                                  Local-Global Contexts for Arabic Dialect
                                  Identification . . . . . . . . . . . . . 879--890
               Jae-Hong Lee and   
                Joon-Hyuk Chang   Partitioning Attention Weight:
                                  Mitigating Adverse Effect of Incorrect
                                  Pseudo-Labels for Self-Supervised ASR    891--905
                 Ryo Fukuda and   
            Katsuhito Sudoh and   
               Satoshi Nakamura   Improving Speech Translation Accuracy
                                  and Time Efficiency With Fine-Tuned
                                  wav2vec 2.0-Based Speech Segmentation    906--916
            Seong-Gyun Leem and   
             Daniel Fulford and   
         Jukka-Pekka Onnela and   
                 David Gard and   
                   Carlos Busso   Selective Acoustic Feature Enhancement
                                  for Speech Emotion Recognition With
                                  Noisy Speech . . . . . . . . . . . . . . 917--929
        Alexander Bohlender and   
                 Ann Spriet and   
               Wouter Tirry and   
                   Nilesh Madhu   Spatially Selective Speaker Separation
                                  Using a DNN With a Location Dependent
                                  Feature Extraction . . . . . . . . . . . 930--945
                 Matan Karo and   
               Arie Yeredor and   
                 Itshak Lapidot   Compact Time-Domain Representation for
                                  Logical Access Spoofed Audio . . . . . . 946--958
                  Or Berebi and   
              Zamir Ben-Hur and   
             David Lou Alon and   
                   Boaz Rafaely   Analysis and Design of Head-Tracked
                                  Compensation for Bilateral Ambisonics    959--972
                   Wei Wang and   
                    Yanmin Qian   Universal Cross-Lingual Data Generation
                                  for Low Resource ASR . . . . . . . . . . 973--983
              Davide Berghi and   
           Philip J. B. Jackson   Leveraging Visual Supervision for
                                  Array-Based Active Speaker Detection and
                                  Localization . . . . . . . . . . . . . . 984--995
   Daniel Aleksander Krause and   
Guillermo García-Barrios and   
          Archontis Politis and   
              Annamaria Mesaros   Binaural Sound Source Distance
                                  Estimation and Localization for a Moving
                                  Listener . . . . . . . . . . . . . . . . 996--1011
              Seung-Bin Kim and   
              Sang-Hoon Lee and   
              Ha-Yeong Choi and   
                 Seong-Whan Lee   Audio Super-Resolution With Robust
                                  Speech Representation Learning of Masked
                                  Autoencoder  . . . . . . . . . . . . . . 1012--1022
           Omer Musa Battal and   
               Aykut Koç   Automatic Construction of Sememe
                                  Knowledge Bases From Machine Readable
                                  Dictionaries . . . . . . . . . . . . . . 1023--1035
              Varun Krishna and   
                  Tarun Sai and   
               Sriram Ganapathy   Representation Learning With Hidden Unit
                                  Clustering for Low Resource Speech
                                  Applications . . . . . . . . . . . . . . 1036--1047
              Zhengding Luo and   
               Dongyuan Shi and   
              Woon-Seng Gan and   
                    Qirui Huang   Delayless Generative Fixed-Filter Active
                                  Noise Control Based on Deep Learning and
                                  Bayesian Filter  . . . . . . . . . . . . 1048--1060
                  Zewen Chi and   
                Heyan Huang and   
                 Luyang Liu and   
                     Yu Bai and   
                Xiaoyan Gao and   
                  Xian-Ling Mao   Can Pretrained English Language Models
                                  Benefit Non-English NLP Systems in
                                  Low-Resource Scenarios?  . . . . . . . . 1061--1074
                    Rui Liu and   
                   Yifan Hu and   
                 Haolin Zuo and   
                Zhaojie Luo and   
              Longbiao Wang and   
                   Guanglai Gao   Text-to-Speech for Low-Resource
                                  Agglutinative Language With
                                  Morphology-Aware Language Model
                                  Pre-Training . . . . . . . . . . . . . . 1075--1087
                  Shu Jiang and   
                  Zuchao Li and   
                   Hai Zhao and   
                   Weiping Ding   Entity-Relation Extraction as Full
                                  Shallow Semantic Dependency Parsing  . . 1088--1099
                 Yoav Vered and   
                Stephen Elliott   A Parallel Analog and Digital Adaptive
                                  Feedforward Controller for Active Noise
                                  Control  . . . . . . . . . . . . . . . . 1100--1108
               Puning Zhang and   
              Rongjian Zhao and   
                 Boran Yang and   
                 Yuexian Li and   
                   Zhigang Yang   Integrated Syntactic and Semantic Tree
                                  for Targeted Sentiment Classification
                                  Using Dual-Channel Graph Convolutional
                                  Network  . . . . . . . . . . . . . . . . 1109--1124
                    Xu Wang and   
               Hainan Zhang and   
                 Shuai Zhao and   
              Hongshen Chen and   
                Zhuoye Ding and   
                 Zhiguo Wan and   
                   Bo Cheng and   
                     Yanyan Lan   Debiasing Counterfactual Context With
                                  Causal Inference for Multi-Turn Dialogue
                                  Reasoning  . . . . . . . . . . . . . . . 1125--1132
            Hoang Ngoc Chau and   
               Tien Dat Bui and   
            Huu Binh Nguyen and   
       Thanh Thi Hien Duong and   
              Quoc Cuong Nguyen   A Novel Approach to Multi-Channel Speech
                                  Enhancement Based on Graph Neural
                                  Networks . . . . . . . . . . . . . . . . 1133--1144
                  Yuchen Hu and   
                  Chen Chen and   
                 Qiushi Zhu and   
                 Eng Siong Chng   Wav2code: Restore Clean Speech
                                  Representations via Codebook Lookup for
                                  Noise-Robust ASR . . . . . . . . . . . . 1145--1156
               Tetsuya Ueda and   
          Tomohiro Nakatani and   
           Rintaro Ikeshita and   
          Keisuke Kinoshita and   
                Shoko Araki and   
                   Shoji Makino   Blind and Spatially-Regularized Online
                                  Joint Optimization of Source Separation,
                                  Dereverberation, and Noise Reduction . . 1157--1172
             Vibhav Agarwal and   
               Sourav Ghosh and   
           Harichandana BSS and   
             Himanshu Arora and   
         Barath Raj Kandur Raja   TrICy: Trigger-Guided Data-to-Text
                                  Generation With Intent Aware
                                  Attention-Copy . . . . . . . . . . . . . 1173--1184
        Christoph Boeddeker and   
Aswin Shanmugam Subramanian and   
             Gordon Wichern and   
       Reinhold Haeb-Umbach and   
               Jonathan Le Roux   TS-SEP: Joint Diarization and Separation
                                  Conditioned on Estimated Speaker
                                  Embeddings . . . . . . . . . . . . . . . 1185--1197
             Reza Varzandeh and   
                Simon Doclo and   
                 Volker Hohmann   Speech-Aware Binaural DOA Estimation
                                  Utilizing Periodicity and Spatial
                                  Features in Convolutional Neural
                                  Networks . . . . . . . . . . . . . . . . 1198--1213
         Yigitcan Özer and   
            Meinard Müller   Source Separation of Piano Concertos
                                  Using Musically Motivated Augmentation
                                  Techniques . . . . . . . . . . . . . . . 1214--1225
               Lior Frenkel and   
           Shlomo E. Chazan and   
               Jacob Goldberger   Domain Adaptation Using Suitable Pseudo
                                  Labels for Speech Enhancement and
                                  Dereverberation  . . . . . . . . . . . . 1226--1236
                Jiahao Zhao and   
                  Wenji Mao and   
              Daniel Dajun Zeng   Disentangled Text Representation
                                  Learning With Information-Theoretic
                                  Perspective for Adversarial Robustness   1237--1247
                  Dong Zhou and   
                   Fang Lei and   
                     Lin Li and   
               Yongmei Zhou and   
                     Aimin Yang   Cross-Modal Interaction via
                                  Reinforcement Feedback for Audio-Lyrics
                                  Retrieval  . . . . . . . . . . . . . . . 1248--1260
                Xuechen Liu and   
              Md Sahidullah and   
               Kong Aik Lee and   
                  Tomi Kinnunen   Generalizing Speaker Verification for
                                  Spoof Awareness in the Embedding Space   1261--1273
                 Shiyao Cui and   
               Jiangxia Cao and   
                   Xin Cong and   
               Jiawei Sheng and   
                Quangang Li and   
                Tingwen Liu and   
                    Jinqiao Shi   Enhancing Multimodal Entity and Relation
                                  Extraction With Variational Information
                                  Bottleneck . . . . . . . . . . . . . . . 1274--1285
                 Yizhou Tan and   
                  Haojun Ai and   
               Shengchen Li and   
               Mark D. Plumbley   Acoustic Scene Classification Across
                                  Cities and Devices via Feature
                                  Disentanglement  . . . . . . . . . . . . 1286--1297
             Orel Ben Zaken and   
               Anurag Kumar and   
         Vladimir Tourbabin and   
                   Boaz Rafaely   Neural- Network-Based
                                  Direction-of-Arrival Estimation for
                                  Reverberant Speech --- The Importance of
                                  Energetic, Temporal, and Spatial
                                  Information  . . . . . . . . . . . . . . 1298--1309
            Changsheng Quan and   
                     Xiaofei Li   SpatialNet: Extensively Learning Spatial
                                  Information for Multichannel Joint
                                  Speech Separation, Denoising and
                                  Dereverberation  . . . . . . . . . . . . 1310--1323
               Matthew Baas and   
                  Herman Kamper   Disentanglement in a GAN for
                                  Unconditional Speech Synthesis . . . . . 1324--1335
                    Xian Li and   
                  Nian Shao and   
                     Xiaofei Li   Self-Supervised Audio Teacher-Student
                                  Transformer for Both Clip-Level and
                                  Frame-Level Tasks  . . . . . . . . . . . 1336--1351
                 Yifan Chen and   
              Gaofeng Cheng and   
                Runyan Yang and   
             Pengyuan Zhang and   
                   Yonghong Yan   Interrelate Training and Clustering for
                                  Online Speaker Diarization . . . . . . . 1352--1364
                 Sheng Feng and   
               Xiaoqian Zhu and   
                     Shuqing Ma   Masking Hierarchical Tokens for
                                  Underwater Acoustic Target Recognition
                                  With Self-Supervised Learning  . . . . . 1365--1379
              Yangyang Zhao and   
                    Kai Yin and   
                Zhenyu Wang and   
              Mehdi Dastani and   
                    Shihan Wang   Decomposed Deep $Q$-Network for Coherent
                                  Task-Oriented Dialogue Policy Learning   1380--1391
             Jayneel Parekh and   
             Sanjeel Parekh and   
         Pavlo Mozharovskyi and   
          Gaël Richard and   
    Florence d'Alché-Buc   Tackling Interpretability in Audio
                                  Classification Networks With
                                  Non-negative Matrix Factorization  . . . 1392--1405
               Xiuying Chen and   
                   Shen Gao and   
                 Mingzhe Li and   
               Qingqing Zhu and   
                    Xin Gao and   
               Xiangliang Zhang   Write Summary Step-by-Step: a Pilot
                                  Study of Stepwise Summarization  . . . . 1406--1415
               Changkai Lin and   
               Hongju Cheng and   
                  Qiang Rao and   
                      Yang Yang   M$^3$SA: Multimodal Sentiment Analysis
                                  Based on Multi-Scale Feature Extraction
                                  and Multi-Task Learning  . . . . . . . . 1416--1429
             Rui-Chen Zheng and   
                    Yang Ai and   
                  Zhen-Hua Ling   Incorporating Ultrasound Tongue Images
                                  for Audio-Visual Speech Enhancement  . . 1430--1444
             Ritujoy Biswas and   
             Karan Nathwani and   
                  Vinayak Abrol   Statistically Guided Near-End Speech
                                  Intelligibility Improvement Through
                                  Voice Transformation and Transfer
                                  Learning . . . . . . . . . . . . . . . . 1445--1456
                 Linhui Sun and   
                  Shuo Yuan and   
                 Aifei Gong and   
                     Lei Ye and   
                 Eng Siong Chng   Dual-Branch Modeling Based on
                                  State-Space Model for Speech Enhancement 1457--1467
            Alkis Koudounas and   
              Eliana Pastor and   
         Giuseppe Attanasio and   
            Vittorio Mazzia and   
              Manuel Giollo and   
             Thomas Gueudre and   
                Elisa Reale and   
              Luca Cagliero and   
              Sandro Cumani and   
             Luca de Alfaro and   
              Elena Baralis and   
                Daniele Amberti   Towards Comprehensive Subgroup
                                  Performance Analysis in Speech Models    1468--1480
              Wenmeng Xiong and   
              Changchun Bao and   
                  Jing Zhou and   
                Maoshen Jia and   
           José Picheral   Joint DOA Estimation and Dereverberation
                                  Based on Multi-Channel Linear Prediction
                                  Filtering and Azimuth Sparsity . . . . . 1481--1493
              Yehav Alkaher and   
                   Israel Cohen   Howling Detection and Gain Control for
                                  Speech Reinforcement in a Noisy Car
                                  Cabin Environment  . . . . . . . . . . . 1494--1505
                  Xinfa Zhu and   
                     Yi Lei and   
                     Tao Li and   
              Yongmao Zhang and   
               Hongbin Zhou and   
                    Heng Lu and   
                        Lei Xie   METTS: Multilingual Emotional
                                  Text-to-Speech by Cross-Speaker and
                                  Cross-Lingual Emotion Transfer . . . . . 1506--1518
            Myeonghun Jeong and   
                Minchan Kim and   
            Byoung Jin Choi and   
                Jaesam Yoon and   
                   Won Jang and   
                    Nam Soo Kim   Transfer Learning for Low-Resource,
                                  Multi-Lingual, and Zero-Shot
                                  Multi-Speaker Text-to-Speech . . . . . . 1519--1530
                  Jiadi Yao and   
                   Hong Luo and   
                     Jun Qi and   
                 Xiao-Lei Zhang   Interpretable Spectrum Transformation
                                  Attacks to Speaker Recognition Systems   1531--1545
                 Xiang Chen and   
                     Lei Li and   
                   Yuqi Zhu and   
                Shumin Deng and   
                Chuanqi Tan and   
                  Fei Huang and   
                     Luo Si and   
               Ningyu Zhang and   
                    Huajun Chen   Sequence Labeling as Non-Autoregressive
                                  Dual-Query Set Generation  . . . . . . . 1546--1558
                    Lei Liu and   
                     Li Liu and   
                     Haizhou Li   Computation and Parameter Efficient
                                  Multi-Modal Fusion Transformer for Cued
                                  Speech Recognition . . . . . . . . . . . 1559--1572
Adrián Barahona-Ríos and   
                    Tom Collins   NoiseBandNet: Controllable Time-Varying
                                  Neural Synthesis of Sound Effects Using
                                  Filterbanks  . . . . . . . . . . . . . . 1573--1585
                Siyuan Wang and   
                Zhongyu Wei and   
                 Jiarong Xu and   
                 Taishan Li and   
                     Zhihao Fan   Unifying Structure Reasoning and
                                  Language Pre-Training for Complex
                                  Reasoning Tasks  . . . . . . . . . . . . 1586--1595
                 Yijing Chu and   
                 Sipei Zhao and   
                   Feng Niu and   
             Yongzheng Dong and   
                    Yuezhe Zhao   A New Diffusion Filtered-$X$ Affine
                                  Projection Algorithm: Performance
                                  Analysis and Application in Windy
                                  Environment  . . . . . . . . . . . . . . 1596--1608
                  Yuquan Le and   
                   Zhe Quan and   
                Jiawei Wang and   
                     Da Cao and   
                       Kenli Li   $ R^2 $: a Novel Recall & Ranking
                                  Framework for Legal Judgment Prediction  1609--1622
             Xiaotong Jiang and   
                 Ruirui Bai and   
             Zhongqing Wang and   
                   Guodong Zhou   Cross-Domain Aspect-Based Sentiment
                                  Classification With Tripartite Graph
                                  Modeling . . . . . . . . . . . . . . . . 1623--1635
             Zhengyang Chen and   
                   Bing Han and   
                 Shuai Wang and   
                    Yanmin Qian   Attention-Based Encoder-Decoder
                                  End-to-End Neural Diarization With
                                  Embedding Enhancer . . . . . . . . . . . 1636--1649
              Chenfeng Miao and   
               Qingying Zhu and   
              Minchuan Chen and   
                     Jun Ma and   
               Shaojun Wang and   
                      Jing Xiao   EfficientTTS 2: Variational End-to-End
                                  Text-to-Speech Synthesis and Voice
                                  Conversion . . . . . . . . . . . . . . . 1650--1661
                Orel Peretz and   
                   Israel Cohen   Constant Elevation-Beamwidth Beamforming
                                  With Concentric Ring Arrays  . . . . . . 1662--1672
                Zhibin Quan and   
               Chi-Man Vong and   
                 Weili Zeng and   
                    Wankou Yang   The MorPhEMe Machine: an Addressable
                                  Neural Memory for Learning
                                  Knowledge-Regularized Deep
                                  Contextualized Chinese Embedding . . . . 1673--1686
                 Lijian Gao and   
                 Qirong Mao and   
                      Ming Dong   On Local Temporal Embedding for
                                  Semi-Supervised Sound Event Detection    1687--1698
                Xuehao Zhou and   
             Mingyang Zhang and   
                    Yi Zhou and   
                Zhizheng Wu and   
                     Haizhou Li   Accented Text-to-Speech Synthesis With
                                  Limited Data . . . . . . . . . . . . . . 1699--1711
           Vinay Kothapally and   
              John H. L. Hansen   Monaural Speech Dereverberation Using
                                  Deformable Convolutional Networks  . . . 1712--1723
                Taihui Wang and   
                Feiran Yang and   
                       Jun Yang   Multichannel Linear Prediction-Based
                                  Speech Dereverberation Considering
                                  Sparse and Low-Rank Priors . . . . . . . 1724--1735
            Saurabh Kataria and   
      Jesús Villalba and   
Laureano Moro-Velázquez and   
            Piotr \.Zelasko and   
                    Najim Dehak   Time-Domain Speech Super-Resolution With
                                  GAN Based Modeling for Telephony Speaker
                                  Verification . . . . . . . . . . . . . . 1736--1749
             Marco Olivieri and   
                Amy Bastine and   
              Mirco Pezzoli and   
            Fabio Antonacci and   
        Thushara Abhayapala and   
                  Augusto Sarti   Acoustic Imaging With Circular
                                  Microphone Array: a New Approach for
                                  Sound Field Analysis . . . . . . . . . . 1750--1761
                Tengfei Liu and   
                  Yongli Hu and   
                 Junbin Gao and   
                Yanfeng Sun and   
                     Baocai Yin   Hierarchical Multi-Granularity
                                  Interaction Graph Convolutional Network
                                  for Long Document Classification . . . . 1762--1775
          Douglas O'Shaughnessy   Review of Methods for Automatic Speaker
                                  Verification . . . . . . . . . . . . . . 1776--1789
          Etienne Thuillier and   
               Craig T. Jin and   
        Vesa Välimäki   HRTF Interpolation Using a Spherical
                                  Neural Process Meta-Learner  . . . . . . 1790--1802
                   Xun Gong and   
                      Yu Wu and   
                   Jinyu Li and   
                 Shujie Liu and   
                   Rui Zhao and   
                   Xie Chen and   
                    Yanmin Qian   Advanced Long-Content Speech Recognition
                                  With Factorized Neural Transducer  . . . 1803--1815
           Yoshiki Masuyama and   
              Kouei Yamaoka and   
             Takao Kawamura and   
                   Nobutaka Ono   Efficient Joint Optimization of Sampling
                                  Rate Offsets Using Entire Multichannel
                                  Signal . . . . . . . . . . . . . . . . . 1816--1828
              Takaaki Saeki and   
                Soumi Maiti and   
                 Xinjian Li and   
            Shinji Watanabe and   
       Shinnosuke Takamichi and   
             Hiroshi Saruwatari   Text-Inductive Graphone-Based Language
                                  Adaptation for Low-Resource Speech
                                  Synthesis  . . . . . . . . . . . . . . . 1829--1844
               Yingming Gao and   
             Peter Birkholz and   
                          Ya Li   Articulatory Copy Synthesis Based on the
                                  Speech Synthesizer VocalTractLab and
                                  Convolutional Recurrent Neural Networks  1845--1858
       Théo Mariotte and   
            Anthony Larcher and   
    Silvio Montrésor and   
               Jean-Hugh Thomas   Channel-Combination Algorithms for
                                  Robust Distant Voice Activity and
                                  Overlapped Speech Detection  . . . . . . 1859--1872
     Luciana M. X. de Souza and   
     Márcio H. Costa and   
           Renata Coelho Borges   Envelope-Based Multichannel Noise
                                  Reduction for Cochlear Implant
                                  Applications . . . . . . . . . . . . . . 1873--1884
                 Linjian Li and   
                     Yi Cai and   
                         Xin Wu   Unsupervised Disentanglement Learning
                                  Model for Exemplar-Guided Paraphrase
                                  Generation . . . . . . . . . . . . . . . 1885--1900
                  Amir Ivry and   
               Israel Cohen and   
                 Baruch Berdugo   A User-Centric Approach for Deep
                                  Residual-Echo Suppression in Double-Talk 1901--1914
                 Geng Zhang and   
                    Jin Liu and   
              Guangyou Zhou and   
               Kunsong Zhao and   
                 Zhiwen Xie and   
                       Bo Huang   Question-Directed Reasoning With
                                  Relation-Aware Graph Attention Network
                                  for Complex Question Answering Over
                                  Knowledge Graph  . . . . . . . . . . . . 1915--1927
                     Yu Yao and   
                  Peng Yang and   
             Guangzhen Zhao and   
                    Guoshun Yin   KGAgent: Learning a Deep Reinforced
                                  Agent for Keyphrase Generation . . . . . 1928--1940
                 Jiahong Li and   
                  Chenda Li and   
                   Yifei Wu and   
                    Yanmin Qian   Unified Cross-Modal Attention: Robust
                                  Audio-Visual Speech Recognition and
                                  Beyond . . . . . . . . . . . . . . . . . 1941--1953
             Mieszko Fra\'s and   
               Konrad Kowalczyk   Reverberant Source Separation Using NTF
                                  With Delayed Subsources and Spatial
                                  Priors . . . . . . . . . . . . . . . . . 1954--1967
                   Rui Wang and   
                      Li Li and   
                    Tomoki Toda   Dual-Channel Target Speaker Extraction
                                  Based on Conditional Variational
                                  Autoencoder and Directional Information  1968--1979