Speech Signal Processing Workshop is an annual event held by the Association for Computational Linguistics and Chinese Language Processing (ACLCLP). The meeting has featured distinguished experts and scholars from around the world, and this year’s invited speakers include:
Dr. Tan LEE, The Chinese Univerisity of Hong Kong, Hong Kong
Dr. Yun-Nung Chen , National Taipei University, Taiwan
Dr. Chi-Te Wang, Far Eastern Memorial Hospital, Taiwan
Dr. Fei Chen, Southern University of Science and Technology of China, China
Dr. Yen-Fu Cheng, Taipei Veterans General Hospial, Taiwan
Dr. Hantao Huan, Mediatek, Taiwan
Dr. Syu-Siang Wang, Research Center for Information Tecgnology Innovation, Academia Sinica
The workshop will cover a wide variety of research topics in speech signal processing. No matter from industry or academia, if you are interested in speech processing, natural language processing or music processing, this workshop is an event that you should not miss.
Because of the COVID-19 pandemic and the potential impacts, we have organized online workshop to ensure the well-being of all participants. (Guidelines) Join us to experience the grand event!
應用視覺與聲學標註之深度編碼多模技術於影像描述生成(第 1 年) - Jui-Feng Yeh Professor, NCYU
Spoken Multiple-Choice Question Answering Using Multimodal Convolutional Neural Networks - Kuan-Yu Chen Professor, NTUSR
Towards Unsupervised Learning:Achievement on Unsupervised Speech Recognition - Hung-Yi Lee Professor, EE, NTU
基於深層神經網路感知模型的雙耳聽覺場景分析模型(1/3) - Taishih Chi Professor, ECE, NCTU
EEG-based Wavelet Analysis for Epilepsy Detection on Stroke Patients - Lung-Hao Lee Professor, EE, NCU
銀髮族口語互動式居家陪伴及推薦系統 - Chung-Hsien Wu Professor, CSIE, NCKU
Development of Text-to-Speech System Based on Deep Learning Technologies - Chen-Yu Chiang Assistant Professor, CE, NTU
基於深度學習之進階語音致能應用開發 - 多語言電視與廣播節目自動文字轉寫、摘要擷取,語料庫建立與內容檢索 - Yuan-Fu Liao Associate Professor, NTUT
Voice Conversion and its Applications - Prof. Hsin-Min Wang, Academia Sinica
研發新穎的目標函數及模型簡化技術於深度學習之語音增強系統 - Prof. Yu Tsao, Research Center for Information Technology Innovation, Academia Sinica
06/19 Registration and payment.
08/02 Registration deadline.
08/03 Payment deadline.
08/07 SWS 2020 !
Department of Biomedical Engineering, National Yang-Ming University
Association for Computational Linguistics and Chinese Language Processing
Research Center for Information Technology Innovation, Academia Sinica
Department of Information Technology & Communication, Shih Chien University
Ministry of Science and Technology
Tan Lee is currently an Associate Professor at the Department of Electronic Engineering, the Chinese University of Hong Kong (CUHK). He has been working on speech and language related research for over 20 years. His research covers spoken language technologies, speech enhancement and separation, audio and music processing, speech and language rehabilitation, and neurological basis of speech and language. He led the effort on developing Cantonese-focused spoken language technologies that have been widely licensed for industrial applications. His current work is focused on applying signal processing and machine learning methods to atypical speech and language that are related to different kinds of human communication and cognitive disorders. He is an Associate Editor for the IEEE/ACM Transactions on Audio, Speech, and Language Processing and the EURASIP Journal on Advances in Signal Processing. He is the Vice Chair of ISCA Special Interest Group of Chinese Spoken Language Processing, and an Area Chair in the Technical Programme Committees of INTERSPEECH 2014, 2016 and 2018.
Yun-Nung (Vivian) Chen is currently an assistant professor in the Department of Computer Science & Information Engineering at National Taiwan University. She earned her Ph.D. degree from Carnegie Mellon University, where her research interests focus on spoken dialogue systems, language understanding, natural language processing, and multimodality. She received Google Faculty Research Awards, MOST Young Scholar Fellowship, FAOS Young Scholar Innovation Award, Student Best Paper Awards, and the Distinguished Master Thesis Award. Prior to joining National Taiwan University, she worked in the Deep Learning Technology Center at Microsoft Research Redmond.
Dr. Syu Siang Wang received the Ph.D. degree (2018) in the Graduate Institute of Communication Engineering, National Taiwan University. The topic of his Ph.D. research is on wavelet speech enhancement and feature compression. He won the PhD Thesis Award at ACLCLP. In addition, he gained twice opportunities to be an summer intern in National Institute of Information and Communications Technology, Japan, in Sep. 2015 and Department of Electrical and Electronic Engineering, SUSTC, China in Jun. 2016.
From August 2018 to July 2019, he was the postdoctoral researcher in MOST Joint Research Center for AI Technology and All Vista Healthcare, where he engaged in research on developing algorithm for healthcare applications . Several papers were published based on his research achievements.
Currently, he is the postdoctoral researcher in the Research Center for Information Technology Innovation, Academia Sinica. His research interests include speech and speaker recognition, acoustic modeling, audio-coding, and bio-signal processing.
Fei Chen received the B.Sc. and M.Phil. degrees from the Department of Electronic Science and Engineering, Nanjing University in 1998 and 2001, respectively, and the Ph.D. degree from the Department of Electronic Engineering, The Chinese University of Hong Kong in 2005. He continued his research as post-doctor and senior research fellow in University of Texas at Dallas and The University of Hong Kong, and joined Southern University of Science and Technology (SUSTech) as a faculty in 2014. Dr. Chen is leading the speech processing research group in SUSTech, with research focus on speech perception, speech intelligibility modeling, speech enhancement, and assistive hearing technology. He published over 80 journal papers and over 80 conference papers in IEEE journals/conferences, Interspeech, Journal of Acoustical Society of America, etc. He received the best presentation award in the 9th Asia Pacific Conference of Speech, Language and Hearing, and 2011 National Organization for Hearing Research Foundation Research Awards in States. Dr. Chen is now serving as associate editor/editorial member of "Frontiers in Psychology" "Biomedical Signal Processing and Control" "Physiological Measurement".
Yen-Fu Cheng is a surgeon-scientist at the Department of Medical Research and director of research and attending doctor at the Department of Otolaryngology-Head and Neck Surgery, Taipei Veterans General Hospital. He is also an adjunct assistant professor of Institute of Brain Science/Faculty of Medicine, National Yang-Ming University. He is currently the Principal Investigator of the Laboratory of Auditory Physiology and Genetic Medicine.
Yen-Fu’s research focuses on auditory neuroscience and clinical otology. For basic research, he is dedicated in applying cutting edge gene transfer and gene editing methods to understand and develop therapy for inner ear disorders. For clinical research, he is interested in using state-of-the-art methods to approach clinical otology issues, such as next-generation sequencing for genetic medicine and artificial-intelligence for hearing-related diseases.
Yen-Fu received his medical degree from Taipei Medical University, and doctoral degree from Massachusetts Institute of Technology, where he studied Speech and Hearing Bioscience and Technology at the Harvard-MIT Division of Health Sciences and Technology. He was a post-doctoral research fellow at Harvard Medical School prior he started his lab at VGH-TPE/NYMU.
Hantao Huang (S’14) received the B.S. and Ph.D. degrees from the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, in 2013 and 2018, respectively. Since 2018, he has been a Staff Engineer with MediaTek, Singapore, where he is involved in natural language processing algorithms, neural network compression and quantization for edge devices. His current research interests include speech recognition, machine-learning algorithms, and low power systems.
Dr. Chi-Te Wang received his MD degree from the National Taiwan University, Taipei, Taiwan, in 2003. After resident training from 2003 to 2008, he joined Far Eastern Memorial Hospital as an attending physician. He received PhD degree from the Institute of Epidemiology and Preventive Medicine at National Taiwan University in 2014. During his professional carrier, he visited Mount Sinai Hospital (NYC, 2009), Mayo Clinic (Arizona, 2012), Isshiki voice center (Kyoto, 2015), UC Davis voice and swallow center (Sacramento, 2018), and UCSF voice and swallow center (San Francisco, 2018) for continual exposure on the expertise practice. He is a corresponding member of the American Laryngological Society and member of councils on the Taiwan Otolaryngological Society and Taiwan Voice Society. He has a wide clinical and academic interest, and has published a dozen papers on different fields, including phonosurgery, automatic detection and classification of voice disorders, real time monitoring of phonation, and telepractice. He is the inventor of multiple international patents on voice detection, classification, and treatments. He co-hosted Big Data Cup Challenge on 2018 and 2019 IEEE International Conference on Big Data. He is the winner of Society for Promotion of International Oto-Rhino-Laryngology (SPIO) Award on 2015, Best Synergy Award of Far Eastern Group on 2018, and National Innovation Award of Taiwan in 2019.
Time | Topic | Speaker | Host |
---|---|---|---|
08:30 - 09:00 | Check-in | ||
09:00 - 09:10 | Opening Remarks | NYMU President, Steve, H. S. NCTU President, Kuo Chen, Sin-Horng |
- |
09:10 - 09:50 | Deep Learning Approaches to Automatic Assessment of Speech and Language Impairment | Prof. Tan Lee | Prof. Hsin-Min Wang |
09:50 - 10:00 | Intermission | ||
10:00 - 10:40 | Towards Superhuman Conversational AI | Prof. Yun-Nung (Vivian) Chen | Prof. Yu Tsao |
10:40 - 10:50 | Intermission | ||
10:50 - 11:30 | Single- and multi-channel Speech Enhancement System | Dr. Syu Siang Wang | Dr. Li L P-H |
11:30 - 13:30 | MOST Outcomes Presentation | ||
13:30 - 14:10 | Minimum Acoustic Information Required for an Intelligible Speech | Prof. Fei Chen | Prof. Chi-Chun (Jeremy) Lee |
14:10 - 14:20 | Intermission | ||
14:20 - 15:00 | A New Era of Otology and Hearing Research: NGS, CRISPR, App, AI and Beyond | Dr. Yen-Fu Cheng | Dr. Wen-Huei Liao |
15:00 - 15:10 | Intermission | ||
15:10 - 15:50 | Make a Power-efficient Voice UI on Edge Devices |
Dr. Hantao Huang | Prof. Yuan-Fu Liao |
15:50 - 16:00 | Intermission | ||
16:00 - 16:40 | Ambulatory Phonation Monitoring Using Wireless Microphone Based on Energy Envelope | Dr. Chi-Te Wang | Prof. Shih-Hau Fang |
16:40 - 17:00 | Close | Prof. Kun-Ching Wang / Prof. Ying-Hui Lai | - |
Speech is a natural and preferred means of expressing one’s thoughts and emotions for communication purpose. Speech and language impairments are negatively impacting the daily life of a large population worldwide. Speech impairments are manifested in the aspects of atypical articulation and phonation, while language impairments could be present across multiple linguistic levels in the use of spoken or written language. Timely and reliable assessment on the type and severity of impairment is crucial to effective treatment and rehabilitation. Conventionally speech assessment is carried out by professional speech and language pathologists (SLPs). In view of the shortage of qualified SLPs with relevant linguistic and cultural background, objective assessment techniques based on acoustical signal analysis and machine learning models are expected to play an increasingly important role in assisting clinical assessment. This presentation will cover a series of our recent studies on applying deep learning models to automatic assessment of different types of speech and language impairments. The types of impairments that we have tackled include voice disorder in adults, phonology and articulation disorder in children, and neurological disorder in elderly people. All of our works are focused on spoken Cantonese. The use of Siamese network and auto-encoder model has been investigated to address the challenges related to the scarcity of training speech and the absence of reliable labels. The findings in attempting the end-to-end approach to speech assessment will also be shared.
Even conversational systems have attracted a lot of attention recently, the current systems sometimes fail due to the errors from different components. This talk presents potential directions for improvement: 1) we first focus on learning language embeddings specifically for practical scenarios for better robustness, and 2) secondly we propose a novel learning framework for natural language understanding and generation on top of duality for better scalability. Both directions enhance the robustness and scalability of conversational systems, showing the potential of guiding future research areas.
Real-world environments are always contain stationary and/or time-varying noises that are received together with speech signals by recording devices. The received noises inevitably degrade the performance of human--human and human--machine interfaces, and this issue has attracted significant attention over the years. To address this issue, an important front-end speech process, namely speech enhancement, which extracts clean components from noisy input, can improve the voice quality and intelligibility of noise-deteriorated clean speech. These speech-enhancement systems can be split into two categories in terms of the physical configurations: single- and multi-channels. For single-channel-based speech enhancement systems, the speech waveform was recorded essentially from an microphone, and then enhanced through the enhancement system, which is derived based on the temporal information of the input. Multiple microphones are used to record the input speech in a multi-channel-based speech enhancement system. The system is designed by simultaneously exploiting the spatial diversity and temporal structures of received signals. In this talk, we present our recent research achievements using machine learning and signal processing on improving speech perception abilities for both configurations.
Speech signal carries a lot of redundant information for speech understanding, and many studies have showed that the loss of some acoustic information did not significantly affect speech intelligibility if important acoustic information was preserved. Due to their hearing loss, hearing-impaired listeners are unable to recognize some acoustic information (e.g., temporal fine structure). Hence, studying the important acoustic information minimally required for an intelligible speech in different listening environments could guide our design of novel assistive hearing technologies. In this talk, I will first introduce early work on the relative importance of commonly-used acoustic cues for speech intelligibility, particularly on a vocoder model for speech intelligibility. Then, I will present recent studies towards reconstructing an intelligible speech with cortical EEG signals, including Mandarin tone imagery and speech reconstruction.
The fields of clinical otology and hearing research are advancing at the forefront of innovation in medicine and technology. Promising progress in genetic medicine and digital technology have started to change the traditional medical and hearing research. Next-generation sequencing, novel gene therapy vectors, CRISPR-Cas9 gene editing technologies, mobile-phone apps and artificial intelligence all generates enormous creative energy. In this talk, I will introduce how these revolutionary technologies change physician’s practice and research.
As privacy is getting more and more concerned, voice user interface (UI) is in the process of transition from the cloud to the edge device. However, to land a neural network based voice/language model on edge devices with efficient power consumption is very challenging. In this talk, we will first introduce MediaTek NeuroPilot from the platform level to tackle this challenge. Then, more specifically, we investigate it from the algorithm perspective including the algorithm trend and landing opportunity. Finally, we show some preliminary results on speech recognitions and natural language understanding.
Voice disorders mainly result from chronic overuse or abuse, particularly for teachers or other occupational voice users. Previous studies have proposed a contact microphone attached to the anterior neck for ambulatory voice monitoring; however, the inconvenience associated with taping and wiring, and the lack of real-time processing has limited its daily application.
Starting from 2015, we founded a research group collaborating with experts from National Yang-Ming University, Yuan Ze University and Far Eastern Memorial Hospital. We proposed an system using wireless microphone for real-time ambulatory voice monitoring. We invited 10 teachers to participate in the pilot study. We designed an adaptive threshold (AT) function to detect the presence of speech based on energy envelope. All the participant wore a wireless microphone during a teaching class (around 40-60 minutes), in quite classroom (background noise < 55dB SPL). We developed a software for manually labeling speech segments according to the time and frequency domains. We randomly selected 25 utterance (10 s each) from the recorded audio files for calculating the coefficients for AT function via genetic algorithm. Another five random utterances were used for testing the accuracy of ASD system, using manually labeled data as the ground truth. We measured phonation ratio (speech frames / total frames) and the length of speech segments as a proxy of phonation habits of the users. We also mimicked scenarios of noisy backgrounds by manually mixing 4 different types of noise into the original recordings. Adjuvant noise reduction function using Log MMSE algorithm was applied to counteract the influence of detection accuracy.
The study results exhibited detection accuracy (for speech) ranging from 81% to 94%. Subsequent analyses revealed a phonation ratio between 50% and 78%, with most phonation segments less than 10 s. Although the presence of background noise reduced the accuracy of the ASD system (25% to 79%), adjuvant noise reduction function can effectively improve the accuracy for up to 45.8%, especially under stable noise (e.g. white noise).
study demonstrated a good detection accuracy of the proposed system. Preliminary results of phonation ratio and speech segments were all comparable to those of previous research. Although wireless microphone was susceptible to background noise, additional noise reduction function can overcome this limitation. These results indicate that the proposed system can be applied to ambulatory voice monitoring for occupational voice users.
ACLCLP Member |
Non-Member | |||
Full Registration | NT$200 | NT$300 | ||
Student | Free | NT$100 | ||
Sponsor | Free | |||
Important Notes |
Online registration:now~8/02 (Sunday)。 1. Post Office: 2. Credit Card:should pay before 8/03。 |
No.155, Sec.2, Linong Street, Taipei, 112 Taiwan, R.O.C.
200 University Road, Neimen, Kaohsiung 84550 Taiwan, R.O.C.
Research Center for Information Technology Innovation, Academia Sinica
128, Sec. 2, Academia Rd., Nangang Dist., Taipei 11529 Taiwan, R.O.C.
Institute of Information Science, Academia Sinica
02-2788-3799 #1714,1507
Research Center for Information Technology Innovation, Academia Sinica
02-2787-2300 #2787-2390
Cheng Hsin General Hospital
02-2826-4400
Department of Electrical Engineering, National Tsing Hua University
03-516-2439
Department of Otorhinolaryngology-Head and Neck Surgery, Taipei Veterans General Hospital
Department of Electronic Engineering, National Taipei University of Technology
02-2771-2171 #2247
Department of Electrical Engineering, Yuan Ze University
03-463-8800 #7125
For any question, please mail or call
02-2826-7000 #5491
Association for Computational Linguistics and Chinese Language Processing
02-2788-3799 #1502