SWS 2020 Speech Signal Processing Workshop

Speech Signal Processing Workshop is an annual event held by the Association for Computational Linguistics and Chinese Language Processing (ACLCLP). The meeting has featured distinguished experts and scholars from around the world, and this year’s invited speakers include:

Dr. Tan LEE, The Chinese Univerisity of Hong Kong, Hong Kong

Dr. Yun-Nung Chen , National Taipei University, Taiwan

Dr. Chi-Te Wang, Far Eastern Memorial Hospital, Taiwan

Dr. Fei Chen, Southern University of Science and Technology of China, China

Dr. Yen-Fu Cheng, Taipei Veterans General Hospial, Taiwan

Dr. Hantao Huan, Mediatek, Taiwan

Dr. Syu-Siang Wang, Research Center for Information Tecgnology Innovation, Academia Sinica

The workshop will cover a wide variety of research topics in speech signal processing. No matter from industry or academia, if you are interested in speech processing, natural language processing or music processing, this workshop is an event that you should not miss.

Because of the COVID-19 pandemic and the potential impacts, we have organized online workshop to ensure the well-being of all participants. (Guidelines) Join us to experience the grand event!

MOST Outcomes Presentation

應用視覺與聲學標註之深度編碼多模技術於影像描述生成(第 1 年) - Jui-Feng Yeh Professor, NCYU

Spoken Multiple-Choice Question Answering Using Multimodal Convolutional Neural Networks - Kuan-Yu Chen Professor, NTUSR

Towards Unsupervised Learning:Achievement on Unsupervised Speech Recognition - Hung-Yi Lee Professor, EE, NTU

基於深層神經網路感知模型的雙耳聽覺場景分析模型(1/3) - Taishih Chi Professor, ECE, NCTU

EEG-based Wavelet Analysis for Epilepsy Detection on Stroke Patients - Lung-Hao Lee Professor, EE, NCU

銀髮族口語互動式居家陪伴及推薦系統 - Chung-Hsien Wu Professor, CSIE, NCKU

Development of Text-to-Speech System Based on Deep Learning Technologies - Chen-Yu Chiang Assistant Professor, CE, NTU

基於深度學習之進階語音致能應用開發 - 多語言電視與廣播節目自動文字轉寫、摘要擷取，語料庫建立與內容檢索 - Yuan-Fu Liao Associate Professor, NTUT

Voice Conversion and its Applications - Prof. Hsin-Min Wang, Academia Sinica

研發新穎的目標函數及模型簡化技術於深度學習之語音增強系統 - Prof. Yu Tsao, Research Center for Information Technology Innovation, Academia Sinica

Important Dates

06/19 Registration and payment.

08/02 Registration deadline.

08/03 Payment deadline.

08/07 SWS 2020 !

Time and Location

August 07, 2020, Friday

National Yang-Ming University

Online Conference

News

Formosa Speech Recognition Challenge 2020 - Taiwanese ASR

Special Issue "Human Computer Interaction for Intelligent Systems" (Dealine-15 December 2020)

ROCLING 2020, 32nd Conference on Computational Linguistics and Speech Processing, Sept. 24-26, 2020

Organizers

Department of Biomedical Engineering, National Yang-Ming University

Association for Computational Linguistics and Chinese Language Processing

Co-organizers

Research Center for Information Technology Innovation, Academia Sinica

Department of Information Technology & Communication, Shih Chien University

Ministry of Science and Technology

Keynote Speakers
Learn from these great fellows

Tan Lee

Assoicate Professor, CUHK

Tan Lee is currently an Associate Professor at the Department of Electronic Engineering, the Chinese University of Hong Kong (CUHK). He has been working on speech and language related research for over 20 years. His research covers spoken language technologies, speech enhancement and separation, audio and music processing, speech and language rehabilitation, and neurological basis of speech and language. He led the effort on developing Cantonese-focused spoken language technologies that have been widely licensed for industrial applications. His current work is focused on applying signal processing and machine learning methods to atypical speech and language that are related to different kinds of human communication and cognitive disorders. He is an Associate Editor for the IEEE/ACM Transactions on Audio, Speech, and Language Processing and the EURASIP Journal on Advances in Signal Processing. He is the Vice Chair of ISCA Special Interest Group of Chinese Spoken Language Processing, and an Area Chair in the Technical Programme Committees of INTERSPEECH 2014, 2016 and 2018.

Yun-Nung Chen

Assistant Professor, NTU

Yun-Nung (Vivian) Chen is currently an assistant professor in the Department of Computer Science & Information Engineering at National Taiwan University. She earned her Ph.D. degree from Carnegie Mellon University, where her research interests focus on spoken dialogue systems, language understanding, natural language processing, and multimodality. She received Google Faculty Research Awards, MOST Young Scholar Fellowship, FAOS Young Scholar Innovation Award, Student Best Paper Awards, and the Distinguished Master Thesis Award. Prior to joining National Taiwan University, she worked in the Deep Learning Technology Center at Microsoft Research Redmond.

Syu Siang Wang

PhD, Academia Sinica

Dr. Syu Siang Wang received the Ph.D. degree (2018) in the Graduate Institute of Communication Engineering, National Taiwan University. The topic of his Ph.D. research is on wavelet speech enhancement and feature compression. He won the PhD Thesis Award at ACLCLP. In addition, he gained twice opportunities to be an summer intern in National Institute of Information and Communications Technology, Japan, in Sep. 2015 and Department of Electrical and Electronic Engineering, SUSTC, China in Jun. 2016.

From August 2018 to July 2019, he was the postdoctoral researcher in MOST Joint Research Center for AI Technology and All Vista Healthcare, where he engaged in research on developing algorithm for healthcare applications . Several papers were published based on his research achievements.

Currently, he is the postdoctoral researcher in the Research Center for Information Technology Innovation, Academia Sinica. His research interests include speech and speaker recognition, acoustic modeling, audio-coding, and bio-signal processing.

Fei Chen

Associate Professor, SUSTC

Fei Chen received the B.Sc. and M.Phil. degrees from the Department of Electronic Science and Engineering, Nanjing University in 1998 and 2001, respectively, and the Ph.D. degree from the Department of Electronic Engineering, The Chinese University of Hong Kong in 2005. He continued his research as post-doctor and senior research fellow in University of Texas at Dallas and The University of Hong Kong, and joined Southern University of Science and Technology (SUSTech) as a faculty in 2014. Dr. Chen is leading the speech processing research group in SUSTech, with research focus on speech perception, speech intelligibility modeling, speech enhancement, and assistive hearing technology. He published over 80 journal papers and over 80 conference papers in IEEE journals/conferences, Interspeech, Journal of Acoustical Society of America, etc. He received the best presentation award in the 9th Asia Pacific Conference of Speech, Language and Hearing, and 2011 National Organization for Hearing Research Foundation Research Awards in States. Dr. Chen is now serving as associate editor/editorial member of "Frontiers in Psychology" "Biomedical Signal Processing and Control" "Physiological Measurement".

Yen-Fu Cheng

PhD, Taipei Veterans General Hospial

Yen-Fu Cheng is a surgeon-scientist at the Department of Medical Research and director of research and attending doctor at the Department of Otolaryngology-Head and Neck Surgery, Taipei Veterans General Hospital. He is also an adjunct assistant professor of Institute of Brain Science/Faculty of Medicine, National Yang-Ming University. He is currently the Principal Investigator of the Laboratory of Auditory Physiology and Genetic Medicine.

Yen-Fu’s research focuses on auditory neuroscience and clinical otology. For basic research, he is dedicated in applying cutting edge gene transfer and gene editing methods to understand and develop therapy for inner ear disorders. For clinical research, he is interested in using state-of-the-art methods to approach clinical otology issues, such as next-generation sequencing for genetic medicine and artificial-intelligence for hearing-related diseases.

Yen-Fu received his medical degree from Taipei Medical University, and doctoral degree from Massachusetts Institute of Technology, where he studied Speech and Hearing Bioscience and Technology at the Harvard-MIT Division of Health Sciences and Technology. He was a post-doctoral research fellow at Harvard Medical School prior he started his lab at VGH-TPE/NYMU.

Hantao Huang

PhD, Mediatek, Taiwan

Hantao Huang (S’14) received the B.S. and Ph.D. degrees from the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, in 2013 and 2018, respectively. Since 2018, he has been a Staff Engineer with MediaTek, Singapore, where he is involved in natural language processing algorithms, neural network compression and quantization for edge devices. His current research interests include speech recognition, machine-learning algorithms, and low power systems.

Chi-Te Wang

PhD, Far Eastern Memorial Hospital

Dr. Chi-Te Wang received his MD degree from the National Taiwan University, Taipei, Taiwan, in 2003. After resident training from 2003 to 2008, he joined Far Eastern Memorial Hospital as an attending physician. He received PhD degree from the Institute of Epidemiology and Preventive Medicine at National Taiwan University in 2014. During his professional carrier, he visited Mount Sinai Hospital (NYC, 2009), Mayo Clinic (Arizona, 2012), Isshiki voice center (Kyoto, 2015), UC Davis voice and swallow center (Sacramento, 2018), and UCSF voice and swallow center (San Francisco, 2018) for continual exposure on the expertise practice. He is a corresponding member of the American Laryngological Society and member of councils on the Taiwan Otolaryngological Society and Taiwan Voice Society. He has a wide clinical and academic interest, and has published a dozen papers on different fields, including phonosurgery, automatic detection and classification of voice disorders, real time monitoring of phonation, and telepractice. He is the inventor of multiple international patents on voice detection, classification, and treatments. He co-hosted Big Data Cup Challenge on 2018 and 2019 IEEE International Conference on Big Data. He is the winner of Society for Promotion of International Oto-Rhino-Laryngology (SPIO) Award on 2015, Best Synergy Award of Far Eastern Group on 2018, and National Innovation Award of Taiwan in 2019.

Agenda

Time	Topic	Speaker	Host
08:30 - 09:00	Check-in
09:00 - 09:10	Opening Remarks	NYMU President, Steve, H. S. NCTU President, Kuo Chen, Sin-Horng	-
09:10 - 09:50	Deep Learning Approaches to Automatic Assessment of Speech and Language Impairment	Prof. Tan Lee	Prof. Hsin-Min Wang
09:50 - 10:00	Intermission
10:00 - 10:40	Towards Superhuman Conversational AI	Prof. Yun-Nung (Vivian) Chen	Prof. Yu Tsao
10:40 - 10:50	Intermission
10:50 - 11:30	Single- and multi-channel Speech Enhancement System	Dr. Syu Siang Wang	Dr. Li L P-H
11:30 - 13:30	MOST Outcomes Presentation
13:30 - 14:10	Minimum Acoustic Information Required for an Intelligible Speech	Prof. Fei Chen	Prof. Chi-Chun (Jeremy) Lee
14:10 - 14:20	Intermission
14:20 - 15:00	A New Era of Otology and Hearing Research: NGS, CRISPR, App, AI and Beyond	Dr. Yen-Fu Cheng	Dr. Wen-Huei Liao
15:00 - 15:10	Intermission
15:10 - 15:50	Make a Power-efficient Voice UI on Edge Devices	Dr. Hantao Huang	Prof. Yuan-Fu Liao
15:50 - 16:00	Intermission
16:00 - 16:40	Ambulatory Phonation Monitoring Using Wireless Microphone Based on Energy Envelope	Dr. Chi-Te Wang	Prof. Shih-Hau Fang
16:40 - 17:00	Close	Prof. Kun-Ching Wang / Prof. Ying-Hui Lai	-

08:30 ~ 09:00

Check-in

09:00 ~ 09:10

Opening Remarks

NYMU President, Steve, H. S. Kuo / NCTU President, Chen, Sin-Horng

09:10 ~ 09:50

Deep Learning Approaches to Automatic Assessment of Speech and Language Impairment

Prof. Tan Lee

Speech is a natural and preferred means of expressing one’s thoughts and emotions for communication purpose. Speech and language impairments are negatively impacting the daily life of a large population worldwide. Speech impairments are manifested in the aspects of atypical articulation and phonation, while language impairments could be present across multiple linguistic levels in the use of spoken or written language. Timely and reliable assessment on the type and severity of impairment is crucial to effective treatment and rehabilitation. Conventionally speech assessment is carried out by professional speech and language pathologists (SLPs). In view of the shortage of qualified SLPs with relevant linguistic and cultural background, objective assessment techniques based on acoustical signal analysis and machine learning models are expected to play an increasingly important role in assisting clinical assessment. This presentation will cover a series of our recent studies on applying deep learning models to automatic assessment of different types of speech and language impairments. The types of impairments that we have tackled include voice disorder in adults, phonology and articulation disorder in children, and neurological disorder in elderly people. All of our works are focused on spoken Cantonese. The use of Siamese network and auto-encoder model has been investigated to address the challenges related to the scarcity of training speech and the absence of reliable labels. The findings in attempting the end-to-end approach to speech assessment will also be shared.

HOST： Prof. Hsin-Min Wang

09:50 ~ 10:00

Intermission

10:00 ~ 10:40

Towards Superhuman Conversational AI

Prof. Yun-Nung (Vivian) Chen

Even conversational systems have attracted a lot of attention recently, the current systems sometimes fail due to the errors from different components. This talk presents potential directions for improvement: 1) we first focus on learning language embeddings specifically for practical scenarios for better robustness, and 2) secondly we propose a novel learning framework for natural language understanding and generation on top of duality for better scalability. Both directions enhance the robustness and scalability of conversational systems, showing the potential of guiding future research areas.

HOST： Prof. Yu Tsao

10:40 ~ 10:50

Intermission

10:50 ~ 11:30

Single- and Multi-channel Speech Enhancement System

Dr. Syu Siang Wang

Real-world environments are always contain stationary and/or time-varying noises that are received together with speech signals by recording devices. The received noises inevitably degrade the performance of human--human and human--machine interfaces, and this issue has attracted significant attention over the years. To address this issue, an important front-end speech process, namely speech enhancement, which extracts clean components from noisy input, can improve the voice quality and intelligibility of noise-deteriorated clean speech. These speech-enhancement systems can be split into two categories in terms of the physical configurations: single- and multi-channels. For single-channel-based speech enhancement systems, the speech waveform was recorded essentially from an microphone, and then enhanced through the enhancement system, which is derived based on the temporal information of the input. Multiple microphones are used to record the input speech in a multi-channel-based speech enhancement system. The system is designed by simultaneously exploiting the spatial diversity and temporal structures of received signals. In this talk, we present our recent research achievements using machine learning and signal processing on improving speech perception abilities for both configurations.

HOST： Dr. Li L P-H

11:30 ~ 13:30

Lunch

13:30 ~ 14:10

Minimum Acoustic Information Required for An Intelligible Speech

Prof. Fei Chen

Speech signal carries a lot of redundant information for speech understanding, and many studies have showed that the loss of some acoustic information did not significantly affect speech intelligibility if important acoustic information was preserved. Due to their hearing loss, hearing-impaired listeners are unable to recognize some acoustic information (e.g., temporal fine structure). Hence, studying the important acoustic information minimally required for an intelligible speech in different listening environments could guide our design of novel assistive hearing technologies. In this talk, I will first introduce early work on the relative importance of commonly-used acoustic cues for speech intelligibility, particularly on a vocoder model for speech intelligibility. Then, I will present recent studies towards reconstructing an intelligible speech with cortical EEG signals, including Mandarin tone imagery and speech reconstruction.

HOST： Prof. Chi-Chun (Jeremy) Lee

14:10 ~ 14:20

Intermission

14:20 ~ 15:00

A New Era of Otology and Hearing Research: NGS, CRISPR, App, AI and Beyond

Dr. Yen-Fu Cheng

The fields of clinical otology and hearing research are advancing at the forefront of innovation in medicine and technology. Promising progress in genetic medicine and digital technology have started to change the traditional medical and hearing research. Next-generation sequencing, novel gene therapy vectors, CRISPR-Cas9 gene editing technologies, mobile-phone apps and artificial intelligence all generates enormous creative energy. In this talk, I will introduce how these revolutionary technologies change physician’s practice and research.

HOST： Dr. Wen-Huei Liao

15:00 ~ 15:10

Intermission

15:10 ~ 15:50

Make a Power-efficient Voice UI
on Edge Devices

Dr. Hantao Huang

As privacy is getting more and more concerned, voice user interface (UI) is in the process of transition from the cloud to the edge device. However, to land a neural network based voice/language model on edge devices with efficient power consumption is very challenging. In this talk, we will first introduce MediaTek NeuroPilot from the platform level to tackle this challenge. Then, more specifically, we investigate it from the algorithm perspective including the algorithm trend and landing opportunity. Finally, we show some preliminary results on speech recognitions and natural language understanding.

HOST： Prof. Yuan-Fu Liao

15:50 ~ 16:00

Intermission

16:00 ~ 16:40

Ambulatory Phonation Monitoring Using Wireless Microphone Based on Energy Envelope

Dr. Chi-Te Wang

Voice disorders mainly result from chronic overuse or abuse, particularly for teachers or other occupational voice users. Previous studies have proposed a contact microphone attached to the anterior neck for ambulatory voice monitoring; however, the inconvenience associated with taping and wiring, and the lack of real-time processing has limited its daily application.

Starting from 2015, we founded a research group collaborating with experts from National Yang-Ming University, Yuan Ze University and Far Eastern Memorial Hospital. We proposed an system using wireless microphone for real-time ambulatory voice monitoring. We invited 10 teachers to participate in the pilot study. We designed an adaptive threshold (AT) function to detect the presence of speech based on energy envelope. All the participant wore a wireless microphone during a teaching class (around 40-60 minutes), in quite classroom (background noise < 55dB SPL). We developed a software for manually labeling speech segments according to the time and frequency domains. We randomly selected 25 utterance (10 s each) from the recorded audio files for calculating the coefficients for AT function via genetic algorithm. Another five random utterances were used for testing the accuracy of ASD system, using manually labeled data as the ground truth. We measured phonation ratio (speech frames / total frames) and the length of speech segments as a proxy of phonation habits of the users. We also mimicked scenarios of noisy backgrounds by manually mixing 4 different types of noise into the original recordings. Adjuvant noise reduction function using Log MMSE algorithm was applied to counteract the influence of detection accuracy.

The study results exhibited detection accuracy (for speech) ranging from 81% to 94%. Subsequent analyses revealed a phonation ratio between 50% and 78%, with most phonation segments less than 10 s. Although the presence of background noise reduced the accuracy of the ASD system (25% to 79%), adjuvant noise reduction function can effectively improve the accuracy for up to 45.8%, especially under stable noise (e.g. white noise).

study demonstrated a good detection accuracy of the proposed system. Preliminary results of phonation ratio and speech segments were all comparable to those of previous research. Although wireless microphone was susceptible to background noise, additional noise reduction function can overcome this limitation. These results indicate that the proposed system can be applied to ambulatory voice monitoring for occupational voice users.

HOST： Prof. Shih-Hau Fang

16:40 ~ 17:00

Close

Prof. Ying-Hui Lai / Prof. Kun-Ching Wang

Registration

	ACLCLP Member	Non-Member
Full Registration	NT$200	NT$300
Student	Free	NT$100
Sponsor	Free
Important Notes	「ACLCLP member」 means the「中華民國計算語言學學會」effective member。 Registration： Online registration：now~8/02 (Sunday)。 Payment： 1. Post Office： Account #：19166251，Title：「中華民國計算語言學學會」，should pay before 7/31。 2. Credit Card：should pay before 8/03。

Contact Us

GENERAL CO-CHAIR

No.155, Sec.2, Linong Street, Taipei, 112 Taiwan, R.O.C.

200 University Road, Neimen, Kaohsiung 84550 Taiwan, R.O.C.

TECHNICAL PROGRAM CO-CHAIRS

Research Center for Information Technology Innovation, Academia Sinica
128, Sec. 2, Academia Rd., Nangang Dist., Taipei 11529 Taiwan, R.O.C.

SESSION CHAIRS

Institute of Information Science, Academia Sinica
02-2788-3799 #1714,1507

Research Center for Information Technology Innovation, Academia Sinica
02-2787-2300 #2787-2390

Cheng Hsin General Hospital

02-2826-4400

Department of Electrical Engineering, National Tsing Hua University
03-516-2439

Department of Otorhinolaryngology-Head and Neck Surgery, Taipei Veterans General Hospital

Department of Electronic Engineering, National Taipei University of Technology
02-2771-2171 #2247

Department of Electrical Engineering, Yuan Ze University
03-463-8800 #7125

WORKSHOP EMAIL

For any question, please mail or call

02-2826-7000 #5491

REGISTRATION

Association for Computational Linguistics and Chinese Language Processing

02-2788-3799 #1502

Research Center for Information Technology Innovation Academia Sinica	APrevent Medical	Vapor Co., Ltd.	JEN SOUND
Eastern Electronics	IEA Electro-Acoustic Technology

SWS 2020 Speech Signal Processing Workshop

MOST Outcomes Presentation

Important Dates

Time and Location

News

Organizers

Co-organizers

Keynote SpeakersLearn from these great fellows

Agenda

Check-in

Opening Remarks

Deep Learning Approaches to Automatic Assessment of Speech and Language Impairment

Intermission

Towards Superhuman Conversational AI

Intermission

Single- and Multi-channel Speech Enhancement System

Lunch

Minimum Acoustic Information Required for An Intelligible Speech

Intermission

A New Era of Otology and Hearing Research: NGS, CRISPR, App, AI and Beyond

Intermission

Make a Power-efficient Voice UI on Edge Devices

Intermission

Ambulatory Phonation Monitoring Using Wireless Microphone Based on Energy Envelope

Close

Registration

Contact Us

Related Links

Sponsors

Keynote Speakers
Learn from these great fellows

Make a Power-efficient Voice UI
on Edge Devices