SWS 2020 語音訊號處理研討會

語音訊號處理研討會是中華民國計算語言學學會，一年一度定期舉辦的學術交流盛會，本次會議所邀請之演講者，包括香港中文大學李丹教授、中國南方科技大學陳霏教授、臺灣大學資訊工程學系陳縕儂教授、台北榮總醫研部鄭彥甫醫師、聯發科黄瀚韜博士、中央研究院資訊科技創新研究中心王緒翔博士、亞東紀念醫院王棨德醫師。演講內容涵蓋語音信號處理、語音技術硬體開發、自然語言處理及語音與醫學的應用等，是所有台灣學術界與產業界對這方面有興趣的專家學者們不容錯過的一場盛會。

除了上述演講外，本次會議同時舉辦國科會研究計畫之成果發表，以促進學術界與產業界技術與經驗的分享交流，並共同討論相關處理技術的新研究與應用方向，因此本次會議預期將可大幅提昇國內數位訊號處理產業技術水準，對工程科技推展效益至鉅。

本屆研討會為因應COVID-19，採線上會議形式舉辦(詳細參加方法)，歡迎各界人士踴躍參與!

研討會邀請函已於2020/08/06 下午 5:00 寄送出去!

科技部
成果發表

國立嘉義大學資訊工程學系（所）葉瑞峰教授 : 應用視覺與聲學標註之深度編碼多模技術於影像描述生成(第1年)

國立台灣科技大學陳冠宇教授 : 基於類神經網路之語言模型: 革新, 未來與應用

國立臺灣大學電機工程學系暨研究所李宏毅教授 : 邁向非督導式語音理解

國立交通大學電機工程學系（所）冀泰石教授 : 基於深層神經網路感知模型的雙耳聽覺場景分析模型(1/3)

中央大學電機工程學系李龍豪教授 : 基於腦電圖之小波分析於中風病人癲癇偵測

國立成功大學吳宗憲教授 : 銀髮族口語互動式居家陪伴及推薦系統

國立臺北大學通訊工程學系江振宇教授 : 利用深度學習技術開發之文字轉語音系統

臺北科技大學電子系廖元甫教授 : 基於深度學習之進階語音致能應用開發 - 多語言電視與廣播節目自動文字轉寫、摘要擷取，語料庫建立與內容檢索

中央研究院資訊科技所王新民教授 : 語音轉換及其應用

中央研究院資訊科技創新研究中心曹昱教授 : 研發新穎的目標函數及模型簡化技術於深度學習之語音增強系統

重要時程

06/19 開始報名繳費

08/02 報名截止

08/03 繳費截止

08/07 SWS 2020!

時間地點

2020 八月七日星期五

國立陽明大學

線上會議

主辦單位

陽明大學生物醫學工程學系

中華民國計算語言學學會

協辦單位

中央研究院資訊科技創新研究中心

實踐大學資訊科技與通訊學系

科技部工程技術研究發展司工程科技推展中心

Keynote Speakers
Learn from these great fellows

李丹教授

香港中文大學

Tan Lee is currently an Associate Professor at the Department of Electronic Engineering, the Chinese University of Hong Kong (CUHK). He has been working on speech and language related research for over 20 years. His research covers spoken language technologies, speech enhancement and separation, audio and music processing, speech and language rehabilitation, and neurological basis of speech and language. He led the effort on developing Cantonese-focused spoken language technologies that have been widely licensed for industrial applications. His current work is focused on applying signal processing and machine learning methods to atypical speech and language that are related to different kinds of human communication and cognitive disorders. He is an Associate Editor for the IEEE/ACM Transactions on Audio, Speech, and Language Processing and the EURASIP Journal on Advances in Signal Processing. He is the Vice Chair of ISCA Special Interest Group of Chinese Spoken Language Processing, and an Area Chair in the Technical Programme Committees of INTERSPEECH 2014, 2016 and 2018.

陳縕儂教授

臺灣大學資訊工程學系

Yun-Nung (Vivian) Chen is currently an assistant professor in the Department of Computer Science & Information Engineering at National Taiwan University. She earned her Ph.D. degree from Carnegie Mellon University, where her research interests focus on spoken dialogue systems, language understanding, natural language processing, and multimodality. She received Google Faculty Research Awards, MOST Young Scholar Fellowship, FAOS Young Scholar Innovation Award, Student Best Paper Awards, and the Distinguished Master Thesis Award. Prior to joining National Taiwan University, she worked in the Deep Learning Technology Center at Microsoft Research Redmond.

王緒翔博士

中央研究院資訊科技創新研究中心

Dr. Syu Siang Wang received the Ph.D. degree (2018) in the Graduate Institute of Communication Engineering, National Taiwan University. The topic of his Ph.D. research is on wavelet speech enhancement and feature compression. He won the PhD Thesis Award at ACLCLP. In addition, he gained twice opportunities to be an summer intern in National Institute of Information and Communications Technology, Japan, in Sep. 2015 and Department of Electrical and Electronic Engineering, SUSTC, China in Jun. 2016.

From August 2018 to July 2019, he was the postdoctoral researcher in MOST Joint Research Center for AI Technology and All Vista Healthcare, where he engaged in research on developing algorithm for healthcare applications . Several papers were published based on his research achievements.

Currently, he is the postdoctoral researcher in the Research Center for Information Technology Innovation, Academia Sinica. His research interests include speech and speaker recognition, acoustic modeling, audio-coding, and bio-signal processing.

陳霏教授

中國南方科技大學

Fei Chen received the B.Sc. and M.Phil. degrees from the Department of Electronic Science and Engineering, Nanjing University in 1998 and 2001, respectively, and the Ph.D. degree from the Department of Electronic Engineering, The Chinese University of Hong Kong in 2005. He continued his research as post-doctor and senior research fellow in University of Texas at Dallas and The University of Hong Kong, and joined Southern University of Science and Technology (SUSTech) as a faculty in 2014. Dr. Chen is leading the speech processing research group in SUSTech, with research focus on speech perception, speech intelligibility modeling, speech enhancement, and assistive hearing technology. He published over 80 journal papers and over 80 conference papers in IEEE journals/conferences, Interspeech, Journal of Acoustical Society of America, etc. He received the best presentation award in the 9th Asia Pacific Conference of Speech, Language and Hearing, and 2011 National Organization for Hearing Research Foundation Research Awards in States. Dr. Chen is now serving as associate editor/editorial member of "Frontiers in Psychology" "Biomedical Signal Processing and Control" "Physiological Measurement".

鄭彥甫博士

台北榮總醫研部

Yen-Fu Cheng is a surgeon-scientist at the Department of Medical Research and director of research and attending doctor at the Department of Otolaryngology-Head and Neck Surgery, Taipei Veterans General Hospital. He is also an adjunct assistant professor of Institute of Brain Science/Faculty of Medicine, National Yang-Ming University. He is currently the Principal Investigator of the Laboratory of Auditory Physiology and Genetic Medicine.

Yen-Fu’s research focuses on auditory neuroscience and clinical otology. For basic research, he is dedicated in applying cutting edge gene transfer and gene editing methods to understand and develop therapy for inner ear disorders. For clinical research, he is interested in using state-of-the-art methods to approach clinical otology issues, such as next-generation sequencing for genetic medicine and artificial-intelligence for hearing-related diseases.

Yen-Fu received his medical degree from Taipei Medical University, and doctoral degree from Massachusetts Institute of Technology, where he studied Speech and Hearing Bioscience and Technology at the Harvard-MIT Division of Health Sciences and Technology. He was a post-doctoral research fellow at Harvard Medical School prior he started his lab at VGH-TPE/NYMU.

黄瀚韜博士

聯發科

Hantao Huang (S’14) received the B.S. and Ph.D. degrees from the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, in 2013 and 2018, respectively. Since 2018, he has been a Staff Engineer with MediaTek, Singapore, where he is involved in natural language processing algorithms, neural network compression and quantization for edge devices. His current research interests include speech recognition, machine-learning algorithms, and low power systems.

王棨德博士

亞東紀念醫院耳鼻喉科

Dr. Chi-Te Wang received his MD degree from the National Taiwan University, Taipei, Taiwan, in 2003. After resident training from 2003 to 2008, he joined Far Eastern Memorial Hospital as an attending physician. He received PhD degree from the Institute of Epidemiology and Preventive Medicine at National Taiwan University in 2014. During his professional carrier, he visited Mount Sinai Hospital (NYC, 2009), Mayo Clinic (Arizona, 2012), Isshiki voice center (Kyoto, 2015), UC Davis voice and swallow center (Sacramento, 2018), and UCSF voice and swallow center (San Francisco, 2018) for continual exposure on the expertise practice. He is a corresponding member of the American Laryngological Society and member of councils on the Taiwan Otolaryngological Society and Taiwan Voice Society. He has a wide clinical and academic interest, and has published a dozen papers on different fields, including phonosurgery, automatic detection and classification of voice disorders, real time monitoring of phonation, and telepractice. He is the inventor of multiple international patents on voice detection, classification, and treatments. He co-hosted Big Data Cup Challenge on 2018 and 2019 IEEE International Conference on Big Data. He is the winner of Society for Promotion of International Oto-Rhino-Laryngology (SPIO) Award on 2015, Best Synergy Award of Far Eastern Group on 2018, and National Innovation Award of Taiwan in 2019.

會議議程

研討會活動手冊

Time	Topic	Speaker	Host
08:30 - 09:00	報到
09:00 - 09:10	開幕致詞	郭旭崧校長陳信宏校長	-
09:10 - 09:50	Deep Learning Approaches to Automatic Assessment of Speech and Language Impairment	李丹教授	王新民教授
09:50 - 10:00	Intermission
10:00 - 10:40	Towards Superhuman Conversational AI	陳縕儂教授	曹昱教授
10:40 - 10:50	Intermission
10:50 - 11:30	Single- and Multi-channel Speech Enhancement System	王緒翔博士	力博宏博士
11:30 - 13:30	MOST Outcomes Presentation
13:30 - 14:10	Minimum Acoustic Information Required for an Intelligible Speech	陳霏教授	李祈均教授
14:10 - 14:20	Intermission
14:20 - 15:00	A New Era of Otology and Hearing Research: NGS, CRISPR, App, AI and Beyond	鄭彥甫博士	廖文輝博士
15:00 - 15:10	Intermission
15:10 - 15:50	Make a Power-efficient Voice UI on Edge Devices	黄瀚韜博士	廖元甫教授
15:50 - 16:00	Intermission
16:00 - 16:40	Ambulatory Phonation Monitoring Using Wireless Microphone Based on Energy Envelope	王棨德博士	方士豪教授
16:40 - 17:00	閉幕	賴穎暉教授王坤卿教授	-

08:30 ~ 09:00

報到

09:00 ~ 09:10

開幕致詞

郭旭崧校長 / 陳信宏校長

09:10 ~ 09:50

Deep Learning Approaches to Automatic Assessment of Speech and Language Impairment

李丹教授

Speech is a natural and preferred means of expressing one’s thoughts and emotions for communication purpose. Speech and language impairments are negatively impacting the daily life of a large population worldwide. Speech impairments are manifested in the aspects of atypical articulation and phonation, while language impairments could be present across multiple linguistic levels in the use of spoken or written language. Timely and reliable assessment on the type and severity of impairment is crucial to effective treatment and rehabilitation. Conventionally speech assessment is carried out by professional speech and language pathologists (SLPs). In view of the shortage of qualified SLPs with relevant linguistic and cultural background, objective assessment techniques based on acoustical signal analysis and machine learning models are expected to play an increasingly important role in assisting clinical assessment. This presentation will cover a series of our recent studies on applying deep learning models to automatic assessment of different types of speech and language impairments. The types of impairments that we have tackled include voice disorder in adults, phonology and articulation disorder in children, and neurological disorder in elderly people. All of our works are focused on spoken Cantonese. The use of Siamese network and auto-encoder model has been investigated to address the challenges related to the scarcity of training speech and the absence of reliable labels. The findings in attempting the end-to-end approach to speech assessment will also be shared.

HOST：王新民教授

09:50 ~ 10:00

Intermission

10:00 ~ 10:40

Towards Superhuman Conversational AI

陳縕儂教授

Even conversational systems have attracted a lot of attention recently, the current systems sometimes fail due to the errors from different components. This talk presents potential directions for improvement: 1) we first focus on learning language embeddings specifically for practical scenarios for better robustness, and 2) secondly we propose a novel learning framework for natural language understanding and generation on top of duality for better scalability. Both directions enhance the robustness and scalability of conversational systems, showing the potential of guiding future research areas.

HOST：曹昱教授

10:40 ~ 10:50

Intermission

10:50 ~ 11:30

Single- and multi-channel Speech Enhancement System

王緒翔博士

Real-world environments are always contain stationary and/or time-varying noises that are received together with speech signals by recording devices. The received noises inevitably degrade the performance of human--human and human--machine interfaces, and this issue has attracted significant attention over the years. To address this issue, an important front-end speech process, namely speech enhancement, which extracts clean components from noisy input, can improve the voice quality and intelligibility of noise-deteriorated clean speech. These speech-enhancement systems can be split into two categories in terms of the physical configurations: single- and multi-channels. For single-channel-based speech enhancement systems, the speech waveform was recorded essentially from an microphone, and then enhanced through the enhancement system, which is derived based on the temporal information of the input. Multiple microphones are used to record the input speech in a multi-channel-based speech enhancement system. The system is designed by simultaneously exploiting the spatial diversity and temporal structures of received signals. In this talk, we present our recent research achievements using machine learning and signal processing on improving speech perception abilities for both configurations.

HOST：力博宏博士

11:30 ~ 13:30

Lunch

13:30 ~ 14:10

Minimum Acoustic Information Required for
An Intelligible Speech

陳霏教授

Speech signal carries a lot of redundant information for speech understanding, and many studies have showed that the loss of some acoustic information did not significantly affect speech intelligibility if important acoustic information was preserved. Due to their hearing loss, hearing-impaired listeners are unable to recognize some acoustic information (e.g., temporal fine structure). Hence, studying the important acoustic information minimally required for an intelligible speech in different listening environments could guide our design of novel assistive hearing technologies. In this talk, I will first introduce early work on the relative importance of commonly-used acoustic cues for speech intelligibility, particularly on a vocoder model for speech intelligibility. Then, I will present recent studies towards reconstructing an intelligible speech with cortical EEG signals, including Mandarin tone imagery and speech reconstruction.

HOST：李祈均教授

14:10 ~ 14:20

Intermission

14:20 ~ 15:00

A New Era of Otology and Hearing Research:
NGS, CRISPR, App, AI and Beyond

鄭彥甫博士

The fields of clinical otology and hearing research are advancing at the forefront of innovation in medicine and technology. Promising progress in genetic medicine and digital technology have started to change the traditional medical and hearing research. Next-generation sequencing, novel gene therapy vectors, CRISPR-Cas9 gene editing technologies, mobile-phone apps and artificial intelligence all generates enormous creative energy. In this talk, I will introduce how these revolutionary technologies change physician’s practice and research.

HOST：廖文輝博士

15:00 ~ 15:10

Intermission

15:10 ~ 15:50

Make a Power-efficient Voice UI
on Edge Devices

黄瀚韜博士

As privacy is getting more and more concerned, voice user interface (UI) is in the process of transition from the cloud to the edge device. However, to land a neural network based voice/language model on edge devices with efficient power consumption is very challenging. In this talk, we will first introduce MediaTek NeuroPilot from the platform level to tackle this challenge. Then, more specifically, we investigate it from the algorithm perspective including the algorithm trend and landing opportunity. Finally, we show some preliminary results on speech recognitions and natural language understanding.

HOST：廖元甫教授

15:50 ~ 16:00

Intermission

16:00 ~ 16:40

Ambulatory Phonation Monitoring Using Wireless Microphone Based on Energy Envelope

王棨德醫師

Voice disorders mainly result from chronic overuse or abuse, particularly for teachers or other occupational voice users. Previous studies have proposed a contact microphone attached to the anterior neck for ambulatory voice monitoring; however, the inconvenience associated with taping and wiring, and the lack of real-time processing has limited its daily application.

Starting from 2015, we founded a research group collaborating with experts from National Yang-Ming University, Yuan Ze University and Far Eastern Memorial Hospital. We proposed an system using wireless microphone for real-time ambulatory voice monitoring. We invited 10 teachers to participate in the pilot study. We designed an adaptive threshold (AT) function to detect the presence of speech based on energy envelope. All the participant wore a wireless microphone during a teaching class (around 40-60 minutes), in quite classroom (background noise < 55dB SPL). We developed a software for manually labeling speech segments according to the time and frequency domains. We randomly selected 25 utterance (10 s each) from the recorded audio files for calculating the coefficients for AT function via genetic algorithm. Another five random utterances were used for testing the accuracy of ASD system, using manually labeled data as the ground truth. We measured phonation ratio (speech frames / total frames) and the length of speech segments as a proxy of phonation habits of the users. We also mimicked scenarios of noisy backgrounds by manually mixing 4 different types of noise into the original recordings. Adjuvant noise reduction function using Log MMSE algorithm was applied to counteract the influence of detection accuracy.

The study results exhibited detection accuracy (for speech) ranging from 81% to 94%. Subsequent analyses revealed a phonation ratio between 50% and 78%, with most phonation segments less than 10 s. Although the presence of background noise reduced the accuracy of the ASD system (25% to 79%), adjuvant noise reduction function can effectively improve the accuracy for up to 45.8%, especially under stable noise (e.g. white noise).

This study demonstrated a good detection accuracy of the proposed system. Preliminary results of phonation ratio and speech segments were all comparable to those of previous research. Although wireless microphone was susceptible to background noise, additional noise reduction function can overcome this limitation. These results indicate that the proposed system can be applied to ambulatory voice monitoring for occupational voice users.

HOST：方士豪教授

16:40 ~ 17:00

閉幕

賴穎暉教授 / 王坤卿教授

報名資訊

立即報名

身份別	ACLCLP 會員	非會員
一般人士	NT$200	NT$300
學生	免費	NT$100
贊助單位	免費
注意事項	「ACLCLP會員」係指「中華民國計算語言學學會」有效會員。報名方式及期間：線上報名：即日起~8/02(星期日)。繳費方式： 1. 郵政劃撥：帳號：19166251，戶名「中華民國計算語言學學會」，報名費請於7/31前劃撥。 2. 線上刷卡：報名費請於8/03前完成刷卡。

聯絡我們

GENERAL CO-CHAIRS

陽明大學生物醫學工程學系
11221 臺北市北投區立農街二段155號

實踐大學資訊科技與通訊學系
84550 高雄市內門區大學路200號

TECHNICAL PROGRAM CO-CHAIRS

中央研究院資訊科技創新研究中心
11529 台北市南港區研究院路二段128號

SESSION CHAIRS

中央研究院資訊科學研究所
02-2788-3799 #1714,1507

中央研究院資訊科技創新研究中心
02-2787-2300 #2787-2390

振興財團法人振興醫院

02-2826-4400

國立清華大學電機系
03-516-2439

台北榮民總醫院耳鼻喉頭頸醫學部

國立台北科技大學電子工程系
02-2771-2171 #2247

元智大學電機工程學系
03-463-8800 #7125

WORKSHOP EMAIL

有任何問題，請利用此郵件信箱或電話

02-2826-7000 #5491

REGISTRATION

中華民國計算語言學學會

02-2788-3799 #1502

中央研究院資訊科技創新研究中心	宇康生科	維膜助聽器	建聲聽覺
泰山電子	易祿達科技

SWS 2020 語音訊號處理研討會

科技部
成果發表

重要時程

時間地點

最新消息

主辦單位

協辦單位

Keynote Speakers
Learn from these great fellows

會議議程

研討會活動手冊

報到

開幕致詞

Deep Learning Approaches to Automatic Assessment of Speech and Language Impairment

Intermission

Towards Superhuman Conversational AI

Intermission

Single- and multi-channel Speech Enhancement System

Lunch

Minimum Acoustic Information Required for
An Intelligible Speech

Intermission

A New Era of Otology and Hearing Research:
NGS, CRISPR, App, AI and Beyond

Intermission

Make a Power-efficient Voice UI
on Edge Devices

Intermission

Ambulatory Phonation Monitoring Using Wireless Microphone Based on Energy Envelope

閉幕

報名資訊

聯絡我們

相關連結

SWS 2020 語音訊號處理研討會

科技部成果發表

重要時程

時間地點

最新消息

主辦單位

協辦單位

Keynote SpeakersLearn from these great fellows

會議議程

研討會活動手冊

報到

開幕致詞

Deep Learning Approaches to Automatic Assessment of Speech and Language Impairment

Intermission

Towards Superhuman Conversational AI

Intermission

Single- and multi-channel Speech Enhancement System

Lunch

Minimum Acoustic Information Required for An Intelligible Speech

Intermission

A New Era of Otology and Hearing Research: NGS, CRISPR, App, AI and Beyond

Intermission

Make a Power-efficient Voice UI on Edge Devices

Intermission

Ambulatory Phonation Monitoring Using Wireless Microphone Based on Energy Envelope

閉幕

報名資訊

聯絡我們

相關連結

贊助單位

科技部
成果發表

Keynote Speakers
Learn from these great fellows

Minimum Acoustic Information Required for
An Intelligible Speech

A New Era of Otology and Hearing Research:
NGS, CRISPR, App, AI and Beyond

Make a Power-efficient Voice UI
on Edge Devices