SWS 2020 語音訊號處理研討會

語音訊號處理研討會是中華民國計算語言學學會,一年一度定期舉辦的學術交流盛會,本次會議所邀請之演講者,包括 香港中文大學 李丹教授、中國 南方科技大學 陳霏教授、臺灣大學資訊工程學系 陳縕儂教授、台北榮總醫研部 鄭彥甫醫師、聯發科 Hantao Huang 博士、中央研究院資訊科技創新研究中心 王緒翔博士、亞東紀念醫院 王棨德醫師。演講內容涵蓋語音信號處理、語音技術硬體開發、自然語言處理及語音與醫學的應用等,是所有台灣學術界與產業界對這方面有興趣的專家學者們不容錯過的一場盛會。

除了上述演講外,本次會議同時舉辦國科會研究計畫之成果發表,以促進學術界與產業界技術與經驗的分享交流,並共同討論相關處理技術的新研究與應用方向,因此本次會議預期將可大幅提昇國內數位訊號處理產業技術水準,對工程科技推展效益至鉅。

本屆研討會為因應COVID-19,採線上會議形式舉辦(詳細參加方法),歡迎各界人士踴躍參與!

科技部
成果發表

國立嘉義大學資訊工程學系(所)葉瑞峰 教授 : 應用視覺與聲學標註之深度編碼多模技術於影像描述生成(第1年)

國立台灣科技大學 陳冠宇 教授 : 基於類神經網路之語言模型: 革新, 未來與應用

國立臺灣大學電機工程學系暨研究所 李宏毅 教授 : 邁向非督導式語音理解

國立交通大學電機工程學系(所) 冀泰石 教授 : 基於深層神經網路感知模型的雙耳聽覺場景分析模型(1/3)

中央大學電機工程學系 李龍豪 教授 : 基於腦電圖之小波分析於中風病人癲癇偵測

國立成功大學 吳宗憲 教授 : 銀髮族口語互動式居家陪伴及推薦系統

國立臺北大學通訊工程學系 江振宇 教授 : 利用深度學習技術開發之文字轉語音系統

臺北科技大學電子系 廖元甫 教授 : 基於深度學習之進階語音致能應用開發 - 多語言電視與廣播節目自動文字轉寫、摘要擷取,語料庫建立與內容檢索

中央研究院資訊科技所 王新民 教授 : 語音轉換及其應用

中央研究院資訊科技創新研究中心 曹昱 教授 : 研發新穎的目標函數及模型簡化技術於深度學習之語音增強系統

重要時程

06/19 開始報名繳費

08/02 報名截止

08/03 繳費截止

08/07 SWS 2020!

時間地點

2020 八月七日 星期五
國立陽明大學
線上會議

最新消息

主辦單位

國立陽明大學 生物醫學工程學系

陽明大學 生物醫學工程學系

中華民國計算語言學學會

中華民國計算語言學學會

協辦單位

中央研究院資訊科技創新研究中心

中央研究院資訊科技創新研究中心

實踐大學資訊科技與通訊學系

實踐大學資訊科技與通訊學系

科技部工程技術研究發展司

科技部工程技術研究發展司工程科技推展中心

Keynote Speakers
Learn from these great fellows

李丹 教授
香港中文大學
李丹
副教授, 香港中文大學

Tan Lee is currently an Associate Professor at the Department of Electronic Engineering, the Chinese University of Hong Kong (CUHK). He has been working on speech and language related research for over 20 years. His research covers spoken language technologies, speech enhancement and separation, audio and music processing, speech and language rehabilitation, and neurological basis of speech and language. He led the effort on developing Cantonese-focused spoken language technologies that have been widely licensed for industrial applications. His current work is focused on applying signal processing and machine learning methods to atypical speech and language that are related to different kinds of human communication and cognitive disorders. He is an Associate Editor for the IEEE/ACM Transactions on Audio, Speech, and Language Processing and the EURASIP Journal on Advances in Signal Processing. He is the Vice Chair of ISCA Special Interest Group of Chinese Spoken Language Processing, and an Area Chair in the Technical Programme Committees of INTERSPEECH 2014, 2016 and 2018.

陳縕儂 教授
臺灣大學 資訊工程學系
陳縕儂
助理教授, 臺灣大學 資訊工程學系

Yun-Nung (Vivian) Chen is currently an assistant professor in the Department of Computer Science & Information Engineering at National Taiwan University. She earned her Ph.D. degree from Carnegie Mellon University, where her research interests focus on spoken dialogue systems, language understanding, natural language processing, and multimodality. She received Google Faculty Research Awards, MOST Young Scholar Fellowship, FAOS Young Scholar Innovation Award, Student Best Paper Awards, and the Distinguished Master Thesis Award. Prior to joining National Taiwan University, she worked in the Deep Learning Technology Center at Microsoft Research Redmond.

王緒翔 博士
中央研究院 資訊科技創新研究中心
王緒翔
博士, 中央研究院 資訊科技創新研究中心

Dr. Syu Siang Wang received the Ph.D. degree (2018) in the Graduate Institute of Communication Engineering, National Taiwan University. The topic of his Ph.D. research is on wavelet speech enhancement and feature compression. He won the PhD Thesis Award at ACLCLP. In addition, he gained twice opportunities to be an summer intern in National Institute of Information and Communications Technology, Japan, in Sep. 2015 and Department of Electrical and Electronic Engineering, SUSTC, China in Jun. 2016.

From August 2018 to July 2019, he was the postdoctoral researcher in MOST Joint Research Center for AI Technology and All Vista Healthcare, where he engaged in research on developing algorithm for healthcare applications . Several papers were published based on his research achievements.

Currently, he is the postdoctoral researcher in the Research Center for Information Technology Innovation, Academia Sinica. His research interests include speech and speaker recognition, acoustic modeling, audio-coding, and bio-signal processing.

陳霏 教授
中國 南方科技大學
陳霏
副教授, 中國 南方科技大學

Fei Chen received the B.Sc. and M.Phil. degrees from the Department of Electronic Science and Engineering, Nanjing University in 1998 and 2001, respectively, and the Ph.D. degree from the Department of Electronic Engineering, The Chinese University of Hong Kong in 2005. He continued his research as post-doctor and senior research fellow in University of Texas at Dallas and The University of Hong Kong, and joined Southern University of Science and Technology (SUSTech) as a faculty in 2014. Dr. Chen is leading the speech processing research group in SUSTech, with research focus on speech perception, speech intelligibility modeling, speech enhancement, and assistive hearing technology. He published over 80 journal papers and over 80 conference papers in IEEE journals/conferences, Interspeech, Journal of Acoustical Society of America, etc. He received the best presentation award in the 9th Asia Pacific Conference of Speech, Language and Hearing, and 2011 National Organization for Hearing Research Foundation Research Awards in States. Dr. Chen is now serving as associate editor/editorial member of "Frontiers in Psychology" "Biomedical Signal Processing and Control" "Physiological Measurement".

鄭彥甫 博士
台北榮總 醫研部
鄭彥甫
醫師, 台北榮總 醫研部

Yen-Fu Cheng is a surgeon-scientist at the Department of Medical Research and director of research and attending doctor at the Department of Otolaryngology-Head and Neck Surgery, Taipei Veterans General Hospital. He is also an adjunct assistant professor of Institute of Brain Science/Faculty of Medicine, National Yang-Ming University. He is currently the Principal Investigator of the Laboratory of Auditory Physiology and Genetic Medicine.

Yen-Fu’s research focuses on auditory neuroscience and clinical otology. For basic research, he is dedicated in applying cutting edge gene transfer and gene editing methods to understand and develop therapy for inner ear disorders. For clinical research, he is interested in using state-of-the-art methods to approach clinical otology issues, such as next-generation sequencing for genetic medicine and artificial-intelligence for hearing-related diseases.

Yen-Fu received his medical degree from Taipei Medical University, and doctoral degree from Massachusetts Institute of Technology, where he studied Speech and Hearing Bioscience and Technology at the Harvard-MIT Division of Health Sciences and Technology. He was a post-doctoral research fellow at Harvard Medical School prior he started his lab at VGH-TPE/NYMU.

Hantao Huang 博士
聯發科
Hantao Huang
博士, 聯發科

Hantao Huang (S’14) received the B.S. and Ph.D. degrees from the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, in 2013 and 2018, respectively. Since 2018, he has been a Staff Engineer with MediaTek, Singapore, where he is involved in natural language processing algorithms, neural network compression and quantization for edge devices. His current research interests include speech recognition, machine-learning algorithms, and low power systems.

王棨德 博士
亞東紀念醫院 耳鼻喉科
王棨德
亞東紀念醫院 耳鼻喉科主治醫師/副教授

Dr. Chi-Te Wang received his MD degree from the National Taiwan University, Taipei, Taiwan, in 2003. After resident training from 2003 to 2008, he joined Far Eastern Memorial Hospital as an attending physician. He received PhD degree from the Institute of Epidemiology and Preventive Medicine at National Taiwan University in 2014. During his professional carrier, he visited Mount Sinai Hospital (NYC, 2009), Mayo Clinic (Arizona, 2012), Isshiki voice center (Kyoto, 2015), UC Davis voice and swallow center (Sacramento, 2018), and UCSF voice and swallow center (San Francisco, 2018) for continual exposure on the expertise practice. He is a corresponding member of the American Laryngological Society and member of councils on the Taiwan Otolaryngological Society and Taiwan Voice Society. He has a wide clinical and academic interest, and has published a dozen papers on different fields, including phonosurgery, automatic detection and classification of voice disorders, real time monitoring of phonation, and telepractice. He is the inventor of multiple international patents on voice detection, classification, and treatments. He co-hosted Big Data Cup Challenge on 2018 and 2019 IEEE International Conference on Big Data. He is the winner of Society for Promotion of International Oto-Rhino-Laryngology (SPIO) Award on 2015, Best Synergy Award of Far Eastern Group on 2018, and National Innovation Award of Taiwan in 2019.

會議議程

Time Topic Speaker Host
08:30 - 09:00 報到
09:00 - 09:10 開幕致詞 郭旭崧 校長 -
09:10 - 09:50 Deep Learning Approaches to Automatic Assessment of Speech and Language Impairment  李丹 教授 王新民 教授
09:50 - 10:00 Intermission
10:00 - 10:40 Towards Superhuman Conversational AI  陳縕儂 教授 曹昱 教授
10:40 - 10:50 Intermission
16:00 - 16:40 Single- and Multi-channel Speech Enhancement System  王緒翔 博士 方士豪 教授
11:30 - 13:30 MOST Outcomes Presentation
13:30 - 14:10 Minimum Acoustic Information Required for an Intelligible Speech  陳霏 教授 李祈均 教授
14:10 - 14:20 Intermission
14:20 - 15:00 A New Era of Otology and Hearing Research: NGS, CRISPR, App, AI and Beyond   鄭彥甫 博士 廖文輝 博士
15:00 - 15:10 Intermission
15:10 - 15:50 Make a Power-efficient Voice UI
on Edge Devices 
Hantao Huang 博士 廖元甫 教授
15:50 - 16:00 Intermission
10:50 - 11:30 Ambulatory Phonation Monitoring Using Wireless Microphone Based on Energy Envelope  王棨德 博士 力博宏 博士
16:40 - 17:00 閉幕 賴穎暉 教授 王坤卿 教授 -

開幕致詞

郭旭崧 校長

Deep Learning Approaches to Automatic Assessment of Speech and Language Impairment

Tan Lee 教授

Speech is a natural and preferred means of expressing one’s thoughts and emotions for communication purpose. Speech and language impairments are negatively impacting the daily life of a large population worldwide. Speech impairments are manifested in the aspects of atypical articulation and phonation, while language impairments could be present across multiple linguistic levels in the use of spoken or written language. Timely and reliable assessment on the type and severity of impairment is crucial to effective treatment and rehabilitation. Conventionally speech assessment is carried out by professional speech and language pathologists (SLPs). In view of the shortage of qualified SLPs with relevant linguistic and cultural background, objective assessment techniques based on acoustical signal analysis and machine learning models are expected to play an increasingly important role in assisting clinical assessment. This presentation will cover a series of our recent studies on applying deep learning models to automatic assessment of different types of speech and language impairments. The types of impairments that we have tackled include voice disorder in adults, phonology and articulation disorder in children, and neurological disorder in elderly people. All of our works are focused on spoken Cantonese. The use of Siamese network and auto-encoder model has been investigated to address the challenges related to the scarcity of training speech and the absence of reliable labels. The findings in attempting the end-to-end approach to speech assessment will also be shared.

HOST: 王新民 教授

Intermission

Towards Superhuman Conversational AI

陳縕儂 教授

Even conversational systems have attracted a lot of attention recently, the current systems sometimes fail due to the errors from different components. This talk presents potential directions for improvement: 1) we first focus on learning language embeddings specifically for practical scenarios for better robustness, and 2) secondly we propose a novel learning framework for natural language understanding and generation on top of duality for better scalability. Both directions enhance the robustness and scalability of conversational systems, showing the potential of guiding future research areas.

HOST: 曹昱 教授

Intermission

Single- and multi-channel Speech Enhancement System

王緒翔 博士

Real-world environments are always contain stationary and/or time-varying noises that are received together with speech signals by recording devices. The received noises inevitably degrade the performance of human--human and human--machine interfaces, and this issue has attracted significant attention over the years. To address this issue, an important front-end speech process, namely speech enhancement, which extracts clean components from noisy input, can improve the voice quality and intelligibility of noise-deteriorated clean speech. These speech-enhancement systems can be split into two categories in terms of the physical configurations: single- and multi-channels. For single-channel-based speech enhancement systems, the speech waveform was recorded essentially from an microphone, and then enhanced through the enhancement system, which is derived based on the temporal information of the input. Multiple microphones are used to record the input speech in a multi-channel-based speech enhancement system. The system is designed by simultaneously exploiting the spatial diversity and temporal structures of received signals. In this talk, we present our recent research achievements using machine learning and signal processing on improving speech perception abilities for both configurations.

HOST: 方士豪 教授

Minimum Acoustic Information Required for
An Intelligible Speech

Fei Chen 教授

Speech signal carries a lot of redundant information for speech understanding, and many studies have showed that the loss of some acoustic information did not significantly affect speech intelligibility if important acoustic information was preserved. Due to their hearing loss, hearing-impaired listeners are unable to recognize some acoustic information (e.g., temporal fine structure). Hence, studying the important acoustic information minimally required for an intelligible speech in different listening environments could guide our design of novel assistive hearing technologies. In this talk, I will first introduce early work on the relative importance of commonly-used acoustic cues for speech intelligibility, particularly on a vocoder model for speech intelligibility. Then, I will present recent studies towards reconstructing an intelligible speech with cortical EEG signals, including Mandarin tone imagery and speech reconstruction.

HOST: 李祈均 教授

Intermission

A New Era of Otology and Hearing Research:
NGS, CRISPR, App, AI and Beyond

鄭彥甫 博士

The fields of clinical otology and hearing research are advancing at the forefront of innovation in medicine and technology. Promising progress in genetic medicine and digital technology have started to change the traditional medical and hearing research. Next-generation sequencing, novel gene therapy vectors, CRISPR-Cas9 gene editing technologies, mobile-phone apps and artificial intelligence all generates enormous creative energy. In this talk, I will introduce how these revolutionary technologies change physician’s practice and research.

HOST: 廖文輝 博士

Intermission

Make a Power-efficient Voice UI
on Edge Devices

Hantao Huang 博士

As privacy is getting more and more concerned, voice user interface (UI) is in the process of transition from the cloud to the edge device. However, to land a neural network based voice/language model on edge devices with efficient power consumption is very challenging. In this talk, we will first introduce MediaTek NeuroPilot from the platform level to tackle this challenge. Then, more specifically, we investigate it from the algorithm perspective including the algorithm trend and landing opportunity. Finally, we show some preliminary results on speech recognitions and natural language understanding.

HOST: 廖元甫 教授

Intermission

Ambulatory Phonation Monitoring Using Wireless Microphone Based on Energy Envelope

王棨德 醫師

Voice disorders mainly result from chronic overuse or abuse, particularly for teachers or other occupational voice users. Previous studies have proposed a contact microphone attached to the anterior neck for ambulatory voice monitoring; however, the inconvenience associated with taping and wiring, and the lack of real-time processing has limited its daily application.

Starting from 2015, we founded a research group collaborating with experts from National Yang-Ming University, Yuan Ze University and Far Eastern Memorial Hospital. We proposed an system using wireless microphone for real-time ambulatory voice monitoring. We invited 10 teachers to participate in the pilot study. We designed an adaptive threshold (AT) function to detect the presence of speech based on energy envelope. All the participant wore a wireless microphone during a teaching class (around 40-60 minutes), in quite classroom (background noise < 55dB SPL). We developed a software for manually labeling speech segments according to the time and frequency domains. We randomly selected 25 utterance (10 s each) from the recorded audio files for calculating the coefficients for AT function via genetic algorithm. Another five random utterances were used for testing the accuracy of ASD system, using manually labeled data as the ground truth. We measured phonation ratio (speech frames / total frames) and the length of speech segments as a proxy of phonation habits of the users. We also mimicked scenarios of noisy backgrounds by manually mixing 4 different types of noise into the original recordings. Adjuvant noise reduction function using Log MMSE algorithm was applied to counteract the influence of detection accuracy.

The study results exhibited detection accuracy (for speech) ranging from 81% to 94%. Subsequent analyses revealed a phonation ratio between 50% and 78%, with most phonation segments less than 10 s. Although the presence of background noise reduced the accuracy of the ASD system (25% to 79%), adjuvant noise reduction function can effectively improve the accuracy for up to 45.8%, especially under stable noise (e.g. white noise).

This study demonstrated a good detection accuracy of the proposed system. Preliminary results of phonation ratio and speech segments were all comparable to those of previous research. Although wireless microphone was susceptible to background noise, additional noise reduction function can overcome this limitation. These results indicate that the proposed system can be applied to ambulatory voice monitoring for occupational voice users.

HOST: 力博宏 博士

閉幕

賴穎暉 教授 王坤卿 教授

報名資訊

立即報名


身份別 ACLCLP
會員
非會員
一般人士 NT$200 NT$300
學生 免費 NT$100
贊助單位 免費
注意事項
  • 「ACLCLP會員」係指「中華民國計算語言學學會」有效會員。
  • 報名方式及期間:
  • 線上報名:即日起~8/02(星期日)。

  • 繳費方式:
  • 1. 郵政劃撥:
    帳號:19166251,戶名「中華民國計算語言學學會」,報名費請於7/31前劃撥。

    2. 線上刷卡:報名費請於8/03前完成刷卡。

聯絡我們

GENERAL CO-CHAIRS
賴穎暉 教授
yh.lai@gm.ym.edu.tw

陽明大學生物醫學工程學系
11221 臺北市北投區立農街二段155號

王坤卿 教授
kunching@g2.usc.edu.tw

實踐大學資訊科技與通訊學系
84550 高雄市內門區大學路200號

TECHNICAL PROGRAM CO-CHAIRS
曹昱 教授
yu.tsao@citi.sinica.edu.tw

中央研究院 資訊科技創新研究中心
11529 台北市南港區研究院路二段128號

SESSION CHAIRS
王新民 教授
whm@iis.sinica.edu.tw

中央研究院 資訊科學研究所
02-2788-3799 #1714,1507

曹昱 教授
yu.tsao@citi.sinica.edu.tw

中央研究院 資訊科技創新研究中心
02-2787-2300 #2787-2390

力博宏 醫師

振興財團法人 振興醫院

02-2826-4400

李祈均 教授
cclee@ee.nthu.edu.tw

國立清華大學 電機系
03-516-2439

廖文輝 醫師
ent@vghtpe.gov.tw

台北榮民總醫院 耳鼻喉頭頸醫學部

廖元甫 教授
yfliao@ntut.edu.tw

國立台北科技大學 電子工程系
02-2771-2171 #2247

方士豪 教授
shfang@saturn.yzu.edu.tw

元智大學 電機工程學系
03-463-8800 #7125

WORKSHOP EMAIL
大會信箱
sws2020.ymspeech@gmail.com

有任何問題,請利用此郵件信箱或電話

02-2826-7000 #5491

REGISTRATION
黃琪 小姐
aclclp@hp.iis.sinica.edu.tw

中華民國計算語言學學會

02-2788-3799 #1502

© 語音與生理訊號處理實驗室