'음성' 카테고리의 글 목록

Notice

Recent Posts

Recent Comments

Link

« 2025/06 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Tags more

Archives

Today

Total

관리 메뉴

목록음성 (2)

TechNOTE

Mel spectrogram 설명

음성 데이터를 raw data를 그대로 사용하면 파라미터가 너무 많아지기도 하고 데이터 용량이 너무 커지므로 보통 mel spectrogram을 많이 사용한다. 이게 뭔지 제대로 알아보도록 하자! 1. 음성파일 로드 sampling rate 24000 으로 구성된 wav 파일을 로드해 보면 다음과 같다. sampling rate가 24000이라는 말은 1초에 음성 신호를 24000번 sampling 했다는 뜻이다. 2.STFT(Short Time Fourier Transform) 이 데이터에, STFT(Short Time Fourier Transform)를 해 준다. STFT란 뭘까? 그 전에 푸리에변환이 뭔지부터 보자.. 푸리에 변환? (Fourier Transform) www.youtube.com/w..

음성 2020. 11. 20. 12:00

[논문리뷰] 간단한 EATS 리뷰

딥마인드에서 음성 합성 관련해서 새 논문이 나왔다! 바로 END-TO-END ADVERSARIAL TEXT-TO-SPEECH 인데.. 사실 End to End 라는 말보다 (말의 정의가 너무 애매하다) 1 stage speech synthsis 라고 하는게 더 맞지 않나 싶다. 여튼 이번 뉴립스에는 떨어지고 ICLR 에 다시 낸 것 같다. deepmind.com/research/publications/End-to-End-Adversarial-Text-to-Speech End-to-End Adversarial Text-to-Speech Modern text-to-speech synthesis pipelines typically involve multiple processing stages, each of ..

음성 2020. 10. 15. 22:53

Prev 1 Next

목록음성 (2)

TechNOTE

티스토리툴바