1 00:00:00,000 --> 00:00:02,850 [Narrator] - Polly is the opposite of Transcribe. 2 00:00:02,850 --> 00:00:06,030 You turn text into speech using deep learning. 3 00:00:06,030 --> 00:00:08,910 This allows you to create applications that will talk. 4 00:00:08,910 --> 00:00:09,787 For example, it says, 5 00:00:09,787 --> 00:00:11,190 "Hello, my name is Stephane 6 00:00:11,190 --> 00:00:13,380 and this is a demo of Amazon Polly." 7 00:00:13,380 --> 00:00:15,600 And then with Polly, it would generate an audio, 8 00:00:15,600 --> 00:00:17,280 which I can play right here. 9 00:00:17,280 --> 00:00:18,270 Hi. 10 00:00:18,270 --> 00:00:22,050 My name is Stephane and this is a demo of Amazon Polly. 11 00:00:22,050 --> 00:00:23,790 I think I'm better at speaking than that 12 00:00:23,790 --> 00:00:26,070 but (chuckles) this gives you a good demo, right? 13 00:00:26,070 --> 00:00:29,010 And you can play with it on the console. 14 00:00:29,010 --> 00:00:30,630 So Amazon Polly can do more. 15 00:00:30,630 --> 00:00:32,820 It can use Lexicon & SSML 16 00:00:32,820 --> 00:00:36,090 so the first one is to customize the pronunciation 17 00:00:36,090 --> 00:00:39,270 of words with Pronunciation lexicons. 18 00:00:39,270 --> 00:00:42,690 For example, if there is a stylized word such as Stephane 19 00:00:42,690 --> 00:00:45,750 but the E is a 3 and the A is a 4 20 00:00:45,750 --> 00:00:50,040 the Amazon Polly image might say "S-T-3-P-H-4-N-E," 21 00:00:50,040 --> 00:00:52,050 which is not how it should be pronounced; 22 00:00:52,050 --> 00:00:53,370 it should be pronounced Stephane. 23 00:00:53,370 --> 00:00:56,160 And so, therefore, you can create a lexicon for this. 24 00:00:56,160 --> 00:00:58,830 Or for example, for acronyms, for example, any time 25 00:00:58,830 --> 00:01:02,700 it sees AWS, instead of saying "A-W-S" 26 00:01:02,700 --> 00:01:05,580 it should say the full "Amazon Web Services." 27 00:01:05,580 --> 00:01:07,200 So then you upload the lexicons 28 00:01:07,200 --> 00:01:10,830 and you use them in the SynthesizeSpeech operation. 29 00:01:10,830 --> 00:01:12,300 The second feature you need to know about 30 00:01:12,300 --> 00:01:14,520 is the SSML feature, 31 00:01:14,520 --> 00:01:17,760 which is called Speech Synthesis Markup Language. 32 00:01:17,760 --> 00:01:21,690 And this enables more customization to how speech is made. 33 00:01:21,690 --> 00:01:23,010 So you can, for example, 34 00:01:23,010 --> 00:01:26,730 emphasize on specific words or phrases, 35 00:01:26,730 --> 00:01:29,130 or you use phonetic pronunciation, 36 00:01:29,130 --> 00:01:31,800 or you want to include breathing sounds or whispering, 37 00:01:31,800 --> 00:01:34,800 or you want to use the Newscaster speaking style. 38 00:01:34,800 --> 00:01:37,620 So all of it can be used using this Markup Language, 39 00:01:37,620 --> 00:01:41,010 and so instead of generating the speech from plain text 40 00:01:41,010 --> 00:01:44,070 you can include a whisper and it will start whispering, 41 00:01:44,070 --> 00:01:45,660 and so on, okay? 42 00:01:45,660 --> 00:01:49,770 So, remember, for pronunciation of stylized words 43 00:01:49,770 --> 00:01:52,710 or acronyms, use Pronunciation lexicons. 44 00:01:52,710 --> 00:01:55,890 And for more customization 45 00:01:55,890 --> 00:01:59,520 on how words are being pronounced, for example, 46 00:01:59,520 --> 00:02:02,850 whispering or phonetic pronunciation, and so on, 47 00:02:02,850 --> 00:02:05,463 then use the SSML Markup Language. 48 00:02:06,510 --> 00:02:09,330 So if I go into the Amazon Polly service, 49 00:02:09,330 --> 00:02:12,600 this is where I can turn text into lifelike speech. 50 00:02:12,600 --> 00:02:13,890 So we can try it. 51 00:02:13,890 --> 00:02:17,880 So we can use, for example, the neural network, okay. 52 00:02:17,880 --> 00:02:20,490 And this is the most natural and human-like speech possible 53 00:02:20,490 --> 00:02:22,200 and we can choose the voice we want. 54 00:02:22,200 --> 00:02:23,610 So, here's the text. 55 00:02:23,610 --> 00:02:25,990 So I will say, "Hey, my name is Stephane 56 00:02:27,330 --> 00:02:29,880 and I love AWS." 57 00:02:29,880 --> 00:02:30,990 Let's see what happens. 58 00:02:30,990 --> 00:02:33,093 So if we listen to this, it will say: 59 00:02:35,520 --> 00:02:40,020 Hi, my name is Stephane and I love AWS. 60 00:02:40,020 --> 00:02:41,520 So that's pretty cool, right? 61 00:02:41,520 --> 00:02:45,330 And here with SSML, and so, for example, let's add a break, 62 00:02:45,330 --> 00:02:48,510 so I will say, "Hey, my name is Joanna," 63 00:02:48,510 --> 00:02:49,770 and then I open a break. 64 00:02:49,770 --> 00:02:51,600 I say, "Break time equals," 65 00:02:51,600 --> 00:02:54,450 and this is part of the SSML Language, 66 00:02:54,450 --> 00:02:55,540 and then slash 67 00:02:56,670 --> 00:02:57,690 and this. 68 00:02:57,690 --> 00:03:01,110 So I say, "Hey, break this for three seconds." 69 00:03:01,110 --> 00:03:02,730 So it would say: 70 00:03:02,730 --> 00:03:05,400 Hi, my name is Joanna. 71 00:03:05,400 --> 00:03:06,500 Now there's a break. 72 00:03:08,670 --> 00:03:11,010 I will read any text you type here. 73 00:03:11,010 --> 00:03:14,130 And this is how you control the speech itself 74 00:03:14,130 --> 00:03:16,173 using the SSML Markup Language. 75 00:03:17,190 --> 00:03:20,782 And lastly, if we want to say, 76 00:03:20,782 --> 00:03:22,090 "Hey, I love 77 00:03:24,366 --> 00:03:26,700 AWS" right here, so we say, "I love AWS," 78 00:03:26,700 --> 00:03:29,220 and we'll just have one second of break. 79 00:03:29,220 --> 00:03:30,090 Let's listen to this. 80 00:03:30,090 --> 00:03:32,553 Hi, my name is Joanna. 81 00:03:33,930 --> 00:03:35,520 I love AWS. 82 00:03:35,520 --> 00:03:38,160 Okay, what if you want to say, not AWS, 83 00:03:38,160 --> 00:03:41,670 but Amazon Web Services, in which case you would need to go 84 00:03:41,670 --> 00:03:45,540 into additional settings and then customize pronunciation. 85 00:03:45,540 --> 00:03:47,610 And here you would need to apply lexicon 86 00:03:47,610 --> 00:03:52,610 and upload lexicon to convert AWS into Amazon Web Services. 87 00:03:52,620 --> 00:03:55,740 So, trust me, you just need to create a file, 88 00:03:55,740 --> 00:03:57,840 and then upload it, create a lexicon, 89 00:03:57,840 --> 00:04:00,390 and then automatically whatever you set as, for example, 90 00:04:00,390 --> 00:04:02,430 whenever it will find AWS, 91 00:04:02,430 --> 00:04:05,160 it will just say Amazon Web Services. 92 00:04:05,160 --> 00:04:06,960 And that's it for Amazon Polly. 93 00:04:06,960 --> 00:04:08,910 You should know everything there is to know for the exam. 94 00:04:08,910 --> 00:04:09,870 I hope you liked it 95 00:04:09,870 --> 00:04:11,820 and I will see you in the next lecture.