1
00:00:00,05 --> 00:00:03,00
- [Instructor] I'm going to show you a demo

2
00:00:03,00 --> 00:00:06,06
of a machine learning audio classification model.

3
00:00:06,06 --> 00:00:08,00
And we are going to use this platform

4
00:00:08,00 --> 00:00:09,06
called Teachable Machine.

5
00:00:09,06 --> 00:00:13,07
So let's pick audio project.

6
00:00:13,07 --> 00:00:16,08
So a machine learning model

7
00:00:16,08 --> 00:00:21,02
is basically classifying different set of data.

8
00:00:21,02 --> 00:00:24,09
So we are going to focus on an AI machine learning model

9
00:00:24,09 --> 00:00:27,05
of sound, voice, or audio.

10
00:00:27,05 --> 00:00:30,07
And Teachable Machine that I'm using here is from Google.

11
00:00:30,07 --> 00:00:32,08
It's free and experimental part of Google.

12
00:00:32,08 --> 00:00:34,05
So you don't need to sign into Google

13
00:00:34,05 --> 00:00:37,04
and they're not collecting any data.

14
00:00:37,04 --> 00:00:41,07
So we're going to do a live training of an Edge AI model.

15
00:00:41,07 --> 00:00:43,02
Right now when we do training,

16
00:00:43,02 --> 00:00:45,05
it is going to be on a computer.

17
00:00:45,05 --> 00:00:48,05
It can be hosted on an Edge device

18
00:00:48,05 --> 00:00:50,04
or it can be hosted on the cloud.

19
00:00:50,04 --> 00:00:54,05
So when it sits as inference model on an Edge device,

20
00:00:54,05 --> 00:00:55,04
it becomes Edge AI.

21
00:00:55,04 --> 00:01:00,02
So we're going to classify different sounds.

22
00:01:00,02 --> 00:01:04,04
So sound is nothing but digitized data

23
00:01:04,04 --> 00:01:06,03
of different frequencies.

24
00:01:06,03 --> 00:01:08,04
Let's organize and group different sounds

25
00:01:08,04 --> 00:01:12,09
and build a model to recognize one particular sound.

26
00:01:12,09 --> 00:01:15,09
I want you to think about your environment,

27
00:01:15,09 --> 00:01:18,03
be it a factory or a home setting.

28
00:01:18,03 --> 00:01:20,07
Sound is everywhere.

29
00:01:20,07 --> 00:01:24,00
How can you use it to create a useful use case?

30
00:01:24,00 --> 00:01:28,04
Say water is dripping and a pipe is leaking

31
00:01:28,04 --> 00:01:30,08
or water is overflowing in a dam.

32
00:01:30,08 --> 00:01:35,01
Dripping versus overflowing water are different sounds.

33
00:01:35,01 --> 00:01:37,00
A squeaky floor in construction

34
00:01:37,00 --> 00:01:38,09
would make a totally different sound.

35
00:01:38,09 --> 00:01:41,05
So think about the sound in your environment

36
00:01:41,05 --> 00:01:43,06
and what do you want to separate out?

37
00:01:43,06 --> 00:01:45,07
In this demo, we want to build a model

38
00:01:45,07 --> 00:01:48,03
to identify, alert you,

39
00:01:48,03 --> 00:01:50,05
or predict something that's going to break

40
00:01:50,05 --> 00:01:53,01
based on identifying a sound.

41
00:01:53,01 --> 00:01:55,00
So materials could be delivered

42
00:01:55,00 --> 00:01:56,08
by an autonomous mobility robot

43
00:01:56,08 --> 00:01:59,02
because it's filled a bin or a pallet

44
00:01:59,02 --> 00:02:00,09
and you may want to stop it.

45
00:02:00,09 --> 00:02:02,09
That's a different kind of sound.

46
00:02:02,09 --> 00:02:06,02
So you could automate this or notify someone.

47
00:02:06,02 --> 00:02:08,00
Those are all product features.

48
00:02:08,00 --> 00:02:09,06
And these are all my examples.

49
00:02:09,06 --> 00:02:10,09
Later, we will do a challenge

50
00:02:10,09 --> 00:02:14,07
and you get to build on it with your own examples.

51
00:02:14,07 --> 00:02:17,01
So coming back to machine learning,

52
00:02:17,01 --> 00:02:20,02
machine learning is about organizing datasets

53
00:02:20,02 --> 00:02:22,09
into multiple groups or classes.

54
00:02:22,09 --> 00:02:28,03
So here, we have one class called background noise.

55
00:02:28,03 --> 00:02:31,07
So all I'm going to do is click on mic.

56
00:02:31,07 --> 00:02:34,02
I've already given permission to use my mic.

57
00:02:34,02 --> 00:02:35,04
So I'm going to be quiet,

58
00:02:35,04 --> 00:02:38,09
so my background is going to be recorded for 20 seconds.

59
00:02:38,09 --> 00:02:42,09
So then I just press extract sample

60
00:02:42,09 --> 00:02:46,05
and it says teachable machine wants 20 minimum samples.

61
00:02:46,05 --> 00:02:49,00
So it's extracted the samples for me.

62
00:02:49,00 --> 00:02:51,04
So I'm done.

63
00:02:51,04 --> 00:02:54,09
So next, I'm going to create a class called bell.

64
00:02:54,09 --> 00:02:57,04
See, here what we are going to do,

65
00:02:57,04 --> 00:02:59,08
the class can be called any name, it's just a name,

66
00:02:59,08 --> 00:03:02,02
but it's a group of one type of data

67
00:03:02,02 --> 00:03:04,01
that I'm going to collect.

68
00:03:04,01 --> 00:03:07,03
What I want to do is live training with you.

69
00:03:07,03 --> 00:03:09,06
So I have actually brought a real bell.

70
00:03:09,06 --> 00:03:11,09
Do you hear that?

71
00:03:11,09 --> 00:03:13,00
Yeah.

72
00:03:13,00 --> 00:03:17,00
So you have a choice to click upload

73
00:03:17,00 --> 00:03:21,03
and you could have your dataset of some sound that you want

74
00:03:21,03 --> 00:03:23,08
to be classified and identified.

75
00:03:23,08 --> 00:03:26,03
You could very well use that.

76
00:03:26,03 --> 00:03:30,05
Or you can click on mic and it will record,

77
00:03:30,05 --> 00:03:32,07
just like we did with the background sound,

78
00:03:32,07 --> 00:03:36,03
it can record a sound that you're going to produce.

79
00:03:36,03 --> 00:03:40,09
So I have brought a bell with me and live in this demo,

80
00:03:40,09 --> 00:03:43,06
I'm going to generate eight minimum samples,

81
00:03:43,06 --> 00:03:47,00
we can do more, of the bell sound.

82
00:03:47,00 --> 00:03:48,06
So think of what do you have.

83
00:03:48,06 --> 00:03:51,03
I could have brought my ukulele here,

84
00:03:51,03 --> 00:03:53,08
it won't fit in this room for the demo,

85
00:03:53,08 --> 00:03:56,06
but you can pretty much bring anything.

86
00:03:56,06 --> 00:03:59,05
Think of an object around you, what noise you're looking at,

87
00:03:59,05 --> 00:04:00,09
or you look at your work and say,

88
00:04:00,09 --> 00:04:03,00
I want to capture this noise

89
00:04:03,00 --> 00:04:05,06
and record that sound, create a dataset

90
00:04:05,06 --> 00:04:07,05
and bring it and upload it.

91
00:04:07,05 --> 00:04:13,08
Okay, so let's do the live collection of the bell sound.

92
00:04:13,08 --> 00:04:14,09
Here we go.

93
00:04:14,09 --> 00:04:17,08
And it's saying that it will record

94
00:04:17,08 --> 00:04:19,06
only two seconds at a time.

95
00:04:19,06 --> 00:04:22,05
So I'm going to do that multiple times.

96
00:04:22,05 --> 00:04:25,06
So right when I'm clicking record,

97
00:04:25,06 --> 00:04:29,01
I'm going to make the bell sound.

98
00:04:29,01 --> 00:04:31,04
(bell ringing)

99
00:04:31,04 --> 00:04:35,06
Okay, and extract two samples and it says eight minimum,

100
00:04:35,06 --> 00:04:38,07
so I'm going to do that more times.

101
00:04:38,07 --> 00:04:41,03
(bell ringing)

102
00:04:41,03 --> 00:04:43,05
Extract sample.

103
00:04:43,05 --> 00:04:47,02
(bell ringing)

104
00:04:47,02 --> 00:04:48,03
Here we go.

105
00:04:48,03 --> 00:04:52,03
(bell ringing)

106
00:04:52,03 --> 00:04:53,07
One more time.

107
00:04:53,07 --> 00:04:56,06
(bell ringing)

108
00:04:56,06 --> 00:04:59,01
I just want to make 20 samples,

109
00:04:59,01 --> 00:05:01,06
so I'm going to do one more time.

110
00:05:01,06 --> 00:05:04,00
(bell ringing)

111
00:05:04,00 --> 00:05:05,02
Here we go.

112
00:05:05,02 --> 00:05:08,01
So we have about 10 samples of the bell

113
00:05:08,01 --> 00:05:10,08
and we're ready with two classes.

114
00:05:10,08 --> 00:05:14,07
So machine learning is about classifying

115
00:05:14,07 --> 00:05:18,00
multiple sounds in different buckets called classes.

116
00:05:18,00 --> 00:05:19,02
And that's what we've done.

117
00:05:19,02 --> 00:05:22,04
If you want, we can add a third class and a fourth class

118
00:05:22,04 --> 00:05:25,02
and test the model, right?

119
00:05:25,02 --> 00:05:29,01
So we're just going to click on train.

120
00:05:29,01 --> 00:05:31,08
I want to show you one thing before we do train.

121
00:05:31,08 --> 00:05:36,07
Here, the defaults are set and epoch is a number.

122
00:05:36,07 --> 00:05:39,04
By default, it's best practice heuristics

123
00:05:39,04 --> 00:05:42,00
to have 50 as the epoch count.

124
00:05:42,00 --> 00:05:43,03
So I'm going to leave it at that.

125
00:05:43,03 --> 00:05:46,06
Epoch is basically the number of times

126
00:05:46,06 --> 00:05:49,07
each data in here, each sound in here.

127
00:05:49,07 --> 00:05:51,06
Can you see how beautifully it is digitized

128
00:05:51,06 --> 00:05:54,04
and you can see the digital sound difference?

129
00:05:54,04 --> 00:05:58,08
Each of this, how many times is it scanned

130
00:05:58,08 --> 00:06:01,01
for learning by the machine learning model

131
00:06:01,01 --> 00:06:03,09
is the epoch count, okay?

132
00:06:03,09 --> 00:06:05,07
So I'm going to leave it at that.

133
00:06:05,07 --> 00:06:08,02
I'm just going to say train the model.

134
00:06:08,02 --> 00:06:10,08
And it says it's preparing the training data.

135
00:06:10,08 --> 00:06:11,08
It's very less data.

136
00:06:11,08 --> 00:06:14,06
It should do that very, very quickly.

137
00:06:14,06 --> 00:06:18,02
And did the FFT box.

138
00:06:18,02 --> 00:06:19,08
So what do you think is happening

139
00:06:19,08 --> 00:06:21,09
right here on the right side?

140
00:06:21,09 --> 00:06:26,06
So this is where the output of this model is produced.

141
00:06:26,06 --> 00:06:29,00
I have left the microphone on.

142
00:06:29,00 --> 00:06:32,08
So as I'm talking, it is capturing sound

143
00:06:32,08 --> 00:06:33,09
and it's running inference and say,

144
00:06:33,09 --> 00:06:36,04
I have a train model, let me test it.

145
00:06:36,04 --> 00:06:39,04
So it is testing my sound

146
00:06:39,04 --> 00:06:42,09
and it's trying to place it as background or bell.

147
00:06:42,09 --> 00:06:44,05
Why is it?

148
00:06:44,05 --> 00:06:46,04
What's going on here?

149
00:06:46,04 --> 00:06:50,04
So if I'm quiet,

150
00:06:50,04 --> 00:06:54,04
okay, 99% confidence it's background noise.

151
00:06:54,04 --> 00:06:57,05
And if I ring a bell, (bell ringing)

152
00:06:57,05 --> 00:07:00,03
Ooh, 100% confidence it's bell.

153
00:07:00,03 --> 00:07:03,00
So it's able to recognize the bell and background sounds.

154
00:07:03,00 --> 00:07:07,09
Those are in its learning, but every AI is narrow AI,

155
00:07:07,09 --> 00:07:11,01
which means it knows only what it's been trained

156
00:07:11,01 --> 00:07:12,06
in its training data.

157
00:07:12,06 --> 00:07:15,00
So since I've given only background sound and bell sound,

158
00:07:15,00 --> 00:07:16,02
it knows only that.

159
00:07:16,02 --> 00:07:18,06
So when I'm talking, it hears my voice

160
00:07:18,06 --> 00:07:22,07
and it is thinking, is this background or is this bell?

161
00:07:22,07 --> 00:07:24,01
And it is trying to fit that

162
00:07:24,01 --> 00:07:26,09
into the model's learning brain.

163
00:07:26,09 --> 00:07:29,04
So that's what is happening.

164
00:07:29,04 --> 00:07:32,09
So we have collected our data.

165
00:07:32,09 --> 00:07:35,01
We have organized them in classes.

166
00:07:35,01 --> 00:07:41,08
We have trained the model and we have now tested that.

167
00:07:41,08 --> 00:07:45,08
So one thing to remember is since it's recognizing my voice

168
00:07:45,08 --> 00:07:50,01
and as one of the two classes it has,

169
00:07:50,01 --> 00:07:52,04
you'll have to think about this in work.

170
00:07:52,04 --> 00:07:56,02
If you want a particular sound to be captured,

171
00:07:56,02 --> 00:07:59,01
like many examples that I gave you earlier,

172
00:07:59,01 --> 00:08:01,06
you might want to create an extra class

173
00:08:01,06 --> 00:08:03,04
of other things that happens.

174
00:08:03,04 --> 00:08:06,04
If a person would be walking, that walking sound,

175
00:08:06,04 --> 00:08:08,01
or there will be a train whistle

176
00:08:08,01 --> 00:08:09,06
that happens at certain time.

177
00:08:09,06 --> 00:08:12,01
If there is going to be other sounds

178
00:08:12,01 --> 00:08:14,08
from which you want to isolate the sound,

179
00:08:14,08 --> 00:08:18,09
you might want to train the model in a lot of other sounds,

180
00:08:18,09 --> 00:08:21,02
the background disturbances and people walking

181
00:08:21,02 --> 00:08:22,05
and that kind of thing

182
00:08:22,05 --> 00:08:25,01
so that it makes your model more accurate

183
00:08:25,01 --> 00:08:27,03
when you actually give your sample.

184
00:08:27,03 --> 00:08:30,04
It's not trying to force fit into that one particular bucket

185
00:08:30,04 --> 00:08:32,07
because now it's learned other things

186
00:08:32,07 --> 00:08:34,09
and it will say, no, no, this is a person walking.

187
00:08:34,09 --> 00:08:36,01
This is the train whistle.

188
00:08:36,01 --> 00:08:38,00
It's not what I'm looking for.

189
00:08:38,00 --> 00:08:40,07
So you can create a lot of different classes

190
00:08:40,07 --> 00:08:44,01
simply by doing add a class.

191
00:08:44,01 --> 00:08:46,04
And the more data that you give,

192
00:08:46,04 --> 00:08:49,07
the model becomes more accurate, okay?

193
00:08:49,07 --> 00:08:51,05
So the next step, the final step

194
00:08:51,05 --> 00:08:52,08
is we're going to export the model.

195
00:08:52,08 --> 00:08:55,00
So we train the model.

196
00:08:55,00 --> 00:08:56,06
So typically at work,

197
00:08:56,06 --> 00:08:59,03
one person will not be doing all of this.

198
00:08:59,03 --> 00:09:01,04
I want you to learn Edge AI.

199
00:09:01,04 --> 00:09:03,02
I want you to learn the different models.

200
00:09:03,02 --> 00:09:05,03
I want you to learn training and inference

201
00:09:05,03 --> 00:09:07,02
and think about the product and everything

202
00:09:07,02 --> 00:09:09,04
so that you get a very solid foundation

203
00:09:09,04 --> 00:09:11,01
in Edge AI in this course.

204
00:09:11,01 --> 00:09:12,06
So when we export the model,

205
00:09:12,06 --> 00:09:14,01
what we are doing is we're just trying

206
00:09:14,01 --> 00:09:15,09
to use the model we created.

207
00:09:15,09 --> 00:09:18,05
So when we're training, it's training model.

208
00:09:18,05 --> 00:09:19,08
Then we tested it.

209
00:09:19,08 --> 00:09:22,03
And then now we're actually using this

210
00:09:22,03 --> 00:09:23,04
as an inference model.

211
00:09:23,04 --> 00:09:26,04
Inference model is nothing but a ready model

212
00:09:26,04 --> 00:09:28,06
ready for production, right?

213
00:09:28,06 --> 00:09:32,00
So we could upload the model.

214
00:09:32,00 --> 00:09:33,09
So what's happening when we upload the model,

215
00:09:33,09 --> 00:09:35,08
it is actually sent to the cloud

216
00:09:35,08 --> 00:09:38,05
and Teachable Machine actually hosts this somewhere

217
00:09:38,05 --> 00:09:39,08
in this URL.

218
00:09:39,08 --> 00:09:41,01
You can copy this.

219
00:09:41,01 --> 00:09:42,04
What do you think you can do with this?

220
00:09:42,04 --> 00:09:43,08
You train the model.

221
00:09:43,08 --> 00:09:46,01
You want it to recognize the bell sound

222
00:09:46,01 --> 00:09:50,02
or the bin sound or a dripping water sound.

223
00:09:50,02 --> 00:09:52,09
And you could actually put this on the cloud

224
00:09:52,09 --> 00:09:55,05
and share this link with somebody

225
00:09:55,05 --> 00:09:57,09
who is taking this to your customer

226
00:09:57,09 --> 00:09:59,07
and say, hey, let's test this out

227
00:09:59,07 --> 00:10:01,02
with real sound in the field.

228
00:10:01,02 --> 00:10:04,01
You could do that remotely in a cloud setting

229
00:10:04,01 --> 00:10:06,02
because you're going to share this link.

230
00:10:06,02 --> 00:10:08,09
So that's one thing you can do by putting it in the cloud.

231
00:10:08,09 --> 00:10:13,06
Or you could actually click download and download this model

232
00:10:13,06 --> 00:10:15,02
and then put it in a mobile app

233
00:10:15,02 --> 00:10:17,04
or somewhere else where you want it to integrate

234
00:10:17,04 --> 00:10:18,08
into your workflow.

235
00:10:18,08 --> 00:10:20,07
And they also give you the source code.

236
00:10:20,07 --> 00:10:24,03
You can copy that or you have your data scientists copy that

237
00:10:24,03 --> 00:10:26,05
and then build on this

238
00:10:26,05 --> 00:10:29,08
or integrate into other products that you're using

239
00:10:29,08 --> 00:10:31,09
to create the right kind of experience you want.

240
00:10:31,09 --> 00:10:36,03
So when this AI model recognizes a sound,

241
00:10:36,03 --> 00:10:38,09
what kind of automation do you want it to do?

242
00:10:38,09 --> 00:10:40,05
What kind of things do you want to do?

243
00:10:40,05 --> 00:10:42,08
So I'm excited to see what you're going to do

244
00:10:42,08 --> 00:10:45,03
with the audio classification on an Edge device.

245
00:10:45,03 --> 00:10:47,05
The minute you put it on an Edge device,

246
00:10:47,05 --> 00:10:50,00
it becomes Edge AI running inference.

247
00:10:50,00 --> 00:10:50,08
Just remember that.

248
00:10:50,08 --> 00:10:54,07
Rest of it, we trained a beautiful model and we are ready.

249
00:10:54,07 --> 00:10:57,00
So let me close this window.

250
00:10:57,00 --> 00:10:59,01
Next, I'm going to give you a challenge

251
00:10:59,01 --> 00:11:02,04
and you're going to be able to do this for yourself.

252
00:11:02,04 --> 00:11:04,04
There's also a handout with the same thing.

253
00:11:04,04 --> 00:11:06,03
So with the same steps,

254
00:11:06,03 --> 00:11:09,06
so you can actually go learn the steps to do this

255
00:11:09,06 --> 00:11:11,02
and practice more.

256
00:11:11,02 --> 00:11:14,00
And I'm more excited to see

257
00:11:14,00 --> 00:11:16,05
what kind of things you're going to do at work.

258
00:11:16,05 --> 00:11:18,04
So at the end of this course,

259
00:11:18,04 --> 00:11:20,04
like when you scroll down at the bottom of the course,

260
00:11:20,04 --> 00:11:23,06
when you finish, you will see an option saying Sudha Live

261
00:11:23,06 --> 00:11:27,01
because I do a live stream on LinkedIn Live

262
00:11:27,01 --> 00:11:29,05
once a month and talk to my students.

263
00:11:29,05 --> 00:11:30,09
So you can come there,

264
00:11:30,09 --> 00:11:32,09
share the different challenges you face,

265
00:11:32,09 --> 00:11:34,01
the solutions you've built,

266
00:11:34,01 --> 00:11:36,09
ask questions or just continue your learning.

267
00:11:36,09 --> 00:11:39,09
And you can also join my Business School of AI

268
00:11:39,09 --> 00:11:41,00
learner community

269
00:11:41,00 --> 00:11:43,02
where you can just come and tag or ask questions

270
00:11:43,02 --> 00:11:45,05
or the easiest thing, you can even post a question

271
00:11:45,05 --> 00:11:49,00
in the question tab of LinkedIn Learning app.