1
00:00:00,180 --> 00:00:07,410
Hello, all before going ahead in this session, let's have a quick recap of what we have done to now

2
00:00:07,410 --> 00:00:14,400
in this project from importing data, both humming sentiment analysis with respect to positive sentences,

3
00:00:14,400 --> 00:00:21,270
as well as negative sentencing, doing lots of data cleaning, doing lots of preprocessing this word

4
00:00:21,270 --> 00:00:24,150
cloud visual and different different things.

5
00:00:24,400 --> 00:00:29,190
We have also analyzed to what type of user Amazon is going to recommend more product.

6
00:00:29,640 --> 00:00:36,600
And at the end, we have this beautiful box plot, which is exactly a distribution of the land of the

7
00:00:36,600 --> 00:00:38,190
customer feedback.

8
00:00:38,580 --> 00:00:46,260
Then we have this counterplan of my core feature and in this particular session we have this assignment

9
00:00:46,260 --> 00:00:50,640
in which I have to analyze what exactly the behavior of the customers.

10
00:00:51,030 --> 00:00:59,190
Because if in this final exam would consider finally the frame for this analysis in this final, you

11
00:00:59,190 --> 00:01:00,180
have a column.

12
00:01:01,780 --> 00:01:08,230
Which is exactly this text, so using this feature, using this feature, you can definitely come up

13
00:01:08,230 --> 00:01:12,290
with this conclusion what exactly the behavior of our customers.

14
00:01:12,610 --> 00:01:14,920
So very first, what we have to do next.

15
00:01:15,220 --> 00:01:16,320
Let me show you a thing.

16
00:01:16,330 --> 00:01:18,250
Save on this final.

17
00:01:18,250 --> 00:01:20,650
I have to exit this text.

18
00:01:20,740 --> 00:01:27,610
And if I'm going to access the Siedel, so you will see here I have all this for the very first time

19
00:01:27,670 --> 00:01:28,590
that you have to do.

20
00:01:28,600 --> 00:01:33,520
You have to convert all these words into some lowercase character.

21
00:01:33,940 --> 00:01:40,350
So the question that you guys can ask why there is a need to convert all this stuff into lowercase.

22
00:01:40,570 --> 00:01:46,950
Listen, at some place like, say, this dog is in the form of, let's say, cap, or you can get the

23
00:01:46,960 --> 00:01:50,890
letters, but Lexcen somewhere else, it is in the form of a small cap.

24
00:01:51,280 --> 00:01:53,430
So that definitely impacts a lot.

25
00:01:53,440 --> 00:01:57,520
That's why you have to convert all your words in your Lorqess.

26
00:01:57,890 --> 00:02:04,840
So if you have to convert this into lowercase, you have to just use this as dot, dot, lower order

27
00:02:04,870 --> 00:02:07,630
and you have to just execute it now.

28
00:02:07,760 --> 00:02:12,920
Now you will see over here all your text gets converted into some lowercase.

29
00:02:13,240 --> 00:02:16,280
Now what you have to do, you have to store it somewhere else.

30
00:02:16,280 --> 00:02:18,760
So I'm going to say I have to store it.

31
00:02:18,760 --> 00:02:21,370
Lexing Final of text.

32
00:02:21,850 --> 00:02:24,880
Now, what we have to do, we have to just execute it.

33
00:02:25,210 --> 00:02:32,010
Now, after doing all these things, let's say I have to remove some extra or some special characters.

34
00:02:32,410 --> 00:02:34,600
So let's say I'm going to show you a thing.

35
00:02:34,600 --> 00:02:43,210
Let's say in this final of text, let's say I'm going to show you our data of this one sixty four index.

36
00:02:43,570 --> 00:02:46,830
You will see this is all the data you will over here.

37
00:02:47,020 --> 00:02:53,670
And if I'm going to use my that concept using Ardie, I'll function with this substitute and here I'm

38
00:02:53,680 --> 00:03:03,520
going the same except A to Z and capitally to say whatever I have, I have to just replace it with this

39
00:03:03,560 --> 00:03:04,050
space.

40
00:03:04,150 --> 00:03:05,930
That's what this substitute will do.

41
00:03:05,950 --> 00:03:12,130
And here I have to mention here on this text, I have to perform this operation that said and again,

42
00:03:12,130 --> 00:03:19,930
if I'm going to execute it now, you will see over here in this in this entire text, this hundred gets

43
00:03:20,230 --> 00:03:21,120
disappeared.

44
00:03:21,370 --> 00:03:26,820
But we all see are this hundred matters a lot over here.

45
00:03:26,830 --> 00:03:32,040
It means you can't use this logic to manipulate this data.

46
00:03:32,050 --> 00:03:40,010
It means you have to use something, your own logic, because whenever you're in a built in modules

47
00:03:40,100 --> 00:03:42,850
isn't going to help you at that time.

48
00:03:42,850 --> 00:03:44,800
You have to write your own logic.

49
00:03:45,190 --> 00:03:50,660
That's where your programming, your problem solving is going to help you a lot.

50
00:03:50,680 --> 00:03:55,770
So let's say I'm going to say let's say very first I have to define my own punctuation.

51
00:03:55,780 --> 00:03:59,380
So here I'm going to say my own punctuations or special character.

52
00:03:59,390 --> 00:04:02,140
Whatever name you want to assign, it's all up to you.

53
00:04:02,330 --> 00:04:05,320
Let's say my own punctuation marks and nothing.

54
00:04:05,320 --> 00:04:07,150
But let's say the very first one.

55
00:04:07,150 --> 00:04:13,870
This one, let's say I have this one, this bracket after let's say I have this one.

56
00:04:13,870 --> 00:04:16,450
I have some big brushes.

57
00:04:16,450 --> 00:04:17,110
I have some.

58
00:04:17,110 --> 00:04:20,920
Godlee says, I have some this one.

59
00:04:20,920 --> 00:04:24,400
Let's add this one or whatever you want to assign.

60
00:04:24,400 --> 00:04:26,040
It's all up to you.

61
00:04:26,230 --> 00:04:31,570
Let's say these are all my punctuation marks that I'm going to do.

62
00:04:31,570 --> 00:04:32,500
Fine over here.

63
00:04:32,500 --> 00:04:38,850
Discussion, Mark, let's say this slash whatever you want to say and it's all up to you.

64
00:04:38,870 --> 00:04:47,950
And of the special characters like the exclamation, this actor, it has this dialogue and is like percentage

65
00:04:48,370 --> 00:04:56,110
and let's say it is a star and let's say all these kinds of punctuation that you want to assign over

66
00:04:56,110 --> 00:04:56,540
here.

67
00:04:56,590 --> 00:05:04,510
Now, what we have to do, let's say I have to do this in this in my data and now I have to I create

68
00:05:04,510 --> 00:05:05,140
on the stage.

69
00:05:05,180 --> 00:05:12,850
So I'm going to say for character and data or for iron data, let me make it more user friendly for

70
00:05:12,850 --> 00:05:20,230
character and data and whatever character you have, then you have to put a condition if character not

71
00:05:20,230 --> 00:05:24,200
in in my punctuation that I have defined earlier.

72
00:05:24,580 --> 00:05:31,540
So if it is not in punctuation, then only I'm going to define or you can say it then only I'm going

73
00:05:31,540 --> 00:05:33,130
to consider it in my string.

74
00:05:33,130 --> 00:05:35,940
So that is what you have to define that particular string.

75
00:05:36,400 --> 00:05:40,480
So I'm just going to say this is string is nothing, but let's have to define it.

76
00:05:40,480 --> 00:05:45,070
Let's say this is doing is no underscore punctuations.

77
00:05:45,110 --> 00:05:47,460
It is exactly my blank string.

78
00:05:47,740 --> 00:05:51,130
So here I'm going to say no underscore punctuation.

79
00:05:51,130 --> 00:05:56,440
It goes to no one to score punctuation, concatenate.

80
00:05:56,440 --> 00:05:59,230
You have to concatenate each and every.

81
00:05:59,230 --> 00:05:59,660
Correct.

82
00:06:00,520 --> 00:06:01,180
That's the same.

83
00:06:01,300 --> 00:06:04,460
All idea behind my concatenation.

84
00:06:04,780 --> 00:06:09,030
Now, what we have to do, we have to print this, no one is called punctuation string.

85
00:06:09,280 --> 00:06:14,640
So for this, I'm just going to print my known as called punctuation, and I have to just execute the

86
00:06:14,650 --> 00:06:15,060
cell.

87
00:06:15,100 --> 00:06:22,920
Now, you will figure out this stuff gets executed and is still this hundred is still in your data.

88
00:06:22,930 --> 00:06:28,030
And if you are going to compare both stuff, these are almost similar.

89
00:06:28,030 --> 00:06:30,270
But there has some minor difference.

90
00:06:30,640 --> 00:06:36,820
It is not going to remove this numerical data because in this punctuations, I haven't defined that.

91
00:06:37,030 --> 00:06:40,300
Yeah, I have to eliminate this numerical data.

92
00:06:40,780 --> 00:06:47,350
But if I'm going to use this substitute function of our module, it would eliminate my numerical data

93
00:06:47,350 --> 00:06:47,750
as well.

94
00:06:48,070 --> 00:06:51,000
So that's a drawback of substitute function.

95
00:06:51,010 --> 00:06:59,260
You guys can also see in a similar way, you can approach to your text feature each and every text entry

96
00:06:59,590 --> 00:07:01,930
by using these blocks of code.

97
00:07:02,260 --> 00:07:07,780
So what I'm going to do, I'm just going to say very first, let me import a class, which is exactly

98
00:07:07,780 --> 00:07:11,110
my string does add in this string class.

99
00:07:11,380 --> 00:07:14,140
I have something very useful for you.

100
00:07:14,170 --> 00:07:16,960
This is exactly my punctuation to it.

101
00:07:16,990 --> 00:07:17,700
Exactly.

102
00:07:17,740 --> 00:07:21,490
It'll me all the punctuations defined by Piperno.

103
00:07:21,760 --> 00:07:23,990
Plus first I have to start somewhere.

104
00:07:24,460 --> 00:07:27,910
I'm just going to say that these are my punctuations.

105
00:07:27,940 --> 00:07:33,560
Now what we have to do, we have to basically copy all these stuff.

106
00:07:34,180 --> 00:07:36,010
Let me just paste over here.

107
00:07:36,310 --> 00:07:39,680
So here I'm going to say here I have to remove this one.

108
00:07:40,030 --> 00:07:47,440
So this time I'm just going to say for character in let me create a function over here.

109
00:07:47,650 --> 00:07:50,590
So I'm just going to provide a right indentation now.

110
00:07:50,860 --> 00:07:59,740
And this time my function name is define, remove punctuation to remove, underscore punctuation and

111
00:08:00,130 --> 00:08:08,450
whatever value I'm going to pass over here, it will just remove all the punctuation from that review.

112
00:08:08,470 --> 00:08:11,140
That's what this function will do.

113
00:08:11,260 --> 00:08:12,070
That simple.

114
00:08:12,250 --> 00:08:16,390
Then I have to simply return this that said, just executed.

115
00:08:16,420 --> 00:08:23,800
Now what we have to do, we have to apply this function on my text feature for this, I'm going to say

116
00:08:23,830 --> 00:08:29,440
final of text, which is exactly this one does not apply.

117
00:08:29,710 --> 00:08:37,990
And I have to apply this to remove, underscore punctuation and whatever ammunition it will do.

118
00:08:38,290 --> 00:08:41,230
I have to upgrade my final text as well.

119
00:08:41,530 --> 00:08:44,050
So here I'm going to see final text.

120
00:08:44,440 --> 00:08:45,900
It goes to this one.

121
00:08:45,910 --> 00:08:47,170
So just executed.

122
00:08:47,170 --> 00:08:49,850
All the stuff gets executed over here.

123
00:08:50,290 --> 00:08:55,930
Now you will see if I'm again going to go, let's say final thought had over there.

124
00:08:56,290 --> 00:08:59,570
You will see this is exactly your data.

125
00:09:00,580 --> 00:09:07,600
Now, if again, I'm let's again, I'm going to copy this final text one six four over here.

126
00:09:07,840 --> 00:09:11,180
And let me show you this data again in front of you.

127
00:09:11,200 --> 00:09:19,750
So this is exactly my entire data that has been modified just because of this function that I have written

128
00:09:19,750 --> 00:09:26,170
over here now is still you have to do better pre processing on your data because in this data you will

129
00:09:26,170 --> 00:09:26,740
fill it out.

130
00:09:26,980 --> 00:09:32,170
You have many Stallworth's like laws as as they are.

131
00:09:32,320 --> 00:09:37,030
These are exactly all the Star Wars that we have to remove so far.

132
00:09:37,030 --> 00:09:38,110
This is what I'm going to do.

133
00:09:38,110 --> 00:09:39,910
I'm just going to sit very first.

134
00:09:39,910 --> 00:09:47,770
I have to import my and that library, which is exactly Analytica, which is exactly natural language

135
00:09:48,070 --> 00:09:48,720
toolkit.

136
00:09:48,730 --> 00:09:52,390
So if you haven't started, you can start using at.

137
00:09:53,110 --> 00:10:01,840
Now I going to say from this Analytica dot corpus, I have something which is exactly my is Davut.

138
00:10:01,840 --> 00:10:06,280
So I'm very first just going to import all this stuff just executed.

139
00:10:06,310 --> 00:10:13,570
Now what we have to do, let's say in this, in this text, let's say I have to remove my all this is

140
00:10:13,570 --> 00:10:14,890
stored so far.

141
00:10:14,890 --> 00:10:16,570
This looks very false.

142
00:10:16,570 --> 00:10:17,080
That's it.

143
00:10:17,080 --> 00:10:18,400
I'm just going to storage.

144
00:10:18,430 --> 00:10:20,560
I say this is mine.

145
00:10:20,560 --> 00:10:24,960
Let's set it up to you, whatever name you want to assign and let me print it.

146
00:10:25,330 --> 00:10:25,680
Yeah.

147
00:10:26,020 --> 00:10:33,520
So now I'm just going to say for word in data, I have to access each and every word.

148
00:10:33,910 --> 00:10:40,210
Then I'm just going to say very first I, I have to convert this into some list.

149
00:10:40,540 --> 00:10:48,730
So I'm going to say data is split and I have to split it on the basis of the separator once all these

150
00:10:48,730 --> 00:10:56,950
things happen that if I'm going to fetch this world, then on this world, I'm going to put a condition

151
00:10:56,950 --> 00:10:59,720
that if this particular word.

152
00:10:59,740 --> 00:11:00,970
So if word.

153
00:11:01,260 --> 00:11:06,640
Not in my e-mail world, then only I'm going to consider this word.

154
00:11:06,750 --> 00:11:09,020
That's what my logic is here.

155
00:11:09,030 --> 00:11:17,130
I'm going to say a word not in threat of a stock word because you need your unique words here.

156
00:11:17,130 --> 00:11:21,990
I would say set off is towers dot words because.

157
00:11:22,170 --> 00:11:29,400
Yeah, here Dotto words because you need strong words of English language here.

158
00:11:29,400 --> 00:11:33,110
I'm going to say I need Stallworth's of English language.

159
00:11:33,360 --> 00:11:36,630
That's what these blocks of code will do for us.

160
00:11:37,080 --> 00:11:43,010
Then whatever word it will return me, I'm just going to do is there.

161
00:11:43,080 --> 00:11:47,760
So this is exactly mine code for list comprehension.

162
00:11:47,760 --> 00:11:51,690
So if I'm going to execute it, you will see you have all your words.

163
00:11:51,790 --> 00:11:54,530
You don't have any stored in this list.

164
00:11:54,750 --> 00:11:57,930
Now, what you have to do, you have to concatenated.

165
00:11:58,110 --> 00:12:02,970
So let's say I'm going to say all these words are in my body list.

166
00:12:03,090 --> 00:12:08,850
Now, what I'm going to do, let's say very first, I would define some blank list, let's say start,

167
00:12:09,180 --> 00:12:12,400
and then I'm going to create on this audit list.

168
00:12:12,400 --> 00:12:20,460
So I'm going to say for W.D. or word Inari, whatever, W.T. I'm going to fetch or you can say whatever

169
00:12:20,460 --> 00:12:25,560
element I'm going to fact from this list I have to concatenate in this string for this.

170
00:12:25,560 --> 00:12:33,380
I'm going to say start equals to start plus W and after it I have to open some space as well.

171
00:12:33,750 --> 00:12:41,100
So here I'm going to say start equals to your plus some space that that's all up to you.

172
00:12:41,460 --> 00:12:46,200
Now what I'm going to do, I'm just going to print this start as well.

173
00:12:46,320 --> 00:12:47,550
Just executed.

174
00:12:47,940 --> 00:12:52,230
This is exactly that word that you have over here.

175
00:12:52,230 --> 00:12:57,750
But here you have some strong words, but here you don't have any Star Wars.

176
00:12:58,110 --> 00:13:06,090
So you have to apply the similar approach, the similar approach with respect to each and every text.

177
00:13:06,480 --> 00:13:12,090
So what I'm going to do now, so now I'm just going to define a function over here next to this function

178
00:13:12,090 --> 00:13:18,540
in my setting or remove the underscore is stop words.

179
00:13:18,540 --> 00:13:26,130
And here, whatever you do, it will receive it will remove its top words from that particular review.

180
00:13:26,400 --> 00:13:27,780
What dysfunction will do?

181
00:13:28,140 --> 00:13:28,740
Symbol.

182
00:13:29,010 --> 00:13:32,030
So I'm just going to copy all this stuff.

183
00:13:32,150 --> 00:13:40,620
Just going to paste over here then either using these blocks of code you guys can also use join that

184
00:13:40,620 --> 00:13:46,860
what we have used already when we are going to perform sentiment analysis with respect to our data here,

185
00:13:46,860 --> 00:13:55,230
I have to say I have to simply join you, assign proper small presses over here that simply I have to

186
00:13:55,230 --> 00:13:58,320
return it, text it, then just execute it.

187
00:13:58,320 --> 00:14:00,380
Now, what do you have to do?

188
00:14:00,540 --> 00:14:08,150
You have to apply this function, which is exactly my attribute underscore is downvotes to my text column.

189
00:14:08,160 --> 00:14:16,440
So for this I'm going to say final of X value for us to have to accept this text, not apply what you

190
00:14:16,440 --> 00:14:17,040
have to apply.

191
00:14:17,040 --> 00:14:18,780
You have to apply this function simple.

192
00:14:19,200 --> 00:14:24,900
Once all your stuff gets applied, then you have to update this text as well.

193
00:14:25,290 --> 00:14:26,610
Just executed.

194
00:14:26,850 --> 00:14:34,250
It will take some couple of seconds in execution because internally all these blocks of code will join

195
00:14:34,260 --> 00:14:35,070
this list.

196
00:14:35,070 --> 00:14:39,200
Comprehension code will run with respect to your two thousand rules.

197
00:14:39,210 --> 00:14:45,870
So definitely it will take somewhere around one minutes depending upon what a specification, what type

198
00:14:45,870 --> 00:14:49,950
of data, relatively bulky or whether it will be not bulky.

199
00:14:49,950 --> 00:14:53,130
Now you will figure out all this stuff gets executed.

200
00:14:53,610 --> 00:14:54,810
Let me show you a thing.

201
00:14:54,810 --> 00:15:00,600
And on this, let's say if I'm going to accept this text of, let's say, any random value that you

202
00:15:00,600 --> 00:15:04,640
want to assign over here, let's say forty five just executed.

203
00:15:04,680 --> 00:15:09,950
Now you will see you don't have any money stored in your data.

204
00:15:10,200 --> 00:15:15,320
So it is up to some extent your data is somehow ready.

205
00:15:15,330 --> 00:15:18,270
I'm not saying your data is completely ready.

206
00:15:18,660 --> 00:15:24,990
So it means you still have to do lots of analysis, lots of people sitting on your data to come up with

207
00:15:24,990 --> 00:15:26,070
this conclusion.

208
00:15:26,310 --> 00:15:28,800
What exactly the behavior of customers.

209
00:15:28,860 --> 00:15:31,260
So that's all about the session in the upcoming session.

210
00:15:31,260 --> 00:15:35,010
We are going to continue doing lots of people sitting on our data.

211
00:15:35,310 --> 00:15:36,250
That's all about it.

212
00:15:36,290 --> 00:15:37,990
Hope you love the session very much.

213
00:15:38,550 --> 00:15:39,210
Thank you.

214
00:15:39,330 --> 00:15:40,360
Have a nice day.

215
00:15:40,680 --> 00:15:41,550
Keep learning.

216
00:15:41,550 --> 00:15:42,420
Keep growing.

217
00:15:42,750 --> 00:15:43,560
Keep practicing.