1 00:00:00,180 --> 00:00:07,410 Hello, all before going ahead in this session, let's have a quick recap of what we have done to now 2 00:00:07,410 --> 00:00:14,400 in this project from importing data, both humming sentiment analysis with respect to positive sentences, 3 00:00:14,400 --> 00:00:21,270 as well as negative sentencing, doing lots of data cleaning, doing lots of preprocessing this word 4 00:00:21,270 --> 00:00:24,150 cloud visual and different different things. 5 00:00:24,400 --> 00:00:29,190 We have also analyzed to what type of user Amazon is going to recommend more product. 6 00:00:29,640 --> 00:00:36,600 And at the end, we have this beautiful box plot, which is exactly a distribution of the land of the 7 00:00:36,600 --> 00:00:38,190 customer feedback. 8 00:00:38,580 --> 00:00:46,260 Then we have this counterplan of my core feature and in this particular session we have this assignment 9 00:00:46,260 --> 00:00:50,640 in which I have to analyze what exactly the behavior of the customers. 10 00:00:51,030 --> 00:00:59,190 Because if in this final exam would consider finally the frame for this analysis in this final, you 11 00:00:59,190 --> 00:01:00,180 have a column. 12 00:01:01,780 --> 00:01:08,230 Which is exactly this text, so using this feature, using this feature, you can definitely come up 13 00:01:08,230 --> 00:01:12,290 with this conclusion what exactly the behavior of our customers. 14 00:01:12,610 --> 00:01:14,920 So very first, what we have to do next. 15 00:01:15,220 --> 00:01:16,320 Let me show you a thing. 16 00:01:16,330 --> 00:01:18,250 Save on this final. 17 00:01:18,250 --> 00:01:20,650 I have to exit this text. 18 00:01:20,740 --> 00:01:27,610 And if I'm going to access the Siedel, so you will see here I have all this for the very first time 19 00:01:27,670 --> 00:01:28,590 that you have to do. 20 00:01:28,600 --> 00:01:33,520 You have to convert all these words into some lowercase character. 21 00:01:33,940 --> 00:01:40,350 So the question that you guys can ask why there is a need to convert all this stuff into lowercase. 22 00:01:40,570 --> 00:01:46,950 Listen, at some place like, say, this dog is in the form of, let's say, cap, or you can get the 23 00:01:46,960 --> 00:01:50,890 letters, but Lexcen somewhere else, it is in the form of a small cap. 24 00:01:51,280 --> 00:01:53,430 So that definitely impacts a lot. 25 00:01:53,440 --> 00:01:57,520 That's why you have to convert all your words in your Lorqess. 26 00:01:57,890 --> 00:02:04,840 So if you have to convert this into lowercase, you have to just use this as dot, dot, lower order 27 00:02:04,870 --> 00:02:07,630 and you have to just execute it now. 28 00:02:07,760 --> 00:02:12,920 Now you will see over here all your text gets converted into some lowercase. 29 00:02:13,240 --> 00:02:16,280 Now what you have to do, you have to store it somewhere else. 30 00:02:16,280 --> 00:02:18,760 So I'm going to say I have to store it. 31 00:02:18,760 --> 00:02:21,370 Lexing Final of text. 32 00:02:21,850 --> 00:02:24,880 Now, what we have to do, we have to just execute it. 33 00:02:25,210 --> 00:02:32,010 Now, after doing all these things, let's say I have to remove some extra or some special characters. 34 00:02:32,410 --> 00:02:34,600 So let's say I'm going to show you a thing. 35 00:02:34,600 --> 00:02:43,210 Let's say in this final of text, let's say I'm going to show you our data of this one sixty four index. 36 00:02:43,570 --> 00:02:46,830 You will see this is all the data you will over here. 37 00:02:47,020 --> 00:02:53,670 And if I'm going to use my that concept using Ardie, I'll function with this substitute and here I'm 38 00:02:53,680 --> 00:03:03,520 going the same except A to Z and capitally to say whatever I have, I have to just replace it with this 39 00:03:03,560 --> 00:03:04,050 space. 40 00:03:04,150 --> 00:03:05,930 That's what this substitute will do. 41 00:03:05,950 --> 00:03:12,130 And here I have to mention here on this text, I have to perform this operation that said and again, 42 00:03:12,130 --> 00:03:19,930 if I'm going to execute it now, you will see over here in this in this entire text, this hundred gets 43 00:03:20,230 --> 00:03:21,120 disappeared. 44 00:03:21,370 --> 00:03:26,820 But we all see are this hundred matters a lot over here. 45 00:03:26,830 --> 00:03:32,040 It means you can't use this logic to manipulate this data. 46 00:03:32,050 --> 00:03:40,010 It means you have to use something, your own logic, because whenever you're in a built in modules 47 00:03:40,100 --> 00:03:42,850 isn't going to help you at that time. 48 00:03:42,850 --> 00:03:44,800 You have to write your own logic. 49 00:03:45,190 --> 00:03:50,660 That's where your programming, your problem solving is going to help you a lot. 50 00:03:50,680 --> 00:03:55,770 So let's say I'm going to say let's say very first I have to define my own punctuation. 51 00:03:55,780 --> 00:03:59,380 So here I'm going to say my own punctuations or special character. 52 00:03:59,390 --> 00:04:02,140 Whatever name you want to assign, it's all up to you. 53 00:04:02,330 --> 00:04:05,320 Let's say my own punctuation marks and nothing. 54 00:04:05,320 --> 00:04:07,150 But let's say the very first one. 55 00:04:07,150 --> 00:04:13,870 This one, let's say I have this one, this bracket after let's say I have this one. 56 00:04:13,870 --> 00:04:16,450 I have some big brushes. 57 00:04:16,450 --> 00:04:17,110 I have some. 58 00:04:17,110 --> 00:04:20,920 Godlee says, I have some this one. 59 00:04:20,920 --> 00:04:24,400 Let's add this one or whatever you want to assign. 60 00:04:24,400 --> 00:04:26,040 It's all up to you. 61 00:04:26,230 --> 00:04:31,570 Let's say these are all my punctuation marks that I'm going to do. 62 00:04:31,570 --> 00:04:32,500 Fine over here. 63 00:04:32,500 --> 00:04:38,850 Discussion, Mark, let's say this slash whatever you want to say and it's all up to you. 64 00:04:38,870 --> 00:04:47,950 And of the special characters like the exclamation, this actor, it has this dialogue and is like percentage 65 00:04:48,370 --> 00:04:56,110 and let's say it is a star and let's say all these kinds of punctuation that you want to assign over 66 00:04:56,110 --> 00:04:56,540 here. 67 00:04:56,590 --> 00:05:04,510 Now, what we have to do, let's say I have to do this in this in my data and now I have to I create 68 00:05:04,510 --> 00:05:05,140 on the stage. 69 00:05:05,180 --> 00:05:12,850 So I'm going to say for character and data or for iron data, let me make it more user friendly for 70 00:05:12,850 --> 00:05:20,230 character and data and whatever character you have, then you have to put a condition if character not 71 00:05:20,230 --> 00:05:24,200 in in my punctuation that I have defined earlier. 72 00:05:24,580 --> 00:05:31,540 So if it is not in punctuation, then only I'm going to define or you can say it then only I'm going 73 00:05:31,540 --> 00:05:33,130 to consider it in my string. 74 00:05:33,130 --> 00:05:35,940 So that is what you have to define that particular string. 75 00:05:36,400 --> 00:05:40,480 So I'm just going to say this is string is nothing, but let's have to define it. 76 00:05:40,480 --> 00:05:45,070 Let's say this is doing is no underscore punctuations. 77 00:05:45,110 --> 00:05:47,460 It is exactly my blank string. 78 00:05:47,740 --> 00:05:51,130 So here I'm going to say no underscore punctuation. 79 00:05:51,130 --> 00:05:56,440 It goes to no one to score punctuation, concatenate. 80 00:05:56,440 --> 00:05:59,230 You have to concatenate each and every. 81 00:05:59,230 --> 00:05:59,660 Correct. 82 00:06:00,520 --> 00:06:01,180 That's the same. 83 00:06:01,300 --> 00:06:04,460 All idea behind my concatenation. 84 00:06:04,780 --> 00:06:09,030 Now, what we have to do, we have to print this, no one is called punctuation string. 85 00:06:09,280 --> 00:06:14,640 So for this, I'm just going to print my known as called punctuation, and I have to just execute the 86 00:06:14,650 --> 00:06:15,060 cell. 87 00:06:15,100 --> 00:06:22,920 Now, you will figure out this stuff gets executed and is still this hundred is still in your data. 88 00:06:22,930 --> 00:06:28,030 And if you are going to compare both stuff, these are almost similar. 89 00:06:28,030 --> 00:06:30,270 But there has some minor difference. 90 00:06:30,640 --> 00:06:36,820 It is not going to remove this numerical data because in this punctuations, I haven't defined that. 91 00:06:37,030 --> 00:06:40,300 Yeah, I have to eliminate this numerical data. 92 00:06:40,780 --> 00:06:47,350 But if I'm going to use this substitute function of our module, it would eliminate my numerical data 93 00:06:47,350 --> 00:06:47,750 as well. 94 00:06:48,070 --> 00:06:51,000 So that's a drawback of substitute function. 95 00:06:51,010 --> 00:06:59,260 You guys can also see in a similar way, you can approach to your text feature each and every text entry 96 00:06:59,590 --> 00:07:01,930 by using these blocks of code. 97 00:07:02,260 --> 00:07:07,780 So what I'm going to do, I'm just going to say very first, let me import a class, which is exactly 98 00:07:07,780 --> 00:07:11,110 my string does add in this string class. 99 00:07:11,380 --> 00:07:14,140 I have something very useful for you. 100 00:07:14,170 --> 00:07:16,960 This is exactly my punctuation to it. 101 00:07:16,990 --> 00:07:17,700 Exactly. 102 00:07:17,740 --> 00:07:21,490 It'll me all the punctuations defined by Piperno. 103 00:07:21,760 --> 00:07:23,990 Plus first I have to start somewhere. 104 00:07:24,460 --> 00:07:27,910 I'm just going to say that these are my punctuations. 105 00:07:27,940 --> 00:07:33,560 Now what we have to do, we have to basically copy all these stuff. 106 00:07:34,180 --> 00:07:36,010 Let me just paste over here. 107 00:07:36,310 --> 00:07:39,680 So here I'm going to say here I have to remove this one. 108 00:07:40,030 --> 00:07:47,440 So this time I'm just going to say for character in let me create a function over here. 109 00:07:47,650 --> 00:07:50,590 So I'm just going to provide a right indentation now. 110 00:07:50,860 --> 00:07:59,740 And this time my function name is define, remove punctuation to remove, underscore punctuation and 111 00:08:00,130 --> 00:08:08,450 whatever value I'm going to pass over here, it will just remove all the punctuation from that review. 112 00:08:08,470 --> 00:08:11,140 That's what this function will do. 113 00:08:11,260 --> 00:08:12,070 That simple. 114 00:08:12,250 --> 00:08:16,390 Then I have to simply return this that said, just executed. 115 00:08:16,420 --> 00:08:23,800 Now what we have to do, we have to apply this function on my text feature for this, I'm going to say 116 00:08:23,830 --> 00:08:29,440 final of text, which is exactly this one does not apply. 117 00:08:29,710 --> 00:08:37,990 And I have to apply this to remove, underscore punctuation and whatever ammunition it will do. 118 00:08:38,290 --> 00:08:41,230 I have to upgrade my final text as well. 119 00:08:41,530 --> 00:08:44,050 So here I'm going to see final text. 120 00:08:44,440 --> 00:08:45,900 It goes to this one. 121 00:08:45,910 --> 00:08:47,170 So just executed. 122 00:08:47,170 --> 00:08:49,850 All the stuff gets executed over here. 123 00:08:50,290 --> 00:08:55,930 Now you will see if I'm again going to go, let's say final thought had over there. 124 00:08:56,290 --> 00:08:59,570 You will see this is exactly your data. 125 00:09:00,580 --> 00:09:07,600 Now, if again, I'm let's again, I'm going to copy this final text one six four over here. 126 00:09:07,840 --> 00:09:11,180 And let me show you this data again in front of you. 127 00:09:11,200 --> 00:09:19,750 So this is exactly my entire data that has been modified just because of this function that I have written 128 00:09:19,750 --> 00:09:26,170 over here now is still you have to do better pre processing on your data because in this data you will 129 00:09:26,170 --> 00:09:26,740 fill it out. 130 00:09:26,980 --> 00:09:32,170 You have many Stallworth's like laws as as they are. 131 00:09:32,320 --> 00:09:37,030 These are exactly all the Star Wars that we have to remove so far. 132 00:09:37,030 --> 00:09:38,110 This is what I'm going to do. 133 00:09:38,110 --> 00:09:39,910 I'm just going to sit very first. 134 00:09:39,910 --> 00:09:47,770 I have to import my and that library, which is exactly Analytica, which is exactly natural language 135 00:09:48,070 --> 00:09:48,720 toolkit. 136 00:09:48,730 --> 00:09:52,390 So if you haven't started, you can start using at. 137 00:09:53,110 --> 00:10:01,840 Now I going to say from this Analytica dot corpus, I have something which is exactly my is Davut. 138 00:10:01,840 --> 00:10:06,280 So I'm very first just going to import all this stuff just executed. 139 00:10:06,310 --> 00:10:13,570 Now what we have to do, let's say in this, in this text, let's say I have to remove my all this is 140 00:10:13,570 --> 00:10:14,890 stored so far. 141 00:10:14,890 --> 00:10:16,570 This looks very false. 142 00:10:16,570 --> 00:10:17,080 That's it. 143 00:10:17,080 --> 00:10:18,400 I'm just going to storage. 144 00:10:18,430 --> 00:10:20,560 I say this is mine. 145 00:10:20,560 --> 00:10:24,960 Let's set it up to you, whatever name you want to assign and let me print it. 146 00:10:25,330 --> 00:10:25,680 Yeah. 147 00:10:26,020 --> 00:10:33,520 So now I'm just going to say for word in data, I have to access each and every word. 148 00:10:33,910 --> 00:10:40,210 Then I'm just going to say very first I, I have to convert this into some list. 149 00:10:40,540 --> 00:10:48,730 So I'm going to say data is split and I have to split it on the basis of the separator once all these 150 00:10:48,730 --> 00:10:56,950 things happen that if I'm going to fetch this world, then on this world, I'm going to put a condition 151 00:10:56,950 --> 00:10:59,720 that if this particular word. 152 00:10:59,740 --> 00:11:00,970 So if word. 153 00:11:01,260 --> 00:11:06,640 Not in my e-mail world, then only I'm going to consider this word. 154 00:11:06,750 --> 00:11:09,020 That's what my logic is here. 155 00:11:09,030 --> 00:11:17,130 I'm going to say a word not in threat of a stock word because you need your unique words here. 156 00:11:17,130 --> 00:11:21,990 I would say set off is towers dot words because. 157 00:11:22,170 --> 00:11:29,400 Yeah, here Dotto words because you need strong words of English language here. 158 00:11:29,400 --> 00:11:33,110 I'm going to say I need Stallworth's of English language. 159 00:11:33,360 --> 00:11:36,630 That's what these blocks of code will do for us. 160 00:11:37,080 --> 00:11:43,010 Then whatever word it will return me, I'm just going to do is there. 161 00:11:43,080 --> 00:11:47,760 So this is exactly mine code for list comprehension. 162 00:11:47,760 --> 00:11:51,690 So if I'm going to execute it, you will see you have all your words. 163 00:11:51,790 --> 00:11:54,530 You don't have any stored in this list. 164 00:11:54,750 --> 00:11:57,930 Now, what you have to do, you have to concatenated. 165 00:11:58,110 --> 00:12:02,970 So let's say I'm going to say all these words are in my body list. 166 00:12:03,090 --> 00:12:08,850 Now, what I'm going to do, let's say very first, I would define some blank list, let's say start, 167 00:12:09,180 --> 00:12:12,400 and then I'm going to create on this audit list. 168 00:12:12,400 --> 00:12:20,460 So I'm going to say for W.D. or word Inari, whatever, W.T. I'm going to fetch or you can say whatever 169 00:12:20,460 --> 00:12:25,560 element I'm going to fact from this list I have to concatenate in this string for this. 170 00:12:25,560 --> 00:12:33,380 I'm going to say start equals to start plus W and after it I have to open some space as well. 171 00:12:33,750 --> 00:12:41,100 So here I'm going to say start equals to your plus some space that that's all up to you. 172 00:12:41,460 --> 00:12:46,200 Now what I'm going to do, I'm just going to print this start as well. 173 00:12:46,320 --> 00:12:47,550 Just executed. 174 00:12:47,940 --> 00:12:52,230 This is exactly that word that you have over here. 175 00:12:52,230 --> 00:12:57,750 But here you have some strong words, but here you don't have any Star Wars. 176 00:12:58,110 --> 00:13:06,090 So you have to apply the similar approach, the similar approach with respect to each and every text. 177 00:13:06,480 --> 00:13:12,090 So what I'm going to do now, so now I'm just going to define a function over here next to this function 178 00:13:12,090 --> 00:13:18,540 in my setting or remove the underscore is stop words. 179 00:13:18,540 --> 00:13:26,130 And here, whatever you do, it will receive it will remove its top words from that particular review. 180 00:13:26,400 --> 00:13:27,780 What dysfunction will do? 181 00:13:28,140 --> 00:13:28,740 Symbol. 182 00:13:29,010 --> 00:13:32,030 So I'm just going to copy all this stuff. 183 00:13:32,150 --> 00:13:40,620 Just going to paste over here then either using these blocks of code you guys can also use join that 184 00:13:40,620 --> 00:13:46,860 what we have used already when we are going to perform sentiment analysis with respect to our data here, 185 00:13:46,860 --> 00:13:55,230 I have to say I have to simply join you, assign proper small presses over here that simply I have to 186 00:13:55,230 --> 00:13:58,320 return it, text it, then just execute it. 187 00:13:58,320 --> 00:14:00,380 Now, what do you have to do? 188 00:14:00,540 --> 00:14:08,150 You have to apply this function, which is exactly my attribute underscore is downvotes to my text column. 189 00:14:08,160 --> 00:14:16,440 So for this I'm going to say final of X value for us to have to accept this text, not apply what you 190 00:14:16,440 --> 00:14:17,040 have to apply. 191 00:14:17,040 --> 00:14:18,780 You have to apply this function simple. 192 00:14:19,200 --> 00:14:24,900 Once all your stuff gets applied, then you have to update this text as well. 193 00:14:25,290 --> 00:14:26,610 Just executed. 194 00:14:26,850 --> 00:14:34,250 It will take some couple of seconds in execution because internally all these blocks of code will join 195 00:14:34,260 --> 00:14:35,070 this list. 196 00:14:35,070 --> 00:14:39,200 Comprehension code will run with respect to your two thousand rules. 197 00:14:39,210 --> 00:14:45,870 So definitely it will take somewhere around one minutes depending upon what a specification, what type 198 00:14:45,870 --> 00:14:49,950 of data, relatively bulky or whether it will be not bulky. 199 00:14:49,950 --> 00:14:53,130 Now you will figure out all this stuff gets executed. 200 00:14:53,610 --> 00:14:54,810 Let me show you a thing. 201 00:14:54,810 --> 00:15:00,600 And on this, let's say if I'm going to accept this text of, let's say, any random value that you 202 00:15:00,600 --> 00:15:04,640 want to assign over here, let's say forty five just executed. 203 00:15:04,680 --> 00:15:09,950 Now you will see you don't have any money stored in your data. 204 00:15:10,200 --> 00:15:15,320 So it is up to some extent your data is somehow ready. 205 00:15:15,330 --> 00:15:18,270 I'm not saying your data is completely ready. 206 00:15:18,660 --> 00:15:24,990 So it means you still have to do lots of analysis, lots of people sitting on your data to come up with 207 00:15:24,990 --> 00:15:26,070 this conclusion. 208 00:15:26,310 --> 00:15:28,800 What exactly the behavior of customers. 209 00:15:28,860 --> 00:15:31,260 So that's all about the session in the upcoming session. 210 00:15:31,260 --> 00:15:35,010 We are going to continue doing lots of people sitting on our data. 211 00:15:35,310 --> 00:15:36,250 That's all about it. 212 00:15:36,290 --> 00:15:37,990 Hope you love the session very much. 213 00:15:38,550 --> 00:15:39,210 Thank you. 214 00:15:39,330 --> 00:15:40,360 Have a nice day. 215 00:15:40,680 --> 00:15:41,550 Keep learning. 216 00:15:41,550 --> 00:15:42,420 Keep growing. 217 00:15:42,750 --> 00:15:43,560 Keep practicing.