1 00:00:01,530 --> 00:00:03,950 This session is about variable transmission. 2 00:00:04,800 --> 00:00:10,190 We are trying to derive maximum value out of what our data got. 3 00:00:11,410 --> 00:00:12,910 Incredible transformation. 4 00:00:13,210 --> 00:00:21,130 You're not having any data like you're only trying to make the data more useful and different techniques 5 00:00:21,170 --> 00:00:23,890 are available for variable transmission. 6 00:00:25,250 --> 00:00:32,060 That is transformation of continuous variable, and that is transformation of categorical, like both 7 00:00:32,060 --> 00:00:39,150 we will see in this session in the case of continuous variable, which is numerically what's right. 8 00:00:39,170 --> 00:00:48,710 You can transform the data by converting the data from a numeric to a laboratory or square root on human 9 00:00:49,580 --> 00:00:56,720 right through this, we're essentially changing the nature of relationship on nature, of distribution, 10 00:00:56,720 --> 00:00:57,470 of dataset. 11 00:00:58,280 --> 00:01:00,710 That's what we are doing through this exercise. 12 00:01:01,400 --> 00:01:11,780 Beny is transforming a new data into a category like we already saw the example of beaning and logarithmic 13 00:01:12,590 --> 00:01:14,720 in treating outlier data. 14 00:01:15,380 --> 00:01:16,870 So that's something we Soldini. 15 00:01:17,930 --> 00:01:26,030 In the case of categorical data, the transformation to numeric value happens through the process of 16 00:01:26,030 --> 00:01:27,790 what is known as encoding, OK? 17 00:01:28,220 --> 00:01:35,210 In fact, the transformation is more needed in categorical variables than numeric variables. 18 00:01:35,870 --> 00:01:42,450 This is because categorical variables as such cannot be used directly in most machine learning models. 19 00:01:42,470 --> 00:01:49,910 Hence, it becomes a necessity for us to convert the categorical variables into numeric variables. 20 00:01:50,870 --> 00:01:51,170 Right. 21 00:01:51,980 --> 00:02:01,460 So if you see the insurance example, the ones that are mentioned in green right there on non numeric 22 00:02:01,880 --> 00:02:08,840 categorical, the ones that are mentioned in red, they are continuous numeric variables. 23 00:02:09,380 --> 00:02:16,970 So all the ones that are that are shown in red, the numeric variables, I can translate them if needed 24 00:02:16,970 --> 00:02:20,650 into longer to make a square root or Kubelik. 25 00:02:21,170 --> 00:02:21,530 Right. 26 00:02:21,770 --> 00:02:30,440 Whereas the ones in green non numeric I have to mandatorily convert them, transform them into numeric 27 00:02:30,440 --> 00:02:35,690 data, only then I can use them for machine learning models. 28 00:02:36,200 --> 00:02:36,550 Right. 29 00:02:36,860 --> 00:02:41,200 So in the case of non numeric or categorical, it's more than necessary for. 30 00:02:42,470 --> 00:02:42,820 Right. 31 00:02:44,360 --> 00:02:48,260 So how do you convert Python comes with preloaded library? 32 00:02:49,040 --> 00:02:53,880 As you can see the example here, I have six smoker region, right? 33 00:02:53,900 --> 00:02:58,570 These are categorical variables using the encoding process. 34 00:02:58,690 --> 00:03:06,170 I input pre processing and then I use pre dot, label encoder and then I transform and you can see that 35 00:03:06,380 --> 00:03:15,710 the categorical data is now converted into numeric data and this data can now be used for generating 36 00:03:15,710 --> 00:03:16,400 machine learning. 37 00:03:16,400 --> 00:03:19,610 Models get very simple, right? 38 00:03:20,120 --> 00:03:23,420 So variable transformation is a very important process. 39 00:03:24,320 --> 00:03:31,730 More important in the case of categorical variables, because I cannot use categorical variables directly 40 00:03:31,730 --> 00:03:37,700 in machine learning models like they must be first converted into numeric value. 41 00:03:38,790 --> 00:03:46,180 OK, so that completes the session on variable transmission, right? 42 00:03:46,290 --> 00:03:53,100 In fact, with that session we completed and are exploratory data analysis online program. 43 00:03:53,550 --> 00:03:58,660 So just to summarize, we started with understanding the data, basic understanding of the data. 44 00:03:59,340 --> 00:04:07,680 Then we did univariate analysis, then Vivarium analysis to see the relationship between two variables. 45 00:04:08,040 --> 00:04:13,240 We also saw the concept of Collini are clearly something we don't want in our dataset. 46 00:04:13,890 --> 00:04:20,370 Then we also saw missing values, treatment, how to handle missing values, how to handle outliers, 47 00:04:20,730 --> 00:04:27,980 how to transform variables like andas we saw as part of exploratory data analysis. 48 00:04:28,380 --> 00:04:28,710 Right. 49 00:04:29,070 --> 00:04:36,930 So if you really see the last three, what I would call a future Regini because I'm trying to get maximum 50 00:04:36,930 --> 00:04:44,460 value out of whatever data and I want to make the data more useful for my purposes. 51 00:04:45,700 --> 00:04:46,040 Right. 52 00:04:46,810 --> 00:04:52,990 So that completes the session on exploratory data analysis. 53 00:04:53,830 --> 00:04:54,320 Thank you. 54 00:04:54,610 --> 00:04:56,160 And I wish you the best of luck.