1 00:00:00,330 --> 00:00:04,650 Hello, all so welcome to this data analytics project for finance. 2 00:00:05,010 --> 00:00:12,390 So in this series, basically, I'm going to consider data of bank personal loan and here I'm going 3 00:00:12,390 --> 00:00:19,830 to perform some kind of analysis so that I can fetch some insights and some valuable information from 4 00:00:19,830 --> 00:00:21,750 this huge chunk of data. 5 00:00:22,020 --> 00:00:28,360 So this exactly is my data set on which I'm going to perform some sort of analysis over here. 6 00:00:28,680 --> 00:00:35,130 So very first, you have to just go across to the various basic steps of analysis. 7 00:00:35,130 --> 00:00:41,310 Very first, you have to read your data, then you have to go through with the people sitting stage 8 00:00:41,310 --> 00:00:46,560 of data where you are going to deal with the missing values and various other features as well. 9 00:00:46,830 --> 00:00:52,350 So I'm just going to open my own book, and this is exactly my very first. 10 00:00:52,350 --> 00:00:59,580 I have to import my some of the libraries for manipulation, for numerical computation, for visualizations, 11 00:00:59,580 --> 00:01:02,710 as well as for interactive results as well. 12 00:01:03,060 --> 00:01:08,760 So what I'm going to do is I'm just going to import my all the libraries over here so that if I have 13 00:01:08,760 --> 00:01:15,550 to import my panel, so I'm going to say I'm not as big and is basically all of us. 14 00:01:15,870 --> 00:01:18,300 So the second one is basically my name. 15 00:01:19,050 --> 00:01:24,000 So I'm going to import my name by and creating an alias as ENPI. 16 00:01:24,270 --> 00:01:34,620 And the third one is basically my basic library for vision purpose, which is my matplotlib and so on. 17 00:01:34,620 --> 00:01:40,320 For interactive graphs, you can import your seabourne for production. 18 00:01:40,320 --> 00:01:44,710 Even with the additions, you can import your blakley as well. 19 00:01:45,030 --> 00:01:47,790 So I'm just going to execute it very first. 20 00:01:47,800 --> 00:01:50,020 I have to read my data. 21 00:01:50,370 --> 00:01:54,920 So you will see this exactly is my actual data set. 22 00:01:54,930 --> 00:01:59,970 So I'm just going to copy this entire part because my data is somewhere available over here. 23 00:02:00,600 --> 00:02:05,960 So now I have to use a function known as Read an Excel. 24 00:02:05,970 --> 00:02:10,600 And in this basically very flat, I have to pass my entire. 25 00:02:10,860 --> 00:02:16,620 But then you can pass that over here to get to your dataset. 26 00:02:17,340 --> 00:02:20,990 So this is basically my actual data. 27 00:02:21,000 --> 00:02:27,970 And now you have to pass your what exactly the sheet name that you want to read. 28 00:02:28,350 --> 00:02:32,610 So if you are going to open this data, so this is exactly the data. 29 00:02:33,030 --> 00:02:39,850 And in the very first state, we have a description and in the second sheet we have this entire data. 30 00:02:40,110 --> 00:02:44,320 So here we have to basically parse the index of that sheet. 31 00:02:44,550 --> 00:02:52,410 So my index is basically one because zero indexes goals for description and the first one is goals for 32 00:02:52,440 --> 00:02:53,150 data as well. 33 00:02:53,790 --> 00:02:55,640 So it will basically return me. 34 00:02:56,250 --> 00:03:03,450 So I'm going to store it in detail and let's say I'm going to call ahead on my frame. 35 00:03:03,930 --> 00:03:12,270 So it will basically return me data, have some number of rules and some column and to check what are 36 00:03:12,270 --> 00:03:20,010 the rules and columns in your data, you will see somewhere around five thousand rows and columns. 37 00:03:20,020 --> 00:03:27,240 And to check whether I have any null values, indeed or not, you can call it is null, dark, some 38 00:03:27,600 --> 00:03:32,610 to get all your some of the any values in your data. 39 00:03:33,060 --> 00:03:41,550 So now you will see over here you don't have any of the missing values in your data, but you will see. 40 00:03:41,910 --> 00:03:43,500 Yeah, Aidy. 41 00:03:43,530 --> 00:03:48,960 And the zip code doesn't make sense for the analysis purposes. 42 00:03:49,110 --> 00:03:56,720 So I can simply remove these features or I can simply drop both of the features for this. 43 00:03:56,730 --> 00:03:58,520 I'm going to say the drop. 44 00:03:58,740 --> 00:04:06,060 And in this list I have to mention my board, the features, the very first one is my ID and the next 45 00:04:06,060 --> 00:04:09,410 one is basically my zip code. 46 00:04:09,660 --> 00:04:14,250 So you can either copy from here or you can set as manually. 47 00:04:14,580 --> 00:04:21,860 And now you have to set your X's parameter because I had to remove this as entire column. 48 00:04:22,200 --> 00:04:28,060 Then if you want to factor in this parameter, which is responsible for a bit in your data frame. 49 00:04:28,530 --> 00:04:32,370 So now you can set it as well to just execute it. 50 00:04:32,370 --> 00:04:33,600 If you are going to check. 51 00:04:33,930 --> 00:04:36,970 What are the different columns right now available in my data. 52 00:04:37,290 --> 00:04:40,160 So you will see I don't have any idea. 53 00:04:40,170 --> 00:04:44,530 And the zip code features right now available in my data. 54 00:04:44,820 --> 00:04:51,110 Let's say for Advanced ViSalus, I'm going to import my plotless library. 55 00:04:51,120 --> 00:04:59,520 So import floridly Dot Xpress to step and I'm going to create a elĂ­as as B. 56 00:05:00,500 --> 00:05:03,990 So now my property is that important in my cell. 57 00:05:04,010 --> 00:05:11,030 So let's open the assignment and this is exactly the assignment, Larry first was in my data preparation 58 00:05:11,330 --> 00:05:16,100 in which I have to go through with a very basic staff checking the missing values. 59 00:05:16,320 --> 00:05:21,570 I reviewed all the features that are you know, that doesn't make sense at all. 60 00:05:21,590 --> 00:05:29,900 And the second one is basically I have to use five points, some concept to get a description about 61 00:05:29,900 --> 00:05:31,110 your data. 62 00:05:31,520 --> 00:05:35,200 So what exactly is five point summary? 63 00:05:35,210 --> 00:05:42,620 The very first one is in five point summary is my minimum value, which stands for the zero percentile 64 00:05:42,750 --> 00:05:43,140 data. 65 00:05:43,340 --> 00:05:50,270 And the second one is basically my 20 percent later and the third one is my 50 percent on data, which 66 00:05:50,270 --> 00:05:51,960 is termed as median. 67 00:05:52,280 --> 00:05:57,410 The third one is basically my seventy five percent, which is termed as two three Esbjorn. 68 00:05:57,710 --> 00:06:05,600 And the last one is of course my maximum value, or you can say as one hundred percent on data so you 69 00:06:05,600 --> 00:06:12,260 can achieve all this concept by just calling box brought on your data. 70 00:06:12,500 --> 00:06:19,270 So for this, I'm going to use floridly in such scenarios for interactive results. 71 00:06:19,280 --> 00:06:27,410 So from plotless, you have to call a function known as Bonks, which gives me a beautiful box plot. 72 00:06:27,590 --> 00:06:33,110 So if you are going to pass a step, the very first one parameter is data frame and then you have to 73 00:06:33,110 --> 00:06:39,060 pass your data on X and Y axis depending upon what exactly you want. 74 00:06:39,440 --> 00:06:41,370 So the very first one is my data. 75 00:06:41,720 --> 00:06:49,580 So my data frame is basically the X and then on X axis I have to pass various data or you can pass it 76 00:06:49,580 --> 00:06:51,330 on Y axis as well. 77 00:06:51,710 --> 00:07:00,410 So let's say I am going to set my y axis data, so I'm going to use five point summary for, let's say 78 00:07:00,410 --> 00:07:07,370 all these features just compute from here and just let's say based on here. 79 00:07:07,820 --> 00:07:14,480 So these are exactly my let's say I'm going to remove this coverage, which is responsible for credit 80 00:07:14,480 --> 00:07:15,620 card average. 81 00:07:16,400 --> 00:07:21,640 And now I have five features over here and I'm going to visualize this. 82 00:07:21,860 --> 00:07:23,830 So let's say this is my finger. 83 00:07:23,930 --> 00:07:32,180 I'm going to store it in seven and I have to just call a show on this finger to visualize my beautiful 84 00:07:32,210 --> 00:07:34,270 or interactive box plot. 85 00:07:34,700 --> 00:07:37,490 So it will take a couple of seconds to execute. 86 00:07:37,520 --> 00:07:40,330 So this is exactly my beautiful box. 87 00:07:40,660 --> 00:07:47,000 So if you are going to hover your mouse, so now you will see for income, I have minimum value eight, 88 00:07:47,030 --> 00:07:54,410 which is this one and two one, which is my basically twenty five on data is basically thirty nine and 89 00:07:54,410 --> 00:08:00,690 median is sixty four, which is basically my 58 percent data and two three which is my seventy five 90 00:08:00,710 --> 00:08:05,140 percent and it stands for two three as ninety eight. 91 00:08:05,450 --> 00:08:11,590 And this is exactly my maximum value, which is my two to four. 92 00:08:11,780 --> 00:08:20,000 So these all are my five point summary for my data, similarly for a similarly for experience, similarly 93 00:08:20,000 --> 00:08:24,800 for family and similarly in case of education as well. 94 00:08:25,190 --> 00:08:32,120 So that's a brief idea how you can visualize your visual, how you can conclude your visas and what 95 00:08:32,120 --> 00:08:36,690 type of visas you guys can basically perform on your data. 96 00:08:36,710 --> 00:08:44,240 So if you have to conclude this visa, you can see this five point summary suggests that experience 97 00:08:44,240 --> 00:08:46,000 has some negative values. 98 00:08:46,280 --> 00:08:51,470 So we have to fix this because experience can't have negative values. 99 00:08:51,830 --> 00:08:58,570 So we can see the minimum max me and the standard deviation for all the other features as well. 100 00:08:58,910 --> 00:09:06,110 And income you will see income has too much noise and it is slightly skewed, right? 101 00:09:06,410 --> 00:09:11,390 It means it is high positive outliers in income. 102 00:09:11,750 --> 00:09:19,340 And if you are going to conclude this age and experience, so you will see age and experience are some 103 00:09:19,360 --> 00:09:21,860 already equally distributed. 104 00:09:21,920 --> 00:09:25,130 So that's a type of conclusion you can draw on. 105 00:09:25,130 --> 00:09:28,310 You can in sense, from this box plot. 106 00:09:28,320 --> 00:09:31,400 So that's all of our discussions in the upcoming session. 107 00:09:31,640 --> 00:09:37,850 We are going to deal with some of the analysis as well as this experience, because it contains right 108 00:09:37,850 --> 00:09:39,410 now some negative values. 109 00:09:39,800 --> 00:09:41,610 So hopefully you will love this session. 110 00:09:41,630 --> 00:09:42,280 Thank you. 111 00:09:42,290 --> 00:09:43,150 Have a nice day. 112 00:09:43,370 --> 00:09:46,700 Keep learning, keep going and keep motivating.