0 1 00:00:11,870 --> 00:00:12,680 Hello everyone. 1 2 00:00:12,680 --> 00:00:16,990 Welcome to another video of malware analysis and reverse engineering course. 2 3 00:00:17,180 --> 00:00:24,110 Starting from this particular video we are now going to focus on portable executable file formats which 3 4 00:00:24,110 --> 00:00:27,980 are basically executable files for Windows operating system. 4 5 00:00:28,580 --> 00:00:36,350 So if you remember in our previous videos we primarily focused on understanding th cyber kill chain. 5 6 00:00:36,590 --> 00:00:42,740 To really understand how a cybercrime operation works in the digital space. 6 7 00:00:42,740 --> 00:00:50,630 After that, we looked at some of the common delivery mechanisms by which malware or weaponized files 7 8 00:00:50,660 --> 00:00:52,820 can be delivered onto your system. 8 9 00:00:52,820 --> 00:00:59,300 We looked at things like analyzing spear phishing e-mails looking at e-mail headers and stuff like that. 9 10 00:00:59,300 --> 00:01:06,910 After that we looked at e-mails attachments which consisted of things like document files or PDF files. 10 11 00:01:06,920 --> 00:01:14,000 We then looked at the structure of these document files to really understand how we can analyze whether 11 12 00:01:14,000 --> 00:01:18,420 there is something suspicious or bad inside the file. 12 13 00:01:18,470 --> 00:01:26,420 Now what happens is we all know that spear phishing is one of the most common ways of delivering malware 13 14 00:01:26,420 --> 00:01:28,160 onto your system. 14 15 00:01:28,180 --> 00:01:34,850 Now usually those spear phishing e-mails would either have a document file as an attachment or it may 15 16 00:01:34,850 --> 00:01:42,680 contain a hyperlink which when you click would redirect you to probably some website where a document 16 17 00:01:42,680 --> 00:01:48,710 is posted and from there it will just try and download the document and execute it on your machine. 17 18 00:01:48,710 --> 00:01:57,440 So document files basically serve as a first level payload or the entry point onto the system. Those document 18 19 00:01:57,440 --> 00:02:02,410 files would not be the actual malware that could infect your machine. 19 20 00:02:02,540 --> 00:02:10,080 They will be the first level payload which will serve as the delivery mechanism for this second stage payload. 20 21 00:02:10,280 --> 00:02:13,070 The second stage payload is the actual malware. 21 22 00:02:13,610 --> 00:02:19,880 It's actually an executable file a binary file which will run on your Windows operating systems. 22 23 00:02:20,300 --> 00:02:22,650 Just to give you an example. 23 24 00:02:23,390 --> 00:02:29,330 There is a difference between how you install a software and how you modify a document onto a machine 24 25 00:02:29,330 --> 00:02:29,900 right. 25 26 00:02:29,990 --> 00:02:35,870 In case of document modification you load a document in a secondary software and you edit the document 26 27 00:02:35,900 --> 00:02:37,280 you save it and that's it. 27 28 00:02:37,430 --> 00:02:44,240 That as in case of a software you actually install it on your machine because it requires certain special 28 29 00:02:44,240 --> 00:02:46,900 rights before it can run on your machine. 29 30 00:02:47,030 --> 00:02:55,020 That's the reason why documents can only serve as a delivery mechanism. A weaponized document can help 30 31 00:02:55,130 --> 00:03:02,030 deliver an executable malware onto the machine but unless the set executable doesn't land onto a machine 31 32 00:03:02,030 --> 00:03:10,350 Your machine is still not completely infected. So that's why analyzing portable executable files is the 32 33 00:03:10,350 --> 00:03:13,010 most crucial part of malware analysis. 33 34 00:03:13,050 --> 00:03:19,920 So the processes and steps that we learned in previous videos was more like a stepping 34 35 00:03:19,920 --> 00:03:25,520 stone. I do not want to directly to analyze executable files 35 36 00:03:25,740 --> 00:03:31,540 I want you guys to really understand the flow as to how an attack happens and in that way, you can effectively 36 37 00:03:31,540 --> 00:03:32,980 respond to threats. 37 38 00:03:33,360 --> 00:03:40,410 So moving forward we are primarily going to focus on executable files especially on Windows machines 38 39 00:03:43,550 --> 00:03:47,650 so the PE file format or Portable Executable file format. 39 40 00:03:47,660 --> 00:03:57,320 What exactly is a PE file. Portable executable files are nothing but a data structure format for executables, 40 41 00:03:57,330 --> 00:04:01,880 object codes and DLL's in Windows operating systems. 41 42 00:04:02,030 --> 00:04:04,910 It can work for both 64 as well as 32 bits. 42 43 00:04:05,210 --> 00:04:11,330 So every executable that exist on the windows file is a portable executable file. 43 44 00:04:11,420 --> 00:04:15,020 It has to be in that structure so that Windows can really understand that. 44 45 00:04:15,050 --> 00:04:15,410 OK. 45 46 00:04:15,440 --> 00:04:18,560 It's in a critical executable file that should run it. 46 47 00:04:18,680 --> 00:04:20,920 Let's load it into the memory. 47 48 00:04:20,960 --> 00:04:27,140 Link It's different drivers or linkage with different required DLL's and start executing it to really 48 49 00:04:27,470 --> 00:04:30,130 see what the executable is doing on the machine. 49 50 00:04:30,350 --> 00:04:36,560 So all the software everything that you install on your Windows machine is nothing but a portable 50 51 00:04:36,560 --> 00:04:38,880 executable file. 51 52 00:04:39,500 --> 00:04:46,010 DLL's are also of portable executable file format because DLL's are also a kind of executables. 52 53 00:04:46,370 --> 00:04:49,770 So DLL basically stands for dynamic link libraries. 53 54 00:04:49,780 --> 00:04:55,490 They are nothing but reusable codes that are present in the Windows environment which you can 54 55 00:04:55,490 --> 00:05:04,220 you can use if you want to create softwares on the windows operating systems. 55 56 00:05:04,220 --> 00:05:12,530 So a p e file consists of a number of headers and sections that tells the windows dynamic linker about 56 57 00:05:12,530 --> 00:05:14,900 how to map that file into the memory. 57 58 00:05:14,930 --> 00:05:21,520 So what happens is when you run a portable executable file it's actually on the disk. 58 59 00:05:21,610 --> 00:05:30,800 So before it can be executed it has to be mapped into the memory. So the PE file structure tells the Windows 59 60 00:05:30,800 --> 00:05:38,270 operating system about how to map that file on disk into the memory and how to locate all the different 60 61 00:05:38,280 --> 00:05:45,350 resources are all the different DLL's that the portable executable file needs. 61 62 00:05:45,350 --> 00:05:50,900 It's a data structure that tells Windows OS loader which information it's required in order to manage 62 63 00:05:50,940 --> 00:05:53,360 to grab executable code. 63 64 00:05:53,420 --> 00:05:59,850 It includes dynamic library references for linking API export and import tables. 64 65 00:05:59,880 --> 00:06:04,630 That's very critical information and we'll be looking at actual examples in. 65 66 00:06:04,790 --> 00:06:08,670 In this particular video or in the next video going forward. 66 67 00:06:08,960 --> 00:06:16,460 The PE data structure includes things like DOS header, DOS stub, PE file headers, Optional headers & bunch 67 68 00:06:16,460 --> 00:06:19,280 of sections and so on. 68 69 00:06:19,280 --> 00:06:22,860 Let's quickly look at an example. 69 70 00:06:22,880 --> 00:06:28,330 So this is a very simplified presentation of the e files structure. 70 71 00:06:28,340 --> 00:06:32,810 This is how different sections of the file would look like. 71 72 00:06:32,870 --> 00:06:38,180 This is a very simplified version so it starts with a DOS header. 72 73 00:06:38,180 --> 00:06:46,650 Then it has PE signature or PE header. Then it has the COFF header or the common object file format header. 73 74 00:06:47,120 --> 00:06:54,890 So COFF is nothing but common file format that was designed for executable files on any platform. 74 75 00:06:55,370 --> 00:07:04,610 and the PE file is nothing but an executable file format built for Windows 75 76 00:07:04,670 --> 00:07:07,100 environment on top of COFF header. 76 77 00:07:07,490 --> 00:07:14,930 Similarly, we have the ELF file format for the Linux environment. We have MachO for 77 78 00:07:15,560 --> 00:07:16,710 Mac operating systems. So after the COFF header, 78 79 00:07:16,710 --> 00:07:22,780 we have the optional header then we have these sections and how those sections are mapped into the 79 80 00:07:22,790 --> 00:07:23,090 memory 80 81 00:07:25,990 --> 00:07:33,960 so if you look at the PE file from a very high level it would primarily have two parts. 81 82 00:07:34,000 --> 00:07:40,660 One is the headers and the other one is this section. The header would basically consist of things 82 83 00:07:40,660 --> 00:07:45,630 like dos header, PE header, optional header and the section table. 83 84 00:07:46,150 --> 00:07:50,700 So let us look at a very simple example. 84 85 00:07:50,770 --> 00:07:54,360 We'll be using a tool called CFF explorer 85 86 00:07:59,660 --> 00:08:02,300 let us look at a very simple example. 86 87 00:08:02,320 --> 00:08:07,940 We'll be using a tool called CFF Explorer or common file format explorer. 87 88 00:08:07,940 --> 00:08:13,200 This is a very simple tool to parse COFF file format. 88 89 00:08:13,430 --> 00:08:20,170 If you have followed the previous videos then this will come pre-installed with your FLARE VM setup. 89 90 00:08:20,240 --> 00:08:24,960 If not you can just search for it on Google and download this particular tool. 90 91 00:08:24,980 --> 00:08:26,300 its easy to find 91 92 00:08:26,420 --> 00:08:35,040 So let's load an executable. You can basically load any executable into CFF explorer 92 93 00:08:35,040 --> 00:08:37,580 that is available on your Windows machine. 93 94 00:08:37,620 --> 00:08:45,570 Our whole purpose is to just look at the different sections of the file and look at the hex version 94 95 00:08:45,570 --> 00:08:52,500 off and file and how we can quickly identify by looking at Hex Bytes that OK this is portable executable 95 96 00:08:52,500 --> 00:08:52,910 file. 96 97 00:08:53,670 --> 00:09:02,590 So it's expanded. Once the PE file gets loaded in CFF, this is how it will look like. On the 97 98 00:09:02,590 --> 00:09:08,210 left-hand side you'll see all the parsed headers and sections that it could identify. 98 99 00:09:08,290 --> 00:09:11,270 Then there will be bunch of other options as well. 99 100 00:09:11,290 --> 00:09:21,160 Let's look at the hex editor. So this is how a PE looks like in a hex editor. 100 101 00:09:21,170 --> 00:09:27,200 It starts with MZ as the magic bytes. 101 102 00:09:27,350 --> 00:09:34,540 So this is the first and foremost way to identify whether the file is a PE file or not. 102 103 00:09:34,970 --> 00:09:44,040 So if we go to our presentation, we mentioned that the header of a PE file would begin with DOS header 103 104 00:09:44,300 --> 00:09:47,930 and this is a DOS header that we are talking about. 104 105 00:09:47,990 --> 00:09:52,240 So MZ is nothing but a magic number of the PE file. 105 106 00:09:52,340 --> 00:09:56,040 Then it consists of a DOS stub which is called 106 107 00:09:56,390 --> 00:09:59,300 "This program cannot run in DOS mode" 107 108 00:09:59,300 --> 00:10:01,380 This is more like a legacy thing. 108 109 00:10:01,460 --> 00:10:07,050 If the PE file gets executed in a non-DOS environment, this is the error that would come up. 109 110 00:10:07,070 --> 00:10:12,430 It's not something which is used these days but you know it's it's really difficult to remove it 110 111 00:10:12,440 --> 00:10:14,400 without breaking a bunch of softwares. 111 112 00:10:14,630 --> 00:10:17,950 So it's more like a legacy thing it's not really useful. 112 113 00:10:19,370 --> 00:10:26,930 if you move down further you'll see that there is a section called PE. Portable executable files 113 114 00:10:27,020 --> 00:10:29,990 header begins after the DOSstub ends. 114 115 00:10:30,350 --> 00:10:38,990 So the magic number of portable executable file is PE 00 00 so if you look at the hex presentation 115 116 00:10:39,020 --> 00:10:42,920 is it's 50 45 00 00. 116 117 00:10:43,160 --> 00:10:47,240 So that's our PE header the one that we have highlighted here. 117 118 00:10:47,720 --> 00:10:54,670 So this is how we can easily identify that this is a portable executable file that can run inside of 118 119 00:10:54,680 --> 00:10:55,610 windows environment. 119 120 00:10:56,000 --> 00:11:03,500 So if you keep moving down you'll have things like the optional headers which which would contain lot 120 121 00:11:03,500 --> 00:11:10,670 of other details about the file and we look at the option header in much more detail in the next video 121 122 00:11:10,940 --> 00:11:18,080 because it requires a lot more detail parsing and separate tools to really look at what these random 122 123 00:11:18,200 --> 00:11:23,600 values actually translate into. Text data, resource 123 124 00:11:23,600 --> 00:11:26,440 This is basically the section piece. 124 125 00:11:26,450 --> 00:11:30,920 This particular part of the PE file. So that's where the section part starts. 125 126 00:11:30,920 --> 00:11:37,760 This would contain all the code related information all the API related information about the PE file. 126 127 00:11:37,760 --> 00:11:44,290 So this was just a quick demonstration to get you started about analyzing PE file. 127 128 00:11:44,300 --> 00:11:51,070 Understanding how we can recognize a PE file by looking at its magic bytes and things like that. 128 129 00:11:51,080 --> 00:11:52,370 Thanks a lot for watching the video.