1 00:00:00,110 --> 00:00:02,620 Okay, so let's use the Athena service 2 00:00:02,620 --> 00:00:06,910 to query our data in Amazon S3 using the SQL query language. 3 00:00:06,910 --> 00:00:10,310 So we are getting into Athena, and, as we can see, 4 00:00:10,310 --> 00:00:12,430 this is the new editor. 5 00:00:12,430 --> 00:00:14,070 And, so, before you run your first query, 6 00:00:14,070 --> 00:00:17,570 you need to set up a query result location in Amazon S3. 7 00:00:17,570 --> 00:00:21,150 So let's click on View settings and then Manage. 8 00:00:21,150 --> 00:00:25,090 And then, here, when to specify a location of query results. 9 00:00:25,090 --> 00:00:27,810 So what I have to do is to go into Amazon S3, 10 00:00:27,810 --> 00:00:29,540 create a bucket, and I'll call this one 11 00:00:29,540 --> 00:00:34,540 stephane-demo-athena-eu-central-1, 12 00:00:35,460 --> 00:00:36,760 and that should be unique. 13 00:00:38,430 --> 00:00:40,130 And then we'll create this bucket. 14 00:00:41,330 --> 00:00:43,230 Now the bucket is created, I can click on View details. 15 00:00:43,230 --> 00:00:45,740 Here is my buckets, and I can just copy 16 00:00:45,740 --> 00:00:47,090 the bucket name right here. 17 00:00:48,760 --> 00:00:53,760 And I'm going to just enter s3:// and then my bucket name. 18 00:00:55,180 --> 00:00:57,350 Just click on Save, and now my query result location 19 00:00:57,350 --> 00:01:00,210 is going to be in this S3 buckets. 20 00:01:00,210 --> 00:01:02,010 So, now that we have this done, 21 00:01:02,010 --> 00:01:04,810 we can refresh this page, and we can see 22 00:01:04,810 --> 00:01:07,520 that there are already some Data Sources, 23 00:01:07,520 --> 00:01:11,500 the AwsDataCatalog, and some Databases we can use, okay. 24 00:01:11,500 --> 00:01:13,340 But what I'm going to do is that I'm going to create 25 00:01:13,340 --> 00:01:14,952 my own database for it to work 26 00:01:14,952 --> 00:01:17,440 because you may not necessarily have this. 27 00:01:17,440 --> 00:01:19,100 So, for this, it's pretty easy. 28 00:01:19,100 --> 00:01:23,260 We're going to create our S3 access logs in Amazon S3. 29 00:01:23,260 --> 00:01:24,820 So I've made my bucket called 30 00:01:24,820 --> 00:01:27,610 demo-s3-access-logs-stephane-2020, 31 00:01:27,610 --> 00:01:31,770 which contains my access logs in this S3 logs folder. 32 00:01:31,770 --> 00:01:34,130 So what I have to do now is to actually create a database, 33 00:01:34,130 --> 00:01:37,320 and create this table that will present the objects in here. 34 00:01:37,320 --> 00:01:38,550 So, how to do so? 35 00:01:38,550 --> 00:01:40,470 Well, let's go into our code. 36 00:01:40,470 --> 00:01:42,390 And the first line we have under S3 events, 37 00:01:42,390 --> 00:01:45,980 athena-s3-access-logs-sql, is to create this database. 38 00:01:45,980 --> 00:01:47,720 So we'll paste this in, 39 00:01:47,720 --> 00:01:50,233 create database, and then click on Run. 40 00:01:51,210 --> 00:01:53,730 And this creates a database successfully, 41 00:01:53,730 --> 00:01:56,180 so now we have a specific database, here, 42 00:01:56,180 --> 00:01:57,393 called s3_access_logs_db. 43 00:01:58,320 --> 00:02:01,040 Next, we need to create an external table 44 00:02:01,040 --> 00:02:03,340 that represents the bucket logs. 45 00:02:03,340 --> 00:02:06,530 So let me paste, copy and paste this. 46 00:02:06,530 --> 00:02:08,729 And the question is, how did I get this? 47 00:02:08,729 --> 00:02:13,730 Well, if you type athena s3 access logs, on the internet, 48 00:02:14,730 --> 00:02:18,840 you will have a link, and this link will actually give you 49 00:02:18,840 --> 00:02:23,840 the query units you do to create these proper access logs. 50 00:02:24,010 --> 00:02:26,880 So the one thing I have to change is under the location, 51 00:02:26,880 --> 00:02:30,060 when to change a target bucket name and the prefix. 52 00:02:30,060 --> 00:02:33,260 Well, it turns out that my properties will give me this 53 00:02:33,260 --> 00:02:34,860 so I'll copy this S3 URI, 54 00:02:34,860 --> 00:02:37,500 which has the bucket name and the prefix. 55 00:02:37,500 --> 00:02:41,610 So I will paste this in right here, and we're good to go. 56 00:02:41,610 --> 00:02:45,570 Now let's run this, and now the query is successful. 57 00:02:45,570 --> 00:02:48,105 So, now, under this database, we see there's a table called 58 00:02:48,105 --> 00:02:51,290 mybuckets_logs, with a bunch of columns. 59 00:02:51,290 --> 00:02:55,220 So if I click on this, and click on Preview Table, 60 00:02:55,220 --> 00:02:57,160 now this is going to run a select star 61 00:02:57,160 --> 00:02:59,500 from my table, limit 10. 62 00:02:59,500 --> 00:03:03,750 And, at the bottom, we see that it returns 10 rows, okay, 63 00:03:03,750 --> 00:03:06,700 and these 10 rows represents my access logs, 64 00:03:06,700 --> 00:03:07,880 so this is super handy. 65 00:03:07,880 --> 00:03:11,170 Now I get all my access logs data in this table, 66 00:03:11,170 --> 00:03:12,450 super, super handy. 67 00:03:12,450 --> 00:03:15,110 So this is just scratching the surface because now, 68 00:03:15,110 --> 00:03:16,990 using Athena, we can run some more 69 00:03:16,990 --> 00:03:20,090 interesting types of queries. 70 00:03:20,090 --> 00:03:21,980 So, as we can see right now, we have done nothing. 71 00:03:21,980 --> 00:03:24,170 We didn't set up any database, per se, 72 00:03:24,170 --> 00:03:26,410 we didn't provision database, this is all serverless okay. 73 00:03:26,410 --> 00:03:28,530 All these queries are run by the Athena service 74 00:03:28,530 --> 00:03:31,630 directly on the data that sits within our S3 buckets, 75 00:03:31,630 --> 00:03:33,990 which is, I think, extremely powerful. 76 00:03:33,990 --> 00:03:36,390 So, in this query, we're going to say 77 00:03:36,390 --> 00:03:38,450 we want to get the requesturi_operation, 78 00:03:38,450 --> 00:03:41,050 the httpstatus, count star, to count how many, 79 00:03:41,050 --> 00:03:43,010 FROM this table, and we're going to do a GROUP BY. 80 00:03:43,010 --> 00:03:44,920 So this is a more advanced SQL query, 81 00:03:44,920 --> 00:03:47,580 but it's gonna give us a log of reports 82 00:03:47,580 --> 00:03:50,360 of the status codes as well as how many times they occurred. 83 00:03:50,360 --> 00:03:53,510 So, we can see, for example, that a GET 404 84 00:03:53,510 --> 00:03:58,510 happened 49 times, and a HEAD 200 happened 81 time, 85 00:03:58,540 --> 00:04:01,290 and so on, or GET 403 happened 14 times. 86 00:04:01,290 --> 00:04:03,360 So, this could help us, for example, 87 00:04:03,360 --> 00:04:07,160 do a report into if visioning what types of queries, 88 00:04:07,160 --> 00:04:09,540 or what types of requests do not happen, 89 00:04:09,540 --> 00:04:13,310 or do not succeed, on our buckets, so super handy. 90 00:04:13,310 --> 00:04:14,780 And one last, for example, 91 00:04:14,780 --> 00:04:17,260 we saw that there were a few, four or three errors, 92 00:04:17,260 --> 00:04:19,610 so it can say, hey, this is when it's unauthorized, 93 00:04:19,610 --> 00:04:21,760 this is very fishy, either someone is not authorized 94 00:04:21,760 --> 00:04:24,530 to do something, and, cool, we want to know who and why. 95 00:04:24,530 --> 00:04:27,060 Or if someone is trying to get unauthorized access 96 00:04:27,060 --> 00:04:29,980 to our buckets and we wanted to just deep dive into it. 97 00:04:29,980 --> 00:04:33,630 So, again, we run this query and then we have 44 rows, 98 00:04:33,630 --> 00:04:36,320 which are not present here because the data is probably 99 00:04:36,320 --> 00:04:39,200 very, very big, but we have 44 rows, okay, 100 00:04:39,200 --> 00:04:42,057 and these rows will represent what type of query 101 00:04:42,057 --> 00:04:46,640 generated and ended up in a 403, which is super handy. 102 00:04:46,640 --> 00:04:49,590 So, again, the whole power, here, of Athena 103 00:04:49,590 --> 00:04:52,300 is that we're able to literally query data on Amazon S3, 104 00:04:52,300 --> 00:04:53,530 do some complex queries directly, 105 00:04:53,530 --> 00:04:55,130 without setting up any servers, 106 00:04:55,130 --> 00:04:56,560 without transforming our data, 107 00:04:56,560 --> 00:04:58,460 just by setting up the right data formats, 108 00:04:58,460 --> 00:05:00,250 and specifying what the data is, 109 00:05:00,250 --> 00:05:02,930 Athena is able to help us with all these things, okay. 110 00:05:02,930 --> 00:05:05,144 Finally, in Athena, you can see the recent queries 111 00:05:05,144 --> 00:05:07,690 as well as saved queries, if you wanted to, 112 00:05:07,690 --> 00:05:09,530 and edit the settings if you wanted to encrypt 113 00:05:09,530 --> 00:05:12,040 the query results in the target buckets. 114 00:05:12,040 --> 00:05:14,120 So that's it for this lecture, I hope you liked it, 115 00:05:14,120 --> 00:05:16,070 and I will see you in the next lecture.