1 00:00:00,250 --> 00:00:03,200 Hello, guys, and welcome to this ZTM project. 2 00:00:03,690 --> 00:00:06,420 We will build together, step -by -step, 3 00:00:06,730 --> 00:00:11,980 line -by -line, an LLM -powered question -answering application for custom or 4 00:00:11,990 --> 00:00:13,100 private documents. 5 00:00:14,210 --> 00:00:16,260 Before we get started with Lanchain, 6 00:00:16,510 --> 00:00:21,800 Pinecon, OpenAI, and all these amazing technologies, I'd like to show you a demo 7 00:00:21,810 --> 00:00:24,080 of what they can do as we progress. 8 00:00:25,170 --> 00:00:27,320 We'll be following a learning -by -doing 9 00:00:27,330 --> 00:00:32,340 approach, and what you see next is what we'll be doing together throughout this project. 10 00:00:32,870 --> 00:00:38,020 We won't jump directly into the project, because I also want you to build a strong 11 00:00:38,030 --> 00:00:42,560 foundation in Lanchain so that you can create custom LLM applications. 12 00:00:43,150 --> 00:00:47,020 After a short introduction to Lanchain, we'll dive into its main components. 13 00:00:47,570 --> 00:00:52,780 We'll talk about LLM wrappers, chains, and agents, and how to combine them with 14 00:00:52,790 --> 00:00:56,580 the Pinecon Vector Datastore and with OpenAI's models. 15 00:00:57,270 --> 00:01:02,180 After this big section, we'll move on to the actual question -answering project. 16 00:01:03,230 --> 00:01:06,680 Let's take a look at a short preview of what we'll build. 17 00:01:07,210 --> 00:01:10,740 Let's imagine that you want to start a business in the EU. 18 00:01:11,870 --> 00:01:18,040 All EU business entities must follow the GDPR regulation, which was implemented by 19 00:01:18,050 --> 00:01:23,060 the European Union to protect the privacy and personal data of EU citizens. 20 00:01:23,070 --> 00:01:29,480 It standardizes data protection laws across all EU member states and imposes 21 00:01:29,490 --> 00:01:35,640 strict rules on controlling and processing personally identifiable information. 22 00:01:36,290 --> 00:01:41,420 If you don't comply, you could be fined hundreds of thousands of euros, and your 23 00:01:41,430 --> 00:01:42,740 business could be closed. 24 00:01:43,330 --> 00:01:44,280 This is not a joke. 25 00:01:45,810 --> 00:01:47,900 Here is the GDPR website. 26 00:01:48,570 --> 00:01:50,380 As you can see, there are tens of 27 00:01:50,390 --> 00:01:54,740 chapters, and the PDF file is 88 pages long. 28 00:01:55,890 --> 00:01:58,060 You could probably spend a few full days 29 00:01:58,070 --> 00:02:02,600 reading and understanding it, or you could hire a specialized lawyer to 30 00:02:02,610 --> 00:02:03,620 explain it to you. 31 00:02:03,810 --> 00:02:05,400 And that's not a cheap job. 32 00:02:06,850 --> 00:02:11,900 Alternatively, you could build an LLM application that learns the content of 33 00:02:11,910 --> 00:02:16,320 this document so that you can ask it questions about anything related to it. 34 00:02:16,330 --> 00:02:21,140 It's like having that expensive lawyer in front of you, but for free. 35 00:02:22,150 --> 00:02:27,020 I used this example because it's something that's well known and of 36 00:02:27,030 --> 00:02:28,100 interest in Europe. 37 00:02:28,670 --> 00:02:30,600 But you could use any other private 38 00:02:30,610 --> 00:02:36,260 document, instruction manuals, legal or accounting documents, medical studies, 39 00:02:36,690 --> 00:02:38,100 treatment plans, and so on. 40 00:02:38,470 --> 00:02:41,420 The AI model was not trained on this set 41 00:02:41,430 --> 00:02:44,940 of data because it's not public or is too recent. 42 00:02:45,470 --> 00:02:47,840 Once you build the application, you can 43 00:02:47,850 --> 00:02:49,760 use it with any other document. 44 00:02:50,330 --> 00:02:54,100 I'm loading the 88 -page PDF into a 45 00:02:54,110 --> 00:02:55,120 Lanchain document. 46 00:02:56,530 --> 00:02:58,740 I'm running the code in this cell. 47 00:03:00,270 --> 00:03:01,600 It's loading the file. 48 00:03:04,710 --> 00:03:07,120 Now I'm splitting it into smaller chunks. 49 00:03:10,260 --> 00:03:12,970 There are 856 chunks. 50 00:03:17,050 --> 00:03:20,360 Next, I'll embed the chunks into numeric 51 00:03:20,370 --> 00:03:24,100 vectors and insert them into a Pinecon index. 52 00:03:24,970 --> 00:03:26,180 I'm running the code. 53 00:03:27,790 --> 00:03:33,460 It's embedding the chunks, creating the Pinecon index, and inserting both the 54 00:03:33,470 --> 00:03:36,520 chunks and the embeddings into the Pinecon index. 55 00:03:37,410 --> 00:03:40,820 It takes some time, so I'm pausing the video until it's done. 56 00:03:41,910 --> 00:03:46,940 It is done, and the embeddings are now in the Pinecon index. 57 00:03:48,310 --> 00:03:49,240 The vector embeddings. 58 00:03:50,550 --> 00:03:52,340 Let's ask a few questions. 59 00:03:53,090 --> 00:03:54,660 I'm running this piece of code. 60 00:03:55,450 --> 00:03:57,900 And the first question will be, what are 61 00:03:57,910 --> 00:04:00,260 the main points described in the document? 62 00:04:02,230 --> 00:04:04,000 And I've got the answer. 63 00:04:05,410 --> 00:04:05,800 Ok. 64 00:04:06,670 --> 00:04:09,000 Let's ask what is GDPR? 65 00:04:10,370 --> 00:04:11,960 What is GDPR? 66 00:04:19,880 --> 00:04:22,390 And it says that GDPR stands for General 67 00:04:22,400 --> 00:04:27,410 Data Protection Regulation. And that it is an EU law. 68 00:04:27,760 --> 00:04:28,150 Very good. 69 00:04:29,520 --> 00:04:30,550 The next question. 70 00:04:31,160 --> 00:04:35,210 Tell me more about the requirement for consent, please. 71 00:04:38,470 --> 00:04:39,100 Very good. 72 00:04:40,130 --> 00:04:40,920 And the last question. 73 00:04:41,550 --> 00:04:43,520 What about the right to be forgotten? 74 00:04:44,350 --> 00:04:45,720 This is something very interesting. 75 00:04:52,140 --> 00:04:53,100 This is the answer. 76 00:04:54,910 --> 00:04:57,760 All the answers were given only based on 77 00:04:57,770 --> 00:04:59,440 the content of that document. 78 00:05:00,510 --> 00:05:02,980 You can use any other document in any 79 00:05:02,990 --> 00:05:04,080 format instead. 80 00:05:05,030 --> 00:05:07,900 This is amazing, and in my opinion, this 81 00:05:07,910 --> 00:05:11,100 opens up an infinite number of use cases. 82 00:05:12,190 --> 00:05:14,860 We'll probably add the application, a 83 00:05:14,870 --> 00:05:16,940 fancy web interface, and that's it. 84 00:05:17,730 --> 00:05:18,460 Great. 85 00:05:18,830 --> 00:05:21,040 Now we'll move on and dive into LangShare. 86 00:05:21,290 --> 00:05:22,300 Let's get started.