So I've been trying to get up to speed on AI so I spent the last few days digesting content from all over the net.
I'm ready to get stuck in and build my first AI project, and one of those "ChatGPT but with your own documents" apps seems ideal.
So here's what looks like the general architecture of such a chatbot app:
Setup
Create an OpenAI account and get your API key
Create a vector database, e.g. PostgreSQL with pgvector or Redis
Create a table to hold all your embeddings (OpenAI's vectors require 1536)
Process your documents
Gather all your documents ad convert them into text
-- Convert line breaks and tabs to spaces (OpenAI’s recommendation)Break the document content into chunks
-- Set it to something small, e.g. 1000 words (look up tokens for more specific settings)Use OpenAI’s API to create embeddings for each chunk
Save each chunk into the vector database
Create your user-facing app
Create a form to ask questions and display responses
Write your prompts
Information gathering prompt: A prompt for OpenAI to keep asking questions until it has all the information it needs
Summarise question prompt: Another prompt to summarise the question
Answer prompt: A prompt that combines the final form of the question along with the relevant chunk of information
Chat
When someone asks a question, send the question and the Information gathering prompt to the OpenAI Chat API
Repeat until OpenAI responds that it has enough info
Send the chat transcript to OpenAI with the Summarise question prompt
Answer question
Use the OpenAI API to convert the summarised question into an embedding
Search the vector database for the chunk that would most likely contain the answer
-- Use the cosine distance function as recommended by OpenAIUse the Answer prompt to send the summarised question and the chunk to the OpenAI Completion API to get the answer
Wait for OpenAI to respond and display the answer to the user
Does that sound right? Would love some feedback from people who have built this.
And if you want to follow along as I build my first AI product, you can follow me on Twitter at https://twitter.com/farez.