A didactic RAG pipeline built with LangChain, featuring HuggingFace embeddings, FAISS vector store, and local FLAN-T5 model with optional OpenAI fallback. Designed for teaching modern RAG concepts step by step with a minimal, reproducible architecture.
Education-focused approach with curated small documents, simple FAISS indexing, and local FLAN-T5 LLM so students can run everything on CPU/GPU without external services.
Designed specifically for learning and teaching RAG concepts with clear, step-by-step documentation and minimal complexity to understand core principles.
Optimized for fast execution with lightweight models (MiniLM embeddings, FLAN-T5) that can run efficiently on limited hardware resources.
Modular design allowing easy experimentation with different components: embeddings, vector stores, and language models.
Runs entirely locally with optional cloud integrations, perfect for students and developers without API dependencies.
The pipeline follows a clean, linear flow from document ingestion to answer generation, with each component clearly separated for educational clarity.
Curated collection of educational texts covering churn analysis, NPS metrics, LangChain framework, and RAG concepts.
Clean and chunk documents into simple string segments optimized for embedding generation and semantic search.
Convert text chunks to high-dimensional vectors using sentence-transformers/all-MiniLM-L6-v2.
Store and index embeddings in Facebook's FAISS vector database for lightning-fast similarity search.
Perform top-k similarity search to find most relevant document chunks for any given query.
Process retrieved context and user query through FLAN-T5 local LLM via RetrievalQA chain.
Step-by-step breakdown of the core pipeline implementation with actual code examples from the project.
# 1) Sample educational documents
docs = [
"Churn is customer cancellation and represents significant business impact...",
"NPS, or Net Promoter Score, measures customer satisfaction and loyalty...",
"LangChain is a powerful library for building LLM-powered applications...",
"RAG combines information retrieval with text generation for enhanced accuracy...",
]
# 2) Setup embeddings and vector store
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.chains import RetrievalQA
# Initialize lightweight, fast embeddings
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
# Create FAISS vector store and retriever
vectorstore = FAISS.from_texts(docs, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})
# 3) Setup local FLAN-T5 model
from transformers import pipeline
from langchain_community.llms import HuggingFacePipeline
gen_pipeline = pipeline("text2text-generation", model="google/flan-t5-base", max_new_tokens=256)
llm = HuggingFacePipeline(pipeline=gen_pipeline)
# 4) Create complete RAG chain
qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)
# 5) Process queries and generate answers
response = qa_chain.invoke({"query": "What does churn mean?"})
print(response["result"])
Small, curated collection of educational strings about churn, NPS, LangChain, RAG, and embeddings - perfect for demonstrating retrieval concepts.
Uses MiniLM for lightweight, efficient text-to-vector conversion that runs smoothly on CPU with high-quality representations.
Facebook's FAISS library provides lightning-fast similarity search with configurable top-k retrieval for finding relevant context.
Google's model runs locally via HuggingFace, eliminating API dependencies while providing solid text generation.
RetrievalQA chain orchestrates the entire pipeline, seamlessly combining document retrieval with language model generation.
Optional integration with OpenAI models for enhanced performance when API access is available.
Carefully selected technologies optimized for educational use, performance, and accessibility.
Core language
Orchestration
Local LLM
Vector Search
Models & Embeddings
Embedding model
Fast query processing with optimized local model inference.
High precision in retrieving contextually relevant documents.
Complete independence from external APIs.
Lightweight architecture suitable for educational environments.
Comprehensive testing approach to ensure system reliability and educational value.
import time
# Test questions with expected topics
test_questions = [
("What does churn mean?", "churn"),
("How does RAG work?", "RAG"),
("What is NPS?", "NPS"),
("Explain LangChain benefits", "LangChain"),
]
# Evaluation metrics
def evaluate_system(qa_chain, test_questions):
results = []
for question, expected_topic in test_questions:
start_time = time.time()
result = qa_chain.invoke({"query": question})
response_time = time.time() - start_time
response_text = result["result"].lower()
is_relevant = expected_topic.lower() in response_text
results.append({
"question": question,
"response_time": response_time,
"relevance": is_relevant,
"answer": result["result"]
})
print(f"❓ {question}")
print(f"✅ {result['result']}")
print(f"⏱️ {response_time:.2f}s")
print(f"🎯 Relevant: {is_relevant}
")
return results
Validates that responses accurately reflect information from retrieved documents, ensuring factual consistency.
Measures how well the retrieval system finds contextually appropriate documents for different query types.
Assesses response coherence, grammatical correctness, and overall quality of generated text.
Well-organized codebase following Python best practices with comprehensive tooling.
Miguel_LLM-educacional/ ├── 📁 config/ # Hydra configuration files │ ├── main.yaml # Main configuration │ ├── model/ # Model parameters │ └── process/ # Processing parameters ├── 📁 data/ # Project data │ ├── raw/ # Raw input data │ ├── processed/ # Cleaned data │ └── final/ # Final datasets ├── 📁 notebooks/ # Jupyter notebooks │ └── miguel_llm.ipynb ├── 📁 src/ # Source code │ ├── __init__.py │ ├── process.py # Data processing │ ├── train_model.py # Model training │ └── utils.py # Utility functions ├── 📁 tests/ # Automated tests ├── pyproject.toml # Poetry dependencies └── README.md # Project documentation
Simple setup process to get the MIGUEL system running locally in minutes.
# 1. Clone the repository git clone https://github.com/bcmaymonegalvao/Miguel_LLM-educacional.git cd Miguel_LLM-educacional # 2. Create virtual environment python -m venv .venv source .venv/bin/activate # Windows: .venv\Scripts\activate # 3. Install dependencies pip install -U pip pip install faiss-cpu sentence-transformers langchain langchain-community transformers torch # 4. Run the notebook jupyter notebook notebooks/miguel_llm.ipynb
One-click execution in Google Colab with pre-configured environment and GPU acceleration.
Complete local setup with minimal dependencies, perfect for offline development and learning.
Comprehensive Jupyter notebooks with step-by-step explanations and learning exercises.
Dive into the code, experiment with the live demo, or reach out to discuss machine learning projects.