Enterprises increasingly want AI systems that understand their internal language, policies, and documents, without exposing sensitive data to public cloud models. Traditional approaches like keyword search or basic RAG systems often fall short when consistency, reasoning, and domain understanding matter.
Unsloth framework changes this equation: It enables teams to fine-tune state-of-the-art open-source large language models directly on company data using consumer-grade GPUs, dramatically reducing cost, complexity, and infrastructure dependency. For over two decades, I’ve been igniting change and delivering scalable tech solutions that elevate organisations to new heights. My expertise transforms challenges into opportunities, inspiring businesses to thrive in the digital age.
This tech concept presents a production-ready, step-by-step workflow to fine-tune a private LLM using Unsloth, starting from raw PDF and Word documents and ending with a deployable, company-aware model. The approach is designed for startups, enterprises, and regulated industries that require data sovereignty, performance, and scalability—all without relying on expensive cloud GPUs.
Architecture Overview
End-to-End Pipeline
Documents → Text Extraction → Instruction Dataset → Unsloth Fine-Tuning → Private LLM DeploymentThis approach creates a domain-trained model that understands company language and policies, rather than merely retrieving text like traditional search systems.
Step 1: Choose the Base Model
Selecting the Right Foundation Model
For enterprise documents, prioritize quality, context length, and GPU efficiency.
Recommended open-source models:
- Meta LLaMA 3 (8B)
- Mistral 7B Instruct
- Gemma 7B IT
We use LLaMA 3 8B, which balances strong reasoning with manageable VRAM requirements.
Step 2: Environment Setup
Hardware Requirements
- NVIDIA GPU with at least ~16 GB VRAM (24 GB recommended)
- 32 GB system RAM
- Ubuntu 22.04 or compatible Linux distribution
Dependency Installation
pip install unsloth
pip install torch transformers datasets accelerate peft
pip install pypdf python-docxThis setup enables document parsing, dataset creation, and Unsloth-optimized training.
Step 3: Extract Text from PDF and Word Documents
Organizing the Data Directory
data/
├── pdfs/
├── docs/
└── raw_text/
PDF Text Extraction
from pypdf import PdfReader
import os
def extract_pdf_text(path):
reader = PdfReader(path)
return "\n".join(page.extract_text() for page in reader.pages)
for file in os.listdir("data/pdfs"):
text = extract_pdf_text(f"data/pdfs/{file}")
with open(f"data/raw_text/{file}.txt", "w") as f:
f.write(text)Word Document Extraction
from docx import Document
def extract_docx_text(path):
doc = Document(path)
return "\n".join(p.text for p in doc.paragraphs)
for file in os.listdir("data/docs"):
text = extract_docx_text(f"data/docs/{file}")
with open(f"data/raw_text/{file}.txt", "w") as f:
f.write(text)At this stage, all company documents are converted into raw text files.
Step 4: Clean and Chunk the Text
Preparing Text for LLM Training
Language models train more effectively on structured, consistent text blocks.
import re
def clean_text(text):
text = re.sub(r'\s+', ' ', text)
return text.strip()
def chunk_text(text, chunk_size=800):
words = text.split()
return [" ".join(words[i:i+chunk_size]) for i in range(0, len(words), chunk_size)]Chunking preserves context while avoiding token overflow during training.
Step 5: Convert Company Data into an Instruction Dataset
Why Instruction Formatting Matters
Instruction tuning teaches the model how to respond, not just what to memorize.
Standard JSONL Structure
{
"instruction": "Explain the company leave policy",
"input": "",
"output": "Employees are entitled to..."
}Dataset Generation Script
import json
import os
dataset = []
for file in os.listdir("data/raw_text"):
with open(f"data/raw_text/{file}") as f:
text = clean_text(f.read())
chunks = chunk_text(text)
for chunk in chunks:
dataset.append({
"instruction": "Summarize the following company document",
"input": chunk,
"output": chunk
})
with open("company_dataset.jsonl", "w") as f:
for item in dataset:
f.write(json.dumps(item) + "\n")This step converts unstructured documents into supervised training data.
Step 6: Load the Model with Unsloth
Applying Unsloth Optimizations
from unsloth import FastLanguageModel
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="meta-llama/Meta-Llama-3-8B",
max_seq_length=2048,
dtype=torch.float16,
load_in_4bit=True
)Unsloth dramatically reduces VRAM usage while accelerating training speed.
Step 7: Apply LoRA for Memory-Efficient Fine-Tuning
Low-Rank Adaptation Configuration
model = FastLanguageModel.get_peft_model(
model,
r=16,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
lora_alpha=16,
lora_dropout=0.05,
bias="none"
)LoRA ensures fast convergence and keeps the base model frozen.
Step 8: Prepare the Dataset for Training
Prompt Formatting for Instruction Tuning
from datasets import load_dataset
dataset = load_dataset("json", data_files="company_dataset.jsonl")
def format_prompt(example):
return {
"text": f"""### Instruction:
{example['instruction']}
### Input:
{example['input']}
### Response:
{example['output']}"""
}
dataset = dataset.map(format_prompt)This structure aligns the dataset with instruction-following LLM behavior.
Step 9: Train the Model Using Unsloth
Training Configuration
from transformers import TrainingArguments, Trainer
trainer = Trainer(
model=model,
train_dataset=dataset["train"],
args=TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
warmup_steps=100,
max_steps=1000,
learning_rate=2e-4,
fp16=True,
logging_steps=10,
output_dir="./company_llm",
save_steps=500,
save_total_limit=2
)
)
trainer.train()Unsloth’s optimizations enable efficient fine-tuning even on a single GPU.
Step 10: Save and Export the Fine-Tuned Model
Persisting the LoRA Adapter
model.save_pretrained("company_llm_lora")
tokenizer.save_pretrained("company_llm_lora")The output is a lightweight adapter that can be reused or merged later.
Step 11: Run Inference with the Private LLM
Testing Company-Specific Queries
FastLanguageModel.for_inference(model)
prompt = """### Instruction:
What is the company leave policy?
### Response:
"""
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))The model now responds using internal company knowledge and tone.
Step 12: Deployment Options
Secure and Scalable Deployment Paths
- vLLM for high-throughput inference
- Text Generation Inference for production APIs
- llama.cpp after merging LoRA adapters
These options support on-prem environments.
Best Practices: Fine-Tuning vs RAG
Choosing the Right Strategy
| Use Case | Recommended Approach |
|---|---|
| Policies, tone, internal language | Fine-tuning with Unsloth |
| Frequently changing information | Retrieval-Augmented Generation (RAG) |
| Highest accuracy | Fine-tuning combined with RAG |
My Tech Advice: Unsloth fundamentally lowers the barrier to building private, high-quality large language models trained on proprietary company data. By combining efficient fine-tuning techniques like LoRA with aggressive memory and speed optimizations, teams can now train powerful LLMs on internal PDFs and Word documents using a single on-prem GPU.
For organizations aiming to move beyond generic AI assistants and toward true internal intelligence, Unsloth offers a practical, cost-effective, and production-ready path forward.
Ready to build your own AI tech ? Try the above tech concept, or contact me for a tech advice!
#AskDushyant
Note: The names and information mentioned are based on my personal experience; however, they do not represent any formal statement. The example and pseudo code is for illustration only. You must modify and experiment with the concept to meet your specific needs.
#TechConcept #TechAdvice #UnslothAI #LLMFineTuning #PrivateLLM #EnterpriseAI #OnPremAI #OpenSourceAI #LLaMA3 #LoRA #ConsumerGPU #SecureAI #DataPrivacyAI #AIModelTraining

Leave a Reply