Fine-Tuning a Private LLM with Unsloth Using Company PDF and Word Data

Home » #Technology » Fine-Tuning a Private LLM with Unsloth Using Company PDF and Word Data

Enterprises increasingly want AI systems that understand their internal language, policies, and documents, without exposing sensitive data to public cloud models. Traditional approaches like keyword search or basic RAG systems often fall short when consistency, reasoning, and domain understanding matter.

Unsloth framework changes this equation: It enables teams to fine-tune state-of-the-art open-source large language models directly on company data using consumer-grade GPUs, dramatically reducing cost, complexity, and infrastructure dependency. For over two decades, I’ve been igniting change and delivering scalable tech solutions that elevate organisations to new heights. My expertise transforms challenges into opportunities, inspiring businesses to thrive in the digital age.

This tech concept presents a production-ready, step-by-step workflow to fine-tune a private LLM using Unsloth, starting from raw PDF and Word documents and ending with a deployable, company-aware model. The approach is designed for startups, enterprises, and regulated industries that require data sovereignty, performance, and scalability—all without relying on expensive cloud GPUs.

Architecture Overview

End-to-End Pipeline

Documents → Text Extraction → Instruction Dataset → Unsloth Fine-Tuning → Private LLM Deployment

This approach creates a domain-trained model that understands company language and policies, rather than merely retrieving text like traditional search systems.

Step 1: Choose the Base Model

Selecting the Right Foundation Model

For enterprise documents, prioritize quality, context length, and GPU efficiency.

Recommended open-source models:

Meta LLaMA 3 (8B)
Mistral 7B Instruct
Gemma 7B IT

We use LLaMA 3 8B, which balances strong reasoning with manageable VRAM requirements.

Step 2: Environment Setup

Hardware Requirements

NVIDIA GPU with at least ~16 GB VRAM (24 GB recommended)
32 GB system RAM
Ubuntu 22.04 or compatible Linux distribution

Dependency Installation

pip install unsloth
pip install torch transformers datasets accelerate peft
pip install pypdf python-docx

This setup enables document parsing, dataset creation, and Unsloth-optimized training.

Step 3: Extract Text from PDF and Word Documents

Organizing the Data Directory

data/
 ├── pdfs/
 ├── docs/
 └── raw_text/

PDF Text Extraction

from pypdf import PdfReader
import os

def extract_pdf_text(path):
    reader = PdfReader(path)
    return "\n".join(page.extract_text() for page in reader.pages)

for file in os.listdir("data/pdfs"):
    text = extract_pdf_text(f"data/pdfs/{file}")
    with open(f"data/raw_text/{file}.txt", "w") as f:
        f.write(text)

Word Document Extraction

from docx import Document

def extract_docx_text(path):
    doc = Document(path)
    return "\n".join(p.text for p in doc.paragraphs)

for file in os.listdir("data/docs"):
    text = extract_docx_text(f"data/docs/{file}")
    with open(f"data/raw_text/{file}.txt", "w") as f:
        f.write(text)

At this stage, all company documents are converted into raw text files.

Step 4: Clean and Chunk the Text

Preparing Text for LLM Training

Language models train more effectively on structured, consistent text blocks.

import re

def clean_text(text):
    text = re.sub(r'\s+', ' ', text)
    return text.strip()

def chunk_text(text, chunk_size=800):
    words = text.split()
    return [" ".join(words[i:i+chunk_size]) for i in range(0, len(words), chunk_size)]

Chunking preserves context while avoiding token overflow during training.

Step 5: Convert Company Data into an Instruction Dataset

Why Instruction Formatting Matters

Instruction tuning teaches the model how to respond, not just what to memorize.

Standard JSONL Structure

{
  "instruction": "Explain the company leave policy",
  "input": "",
  "output": "Employees are entitled to..."
}

Dataset Generation Script

import json
import os

dataset = []

for file in os.listdir("data/raw_text"):
    with open(f"data/raw_text/{file}") as f:
        text = clean_text(f.read())
        chunks = chunk_text(text)

        for chunk in chunks:
            dataset.append({
                "instruction": "Summarize the following company document",
                "input": chunk,
                "output": chunk
            })

with open("company_dataset.jsonl", "w") as f:
    for item in dataset:
        f.write(json.dumps(item) + "\n")

This step converts unstructured documents into supervised training data.

Step 6: Load the Model with Unsloth

Applying Unsloth Optimizations

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="meta-llama/Meta-Llama-3-8B",
    max_seq_length=2048,
    dtype=torch.float16,
    load_in_4bit=True
)

Unsloth dramatically reduces VRAM usage while accelerating training speed.

Step 7: Apply LoRA for Memory-Efficient Fine-Tuning

Low-Rank Adaptation Configuration

model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none"
)

LoRA ensures fast convergence and keeps the base model frozen.

Step 8: Prepare the Dataset for Training

Prompt Formatting for Instruction Tuning

from datasets import load_dataset

dataset = load_dataset("json", data_files="company_dataset.jsonl")

def format_prompt(example):
    return {
        "text": f"""### Instruction:
{example['instruction']}

### Input:
{example['input']}

### Response:
{example['output']}"""
    }

dataset = dataset.map(format_prompt)

This structure aligns the dataset with instruction-following LLM behavior.

Step 9: Train the Model Using Unsloth

Training Configuration

from transformers import TrainingArguments, Trainer

trainer = Trainer(
    model=model,
    train_dataset=dataset["train"],
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=100,
        max_steps=1000,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=10,
        output_dir="./company_llm",
        save_steps=500,
        save_total_limit=2
    )
)

trainer.train()

Unsloth’s optimizations enable efficient fine-tuning even on a single GPU.

Step 10: Save and Export the Fine-Tuned Model

Persisting the LoRA Adapter

model.save_pretrained("company_llm_lora")
tokenizer.save_pretrained("company_llm_lora")

The output is a lightweight adapter that can be reused or merged later.

Step 11: Run Inference with the Private LLM

Testing Company-Specific Queries

FastLanguageModel.for_inference(model)

prompt = """### Instruction:
What is the company leave policy?

### Response:
"""

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

The model now responds using internal company knowledge and tone.

Step 12: Deployment Options

Secure and Scalable Deployment Paths

vLLM for high-throughput inference
Text Generation Inference for production APIs
llama.cpp after merging LoRA adapters

These options support on-prem environments.

Best Practices: Fine-Tuning vs RAG

Choosing the Right Strategy

Use Case	Recommended Approach
Policies, tone, internal language	Fine-tuning with Unsloth
Frequently changing information	Retrieval-Augmented Generation (RAG)
Highest accuracy	Fine-tuning combined with RAG

My Tech Advice: Unsloth fundamentally lowers the barrier to building private, high-quality large language models trained on proprietary company data. By combining efficient fine-tuning techniques like LoRA with aggressive memory and speed optimizations, teams can now train powerful LLMs on internal PDFs and Word documents using a single on-prem GPU.
For organizations aiming to move beyond generic AI assistants and toward true internal intelligence, Unsloth offers a practical, cost-effective, and production-ready path forward.
Ready to build your own AI tech ? Try the above tech concept, or contact me for a tech advice!
#AskDushyant


Note: The names and information mentioned are based on my personal experience; however, they do not represent any formal statement. The example and pseudo code is for illustration only. You must modify and experiment with the concept to meet your specific needs.

#TechConcept #TechAdvice #UnslothAI #LLMFineTuning #PrivateLLM #EnterpriseAI #OnPremAI #OpenSourceAI #LLaMA3 #LoRA #ConsumerGPU #SecureAI #DataPrivacyAI #AIModelTraining