Fine-Tuning a Language Model on Company Policy Documents: A Step-by-Step Guide

Home » #Technology » Fine-Tuning a Language Model on Company Policy Documents: A Step-by-Step Guide

In today’s fast-paced corporate environment, employees often have questions about company policies—from attendance rules to leave entitlements and codes of conduct. While traditional intranets and HR portals provide static information, generative AI offers a more interactive way to access policy information.

For over 20 years, I’ve been building the future of tech, from writing millions of lines of code to leading transformative initiatives that fuel remarkable business growth. As advisor, I empower businesses to harness AI tech to power and make a real-world impact. In this tech concept, I’ll walk through how to fine-tune a language model specifically for answering questions about your company policies using Microsoft Word documents as the source material.

Why Fine-Tune a Policy Assistant?

Before we dive into the implementation, let’s consider the benefits:

Instant, accurate responses to employee policy questions
Consistent interpretation of policy language across the organization
Reduced HR workload for basic policy inquiries
24/7 availability for distributed teams across time zones

Technical Approach Overview

We’ll be using:

Python-docx for document processing
Microsoft’s compact but powerful Phi-2 model (2.7B parameters)
Parameter-Efficient Fine-Tuning (PEFT) with LoRA adapters
Hugging Face’s Transformers and Datasets libraries

Step 1: Policy Document Structure

Our sample company_policies.docx contains three main sections with subsections:

1. Attendance Policy
   - Work Hours: 9 AM to 6 PM
   - Late Arrival: 3 late marks = 1 warning
   - Remote Work: Max 2 days/week with manager approval

2. Leave Policy
   - Annual Leave: 15 days/year
   - Sick Leave: 10 days/year (medical certificate required after 3 days)

3. Code of Conduct
   - Dress Code: Business casual
   - Social Media: No confidential information sharing

This hierarchical structure is common in policy documents and provides natural segmentation for our training data.

Step 2: Environment Setup

pip install python-docx transformers datasets torch sentencepiece peft

Step 3: Document Processing Pipeline

The first challenge is extracting and structuring the policy content:

from docx import Document
import re
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments
from datasets import Dataset
import torch

def extract_policy_text(doc_path):
    """Extracts all text from a Word document"""
    doc = Document(doc_path)
    return '\n'.join(para.text for para in doc.paragraphs if para.text.strip())

def structure_policy_data(text):
    """Splits document into structured sections"""
    sections = re.split(r'\n\d+\. ', text)[1:]  # Split by numbered sections
    return {
        section[:section.find('\n')].strip(): section[section.find('\n'):].strip()
        for section in sections
    }

def create_training_examples(structured_data):
    """Generates training examples in instruction-response format"""
    examples = []
    for title, content in structured_data.items():
        # Policy explanation examples
        examples.append({"text": f"Explain our company {title}: {content}"})

        # Q&A examples
        for point in [p.strip() for p in content.split('\n') if p.strip()]:
            examples.append({
                "text": f"What is our policy regarding {title.lower()}? {point}"
            })
    return examples

This pipeline:

Extracts raw text from the Word document
Structures it by policy sections
Creates multiple training examples per policy item in both explanatory and Q&A formats

Step 4: Dataset Preparation

We convert our examples into a Hugging Face Dataset and tokenize them appropriately:

# Convert to dataset
dataset = Dataset.from_list(create_training_examples(
    structure_policy_data(
        extract_policy_text("company_policies.docx")
    )
))

# Tokenization setup
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

# Tokenize the entire dataset
tokenized_dataset = dataset.map(
    lambda examples: tokenizer(examples["text"], truncation=True, max_length=512),
    batched=True
)

Step 5: Efficient Fine-Tuning with LoRA

To maximize performance while minimizing compute requirements, we use Parameter-Efficient Fine-Tuning (PEFT) with LoRA:

from peft import LoraConfig, get_peft_model

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/phi-2",
    trust_remote_code=True,
    torch_dtype=torch.float16
)

# Configure LoRA
peft_config = LoraConfig(
    r=8,  # Rank
    lora_alpha=16,
    target_modules=["Wqkv", "out_proj"],  # Targeting attention layers
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

# Apply LoRA to model
model = get_peft_model(model, peft_config)

# Training configuration
training_args = TrainingArguments(
    output_dir="./policy_model",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    num_train_epochs=3,
    logging_steps=10,
    save_strategy="epoch"
)

# Initialize Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
)

# Start training
trainer.train()
trainer.save_model("fine_tuned_policy_model")

This configuration trains only about 0.1% of the model’s parameters while achieving strong performance on our policy QA task.

Step 6: Querying the Policy Assistant

After training, we can create a simple pipeline for policy questions:

from transformers import pipeline

policy_qa = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device="cuda" if torch.cuda.is_available() else "cpu"
)

def ask_policy_question(question):
    response = policy_qa(
        f"Question: {question}\nAnswer:",
        max_length=200,
        do_sample=True,
        temperature=0.7
    )
    return response[0]['generated_text']

Example usage:

print(ask_policy_question("What is the remote work policy?"))

Expected generative output (may vary):

Question: What is the remote work policy?
Answer: According to our Attendance Policy, employees may work remotely 
for a maximum of 2 days per week with prior manager approval. 
This is subject to team requirements and individual performance.

Advanced Considerations

For production deployment, you might want to:

Add policy citation references in responses
Implement confidence thresholds for uncertain answers
Create a feedback loop to improve responses over time
Add multi-turn conversation capabilities
Implement access controls for sensitive policies

My Tech Advice: This approach provides a practical way to transform static policy documents into an interactive assistant. The key advantages include:
Accuracy: Model responses stay grounded in the actual policy text
Efficiency: LoRA makes fine-tuning affordable
Maintainability: Easy to update when policies change
Scalability: Can handle hundreds of simultaneous queries
The same methodology can be extended to other document types like employee handbooks, compliance guidelines, or technical documentation.
#AskDushyant

Note: The names and information mentioned are based on my personal experience and publicly available data; however, they do not represent any formal statement. The example and pseudo code is for illustration only. You must modify and experiment with the concept to meet your specific needs.

#TechConcept #TechAdvice #AIFineTuning #HRTech  #PolicyAutomation #NaturalLanguageProcessing #LoRA #CorporateAI