Mastering Hugging Face Model Caching on Windows: Locations, Configuration, and Optimization

Home » #Technology » Mastering Hugging Face Model Caching on Windows: Locations, Configuration, and Optimization

If you’re working with Hugging Face’s transformers and peft libraries on Windows, you’ve likely seen messages or warnings related to model caching, symlinks, and environment variables. This guide demystifies how Hugging Face handles model storage, how to change the cache locations, and how to resolve common issues — especially on Windows.

What Is Model Caching in Hugging Face?

When you use Hugging Face libraries like transformers, models and tokenizers are downloaded and cached locally so they don’t need to be re-downloaded for future use. This caching system improves speed and efficiency, particularly when experimenting or using multiple scripts or notebooks.

By default, Hugging Face caches models in the following directories:

Hugging Face Hub Cache

This cache stores downloaded models and datasets from the Hugging Face Hub:

Windows: C:\Users\<YourUsername>\.cache\huggingface\hub\ or if overridden: H:\HuggingFace\Cache\hub\
Linux/macOS: ~/.cache/huggingface/hub/

Transformers-Specific Cache

This stores transformers-specific files like:

pytorch_model.bin
tokenizer.json
config.json

Location:

Windows: C:\Users\<YourUsername>\.cache\huggingface\transformers
Linux/macOS: ~/.cache/huggingface/transformers/

Symlinks Warning on Windows

You may encounter a warning like this when using Hugging Face:

huggingface_hub\file_download.py: UserWarning:
`huggingface_hub` cache-system uses symlinks by default...
Caching will still work but in a degraded version...

What It Means:

Hugging Face uses symlinks (shortcuts) to reduce duplicated files and save space. However, Windows often restricts symlink creation unless:

Developer Mode is enabled, or
Python is run as an Administrator

How to Enable Symlinks on Windows

Option 1: Enable Developer Mode

Press Win + R, type: ms-settings:developers
Enable Developer Mode

If “For Developers” is missing from Settings, you can enable Developer Mode via the Windows Registry (advanced users only — instructions below).

Option 2: Run Python as Administrator

Right-click your terminal or IDE (e.g., VS Code) → Run as Administrator
This allows symlink creation without Developer Mode.

Option 3: Suppress the Warning

You can safely ignore the warning, but to suppress it:

import os
os.environ["HF_HUB_DISABLE_SYMLINKS_WARNING"] = "1"

Or set it globally via environment variables:

Variable name: HF_HUB_DISABLE_SYMLINKS_WARNING
Value: 1

Changing Cache Directory Locations

1. Change Hugging Face Hub Cache (`HF_HOME`)

Set the HF_HOME environment variable:

In Python (temporary):

import os
os.environ["HF_HOME"] = "D:/MyCustomHFCache"

System-wide (Windows):

Open Environment Variables
Add:
- Name: HF_HOME
- Value: D:\MyCustomHFCache

2. Change Transformers Cache (`TRANSFORMERS_CACHE`)

In Python:

os.environ["TRANSFORMERS_CACHE"] = "D:/MyTransformersCache"

System-wide:

Add a new Environment Variable:
- Name: TRANSFORMERS_CACHE
- Value: D:\MyTransformersCache

Managing Cache Storage

You can inspect or clean up your cache folder manually:

Check the size:
- Navigate to .cache/huggingface and right-click → Properties
Delete old models:
- You can safely delete folders like models--..., snapshots, or specific model subfolders
- Hugging Face will re-download them when needed

Example: Loading a Fine-Tuned PEFT Model with LoRA

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel, PeftConfig

# Load PEFT config
peft_model_path = "fine_tuned_policy_model"
config = PeftConfig.from_pretrained(peft_model_path)

# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Apply LoRA adapter
model = PeftModel.from_pretrained(base_model, peft_model_path)
model.eval()

# Inference
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
print(pipe("Once upon a time,", max_new_tokens=50)[0]["generated_text"])

Summary

Task	Solution
Default cache path	`~/.cache/huggingface/` (Windows: `C:\Users\<You>\.cache\...`)
Change cache location	Use `HF_HOME` or `TRANSFORMERS_CACHE` env vars
Enable symlinks	Turn on Developer Mode or run Python as Admin
Suppress symlink warnings	Set `HF_HUB_DISABLE_SYMLINKS_WARNING=1`
Clean up old models	Manually delete from cache folder

My Tech Advice: Managing Hugging Face’s cache and understanding how to optimize it — especially on Windows — is key to a smoother ML workflow. With a few tweaks, you can avoid annoying warnings, save disk space, and keep things running efficiently.
Ready to optimise your HuggingFace solution ? Try the above tech concept, or contact me for a tech advice!
#AskDushyant

Note: The names and information mentioned are based on my personal experience and publicly available data; however, they do not represent any formal statement.

#TechConcept #TechAdvice #HuggingFace #AI #Configuration