Home » #Technology » Mastering Hugging Face Model Caching on Windows: Locations, Configuration, and Optimization

Mastering Hugging Face Model Caching on Windows: Locations, Configuration, and Optimization

If you’re working with Hugging Face’s transformers and peft libraries on Windows, you’ve likely seen messages or warnings related to model caching, symlinks, and environment variables. This guide demystifies how Hugging Face handles model storage, how to change the cache locations, and how to resolve common issues — especially on Windows.

What Is Model Caching in Hugging Face?

When you use Hugging Face libraries like transformers, models and tokenizers are downloaded and cached locally so they don’t need to be re-downloaded for future use. This caching system improves speed and efficiency, particularly when experimenting or using multiple scripts or notebooks.

By default, Hugging Face caches models in the following directories:

Hugging Face Hub Cache

This cache stores downloaded models and datasets from the Hugging Face Hub:

  • Windows: C:\Users\<YourUsername>\.cache\huggingface\hub\ or if overridden: H:\HuggingFace\Cache\hub\
  • Linux/macOS: ~/.cache/huggingface/hub/

Transformers-Specific Cache

This stores transformers-specific files like:

  • pytorch_model.bin
  • tokenizer.json
  • config.json

Location:

  • Windows: C:\Users\<YourUsername>\.cache\huggingface\transformers
  • Linux/macOS: ~/.cache/huggingface/transformers/

Symlinks Warning on Windows

You may encounter a warning like this when using Hugging Face:

huggingface_hub\file_download.py: UserWarning:
`huggingface_hub` cache-system uses symlinks by default...
Caching will still work but in a degraded version...

What It Means:

Hugging Face uses symlinks (shortcuts) to reduce duplicated files and save space. However, Windows often restricts symlink creation unless:

  • Developer Mode is enabled, or
  • Python is run as an Administrator

How to Enable Symlinks on Windows

Option 1: Enable Developer Mode

  1. Press Win + R, type: ms-settings:developers
  2. Enable Developer Mode

If “For Developers” is missing from Settings, you can enable Developer Mode via the Windows Registry (advanced users only — instructions below).

Option 2: Run Python as Administrator

  • Right-click your terminal or IDE (e.g., VS Code) → Run as Administrator
  • This allows symlink creation without Developer Mode.

Option 3: Suppress the Warning

You can safely ignore the warning, but to suppress it:

import os
os.environ["HF_HUB_DISABLE_SYMLINKS_WARNING"] = "1"

Or set it globally via environment variables:

  • Variable name: HF_HUB_DISABLE_SYMLINKS_WARNING
  • Value: 1

Changing Cache Directory Locations

1. Change Hugging Face Hub Cache (HF_HOME)

Set the HF_HOME environment variable:

In Python (temporary):
import os
os.environ["HF_HOME"] = "D:/MyCustomHFCache"
System-wide (Windows):
  • Open Environment Variables
  • Add:
    • Name: HF_HOME
    • Value: D:\MyCustomHFCache

2. Change Transformers Cache (TRANSFORMERS_CACHE)

In Python:
os.environ["TRANSFORMERS_CACHE"] = "D:/MyTransformersCache"
System-wide:
  • Add a new Environment Variable:
    • Name: TRANSFORMERS_CACHE
    • Value: D:\MyTransformersCache

Managing Cache Storage

You can inspect or clean up your cache folder manually:

  • Check the size:
    • Navigate to .cache/huggingface and right-click → Properties
  • Delete old models:
    • You can safely delete folders like models--..., snapshots, or specific model subfolders
    • Hugging Face will re-download them when needed

Example: Loading a Fine-Tuned PEFT Model with LoRA

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import PeftModel, PeftConfig

# Load PEFT config
peft_model_path = "fine_tuned_policy_model"
config = PeftConfig.from_pretrained(peft_model_path)

# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Apply LoRA adapter
model = PeftModel.from_pretrained(base_model, peft_model_path)
model.eval()

# Inference
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
print(pipe("Once upon a time,", max_new_tokens=50)[0]["generated_text"])

Summary

TaskSolution
Default cache path~/.cache/huggingface/ (Windows: C:\Users\<You>\.cache\...)
Change cache locationUse HF_HOME or TRANSFORMERS_CACHE env vars
Enable symlinksTurn on Developer Mode or run Python as Admin
Suppress symlink warningsSet HF_HUB_DISABLE_SYMLINKS_WARNING=1
Clean up old modelsManually delete from cache folder

My Tech Advice: Managing Hugging Face’s cache and understanding how to optimize it — especially on Windows — is key to a smoother ML workflow. With a few tweaks, you can avoid annoying warnings, save disk space, and keep things running efficiently.

Ready to optimise your HuggingFace solution ? Try the above tech concept, or contact me for a tech advice!

#AskDushyant
Note: The names and information mentioned are based on my personal experience and publicly available data; however, they do not represent any formal statement.
#TechConcept #TechAdvice #HuggingFace #AI #Configuration

Leave a Reply

Your email address will not be published. Required fields are marked *