Storing PDF, Word, or document files in a database is a common requirement in modern web applications, especially in document management systems, HR software, legal tech, or fintech platforms. While cloud storage is a popular approach, storing files directly in the database offers several benefits in terms of consistency, auditing, and control.
I’ve spent 20+ years empowering businesses to achieve extraordinary results through strategic technology adoption and transformative leadership. My experience, from writing millions of lines of code to leading major initiatives, is dedicated to helping them realise their full potential.
In this tech concept, you’ll learn the core concepts and techniques for storing document files directly inside a database using MySQL, PostgreSQL, and MongoDB. We’ll cover use cases, data types, performance considerations, and give you conceptual examples to help guide your implementation.
What Does It Mean to Store a File in a Database?
Databases store structured data—like strings, numbers, and dates—but they also support storing binary data through specific data types. Files such as PDFs or DOCX documents are first read as byte streams and then stored in fields that support binary content.
You can store documents in one of two ways:
- Store the file’s binary content in the database using a binary or BLOB (Binary Large Object) field.
- Store only the file path or reference, while keeping the actual file in cloud or local storage.
This post focuses on the first technique: storing file content in the database itself.
Benefits and Trade-Offs
Advantages:
- Data Integrity: Files and their metadata are stored atomically.
- Centralized Backup: All files and data exist in the same backup system.
- Access Control: Security policies can be applied through database permissions.
- Auditability: Logging and monitoring can track access and changes.
Disadvantages:
- Performance Overhead: Databases are not optimized for streaming large files.
- Increased Storage Size: Backups and replication can be slower.
- Scalability Limitations: For very large or frequent file uploads, performance may degrade.
How Different Databases Handle File Storage
Let’s look at how major databases support storing files, including the best practices and data types to use.
MySQL: Storing Files Using BLOB or LONGBLOB
MySQL supports storing binary data using the BLOB family of types. Choose the appropriate type based on the maximum expected file size.
TINYBLOB
: Up to 255 bytesBLOB
: Up to 65 KBMEDIUMBLOB
: Up to 16 MBLONGBLOB
: Up to 4 GB
Recommended Schema:
CREATE TABLE documents (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255),
mimetype VARCHAR(100),
data LONGBLOB,
uploaded_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
Best Practices:
- Use prepared statements or ORM libraries to insert binary data.
- Adjust
max_allowed_packet
in MySQL to support large uploads. - Do not index BLOB fields. Index only relevant metadata (e.g.,
name
oruploaded_at
).
PostgreSQL: Using BYTEA or Large Object Storage
PostgreSQL provides two main ways to store binary files:
Option 1: BYTEA
stores binary data directly in the table, similar to MySQL’s BLOB.
Example Table:
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
name TEXT,
mimetype TEXT,
data BYTEA,
uploaded_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Use encode()
and decode()
for binary conversion if needed.
Option 2: Large Object (LO) Storage
This method stores files in a separate system catalog and references them via an OID
in your table.
Example:
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
name TEXT,
mimetype TEXT,
file_oid OID
);
Use lo_import
, lo_export
, and relevant client-side drivers to insert and retrieve data.
Best Practices:
- Prefer
BYTEA
for small to medium files (<1MB). - Use
Large Objects
for files that exceed a few megabytes. - Regularly clean up orphaned large objects when records are deleted.
MongoDB: Using BinData or GridFS
MongoDB, a NoSQL document database, offers two approaches to handle binary data.
Option 1: BinData (Binary Field)
For files smaller than 16 MB (MongoDB document limit), you can store files directly as binary.
Example Document:
{
"_id": ObjectId("..."),
"name": "report.pdf",
"mimetype": "application/pdf",
"data": BinData(0, "<base64_encoded_bytes>"),
"uploaded_at": ISODate("2025-06-05T12:00:00Z")
}
Option 2: GridFS for Large Files
GridFS is MongoDB’s file storage system for files larger than 16 MB. It splits files into chunks and stores them in two collections:
fs.files
: stores metadatafs.chunks
: stores the actual binary data in 255KB chunks
Best Practices:
- Use GridFS when handling large documents, media, or backup files.
- Rely on MongoDB drivers, which handle splitting and reconstruction automatically.
- GridFS enables file streaming and partial reads/writes, which is not possible with BinData.
Summary Table: Database-Specific Techniques
Database | Method | Data Type | Best For | Max Size |
---|---|---|---|---|
MySQL | Direct binary storage | BLOB / LONGBLOB | Small–medium files | Up to 4 GB |
PostgreSQL | BYTEA or Large Object (lo) | BYTEA / OID | BYTEA: <1MB, lo: large | Multi-GB (lo) |
MongoDB | BinData or GridFS | BinData / GridFS | BinData <16MB, GridFS >16MB | Unlimited |
Additional Considerations
- Encryption
- For sensitive data like legal contracts or medical documents, use encryption libraries (e.g., AES, Fernet) to encrypt content before storing.
- Compression
- Reduce storage and transmission time by compressing files using tools like ZIP or GZIP before upload.
- Indexing
- Never index binary fields. Instead, create indexes on fields like
name
,type
,uploaded_at
, orowner_id
.
- Never index binary fields. Instead, create indexes on fields like
- Backup and Replication
- Ensure that your database backup solution accounts for binary fields. Binary-heavy tables may increase backup sizes and slow replication.
- Chunking and Streaming
- Only MongoDB (via GridFS) natively supports chunked reads and writes. If using MySQL or PostgreSQL with very large files, consider application-layer chunking or alternate file storage strategies.
When to Use Database-Based File Storage
Use this method if:
- File size is relatively small (<10MB)
- Files require transactional consistency with other data
- You want a single backup, replication, and security mechanism
- You need audit trails for file access and changes
Avoid if:
- Files are large or streamed frequently
- File access performance is a top priority
- You already use a cloud-based file storage system (S3, Azure Blob, etc.)
My Tech Advice: Storing PDFs and Word documents in a database is a practical technique for managing small to medium files with strong consistency and security requirements. MySQL, PostgreSQL, and MongoDB each offer robust mechanisms for handling binary data, with different trade-offs in performance, scalability, and complexity.
Understanding these techniques helps you make informed decisions for your architecture, especially when working on enterprise apps, HR systems, or document workflows.
Ready to build your own tech solution ? Try the above tech concept, or contact me for a tech advice!
#AskDushyant
Note: The names and information mentioned are based on my personal experience; however, they do not represent any formal statement. Pseudo code is for illustration only, You must modify and experiment with the concept to meet your specific needs.
#TechConcept #TechAdvice #Database #FileSystem #MySQL #MongoDB #MariaDB #PostgreSQL
Leave a Reply