Storing PDF or DOC files directly into a database can be essential for enterprise applications, document management systems, and cloud-based platforms. Java provides robust APIs for file handling and seamless integration with various databases, including MySQL, PostgreSQL, and MongoDB.
For over two decades, I’ve been igniting change and delivering scalable tech solutions that elevate organisations to new heights. My expertise transforms challenges into opportunities, inspiring businesses to thrive in the digital age. This tech concept, breaks down the complete workflow of how Java processes and stores PDF or DOC files using different techniques for each database system. Real-world code examples are included for each approach.
Java File Storage Workflow: From Disk to Database
Whether you’re working with MySQL, PostgreSQL, or MongoDB, the file storage workflow in Java typically involves:
- Reading the file in binary format using
FileInputStream
orFiles.readAllBytes()
. - Converting the file to a byte array or binary stream.
- Storing the binary data into a supported data type:
BLOB
,BYTEA
,BinData
, orGridFS
.
Storing Files in MySQL Using Java
MySQL Table Setup
CREATE TABLE documents (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255),
type VARCHAR(50),
content LONGBLOB
);
Java Approach
In MySQL, use a LONGBLOB
column to store binary data. Java can insert binary data using PreparedStatement.setBinaryStream()
or setBytes()
.
Java Example Code
File file = new File("sample.pdf");
FileInputStream fis = new FileInputStream(file);
Connection conn = DriverManager.getConnection(DB_URL, USER, PASSWORD);
PreparedStatement ps = conn.prepareStatement("INSERT INTO documents (name, type, content) VALUES (?, ?, ?)");
ps.setString(1, file.getName());
ps.setString(2, "application/pdf");
ps.setBinaryStream(3, fis, (int) file.length());
ps.executeUpdate();
ps.close();
conn.close();
Storing Files in PostgreSQL Using Java
PostgreSQL supports two main approaches for storing binary files: BYTEA
and Large Objects (LO) with OID
.
Option 1: BYTEA Field
PostgreSQL Table Setup
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
name TEXT,
type TEXT,
content BYTEA
);
Java Code Example
byte[] fileBytes = Files.readAllBytes(Paths.get("sample.docx"));
Connection conn = DriverManager.getConnection(DB_URL, USER, PASSWORD);
PreparedStatement ps = conn.prepareStatement("INSERT INTO documents (name, type, content) VALUES (?, ?, ?)");
ps.setString(1, "sample.docx");
ps.setString(2, "application/vnd.openxmlformats-officedocument.wordprocessingml.document");
ps.setBytes(3, fileBytes);
ps.executeUpdate();
ps.close();
conn.close();
Option 2: Large Object (OID)
For large files or streamed access, PostgreSQL provides the Large Object API.
Java Code Example
Connection conn = DriverManager.getConnection(DB_URL, USER, PASSWORD);
conn.setAutoCommit(false);
LargeObjectManager lobj = ((org.postgresql.PGConnection) conn).getLargeObjectAPI();
long oid = lobj.createLO(LargeObjectManager.READ | LargeObjectManager.WRITE);
LargeObject obj = lobj.open(oid, LargeObjectManager.WRITE);
FileInputStream fis = new FileInputStream("sample.pdf");
byte[] buf = new byte[2048];
int s;
while ((s = fis.read(buf, 0, 2048)) > 0) {
obj.write(buf, 0, s);
}
obj.close();
PreparedStatement ps = conn.prepareStatement("INSERT INTO documents (name, type, content_oid) VALUES (?, ?, ?)");
ps.setString(1, "sample.pdf");
ps.setString(2, "application/pdf");
ps.setLong(3, oid);
ps.executeUpdate();
conn.commit();
ps.close();
conn.close();
Storing Files in MongoDB Using Java
MongoDB allows storing files either directly as binary data or through GridFS
for larger files or streamed access.
Option 1: Binary Field (For Files < 16MB)
Java Code Example
MongoClient mongoClient = new MongoClient("localhost", 27017);
MongoDatabase database = mongoClient.getDatabase("mydb");
MongoCollection<Document> collection = database.getCollection("files");
byte[] fileBytes = Files.readAllBytes(Paths.get("sample.pdf"));
Document doc = new Document("name", "sample.pdf")
.append("type", "application/pdf")
.append("content", new Binary(BsonBinarySubType.BINARY, fileBytes));
collection.insertOne(doc);
mongoClient.close();
Option 2: GridFS (For Large Files > 16MB)
Java Code Example
MongoClient mongoClient = MongoClients.create();
MongoDatabase database = mongoClient.getDatabase("mydb");
GridFSBucket gridFSBucket = GridFSBuckets.create(database, "pdfFiles");
FileInputStream streamToUploadFrom = new FileInputStream(new File("sample.pdf"));
GridFSUploadOptions options = new GridFSUploadOptions()
.chunkSizeBytes(1024 * 1024)
.metadata(new Document("type", "application/pdf"));
ObjectId fileId = gridFSBucket.uploadFromStream("sample.pdf", streamToUploadFrom, options);
mongoClient.close();
Comparison Table: Techniques by Database
Database | File Type | Storage Format | Java Method |
---|---|---|---|
MySQL | PDF/DOCX | LONGBLOB | setBinaryStream() or setBytes() |
PostgreSQL | PDF/DOCX | BYTEA / Large Object | setBytes() or LargeObjectManager |
MongoDB | PDF/DOCX | Binary / GridFS | Binary(BsonBinarySubType) or GridFS |
Best Practices and Considerations
- Always use prepared statements for safe and efficient binary data insertion.
- Use GridFS in MongoDB when files exceed 16MB or need streamed access.
- Add metadata fields like filename, MIME type, and timestamps to improve retrieval and indexing.
- Stream large files rather than reading into memory if scalability is a concern.
- Ensure that your database configuration supports large file uploads (e.g.,
max_allowed_packet
in MySQL).
My Tech Advice: Java makes it straightforward to process and store PDF or DOC files across various databases. Whether you’re working with relational systems like MySQL or PostgreSQL, or document-oriented stores like MongoDB, the key lies in using binary-safe I/O methods and database-specific APIs. By choosing the right storage format—
LONGBLOB
,BYTEA
,OID
, orGridFS
—you can ensure scalability, performance, and compatibility across platforms.Ready to build your own tech solution ? Try the above tech concept, or contact me for a tech advice!
#AskDushyant
Note: The names and information mentioned are based on my personal experiences`; however, they do not represent any formal statement. The example and pseudo code is for illustration only. You must modify and experiment with the concept to meet your specific needs.
#TechConcept #TechAdvice
Leave a Reply