Home » #Technology » How Java Processes PDF and DOC Files for Storage in MySQL, PostgreSQL, and MongoDB

How Java Processes PDF and DOC Files for Storage in MySQL, PostgreSQL, and MongoDB

Storing PDF or DOC files directly into a database can be essential for enterprise applications, document management systems, and cloud-based platforms. Java provides robust APIs for file handling and seamless integration with various databases, including MySQL, PostgreSQL, and MongoDB.

For over two decades, I’ve been igniting change and delivering scalable tech solutions that elevate organisations to new heights. My expertise transforms challenges into opportunities, inspiring businesses to thrive in the digital age. This tech concept, breaks down the complete workflow of how Java processes and stores PDF or DOC files using different techniques for each database system. Real-world code examples are included for each approach.

Java File Storage Workflow: From Disk to Database

Whether you’re working with MySQL, PostgreSQL, or MongoDB, the file storage workflow in Java typically involves:

  1. Reading the file in binary format using FileInputStream or Files.readAllBytes().
  2. Converting the file to a byte array or binary stream.
  3. Storing the binary data into a supported data type: BLOBBYTEABinData, or GridFS.

Storing Files in MySQL Using Java

MySQL Table Setup

CREATE TABLE documents (
    id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(255),
    type VARCHAR(50),
    content LONGBLOB
);

Java Approach

In MySQL, use a LONGBLOB column to store binary data. Java can insert binary data using PreparedStatement.setBinaryStream() or setBytes().

Java Example Code

File file = new File("sample.pdf");
FileInputStream fis = new FileInputStream(file);

Connection conn = DriverManager.getConnection(DB_URL, USER, PASSWORD);
PreparedStatement ps = conn.prepareStatement("INSERT INTO documents (name, type, content) VALUES (?, ?, ?)");
ps.setString(1, file.getName());
ps.setString(2, "application/pdf");
ps.setBinaryStream(3, fis, (int) file.length());
ps.executeUpdate();
ps.close();
conn.close();

Storing Files in PostgreSQL Using Java

PostgreSQL supports two main approaches for storing binary files: BYTEA and Large Objects (LO) with OID.

Option 1: BYTEA Field

PostgreSQL Table Setup
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    name TEXT,
    type TEXT,
    content BYTEA
);
Java Code Example
byte[] fileBytes = Files.readAllBytes(Paths.get("sample.docx"));

Connection conn = DriverManager.getConnection(DB_URL, USER, PASSWORD);
PreparedStatement ps = conn.prepareStatement("INSERT INTO documents (name, type, content) VALUES (?, ?, ?)");
ps.setString(1, "sample.docx");
ps.setString(2, "application/vnd.openxmlformats-officedocument.wordprocessingml.document");
ps.setBytes(3, fileBytes);
ps.executeUpdate();
ps.close();
conn.close();

Option 2: Large Object (OID)

For large files or streamed access, PostgreSQL provides the Large Object API.

Java Code Example
Connection conn = DriverManager.getConnection(DB_URL, USER, PASSWORD);
conn.setAutoCommit(false);

LargeObjectManager lobj = ((org.postgresql.PGConnection) conn).getLargeObjectAPI();
long oid = lobj.createLO(LargeObjectManager.READ | LargeObjectManager.WRITE);

LargeObject obj = lobj.open(oid, LargeObjectManager.WRITE);
FileInputStream fis = new FileInputStream("sample.pdf");
byte[] buf = new byte[2048];
int s;
while ((s = fis.read(buf, 0, 2048)) > 0) {
    obj.write(buf, 0, s);
}
obj.close();

PreparedStatement ps = conn.prepareStatement("INSERT INTO documents (name, type, content_oid) VALUES (?, ?, ?)");
ps.setString(1, "sample.pdf");
ps.setString(2, "application/pdf");
ps.setLong(3, oid);
ps.executeUpdate();

conn.commit();
ps.close();
conn.close();

Storing Files in MongoDB Using Java

MongoDB allows storing files either directly as binary data or through GridFS for larger files or streamed access.

Option 1: Binary Field (For Files < 16MB)

Java Code Example
MongoClient mongoClient = new MongoClient("localhost", 27017);
MongoDatabase database = mongoClient.getDatabase("mydb");
MongoCollection<Document> collection = database.getCollection("files");

byte[] fileBytes = Files.readAllBytes(Paths.get("sample.pdf"));
Document doc = new Document("name", "sample.pdf")
    .append("type", "application/pdf")
    .append("content", new Binary(BsonBinarySubType.BINARY, fileBytes));

collection.insertOne(doc);
mongoClient.close();

Option 2: GridFS (For Large Files > 16MB)

Java Code Example
MongoClient mongoClient = MongoClients.create();
MongoDatabase database = mongoClient.getDatabase("mydb");
GridFSBucket gridFSBucket = GridFSBuckets.create(database, "pdfFiles");

FileInputStream streamToUploadFrom = new FileInputStream(new File("sample.pdf"));
GridFSUploadOptions options = new GridFSUploadOptions()
        .chunkSizeBytes(1024 * 1024)
        .metadata(new Document("type", "application/pdf"));

ObjectId fileId = gridFSBucket.uploadFromStream("sample.pdf", streamToUploadFrom, options);
mongoClient.close();

Comparison Table: Techniques by Database

DatabaseFile TypeStorage FormatJava Method
MySQLPDF/DOCXLONGBLOBsetBinaryStream() or setBytes()
PostgreSQLPDF/DOCXBYTEA / Large ObjectsetBytes() or LargeObjectManager
MongoDBPDF/DOCXBinary / GridFSBinary(BsonBinarySubType) or GridFS

Best Practices and Considerations

  • Always use prepared statements for safe and efficient binary data insertion.
  • Use GridFS in MongoDB when files exceed 16MB or need streamed access.
  • Add metadata fields like filename, MIME type, and timestamps to improve retrieval and indexing.
  • Stream large files rather than reading into memory if scalability is a concern.
  • Ensure that your database configuration supports large file uploads (e.g., max_allowed_packet in MySQL).

My Tech Advice: Java makes it straightforward to process and store PDF or DOC files across various databases. Whether you’re working with relational systems like MySQL or PostgreSQL, or document-oriented stores like MongoDB, the key lies in using binary-safe I/O methods and database-specific APIs. By choosing the right storage format—LONGBLOBBYTEAOID, or GridFS—you can ensure scalability, performance, and compatibility across platforms.

Ready to build your own tech solution ? Try the above tech concept, or contact me for a tech advice!

#AskDushyant
Note: The names and information mentioned are based on my personal experiences`; however, they do not represent any formal statement. The example and pseudo code is for illustration only. You must modify and experiment with the concept to meet your specific needs.
#TechConcept #TechAdvice

Leave a Reply

Your email address will not be published. Required fields are marked *