Every application needs to store its data at some point be it in a database or a file system. However, it’s sometimes more convenient to store files in a database instead of a file system. MongoDB is one such database system that allows us to do so.
MongoDB is a NoSQL document-database, meaning it stores the data in the form of documents (Read more on NoSQL Databases Here). Default document size limit in MongoDB is 16MB. That is, if you want to store files up to 16MB then it’s not a big deal. But to store heavy files of size exceeding 16MB, MongoDB provides a module called GridFS. What GridFS does is that it divides your files into chunks of 255kB (initially it was 256kB) and then stores it into the database.
What exactly happens is it creates two collections in your database instance that you are currently using. In one collection, it stores the 255kB chunks of the files and the other collection is a document that contains the meta-data of your file for locating the chunks.
When to use GridFS
GridFS is a simple file system abstraction on top of MongoDB. If you’re familiar with Amazon S3, GridFS is a very similar abstraction. Now, why does a document-oriented database like MongoDB provide a file layer abstraction? Turns out there are some very good reasons:
1. Storing user-generated file content
A large number of web applications allow users to upload files. Historically, when working with relational databases, these user-generated files get stored on the file system separate from the database. This creates a number of problems. How to replicate the files to all of the needed servers? How to delete all the copies when the file is deleted? How to backup the files for safety and disaster recovery? GridFS solves these problems for the user by storing the files along with the database, and you can leverage your database backup to backup your files. Also, due to MongoDB replication, a copy of your files is stored in each replica. Deleting the file is as easy as deleting an object in the database.
2. Accessing portions of file content
When a file is uploaded to GridFS, the file is split into chunks of 256k and stored separately. So, when you need to read only a certain range of bytes of the file, only those chunks are brought into memory and not the whole file. This is extremely useful when dealing with large media content that needs to be selectively read or edited.
3. Storing documents greater than 16MB in MongoDB
By default, MongoDB document size is capped at 16MB. So, if you have documents that are greater than 16MB, you can store them using GridFS.
4. Overcoming file system limitations
If you’re storing a large number of files, you’ll need to consider file system limitations like the maximum number of files/directory, etc. With GridFS, you don’t need to worry about the file system limits. Also, with GridFS and MongoDB sharding, you can distribute your files across different servers without significantly increasing the operational complexity.
Serving files along with your database content can significantly churn your memory working set. If you wouldn’t like to disturb your working set, it might be best to serve your files from a different MongoDB server.
The file serving performance will be slower than natively serving the file from your web server and filesystem. However, the added management benefits might be worth the slowdown.
GridFS does not provide a way to do an atomic update of a file. If this scenario is necessary, you’ll need to maintain multiple versions of your files and pick the right version.
MongoDB has a great way of handling files that are above the specified document limit. GridFS, though not perfect, is still great for use cases where you want to keep your files and metadata consistent.
A typical use case of GridFS might be when using geographically distributed replica sets, MongoDB can distribute files and their metadata automatically to a number of mongod instances and facilities.
I would love to hear what you guys think of GridFS. Share your feedback in the comment below. 🧘🏽♂️