Understanding & Managing Disk Space On Your MongoDB Server
ScaleGrid is the only MongoDB and Redis hosting solution that lets you manage mongo and Redis instances on both public clouds and on premise from a single central console. Try us free for 30 days.
This blog post was published on our ScaleGrid blog and discusses why you need to be aware of MongoDB storage statistics and how you can compact or repair the database to handle fragmentation.
Disk storage is a critical resource for any scalable database system. The performance of disk-based databases is dependent on how data is managed on the disk. Your MongoDB server supports various pluggable storage engines that handle the storage management. MongoDB storage engines initially store all documents sequentially. As the database grows, and multiple write operations run, this contiguous space gets fragmented into smaller blocks with chunks of free space in between. The usual solution is to increase the disk size in such situations; however, there are alternatives that can help you regain the free space without scaling the disk size.
How Large is Your Database Really?
You should always keep an eye on the amount of free disk space on your production server. It would also be prudent to know your database size when you are paying for it on a cloud platform. MongoDB has a command db.stats() that can provide insights into the storage statistics of a MongoDB instance.
dataSize: The total size in bytes of the uncompressed data held in this database.
storageSize: The total amount of disk space allocated to all collections in the database.
The response of db.stats() is dependent on the type of MongoDB engine. You can find version-dependent description of above metrics at MongoDB documentation.
Why the big difference between storageSize and dataSize? This is due to fragmentation of data files that was explained earlier. MongoDB tries to reuse free space in between fragmented data whenever possible and does not release it to the operating system. However in WiredTiger, storageSize may be smaller than dataSize if compression is enabled.
In case a large chunk of data is deleted from a collection and the collection never uses the deleted space for new documents, this space needs to be returned to the operating system so that it can be used by other databases or collections. You will need to run a compact or repair operation in order to defragment the disk space and regain the usable free space.
Compacting MongoDB
MongoDB compact operation rewrites all documents and indexes in a collection to contiguous blocks of disk space. However, this operation blocks all other operations on the database to which the collection belongs. So, for a standalone server, it is recommended to run it during a maintenance window. For replica sets, you should run it in a rolling fashion for each shard. This means compacting all secondaries first, and then finally the primary. Thus, database availability would be not be affected. The syntax of the command is:
1. MMAPv1
Compaction operation defragments data files & indexes. However, it does not release space to the operating system. The operation is still useful to defragment and create more contiguous space for reuse by MongoDB. However, it is of no use though when the free disk space is very low.
An additional disk space up to 2GB is required during the compaction operation.
A database level lock is held during the compaction operation.
2. WiredTiger
The WiredTiger engine provides compression by default which consumes less disk space than MMAPv1.
The compact process releases the free space to the operating system.
Minimal disk space is required to run the compact operation.
WiredTiger also blocks all operations on the database as it needs database level lock.
If you are running WiredTiger. We recommend you run the compact operation when the storage has reached 80% of the disk size. You can do this by triggering ‘Compact’ operation from our details page.
Repair MongoDB
MongoDB repair operation repairs all errors & inconsistencies in data storage. It is similar to fcsk command for a file system. This command ensures the data integrity after unexpected shutdown or crashes. However, if journaling is enabled on server then there is no requirement of repair, server uses journal to get into clean state automatically after restart. In case, your database has been corrupted, then a repair database would not save the corrupt data. Therefore, it is not recommended to use this operation for data recovery when you have other options. For MMAPv1, repairDatabase is only way to reclaim disk space if you think that your database is not corrupted and have enough space required by repair operation. The syntax of the command is:
This command compacts all collections in the database and recreates all indexes.
The job requires free disk space equal to the size of your current data set plus 2 gigabytes.
At ScaleGrid, we use the repairDatabase operation to reclaim free space for MMAPv1 engine clusters.