1

I'm testing the MongoDB Sharding with a Sharded Collection and a forloop to insert arround 1M documents to see how splitting and moving works.

I'm surprise that after few documents MongoDB starts to split chunks (well before 64MB),

At the end of the 1M (and some) inserts i have thoses stats :

mongos> db.users.getShardDistribution()

Shard rs0 at rs0/mongod00.local.net:2000,mongod01.local.net:2001,mongod02.local.net:2002
 data : 84.76MiB docs : 1010010 chunks : 9
 estimated data per chunk : 9.41MiB
 estimated docs per chunk : 112223

Totals
 data : 84.76MiB docs : 1010010 chunks : 9
 Shard rs0 contains 100% data, 100% docs in cluster, avg obj size on shard : 88B

Why MongoDB created 9 chunks of a collection that contains 85MB?

I didn't change the defaut chunk size parameter:

mongos> db.settings.findOne()
{ "_id" : "chunksize", "value" : 64 }   

Thanks,

Max.

Maxime Fouilleul
  • 3,565
  • 25
  • 21

1 Answers1

4

The mongos processes control when automatic splitting happens (you can also pre-split, and split manually). The heuristic they use is a bit more complicated than I am about to describe, but you can use it as a rough guide:

  • Each mongos keeps track of data written to a particular chunk
  • At ~20% of the max chunk size (default is 64MB), it will try an autosplit
  • If valid split points are found and returned (by splitVector), it will split
  • If not, it will try again at 40% (+20%)

Hence if you have inserted enough data for valid split points to be found, you will see splitting before the 32MB mark (number of documents are a valid reason to split, it's not all size related).

If you do not wish for this to happen, you can start the mongos you will be using to write data with the --noAutoSplit option or you could run with a config server down (which means all splits will fail when attempted because the meta data is read only). Generally, that would not be recommended for production, but can sometimes be needed for testing purposes.

Adam C
  • 9,235
  • 3
  • 28
  • 45