Menu Close

My first MongoDB performance investigation

I was just about to go into a meeting with two chaps from MongoDB when I was asked by a colleague to help with a slow query on MongoDB. This was the first time I have had to investigate a slow query on MongoDB, so I was not exactly certain where to start looking.  I decided to start by asking the usual questions to try and narrow down what was going on, such as which environment, which shard was being hit, what the query was and good and bad times and timings. According to the application team, the query having the problem appeared to be the following:

db.getCollection('<collection>').find({"$or" : [{ "<sharding column>" : { "$gt" : 0 },"<date column>" : { "$lte" : new Date(<some date>),"$gte" : new Date(-<some other date>) }, "<ID column>" : {"$in" : ["<some ID>"] } }, { "<sharding column>" : 0,"<ID column>" : { "$gt" : "<some ID>", "$in" :["<some ID>"] }, "<date column>" : { "$lte" : new Date(<some date>), "$gte" : new Date(-<some other date>) } }, { "$and" : [{ "<sharding column>" : 0 }, { "<ID column>" : "<some ID>" }, { "<date column>" : { "$lte" : new Date(<some date>), "$gt" :ISODate("<some ISODate>") } }, { "<ID column>" : {"$in" : ["<some ID>"] } }] }] }) 

Due to the ‘”<sharding column>” : { “$gt” : 0 }’ section, this meant that it would hit every shard which is not a great idea, but can be necessary. This makes life rather difficult as, at least in 3.4, the MongoS does not log slow queries, only the shards.  I would therefore need to look at each shard’s logs individually to work out what was going on, or at least so I thought. However it was then determined by the product that this was not client impacting and so I suggested that we leave the investigation to the next morning.

I therefore left the investigation and went to the meeting with the MongoDB chaps. Both were interesting and knew a lot about MongoDB and its future. We then went for drinks and dinner. One of the chaps was American and so we had some interesting discussions about gun control, Donald Trump and Brexit.  The following morning I started looking at the problem again. The first thing I did was use one of the many JSON formatters, https://jsonformatter.curiousconcept.com/ in this case, to make the relevant section of the query rather more easy to read:

{
   "$or":[
      {
         "<sharding column>":{
            "$gt":0
         },
         "<date column>":{
            "$lte":new Date(<some date>),
            "$gte":new Date(-<some other date>)
         },
         "<ID column>":{
            "$in":[
               "<some ID>"
            ]
         }
      },
      {
         "<sharding column>":0,
         "<ID column>":{
            "$gt":"<some ID>",
            "$in":[
               "<some ID>"
            ]
         },
         "<date column>":{
            "$lte":new Date(<some date>),
            "$gte":new Date(-<some other date>)
         }
      },
      {
         "$and":[
            {
               "<sharding column>":0
            },
            {
               "<ID column>":"<some ID>"
            },
            {
               "<date column>":{
                  "$lte":new Date(<some date>),
                  "$gt":ISODate("<some ISODate>")
               }
            },
            {
               "<ID column>":{
                  "$in":[
                     "<some ID>"
                  ]
               }
            }
         ]
      }
   ]
}

The first thing I spied was the second and third sections of the “OR” had a predicate that was impossible to match – the ID being both greater than <some ID> and also IN <some ID>. From subsequent discussions, it would appear that this may well be an artefact of the application logging, rather than a problem with the actual query.  Although I had gone through various MongoDB courses including query performance tuning, I was unsure of the performance related commands, but I did remember the “explain()” command, which you simply append to the query. This resulted in a rather large output as there are eleven shards and each one gets a section in the output. At the top is a nice section containing the parsed query which is even more easily understood than the formatted version above. Not only that, it is actually what MongoDB understands the query to be, rather than what I might think the query to be:

MongoDB Enterprise mongos> <query>.explain().queryPlanner.winningPlan.shards[0].parsedQuery
{
        "$or" : [
                {
                        "$and" : [
                                {
                                        "<sharding column>" : {
                                                "$eq" : 0
                                        }
                                },
                                {
                                        "<ID column>" : {
                                                "$eq" : "<some ID>"
                                        }
                                },
                                {
                                        "<ID column>" : {
                                                "$eq" : "<some ID>"
                                        }
                                },
                                {
                                        "<date column>" : {
                                                "$lte" : ISODate("1970-01-01T00:00:00Z")
                                        }
                                },
                                {
                                        "<date column>" : {
                                                "$gt" : ISODate("<some ISODate last year>")
                                        }
                                }
                        ]
                },
                {
                        "$and" : [
                                {
                                        "<sharding column>" : {
                                                "$eq" : 0
                                        }
                                },
                                {
                                        "<ID column>" : {
                                                "$eq" : "<some ID>"
                                        }
                                },
                                {
                                        "<date column>" : {
                                                "$lte" : ISODate("1970-01-01T00:00:00Z")
                                        }
                                },
                                {
                                        "<ID column>" : {
                                                "$gt" : "<some ID>"
                                        }
                                },
                                {
                                        "<date column>" : {
                                                "$gte" : ISODate("1970-01-01T00:00:00Z")
                                        }
                                }
                        ]
                },
                {
                        "$and" : [
                                {
                                        "<ID column>" : {
                                                "$eq" : "<some ID>"
                                        }
                                },
                                {
                                        "<date column>" : {
                                                "$lte" : ISODate("1970-01-01T00:00:00Z")
                                        }
                                },
                                {
                                        "<sharding column>" : {
                                                "$gt" : 0
                                        }
                                },
                                {
                                        "<date column>" : {
                                                "$gte" : ISODate("1970-01-01T00:00:00Z")
                                        }
                                }
                        ]
                }
        ]
}

One thing of note in the broader output is that there were apparently no rejected plans:

MongoDB Enterprise mongos> <query>.explain().queryPlanner.rejectedPlans
MongoDB Enterprise mongos>

This seems strange given there are two possible indexes of use as well as a collection scan:

MongoDB Enterprise mongos> db.<collection>.getIndexes();
[
        {
                "v" : 2,
                "key" : {
                        "<sharding column>" : 1,
                        "<ID column>" : 1,
                        "<date column>" : 1
                },
                "name" : "<some name>",
                "ns" : "<database>.<collection>"
        },
        {
                "v" : 2,
                "key" : {
                        "<date column>" : 1,
                        "<ID column>" : 1
                },
                "name" : "<some other name>",
                "ns" : "<database>.<collection>",
                "background" : true
        },
        {
                "v" : 2,
                "key" : {
                        "_id" : 1
                },
                "name" : "<some other name still>",
                "ns" : "<database>.<collection>"
        }
]

I left this observation for later investigations and looked at the winning plan:

MongoDB Enterprise mongos> <query>.explain().queryPlanner.winningPlan.shards[0].winningPlan
{
        "stage" : "SUBPLAN",
        "inputStage" : {
                "stage" : "SHARDING_FILTER",
                "inputStage" : {
                        "stage" : "OR",
                        "inputStages" : [
                                {
                                        "stage" : "FETCH",
                                        "filter" : {
                                                "<sharding column>" : {
                                                        "$eq" : 0
                                                }
                                        },
                                        "inputStage" : {
                                                "stage" : "IXSCAN",
                                                "keyPattern" : {
                                                        "<date column>" : 1,
                                                        "<ID column>" : 1
                                                },
                                                "indexName" : "<index on date and id>",
                                                "isMultiKey" : false,
                                                "multiKeyPaths" : {
                                                        "<date column>" : [ ],
                                                        "<ID column>" : [ ]
                                                },
                                                "isUnique" : false,
                                                "isSparse" : false,
                                                "isPartial" : false,
                                                "indexVersion" : 2,
                                                "direction" : "forward",
                                                "indexBounds" : {
                                                        "<date column>" : [ ],
                                                        "<ID column>" : [
                                                                "[\"<some ID>\", \"<some ID>\"]"
                                                        ]
                                                }
                                        }
                                },
                                {
                                        "stage" : "FETCH",
                                        "filter" : {
                                                "<sharding column>" : {
                                                        "$eq" : 0
                                                }
                                        },
                                        "inputStage" : {
                                                "stage" : "IXSCAN",
                                                "keyPattern" : {
                                                        "<date column>" : 1,
                                                        "<ID column>" : 1
                                                },
                                                "indexName" : "<index on date and id>",
                                                "isMultiKey" : false,
                                                "multiKeyPaths" : {
                                                        "<date column>" : [ ],
                                                        "<ID column>" : [ ]
                                                },
                                                "isUnique" : false,
                                                "isSparse" : false,
                                                "isPartial" : false,
                                                "indexVersion" : 2,
                                                "direction" : "forward",
                                                "indexBounds" : {
                                                        "<date column>" : [
                                                                "[new Date(0), new Date(0)]"
                                                        ],
                                                        "<ID column>" : [ ]
                                                }
                                        }
                                },
                                {
                                        "stage" : "FETCH",
                                        "filter" : {
                                                "<sharding column>" : {
                                                        "$gt" : 0
                                                }
                                        },
                                        "inputStage" : {
                                                "stage" : "IXSCAN",
                                                "keyPattern" : {
                                                        "<date column>" : 1,
                                                        "<ID column>" : 1
                                                },
                                                "indexName" : "<index on date and id>",
                                                "isMultiKey" : false,
                                                "multiKeyPaths" : {
                                                        "<date column>" : [ ],
                                                        "<ID column>" : [ ]
                                                },
                                                "isUnique" : false,
                                                "isSparse" : false,
                                                "isPartial" : false,
                                                "indexVersion" : 2,
                                                "direction" : "forward",
                                                "indexBounds" : {
                                                        "<date column>" : [
                                                                "[new Date(0), new Date(0)]"
                                                        ],
                                                        "<ID column>" : [
                                                                "[\"<some ID>\", \"<some ID>\"]"
                                                        ]
                                                }
                                        }
                                }
                        ]
                }
        }
}

Each section of the “OR” is using the same index which led me to think that possibly MongoDB could only use one index to satisfy such a query, but subsequent investigations showed this to be incorrect. Another point is that even though there is an index on all the predicates including the <sharding column> and there is an equality predicate for it in two of the “OR” sections, the index chosen does not include the <sharding column>.  This seemed rather strange.  I looked at the MongoDB documentation for some more pointers as I knew I could get execution statistics. To get these you need to pass the parameter “allPlansExecution” to explain: However the query I was given seemed to be a bad example as it returned no rows:

MongoDB Enterprise mongos> <query>.explain("allPlansExecution").executionStats.nReturned
0

This was all getting a bit confusing.  By this time I had been contacted by the development team and so I asked them to provide a better example.  They had observed that when they added a projection for all the columns in the query, it ran much faster. They also gave me appropriate examples of the query. These versions did indeed return a large number of rows, about 300,000. When I tested it, the version with the project did indeed run much faster than the version without, but then I tested the query without the projection first and as both returned the same results, these would be in cache. The execution statistics showed that both versions were doing exactly the same work, so it was probably that the second execution was finding all the data in cache. A second execution of the version without the projection ran in a similar amount of time to the version with the projection.

To check whether I needed to look at all shards or just a subset, I looked at which shards were returning most of the data:

MongoDB Enterprise mongos> <query>.explain("allPlansExecution").executionStats.executionStages.shards.forEach(function(shd) { print(shd.shardName + ": " + shd.executionStages.nReturned + " in " + shd.executionStages.executionTimeMillisEstimate + "ms");}); 
<shard name>: 0 in 10ms 
<shard name>: 0 in 0ms 
<shard name>: 287883 in 2442ms 
<shard name>: 0 in 10ms 
<shard name>: 0 in 0ms 
<shard name>: 0 in 11ms 
<shard name>: 0 in 10ms 
<shard name>: 0 in 0ms 
<shard name>: 0 in 10ms 
<shard name>: 0 in 10ms 
<shard name>: 42140 in 162ms

Clearly then, only one shard needs to be looked at.  The relevant section of the execution plan for that shard is displayed below, although this is taken from some time after the original run and so the number of rows and more importantly the timings are different. In the original run it took 98 seconds to return.  Note also that the data was changing somewhat, so the number of rows returned does vary between executions.

{
        "shardName" : "<shard name>",
        "executionSuccess" : true,
        "executionStages" : {
                "stage" : "SUBPLAN",
                "nReturned" : 287895,
                "executionTimeMillisEstimate" : 3185,
                "works" : 576456,
                "advanced" : 287895,
                "needTime" : 288560,
                "needYield" : 0,
                "saveState" : 4529,
                "restoreState" : 4529,
                "isEOF" : 1,
                "invalidates" : 0,
                "inputStage" : {
                        "stage" : "FETCH",
                        "nReturned" : 287895,
                        "executionTimeMillisEstimate" : 3185,
                        "works" : 576455,
                        "advanced" : 287895,
                        "needTime" : 288560,
                        "needYield" : 0,
                        "saveState" : 4518,
                        "restoreState" : 4518,
                        "isEOF" : 1,
                        "invalidates" : 0,
                        "docsExamined" : 287895,
                        "alreadyHasObj" : 0,
                        "inputStage" : {
                                "stage" : "SHARDING_FILTER",
                                "nReturned" : 287895,
                                "executionTimeMillisEstimate" : 874,
                                "works" : 576455,
                                "advanced" : 287895,
                                "needTime" : 288560,
                                "needYield" : 0,
                                "saveState" : 4518,
                                "restoreState" : 4518,
                                "isEOF" : 1,
                                "invalidates" : 0,
                                "chunkSkips" : 0,
                                "inputStage" : {
                                        "stage" : "SORT_MERGE",
                                        "nReturned" : 287895,
                                        "executionTimeMillisEstimate" : 322,
                                        "works" : 576455,
                                        "advanced" : 287895,
                                        "needTime" : 288560,
                                        "needYield" : 0,
                                        "saveState" : 4518,
                                        "restoreState" : 4518,
                                        "isEOF" : 1,
                                        "invalidates" : 0,
                                        "sortPattern" : {
                                                "<sharding column>" : 1,
                                                "<ID column>" : 1,
                                                "<date column>" : 1
                                        },
                                        "dupsTested" : 287895,
                                        "dupsDropped" : 0,
                                        "inputStages" : [
                                                {
                                                        "stage" : "IXSCAN",
                                                        "nReturned" : 0,
                                                        "executionTimeMillisEstimate" : 0,
                                                        "works" : 1,
                                                        "advanced" : 0,
                                                        "needTime" : 0,
                                                        "needYield" : 0,
                                                        "saveState" : 4518,
                                                        "restoreState" : 4518,
                                                        "isEOF" : 1,
                                                        "invalidates" : 0,
                                                        "keyPattern" : {
                                                                "<sharding column>" : 1,
                                                                "<ID column>" : 1,
                                                                "<date column>" : 1
                                                        },
                                                        "indexName" : "<sharding column, date and id>",
                                                        "isMultiKey" : false,
                                                        "multiKeyPaths" : {
                                                                "<sharding column>" : [ ],
                                                                "<ID column>" : [ ],
                                                                "<date column>" : [ ]
                                                        },
                                                        "isUnique" : false,
                                                        "isSparse" : false,
                                                        "isPartial" : false,
                                                        "indexVersion" : 2,
                                                        "direction" : "forward",
                                                        "indexBounds" : {
                                                                "<sharding column>" : [
                                                                        "[0.0, 0.0]"
                                                                ],
                                                                "<ID column>" : [ ],
                                                                "<date column>" : [
                                                                        "[new Date(<small date>), new Date(<large date>)]"
                                                                ]
                                                        },
                                                        "keysExamined" : 0,
                                                        "seeks" : 1,
                                                        "dupsTested" : 0,
                                                        "dupsDropped" : 0,
                                                        "seenInvalidated" : 0
                                                },
                                                {
                                                        "stage" : "IXSCAN",
                                                        "nReturned" : 287895,
                                                        "executionTimeMillisEstimate" : 171,
                                                        "works" : 287896,
                                                        "advanced" : 287895,
                                                        "needTime" : 0,
                                                        "needYield" : 0,
                                                        "saveState" : 4518,
                                                        "restoreState" : 4518,
                                                        "isEOF" : 1,
                                                        "invalidates" : 0,
                                                        "keyPattern" : {
                                                                "<sharding column>" : 1,
                                                                "<ID column>" : 1,
                                                                "<date column>" : 1
                                                        },
                                                        "indexName" : "<sharding column, date and id>",
                                                        "isMultiKey" : false,
                                                        "multiKeyPaths" : {
                                                                "<sharding column>" : [ ],
                                                                "<ID column>" : [ ],
                                                                "<date column>" : [ ]
                                                        },
                                                        "isUnique" : false,
                                                        "isSparse" : false,
                                                        "isPartial" : false,
                                                        "indexVersion" : 2,
                                                        "direction" : "forward",
                                                        "indexBounds" : {
                                                                "<sharding column>" : [
                                                                        "[0.0, 0.0]"
                                                                ],
                                                                "<ID column>" : [
                                                                        "[\"<some ID>\", \"<some ID>\"]"
                                                                ],
                                                                "<date column>" : [
                                                                        "[new Date(<small date>), new Date(<large date>)]"
                                                                ]
                                                        },
                                                        "keysExamined" : 287895,
                                                        "seeks" : 1,
                                                        "dupsTested" : 0,
                                                        "dupsDropped" : 0,
                                                        "seenInvalidated" : 0
                                                },
                                                {
                                                        "stage" : "IXSCAN",
                                                        "nReturned" : 0,
                                                        "executionTimeMillisEstimate" : 10,
                                                        "works" : 663,
                                                        "advanced" : 0,
                                                        "needTime" : 662,
                                                        "needYield" : 0,
                                                        "saveState" : 4518,
                                                        "restoreState" : 4518,
                                                        "isEOF" : 1,
                                                        "invalidates" : 0,
                                                        "keyPattern" : {
                                                                "<sharding column>" : 1,
                                                                "<ID column>" : 1,
                                                                "<date column>" : 1
                                                        },
                                                        "indexName" : "<sharding column, date and id>",
                                                        "isMultiKey" : false,
                                                        "multiKeyPaths" : {
                                                                "<sharding column>" : [ ],
                                                                "<ID column>" : [ ],
                                                                "<date column>" : [ ]
                                                        },
                                                        "isUnique" : false,
                                                        "isSparse" : false,
                                                        "isPartial" : false,
                                                        "indexVersion" : 2,
                                                        "direction" : "forward",
                                                        "indexBounds" : {
                                                                "<sharding column>" : [
                                                                        "(0.0, inf.0]"
                                                                ],
                                                                "<ID column>" : [
                                                                        "[\"<some ID>\", \"<some ID>\"]"
                                                                ],
                                                                "<date column>" : [
                                                                        "[new Date(<small date>), new Date(<large date>)]"
                                                                ]
                                                        },
                                                        "keysExamined" : 662,
                                                        "seeks" : 663,
                                                        "dupsTested" : 0,
                                                        "dupsDropped" : 0,
                                                        "seenInvalidated" : 0
                                                }
                                        ]
                                }
                        }
                }
        }
}

Reading this from the bottom up, we have a number of input stages:

1. IXSCAN: An index is scanned for each of the three sections of the OR.
2. SORT_MERGE: The results of the index scans are sorted and merged.
3. SHARDING_FILTER: The documentation is not overly clear on this section, but it appears to be the removal of documents that should not be on the relevant shard because of instance chunk migrations (https://stackoverflow.com/questions/42387742/mongodb-sharding-filter-in-plan)
4. FETCH: The documents are retrieved from the collection.  This was taking 96 of the 98 seconds in the first execution.
5. SUBPLAN: I am unsure what this stage is and the documentation is not clear. However it does not appear to take any time, although the saveState and restoreStage counters are incremented.

It would seem therefore that the query is running slowly because it is returning a large amount of data.  If for instance “limit(100)” is added to the query, then only the first 100 matches are returned from each of the three index accesses and so the number of documents accessed in the FETCH stage is vastly reduced and the query runs much faster.  I passed this information on and I thought that was the end of the investigation, but later that afternoon I was contacted by the developers again saying that they were getting strange results from the query.  From their testing and some additional test of mine it seemed that with “explain” appended to it, it returned very quickly, with “limit” appended it returned reasonably well as did it with “explain(‘allPlansExecution’)”, however it was taking ages otherwise. I thought this looked like an effect of the query plan cache and so started investigating that. Again, my first port of call was the documentation. The relevant call to look up a query in the plan cache is:

db.<collection>.getPlanCache().getPlansByQuery(<query>,<sort>,<projection>)

I plugged in the query but got no results.  What did that mean? The documentation states that only queries with more than one viable plan are stored in the query cache. However what does viable mean? I set about performing a few tests using the helpful “db.<collection>.getPlanCache().listQueryShapes()” command which is also useful when trying to get the syntax right for the “getPlansByQuery” command. This determined a few things:

  • The “limit” function is ignored for the purposes of the plan cache. A query with or without it will generate the same plan as far as the plan cache is concerned.  It does however have an affect on how fast the query is.
  • The query “shape” is what is important, not the exact query text. The exact meaning of this is unclear, but it is made up of the query section, the sort section and the projection section. Modifying the order of the columns in the sort will generate a new plan as will modifying the direction, ascending or descending. However modifying the order of the clauses in an OR statement will not.
  • If you specify “explain” no plan will be added to the plan cache and the plan cache is completely bypassed.
  • There are some optimisations that go on when you specify “explain(‘allPlansExecution’)”. With it specified, a query may complete even though without it it would have exceeded the 32MB sort limit. In such a case the time taken to return the result with the explain specified was significantly smaller than the time taken to produce the error without explain specified.

Much as this determined a number of things about the plan cache, it did not determine what “viable” meant, nor did it help much with tuning the query. It did however explain the strange behaviour observed by the development team.  I tried several more attempts to get the behaviour of the unmodified query the product was suggesting, but I always got quick results with “limit” specified, whether or not a projection was specified. I also frequently exceeded the 32MB sort limit unless I specified a limit or used “explain(‘allPlansExecution’)”. My testing was of course not helped by the fact that after running the query once the data was then in the cache.  After failing to get any more useful information from my testing and the plan cache, I remembered that there was a slow query log and so set about discovering how to use that.   This would at least remove the issues around the cache as the product were reporting frequent slow runs.  I took a look at the documentation on profiling and had a go. I very quickly discovered that:

  • Profiling can only be enabled on a shard, not at the MongoS level.

This is of course rather annoying when you have 11 shards. However,  I knew that for the particular version of the query I was testing, one shard did most of the work, and so I enabled profiling on that shard. After a rather frustrating period, I discovered the following:

  • Profiling is enabled on a per database level, not system wide.

This little gem of information can save you wasting quite a bit of time trying to understand why you were just not capturing any executions.

There are three levels for the profiler:
0 Off
1 Enabled with a “slowms” filter, recording only those executions taking “slowms” milliseconds or longer
2 Enabled for all executions

Once I had determined how to get the profiler working I thought I would get the product to try a few of their slow queries so that I could actually capture what was going on, rather than simply trying my own version of a query. This however meant that I had to enable profiling on all the shards. I decided that logging on and doing this for each one was rather tedious and so I created a script to enable and disable profiling as well as one to get the results.

This generated some very useful information about executions, but there is a lot of it and of course it is only for executions that took longer than “slowms”. The output is per shard, so if it takes above “slowms” on one shard, but below it on all the others, you will only get output from the shard for which it executed for longer than “slowms”. The profiler information proved very useful. The relevant two sections are given below:

# Execution without projection
{
	"stage" : "FETCH",
	"filter" : {
		"<sharding column>" : {
			"$gt" : 0
		}
	},
	"nReturned" : 102672,
	"executionTimeMillisEstimate" : 52602,
	"works" : 939414,
	"advanced" : 102672,
	"needTime" : 836741,
	"needYield" : 0,
	"saveState" : 7779,
	"restoreState" : 7778,
	"isEOF" : 1,
	"invalidates" : 0,
	"docsExamined" : 102672,
	"alreadyHasObj" : 0,
	"inputStage" : {
		"stage" : "IXSCAN",
		"nReturned" : 102672,
		"executionTimeMillisEstimate" : 18848,
		"works" : 939414,
		"advanced" : 102672,
		"needTime" : 836741,
		"needYield" : 0,
		"saveState" : 7779,
		"restoreState" : 7778,
		"isEOF" : 1,
		"invalidates" : 0,
		"keyPattern" : {
			"<date column>" : 1,
			"<ID column>" : 1
		},
		"indexName" : "<date and id>",
		"isMultiKey" : false,
		"multiKeyPaths" : {
			"<date column>" : [ ],
			"<ID column>" : [ ]
		},
		"isUnique" : false,
		"isSparse" : false,
		"isPartial" : false,
		"indexVersion" : 2,
		"direction" : "forward",
		"indexBounds" : {
			"<date column>" : [
				"[new Date(-<some other date>), new Date(<some date>)]"
			],
			"<ID column>" : [
				"[\"48947638\", \"48947638\"]"
			]
		},
		"keysExamined" : 939413,
		"seeks" : 836742,
		"dupsTested" : 0,
		"dupsDropped" : 0,
		"seenInvalidated" : 0
	}
}

Do not be concerned about the values given in the “indexBounds” section, valid date values were used. In the query section prior to the above you see dates specified as “ISODate(“0NaN-NaN-NaNTNaN:NaN:NaNZ”)” which again look like a problem, but these would appear to simply be an artefact of the profiler. If we now look at what is happening when a projection is used, we see the following:

# Execution with projection
{
	"stage" : "IXSCAN",
	"nReturned" : 5000,
	"executionTimeMillisEstimate" : 0,
	"works" : 5575,
	"advanced" : 5000,
	"needTime" : 575,
	"needYield" : 0,
	"saveState" : 84,
	"restoreState" : 83,
	"isEOF" : 0,
	"invalidates" : 0,
	"keyPattern" : {
		"<sharding column>" : 1,
		"<ID column>" : 1,
		"<date column>" : 1
	},
	"indexName" : "<sharding column, date and id>",
	"isMultiKey" : false,
	"multiKeyPaths" : {
		"<sharding column>" : [ ],
		"<ID column>" : [ ],
		"<date column>" : [ ]
	},
	"isUnique" : false,
	"isSparse" : false,
	"isPartial" : false,
	"indexVersion" : 2,
	"direction" : "forward",
	"indexBounds" : {
		"<sharding column>" : [
			"(0, inf.0]"
		],
		"<ID column>" : [
			"[\"48947638\", \"48947638\"]"
		],
		"<date column>" : [
			"[new Date(-<some other date>), new Date(<some date>)]"
		]
	},
	"keysExamined" : 5575,
	"seeks" : 576,
	"dupsTested" : 0,
	"dupsDropped" : 0,
	"seenInvalidated" : 0
}

Both of these sections are for the same part of the “OR”.  There are two important points to note. Firstly the execution without projection uses an index on “<date column>” and “<ID column>” and then filters on the “<sharding column>”. The execution with projection however just used an index on “<sharding column>”, “<ID column>” and “<date column>”. Secondly, the query without projection returned all 102,672 documents whereas the query with projection returned 5,000. Given that the limit for this query was 10,000, returning 5,000 would seem to be a strange thing to do, but there is also the batch size which was set to 5,000. In the query without projection, the limit, or rather the batch size, was not applied until after all the documents have been read from the collection and were being sorted

If we look at the output from an “explain(‘allPlansExecution’)” run of the query without projection, the relevant section is:

{
	"stage" : "IXSCAN",
	"nReturned" : 10000,
	"executionTimeMillisEstimate" : 20,
	"works" : 10575,
	"advanced" : 10000,
	"needTime" : 575,
	"needYield" : 0,
	"saveState" : 161,
	"restoreState" : 161,
	"isEOF" : 0,
	"invalidates" : 0,
	"keyPattern" : {
		"<sharding column>" : 1,
		"<ID column>" : 1,
		"<date column>" : 1
	},
	"indexName" : "<sharding column, date and id>",
	"isMultiKey" : false,
	"multiKeyPaths" : {
		"<sharding column>" : [ ],
		"<ID column>" : [ ],
		"<date column>" : [ ]
	},
	"isUnique" : false,
	"isSparse" : false,
	"isPartial" : false,
	"indexVersion" : 2,
	"direction" : "forward",
	"indexBounds" : {
		"<sharding column>" : [
			"(0.0, inf.0]"
		],
		"<ID column>" : [
			"[\"<some id value>\", \"<some id value>\"]"
		],
		"<date column>" : [
			"[new Date(-<some date>), new Date(<some date>)]"
		]
	},
	"keysExamined" : 10575,
	"seeks" : 576,
	"dupsTested" : 0,
	"dupsDropped" : 0,
	"seenInvalidated" : 0
}

Clearly a different result to that of the profiler, which explains a lot.  Much as the profiler left a number of questions, it did provide the answer to why it was taking so long to access the data without a projection as the product has been saying. The solution was therefore to add a hint:

<query>.hint({"<sharding column>" : 1,"<ID column>" : 1,"<date column>" : 1})

With the hint in place, the queries with or without projection used the specified index and ran very fast. Note that it appears you can only hint a query to use one index. This may cause problems if different parts of a query run fastest when using different indexes.

Having done some further digging and confirming a few things with MongoDB support, the answers to some of the questions arising from this investigation are:

  • The limit clause is only taken into account when generating the original execution plan. As the limit clause does not affect the query “shape”, changing or removing the limit clause does not change the plan chosen.
  • When “explain()” and a number of other functions are used, the query cache is bypassed completely. It is neither read from nor written to. Therefore explain can give a different winning plan to an actual execution of the query.
  • The reason I was failing to get any useful information from getPlansByQuery about the plans being used was down to a bug that is not fixed until 4.0. If the query starts with an “$or” (rooted $or as the support chap put it) then the command will not return any results. This does not however affect the actual query planner, only the output of the getPlansByQuery command.

As one final point, the MongoDB support chap did point me at a very insightful slideshare detailing how the MongoDB query planner works: