Symptom
- Baseline created from AWR is not used even though the relevant plan had been used recently.
Cause
- Baseline was created from plan generated in 11.2.0.2
Investigation
I was asked by a colleague to take a look at a query in a QA database that had changed plan but a baseline created from the previous plan did not seem to be working. The baseline had subsequently been dropped, so my first task was to recreate the baseline. The SQL was a relatively simple “insert into … select … from”. It did have a WITH clause and a UNION, but otherwise was not too complex
I took a look at the AWR history to find when the plan changed and determined a snapshot when the previous plan had run. I then created a baseline from it using the baseline work instruction I created for the team. With that done, I ran an explain of the SQL with a 10053 trace running. The explain plan did not use the baseline and so I took a look at the trace file to see if I could work out why. The baseline was identified as I could see the following line:
SPM: statement found in SMB
However it was not used as it could not be reproduced:
SPM: failed to reproduce selected accepted plan on re-parse, planId = ...
I therefore took a look at the hints that made up the baseline:
...
hint num 1 len 27 text: IGNORE_OPTIM_EMBEDDED_HINTS
hint num 2 len 37 text: OPTIMIZER_FEATURES_ENABLE('11.2.0.2')
hint num 3 len 22 text: DB_VERSION('11.2.0.2')
hint num 4 len 8 text: ALL_ROWS
...
Hints 2 and 3 were most likely the reason that the plan could not be reproduced. I had seen such a situation before. My assumption is that the plan was first captured by the AWR when the database was 11.2.0.2. The AWR only stores one copy of each execution plan, so even though the database has been upgraded to 12.1, what is stored in the AWR will not have change. This is however an assumption, rather than something that I have proved. I tried adding all the hints excluding 2 and 3 but that did not reproduce the original plan. It may be that the plan produced would have been reasonable with just those hints, but as one of the tables involved in the query was used to store data temporarily and was now empty, I had no way of knowing.
My colleague mentioned that production was performing well and so I suggested that we could create a baseline from that and import that into QA. Being production, this would require a change although it would not have had any impact as it would simply be forcing the optimiser to choose the plan it was going to choose anyway. However, having made the suggestion I decided that it would probably be easier to generate the plan on QA through hints and so suggested that we hold off on that. It subsequently turned out that production was using the “good” plan that QA was originally using. As QA will have been a clone of production it is almost certain that any baseline created on production would have had exactly the same problem as the QA baseline had.
Having suggested we hold off on the baseline creation in production I attempted to work out why the query was running so slowly with the new plan. I therefore took a look at the historical SQL Monitoring reports:
Clearly the problem was with the index fast full scan. But why would performing an index fast full scan 430 times, reading 38MB and getting back 297 rows cause a query to take almost an hour? It was only when I looked at the “Time and Wait Statistics” that I realised that all the time was being spent in “PL/SQL & Java” I was already aware that there was a function being called in the SQL as when I had run an explain plan as my own user it complained about it not existing. I had needed to set my current_schema to get round this whilst still maintaining the original SQL text. If we look at the predicate information for the INDEX FAST FULL SCAN in the execution plan that was being used, we can see how this has come about:
filter("<alias>"."<column>"=DECODE(:B1,'<column>',:B2,'<other column>',"<function>"(:B3)))
This is a “filter” rather than an “access” condition meaning that all the rows are brought back from the index and then the filter is applied. This will mean that for every row the decode and possibly the function will be executed. This is why the execution is taking so long. Not because the database is performing too many logical or physical I/O’s., but because the function is relatively expensive and the optimiser has not taken this into account.
With this identified, I set about trying to force the optimiser to pick the good plan by adding some hints to the SQL. As it was an INSERT and contained a WITH clause and a UNION I knew that I would most likely need to add hints with query block names, but I wanted to start with simple hints and so added two “FULL” hints to the two parts of the query which were union’ed together. This was to stop the “INDEX FULL SCAN”s of the first table in the SQL Monitoring report above. This got me a much closer plan, but it still had an INDEX FAST FULL SCAN rather than the INDEX RANGE SCAN I was aiming for.
...
| 5 | HASH JOIN | | 1 | 60 | 279 (2)| 00:00:01 |
| 6 | TABLE ACCESS FULL | <table 1> | 1 | 24 | 2 (0)| 00:00:01 |
| 7 | TABLE ACCESS FULL | <table 2> | 85760 | 3015K| 276 (2)| 00:00:01 |
| 8 | INDEX FAST FULL SCAN| <index> | 1 | 8 | 2 (0)| 00:00:01 |
... (repeated for the second section)
Rather than continue with hints being inserted into the SQL in the query blocks, given previous experience, I decided to move the hints to the top level. I therefore ran an explain plan and used the call below to get the information on the query block names. Note that this is what is in @?/rdbms/admin/utlxpls but with “serial” replaced with “ALL”.
select plan_table_output from table(dbms_xplan.display('plan_table',null,'ALL'));
This gives you all the usual plan information plus the query block names:
Query Block Name / Object Alias (identified by operation id):
1 - SET$1
4 - SEL$FD9F4DFE
6 - SEL$FD9F4DFE / <table 1>@SEL$1
7 - SEL$FD9F4DFE / <table 2>@SEL$2
8 - SEL$3 / <table alias/name>@SEL$3
9 - SEL$5BF935F8
11 - SEL$5BF935F8 / <table 1>@SEL$4
12 - SEL$5BF935F8 / <table 2>@SEL$2
13 - SEL$5 / <table alias/name>@SEL$5
I used this information to convert my two “FULL” hints inserted into the appropriate selects into hints with a query block name at the top of the query immediately after the “INSERT”:
-- Original form inserted into relevant select statements
FULL(<table alias>)
-- New form immediately after INSERT key word
FULL(@"SEL$FD9F4DFE" "<table 1>"@"SEL$1")
FULL(@"SEL$5BF935F8" "<table 1>"@"SEL$4")
As I had hoped, this produced exactly the same execution plan. I then needed to reverse the join order for two of the tables as this was the case in the original plan. I could have inserted a USE_HASH(<table 2>, <table 1>) hint but this would simply tell the optimiser to use a hash for those tables, not to use the tables in the order specified. I could have used a SWAP_JOIN_INPUTS hint, but I decided to go with the more common LEADING hint and so added the following to the SQL:
leading(@"SEL$FD9F4DFE" <table 2>@SEL$2 "<table 1>"@"SEL$1")
leading(@"SEL$5BF935F8" <table 2>@SEL$2 "<table 1>"@"SEL$4")
This swapped the join order and so I was then only left with changing the INDEX FAST FULL SCANs into INDEX RANGE SCANs. I therefore added the following two hints:
INDEX_RS_ASC(@SEL$3 <table alias/name>@SEL$3)
INDEX_RS_ASC(@SEL$5 <table alias/name>@SEL$5)
With all six hints now specified, the query was using the original plan. All that was left was to make the actual query use the good plan. I could have used a baseline for this, but that would have required the execution of the SQL. As it was an insert and I hate doing anything that creates, deletes or modifies actual data, even if I then immediately roll it back, I decided to use a SQL patch instead. I therefore created the following anonymous block:
begin
sys.dbms_sqldiag_internal.i_create_patch(
sql_text => q'[<full SQL text>]',
hint_text => q'[ FULL(@"SEL$5BF935F8" "<table 1>"@"SEL$4") FULL(@"SEL$FD9F4DFE" "<table 1>"@"SEL$1") leading(@"SEL$FD9F4DFE" <table 2>@SEL$2 "<table 1>"@"SEL$1") leading(@"SEL$5BF935F8" <table 2>@SEL$2 "<table 1>"@"SEL$4") INDEX_RS_ASC(@SEL$3 <table alias/name>@SEL$3) INDEX_RS_ASC(@SEL$5 <table alias/name>@SEL$5)]',
name=> '<some name>');
end;
/
I ran that, which must be done as SYSDBA, and then ran an explain of the original SQL. This showed the SQL patch being picked up. I then waited for the next execution which did indeed use the patch. My colleague then created a baseline from that execution.