Recently I was called into a shop that had a 4 terabyte
database, who claimed they didn’t have anywhere near that much data. Upon
investigation, they were correct. They had about 100 gig of data, and about 3.9
T of indexes.
Is this too much? Well, in order to make that kind of
judgement call there are several factors to consider:
- Are the indexes
interfering with writes (i.e. is the number of indexes having a negative
factor on insert / update performance)?
- Are the indexes actually
being used?
- Are any indexes
redundant?
The first one (write performance) can be identified by
checking length of the write queues; almost any monitoring tool can get you
this information.
The second can be identified with a script you can find on
the web which counts the number of times an index has been accessed since the
last time the server was booted. Of course, you want to the server to have been
up long enough that the system tables are aware of weekend / month end processing.
If you can’t find one readily, you can use the one I have on my blog:
Mssqlperformance.blogspot.com
The final one is the tool I used to reduce the database size
for that customer from 4T to 300M without even doing “what’s being used”
analysis… just by eliminating redundant indexes.
Many people find a script on the web (I have one in the blog
mentioned above) which will look at the system tables and recommend missing
indexes. The problem is, it will give you 6 similar indexes rather than one
combined index. You have to do this yourself. If you automatically add all the
indexes the server suggests… well, you can end up with 4T of indexes for 100G
of data.
So how do we know what’s redundant?
Hopefully, you are old enough to remember a phone book. A
phone book is an index on last name, first name, middle name. But what other
indexes might that be used for? A lookup on just last name, first name
could use the same phone book (and often does), as would a lookup just to find
last name.
So if you have three indexes:
- Last name, first name,
middle name
- Last name, first name
- Last name
You can visually look at these and KNOW that the last 2 are
redundant because the first index can do the work that either of the second two
could do.
As you start looking at data management view (DMV)
information, you often get index recommendations along these lines:
- Create index idx on
TableA (Column1)
- Create index idx on
TableA (Column1) include (ColumnA, ColumnB)
- Create index idx on
TableA (Column1, Column2) include (ColumnC, ColumnD)
Sometimes you’ll get 8 or 10 of these.
So, index 1 can easily be eliminated because index 2 covers
it.
Index 2 can be combined into index 3 like this:
- Create index idx on
TableA (Column1, Column2) include (ColumnA, ColumnB , ColumnC, ColumnD)
So… in summary, you can pull index recommendations from the
DMVs, and they are good recommendations, but review them with an eye for detail
before automatically applying them.