Friday, December 21, 2012

performance?

A recent forum post pushed enough of my buttons that I chose to respond, and the topics themselves were interesting, so I'm copying the post here.


 

1) Science vs. Art

People frequently define tuning as an art rather than a science, but what is scientific method? You postulate a theory, test it, and repostulate if necessary. That's what we do when we tune.
 

2) What's a valid baseline?

I remember shocking a group of about 100 recent IVY grads at a talk I did a while back when I stated that business is not like a college exam; there is no "right" or "wrong," there is instead "the solution works" or "the solution doesn't work" ... there may be multiple correct (i.e. working) answers.

In this case, you need to be able to answer questions along the lines of "Are all of the potential bottlenecks on our system wide enough to accommodate the current expected usage and identified expected growth?" If the answer is yes, you're in good shape. If you're looking for harder numbers, my personal preference is not to exceed 70% of resources at normal peak demand; if you are, then you've got room for unexpected peaks; if you're above 80%, you need to tune or shop. (this is a VERY general rule and needs to be applied with knowledge of your system of course.)

If you can't answer these questions you need to reexamine your work so that you can do predictive analysis

3) Where do we start tuning?

About 6-7 years ago, after doing little but tuning for 15 years, I've changed my approach. Before interviewing users about their issues, or looking at perfmon which gives us a snapshot view of a very tight time slice, I install a tool (the one I currently use is Confio Ignite -- subjectivity warning, here, as we resell this tool, but I examined a dozen others before choosing Ignite). After the tool starts collecting, I interview the users about what they think their problems are, and subsequently start looking at what the tool is collecting, starting at a macro scale and moving from there to the micro scale based upon findings at a higher level; this may lead me to anything from disk issues to specific queries run by specific users at specific times of the day.