Some very, very clever people have developed elaborate excel spreadsheets to model every aspect of hunter dps. There are several different ones (all of which give different results) but the most well-known and heavily reviewed is Shandra’s Spreadsheet, and it is a thing of beauty.
The way it works is you enter your gear, your gems, your enchants, your talents, your pet, your pet talents, your glyphs, your shot rotation, and all the buffs that you would have in a raid, and the spreadsheet calculates what your theoretical dps could be against a raid boss.
The spreadsheets tend to be particularly good at determining which gear is better than other gear. I think the most valuable use of the spreadsheets is determining EAP values.
However, these spreadsheets have given rise to an unfortunate beast I call Spreadsheet Theorycrafters.
Basically what happens is someone sits down and changes talents (or glyphs, or rotations) around one by one on the spreadsheet and looks to see how each changes the dps result, and then figures the one with the highest dps number must be the highest dps talent build.
I’ve even seen hunter sites where this is the entire basis of their theorycrafting, testing, recommended builds, rotations… everything. They just post spreadsheet results. It’s so… it’s just unfathomable to me.
Needless to say, I don’t approve of spreadsheet theorycrafting.
First of all, again, these spreadsheets are things of beauty and they are shockingly good at what they do, especially when it comes to gear or stat comparisons. They provide an excellent data point — but it’s still just one data point.
Here are some of the flaws with spreadsheet theorycrafting:
- There are several different spreadsheets, and all give different results. Each one says it’s not perfect, but it’s the best. They can’t all be the most accurate.
- Flaws are often found: the hunter community is constantly finding errors and these are constantly being corrected (at least in the best of the spreadsheets). The errors are found when someone does their own theorycrafting, or in-game testing, and finds some discrepancy with the complicated formulas in the spreadsheet.
- They model a perfect world: all theorycrafting models some kind of ideal where you don’t make mistakes, or fights last a certain length, or you don’t have to run out of AOEs, or you never get stunned, you aren’t losing seconds and debuffs to switching targets, etc. The problem with just putting info into a spreadsheet is that since you aren’t doing the math, you aren’t necessarily aware of which elements will be strongly affected by real world situations (like the Chimera Shot glyph) and which will be only mildly affected.
- They overweight some things: back when I was SV before the mana regen nerf, I would generally always be at full mana in 25-man raid boss fights. I regened more than I spent. However the spreadsheets all thought I would run out, and so they way overvalued Int, Spirit, and mana restoration stuff – meaning the spreadsheet said that mp5 would increase my dps, which was factually untrue.
- They’re only as good as their input: I’ve gotten several emails now from people using spreadsheets and getting strange results. Turns out they had incorrectly entered some of the raid buffs, or duplicated a shot in their rotation. A single mistake in the data you enter can significantly skew all of your “theorycrafting” results.
- Ultimately you need controlled, repeatable, and falsifiable in-game testing to prove results. And if that testing is substantially different from the spreadsheet results – well, I can assure you that it’s the spreadsheet that’s wrong. The damage that you do in-game is, in fact, the damage that you would do in-game. In other words, you need science!
Theorycrafting and Testing
In the world seen through Frostheim goggles, any kind of theorycrafting is useless without testing. But people don’t like testing because it’s a pain in the ass — it takes forever, and it’s super expensive. The problem is that most anything you’re looking at — a talent build, a glyph, etc actually changes your dps by far, far less than the difference in RNG (random number generator).
For example: I test out my shot rotation on the target dummy. I do 3,300 dps. I do it again, and I do 2,900 dps. Nothing has changed at all, that’s just RNG at work. So now I’m testing a glyph that theory tells me will gain me 30-50 dps. How will I know that an increase or decrease is the glyph, vs RNG?
The answer is that you have to fire your shot rotation over literally thousands of shots (ideally you’re using cheap vendor ammo for this). And that’s just to test one glyph, or one talent build. Then you do it all over again. As Party Girl says “You’re always in Ironforge. Are you at the target dummy again?”
How I Make DPS Choices
Here are the Frostheim steps to evaluating talents, glyphs abilities, etc:
- Sniff Test: first thing is just to look at stuff and determine which ones won’t make the cut. If something increases my health by 10%, I know that won’t have any impact on my dps. This is also the stage where I sit around for a while and try to think up clever ways to take advantage of abilities, or combinations of abilities.
- Paper Napkin Theorycraft: the next step is I do some crude and simple calculations to see approximately where things stand. If there was something that was on the fence on the sniff test, I’ll go ahead and eliminate it if it sucks at this stage. Mostly I’m determining what order to test in. This step is often done while driving.
- Collect Data: next step is a whole ton of target dummy testing to collect my baseline data for stuff like glyphs (dps totals without glyphs, percentage of damage from each shot, stats of each shot, etc.)
- Theorycrafting: Then I sit down and do the number crunching. As I’ve said before, the math here isn’t hard. The hard part is setting up your equations to take everything into account. The most common theorycrafting errors come from people who just set up their equations wrong so they double up on something, or leave something out. This is Data Point 1.
- Testing: Next is the really really painful part of actually testing in-game. I do testing on the target dummy, because it is the only perfectly controlled environment we have (assuming no one else is attacking it). I usually do this with raid buffs. This is Data Point 2.
- Spreadsheet Checking: I also plug the data into a spreadsheet and see what it has to say. This is Data Point 3.
Now I have three data points to compare. If they all agree, then it’s easy to smile and say my work is done; however, if one of them disagrees, then it’s time to go back and try to find out why one is wrong. I could have made an error in my Theorycrafting – it happens. The spreadsheet could be wrong – it happens a decent amount. The in-game data could actually be wrong too! Perhaps the presence of raid buffs would radically alter the result, rather than scale it across all options evenly. That also must be investigated.
The point is any one of these data points could be wrong, and you won’t know without more data points. And actually, let me stress something else here: if you raid ulduar and do 4k dps on XT, then change glyphs and do 4.9k dps the next week. That is not a data point! That is just a change, and you have zero way to know that the change is due to your glyph. You’ve now got RNG spread across 10-25 players and their buff/debuff procs, performance, etc. not to mention how often you have to run, switch targets, etc. If it’s not controlled and repeatable, it’s just not science.
I’m not trying to bash the spreadsheets here. They’re valuable, and especially valuable for comparing gear and generating EAP values. But they get less and less accurate the farther you stray from gear, and are useless at modeling a running fight.
So that is how our guides are made, for the most part. This is also why it takes me a while to get new ones up when a major hunter change happens. And this is also why I get annoyed as all heck when someone responds to a guide with something like “No that sux look at mai build I did 4.7kdps aimed glyph roks.”