Aug 30, 2012 at 8:48 PM
Edited Aug 30, 2012 at 9:12 PM
Okay guys, today I started profiling to try and figure out where all the time was going in sprite and text rendering, and I made some good progress!
For people interested in my methodology, I started with PIX to see if there was anything in the GPU that might be causing a bottleneck. Nope. From ClearRenderTargetView() to SpriteBatch->End(), there's approximately 10ms of nothing. So I started looking
at the CPU. Since I'm developing in VS 2010 Express, I had to use the standalone profiling tools and follow the instructions described in this article:
And here are my suggested sources of pain:
1. Disable Run Time Checks
Identified by the mysterious "_RTC_" calls. An easy one to miss and an easy one to fix.
Either make sure you are building in Release mode, or go into the DirectXTK Project Properties -> C++ -> Code Generation and set Basic Runtime Checks to Default.
With so many function calls within function calls, this little flag almost tripled my frame rate.
Hard to believe, since the C++ documentation claims this method is constant time, but the profiler doesn't lie. By commenting out the if statement at line 446 of SpriteBatch.cpp...
//if (mSpriteTextureReferences.empty() || texture != mSpriteTextureReferences.back().Get())
... and letting the app just keep adding duplicate texture references for each new sprite instance, I actually got a moderate performance boost, 25-50% frame rate.
This was a fun one. Line 152 of SpriteFont.cpp searches the vector of available glyphs in the Sprite Font to find the glyph matching the requested character.
Not the most efficient operation on a vector, so I tried adding a map, from characters to glyphs, and populating it at initialization, to make the search operation a little snappier. This gained me another 50% boost in frame rate.
// Line 42
std::map<wchar_t,Glyph&> charGlyphMap; // *
Glyph const* defaultGlyph;
// Line 93
glyphs.assign(glyphData, glyphData + glyphCount);
for(int i = 0; i < glyphCount; i++)
// Line 152
auto glyph = charGlyphMap.lower_bound(character);
if (glyph != charGlyphMap.end() && glyph->second.Character == character)
However, despite the improvement, my profiler now tells me that std::_Tree.lower_bound() is still the biggest drain, suggesting further optimizations can still be made in the way the FindGlyph() method is implemented. I tried replacing map
with hash_map, but that turned out to be significantly slower, undoing the 50% improvement.
It seems like the faster you can make FindGlyph(), the better. Still, my SpriteBatch performance is now much more acceptable, even on my 1GHz tablet.
Hope this helps people!
(P.S. Direct2D was a dead end. Worse framerate than ever, and wouldn't even run on Windows 7)