SpriteBatch Performance

Aug 16, 2012 at 2:01 PM

Hi,
When I draw text using SpriteFont the frame rate drops from 60 to 17 ! This is a huge performance impact for drawing a page of text.
What can I do to have better perfomance ? Use Direct2D ?
Thanks for your help.

I use the following font :
MakeSpriteFont.exe "Verdana" Verdana10.spritefont /FontSize:1 /CharacterRegion:0x0000-0x00FF /DefaultCharacter:0x0023

m_spriteBatch->Begin(SpriteSortMode_Deferred);
for (int i = 0; i < 50; i++)
Font->DrawString(m_spriteBatch.get(), L"HELLO HELLO HELLO HELLO HELLO HELLO HELLO HELLO HELLO HELLO HELLO HELLO HELLO HELLO HELLO HELLO HELLO HELLO HELLO HELLO HELLO HELLO HELLO HELLO HELLO ", XMFLOAT2(10, 10 + i * 15), Colors::AliceBlue);
m_spriteBatch->End();

Aug 24, 2012 at 11:16 AM

Hi Lena,

I've been having a similar problem. I'm trying to create a 2D game using the DirectXTK and SpriteBatch/Spritefont, but I'm seeing similar performance drops whenever I add too many sprites or items of text. At the moment, I'm struggling to reach 500 sprites with a decent frame rate, which is unacceptable. I've tried optimizations like only using one spriteBatch, calling Begin() and End() only once, and refraining from using MeasureString() too often. These things all help, but not as much as I would like.

I think my best hope is to start investigating Direct2D. I'll let you know my findings.

Coordinator
Aug 24, 2012 at 3:47 PM

Could you profile on one of these machines to find out where the time is going?

On every system I have tried, SpriteBatch can easily draw tens of thousands of sprites at full framerate, so I think something pathological must be going wrong on these particular machines.

Aug 28, 2012 at 12:29 AM

You could render the text/page on a rendertarget once, and use that texture to show the page.

it will be fast even on old & slow tablets.

Aug 30, 2012 at 9:48 PM
Edited Aug 30, 2012 at 10:12 PM

Okay guys, today I started profiling to try and figure out where all the time was going in sprite and text rendering, and I made some good progress!

For people interested in my methodology, I started with PIX to see if there was anything in the GPU that might be causing a bottleneck. Nope. From ClearRenderTargetView() to SpriteBatch->End(), there's approximately 10ms of nothing. So I started looking at the CPU. Since I'm developing in VS 2010 Express, I had to use the standalone profiling tools and follow the instructions described in this article:

http://www.codeproject.com/Articles/144643/Profiling-of-C-Applications-in-Visual-Studio-for-F

And here are my suggested sources of pain:

1. Disable Run Time Checks

Identified by the mysterious "_RTC_" calls. An easy one to miss and an easy one to fix. Either make sure you are building in Release mode, or go into the DirectXTK Project Properties -> C++ -> Code Generation and set Basic Runtime Checks to Default.

With so many function calls within function calls, this little flag almost tripled my frame rate.

2. std::vector.back()

Hard to believe, since the C++ documentation claims this method is constant time, but the profiler doesn't lie. By commenting out the if statement at line 446 of SpriteBatch.cpp...

//if (mSpriteTextureReferences.empty() || texture != mSpriteTextureReferences.back().Get())
//{
    mSpriteTextureReferences.emplace_back(texture);
//}

... and letting the app just keep adding duplicate texture references for each new sprite instance, I actually got a moderate performance boost, 25-50% frame rate.

3. std::lower_bound()

This was a fun one. Line 152 of SpriteFont.cpp searches the vector of available glyphs in the Sprite Font to find the glyph matching the requested character. Not the most efficient operation on a vector, so I tried adding a map, from characters to glyphs, and populating it at initialization, to make the search operation a little snappier. This gained me another 50% boost in frame rate.

// Line 42
// Fields
ComPtr<ID3D11ShaderResourceView> texture;
std::vector<Glyph> glyphs;
std::map<wchar_t,Glyph&> charGlyphMap; // *
Glyph const* defaultGlyph;
float lineSpacing;

// Line 93
glyphs.assign(glyphData, glyphData + glyphCount);

for(int i = 0; i < glyphCount; i++)
{
	charGlyphMap.insert(std::pair<wchar_t,Glyph&>(glyphs.at(i).Character,glyphs.at(i)));
}

// Line 152
auto glyph = charGlyphMap.lower_bound(character);

if (glyph != charGlyphMap.end() && glyph->second.Character == character)
{
    return &glyph->second;
}

However, despite the improvement, my profiler now tells me that std::_Tree.lower_bound() is still the biggest drain, suggesting further optimizations can still be made in the way the FindGlyph() method is implemented. I tried replacing map with hash_map, but that turned out to be significantly slower, undoing the 50% improvement.

It seems like the faster you can make FindGlyph(), the better. Still, my SpriteBatch performance is now much more acceptable, even on my 1GHz tablet.

Hope this helps people!

(P.S. Direct2D was a dead end. Worse framerate than ever, and wouldn't even run on Windows 7)