[sldev] About Viewer Performance
Andy Estes
aestes at acm.org
Mon Mar 19 12:05:42 PDT 2007
Dave,
A hearty thanks for the great primer on the viewer performance issues!
Andy
-----Original Message-----
From: sldev-bounces at lists.secondlife.com
[mailto:sldev-bounces at lists.secondlife.com] On Behalf Of Dave Parks
Sent: Monday, March 19, 2007 12:46 PM
To: Second Life Developer Mailing List
Subject: [sldev] About Viewer Performance
So we are moving towards a multi-threaded model. Currently in FL,
textures are decoded on their own thread, which means the second core
tends to go idle. The next good thing to do would be to put
LLPipeline::renderGeom on its own thread, which basically requires
making the octree thread safe. Not an arduous task, but far from
simple. If done properly, this means one core would be feeding the GL
all the render batches (of which there are thousands, about two or three
per texture) while the other core runs particles, avatar animations, and
prepares the next set of batches. Render batches change a lot, pretty
much every time a prim changes texture or LOD, all the render batches
for its node are rebuilt. A lot of work has been done in
LLVolumeGeometryManager::rebuildGeom to make building the batches take
as little time as possible, but it's not optimal.
In order to keep the GL pipeline from stalling too badly, vertex data
copies are synchronized via LLVertexBuffer::clientCopy. Currently,
clientCopy is only called at the end of a graphics frame. This
basically means that the first frame something changes, it's rendered
from client memory, which is inoptimal, but causes fewer pipeline
stalls. The viewer will spend about 5 ms of every frame just copying
vertex data into VBOs, which should be plenty, but for some reason it
takes certain GL implementations many many frames to catch up. I
experimented with different modes of copying data and found that
shadowing vertex buffers in client memory and updating VBOs via
glBufferSubData allowed the most data to be copied in the least amount
of time, but it's worth looking into again. It's subtle, but if
clientCopy could be reworked so that all client data makes it into the
GL before the main render pipeline, it would remove most of the pipeline
stalls, and the entire render frame would get much shorter. You can
already see the difference of having the copy complete by watching the
fast timer console. Once time is no longer being spent copying vertex
data, time being spent rendering cuts in half.
LLOctreeTraveler::traverse is a likely candidate for optimization, as
optimizing it optimizes all octree optimizations (frustum/occlusion
culling, rebinning, etc.). It would be worthwhile to experiment with a
non-recursive traversal implementation.
Oh, and the rendering of trees is terrible. Good opportunity to make
the open source community look good there. It's no secret that we
licensed speed tree awhile back and have been planning on integrating
it. Those folks aren't too keen on the idea of open sourcing their
library, so it will be another one of those terrible situations where
you guys essentially get left out in the cold, unless *ahem* someone can
provide a compelling alternative to speed tree that is open source.
In summary, the candidates for optimization are (in no particular order):
- Put LLPipeline::renderGeom on its own thread
- LLVertexBuffer::clientCopy - make it optimal and find the optimal
location from which to call it
- LLVolumeGeometryManager::rebuildGeom - build better batches, build
them faster
- LLOctreeTraveler::traverse - the faster the tree is, the better
Honorable mention but slightly out of my domain for discussion:
- Particle simulation (can't touch particle rendering, sorry). I ported
the particle rendering to point sprites, but it turns out 90% of the
particle systems out there won't port, so it was a wasted effort.
- Avatar animation - I'm sure there's some low hanging fruit in the
avatar animation system. In fact, one dev here claims it used to be a
ton faster than it is, so there might be a one line bug in there slowing
things down.
- Flexible object updates - Ugh.
Now, after all that, I'm going to humbly request that folks make
tracking down stability issues a priority. I think stability is much
more important than performance right now, so while you're plowing
through the viewer making performance improvements, try to fix any
crashes you might experience along the way.
Happy hunting.
_______________________________________________
Click here to unsubscribe or manage your list subscription:
/index.html
More information about the SLDev
mailing list