[sldev] About Viewer Performance

Andy Estes aestes at acm.org
Mon Mar 19 12:05:42 PDT 2007

Previous message: [sldev] About Viewer Performance
Next message: [sldev] Re: About Viewer Performance & Re: Running First Look In Debug
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Dave,

A hearty thanks for the great primer on the viewer performance issues! 

Andy

-----Original Message-----
From: sldev-bounces at lists.secondlife.com
[mailto:sldev-bounces at lists.secondlife.com] On Behalf Of Dave Parks
Sent: Monday, March 19, 2007 12:46 PM
To: Second Life Developer Mailing List
Subject: [sldev] About Viewer Performance

So we are moving towards a multi-threaded model.  Currently in FL, 
textures are decoded on their own thread, which means the second core 
tends to go idle.  The next good thing to do would be to put 
LLPipeline::renderGeom on its own thread, which basically requires 
making the octree thread safe.  Not an arduous task, but far from 
simple.  If done properly, this means one core would be feeding the GL 
all the render batches (of which there are thousands, about two or three 
per texture) while the other core runs particles, avatar animations, and 
prepares the next set of batches.  Render batches change a lot, pretty 
much every time a prim changes texture or LOD, all the render batches 
for its node are rebuilt.  A lot of work has been done in 
LLVolumeGeometryManager::rebuildGeom to make building the batches take 
as little time as possible, but it's not optimal. 

In order to keep the GL pipeline from stalling too badly, vertex data 
copies are synchronized via LLVertexBuffer::clientCopy.  Currently, 
clientCopy is only called at the end of a graphics frame.  This 
basically means that the first frame something changes, it's rendered 
from client memory, which is inoptimal, but causes fewer pipeline 
stalls.  The viewer will spend about 5 ms of every frame just copying 
vertex data into VBOs, which should be plenty, but for some reason it 
takes certain GL implementations many many frames to catch up.  I 
experimented with different modes of copying data and found that 
shadowing vertex buffers in client memory and updating VBOs via 
glBufferSubData allowed the most data to be copied in the least amount 
of time, but it's worth looking into again.  It's subtle, but if 
clientCopy could be reworked so that all client data makes it into the 
GL before the main render pipeline, it would remove most of the pipeline 
stalls, and the entire render frame would get much shorter.  You can 
already see the difference of having the copy complete by watching the 
fast timer console.  Once time is no longer being spent copying vertex 
data, time being spent rendering cuts in half.

LLOctreeTraveler::traverse is a likely candidate for optimization, as 
optimizing it optimizes all octree optimizations (frustum/occlusion 
culling, rebinning, etc.).  It would be worthwhile to experiment with a 
non-recursive traversal implementation.

Oh, and the rendering of trees is terrible.  Good opportunity to make 
the open source community look good there.  It's no secret that we 
licensed speed tree awhile back and have been planning on integrating 
it.  Those folks aren't too keen on the idea of open sourcing their 
library, so it will be another one of those terrible situations where 
you guys essentially get left out in the cold, unless *ahem* someone can 
provide a compelling alternative to speed tree that is open source. 

In summary, the candidates for optimization are (in no particular order):
- Put LLPipeline::renderGeom on its own thread
- LLVertexBuffer::clientCopy - make it optimal and find the optimal 
location from which to call it
- LLVolumeGeometryManager::rebuildGeom - build better batches, build 
them faster
- LLOctreeTraveler::traverse - the faster the tree is, the better

Honorable mention but slightly out of my domain for discussion:
- Particle simulation (can't touch particle rendering, sorry).  I ported 
the particle rendering to point sprites, but it turns out 90% of the 
particle systems out there won't port, so it was a wasted effort.
- Avatar animation - I'm sure there's some low hanging fruit in the 
avatar animation system.  In fact, one dev here claims it used to be a 
ton faster than it is, so there might be a one line bug in there slowing 
things down.
- Flexible object updates - Ugh.

Now, after all that, I'm going to humbly request that folks make 
tracking down stability issues a priority.  I think stability is much 
more important than performance right now, so while you're plowing 
through the viewer making performance improvements, try to fix any 
crashes you might experience along the way.

Happy hunting.
_______________________________________________
Click here to unsubscribe or manage your list subscription:
/index.html

Previous message: [sldev] About Viewer Performance
Next message: [sldev] Re: About Viewer Performance & Re: Running First Look In Debug
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the SLDev mailing list