[sldev] Optimization target: Avatar skinning, LLViewerJointMesh, matrix multiply

James Cook james at lindenlab.com
Sun Apr 15 19:39:34 PDT 2007


This looks great.  On what system are you running it?  What headers do
you include to get the _MM_TRANSPOSE4_PS macro?

I'm surprised that Blend DS1 is faster than the others, even including
recopying all the vector3 data to vector4s!

Unfortunately in the LLViewerJointMesh::updateGeometry() function I
believe we are both reading from and writing to packed arrays of
vector3s with other data interleaved for OpenGL.  So writing out 4
floats to get the 3 we want (which is what I think _mm_storeu_ps does)
will obliterate the other data.

In blendDS1 is it faster to copy through memory backwards (VERTEX_COUNT
-> 0) than forwards (0 -> VERTEX_COUNT)?

I'll play with this a little bit tomorrow.

This rocks!

James

Dzonatas wrote:
> For reference, I aligned the data, which got rid of the unaligned
> stores. I changed the call method, so the compiler would generate
> in-line code. The result:
> 
> Initializing
> 0.04 sec
> Blend DS1
> 1.75 sec
> Blend 2
> 3.28 sec
> Blend 3
> 4.08 sec
> Blend 4
> 3.28 sec
> 
> Blend DS1 is almost a 2x speed-up from Blend 2 & 4.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 249 bytes
Desc: OpenPGP digital signature
Url : http://lists.secondlife.com/pipermail/sldev/attachments/20070415/eedbf484/signature.pgp


More information about the SLDev mailing list