[sldev] Optimization target: Avatar skinning, LLViewerJointMesh,
matrix multiply
James Cook
james at lindenlab.com
Sun Apr 15 19:39:34 PDT 2007
This looks great. On what system are you running it? What headers do
you include to get the _MM_TRANSPOSE4_PS macro?
I'm surprised that Blend DS1 is faster than the others, even including
recopying all the vector3 data to vector4s!
Unfortunately in the LLViewerJointMesh::updateGeometry() function I
believe we are both reading from and writing to packed arrays of
vector3s with other data interleaved for OpenGL. So writing out 4
floats to get the 3 we want (which is what I think _mm_storeu_ps does)
will obliterate the other data.
In blendDS1 is it faster to copy through memory backwards (VERTEX_COUNT
-> 0) than forwards (0 -> VERTEX_COUNT)?
I'll play with this a little bit tomorrow.
This rocks!
James
Dzonatas wrote:
> For reference, I aligned the data, which got rid of the unaligned
> stores. I changed the call method, so the compiler would generate
> in-line code. The result:
>
> Initializing
> 0.04 sec
> Blend DS1
> 1.75 sec
> Blend 2
> 3.28 sec
> Blend 3
> 4.08 sec
> Blend 4
> 3.28 sec
>
> Blend DS1 is almost a 2x speed-up from Blend 2 & 4.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 249 bytes
Desc: OpenPGP digital signature
Url : http://lists.secondlife.com/pipermail/sldev/attachments/20070415/eedbf484/signature.pgp
More information about the SLDev
mailing list