[sldev] Optimization target: Avatar skinning, LLViewerJointMesh,
matrix multiply
Dzonatas
dzonatas at dzonux.net
Sun Apr 15 14:23:27 PDT 2007
James Cook wrote:
> This time, with actual attachment. :-P
I ran some tests. It appears there is a bottleneck with o.setVec() and
the addition.
Here are the tests of my code with the commented-out o.setVec() and
addition step (under DS1).
Initializing
0.06 sec
Blend DS1
1.45 sec
Blend 2
3.31 sec
Blend 3
4.06 sec
Blend 4
3.33 sec
While not complete with interleaving, I thought the result of the
multiplication alone was interesting. I expected it to be faster.
Here is the code:
typedef float v4sf __attribute__ ((vector_size (16)));
struct DSMatrix4
{
union {
v4sf v[4];
float mMatrix[4][4];
};
} __attribute__ ((aligned (16)));
union DSVector3
{
float unit[4];
v4sf v;
} __attribute__ ((aligned (16)));
void multiplyDS1(const DSVector3& a, const DSMatrix4& b, LLVector3& o)
{
DSMatrix4 j;
j.v[0] = a.v * b.v[0];
j.v[1] = a.v * b.v[1];
j.v[2] = a.v * b.v[2];
// o.setVec(j.mMatrix[VX][VX] + j.mMatrix[VY][VX] +
j.mMatrix[VZ][VX] + b.mMatrix[VW][VX],
// j.mMatrix[VX][VY] + j.mMatrix[VY][VY] +
j.mMatrix[VZ][VY] + b.mMatrix[VW][VY],
// j.mMatrix[VX][VZ] + j.mMatrix[VY][VZ] +
j.mMatrix[VZ][VZ] + b.mMatrix[VW][VZ]);
}
void blendDS1(LLVector3* in, LLVector3* out)
{
extern void randomize_floats(float*);
DSMatrix4 blend;
DSVector3 DSin[VERTEX_COUNT+1];
for( int k = VERTEX_COUNT; --k>=0;)
DSin[k].unit[VX] = in[k].mV[VX],
DSin[k].unit[VY] = in[k].mV[VY],
DSin[k].unit[VZ] = in[k].mV[VZ];
randomize_floats(&blend.mMatrix[0][0]);
for (int loop = 0; loop < LOOP_COUNT; loop++)
{
for (int i = 0; i < VERTEX_COUNT; i++)
{
multiplyDS1(DSin[i], blend, out[i]);
}
}
}
--
More information about the SLDev
mailing list