[sldev] Optimization target: Avatar skinning, LLViewerJointMesh, matrix multiply

Dzonatas dzonatas at dzonux.net
Sun Apr 15 14:23:27 PDT 2007


James Cook wrote:
> This time, with actual attachment.  :-P
I ran some tests. It appears there is a bottleneck with o.setVec() and 
the addition.

Here are the tests of my code with the commented-out o.setVec() and 
addition step (under DS1).

Initializing
0.06 sec
Blend DS1
1.45 sec
Blend 2
3.31 sec
Blend 3
4.06 sec
Blend 4
3.33 sec

While not complete with interleaving, I thought the result of the 
multiplication alone was interesting. I expected it to be faster.

Here is the code:


typedef float v4sf __attribute__ ((vector_size (16)));

struct DSMatrix4
{
    union {
    v4sf v[4];
    float mMatrix[4][4];
    };
} __attribute__ ((aligned (16)));

union DSVector3
{
    float unit[4];
    v4sf v;
} __attribute__ ((aligned (16)));


void multiplyDS1(const DSVector3& a, const DSMatrix4& b, LLVector3& o)
{
    DSMatrix4 j;
    j.v[0] = a.v * b.v[0];
    j.v[1] = a.v * b.v[1];
    j.v[2] = a.v * b.v[2];
//    o.setVec(j.mMatrix[VX][VX] +  j.mMatrix[VY][VX] + 
j.mMatrix[VZ][VX] + b.mMatrix[VW][VX],
//             j.mMatrix[VX][VY] +  j.mMatrix[VY][VY] + 
j.mMatrix[VZ][VY] + b.mMatrix[VW][VY],
//             j.mMatrix[VX][VZ] +  j.mMatrix[VY][VZ] + 
j.mMatrix[VZ][VZ] + b.mMatrix[VW][VZ]);
}

void blendDS1(LLVector3* in, LLVector3* out)
{
    extern void randomize_floats(float*);
    DSMatrix4 blend;
    DSVector3 DSin[VERTEX_COUNT+1];
    for( int k = VERTEX_COUNT; --k>=0;)
        DSin[k].unit[VX] = in[k].mV[VX],
        DSin[k].unit[VY] = in[k].mV[VY],
        DSin[k].unit[VZ] = in[k].mV[VZ];
    randomize_floats(&blend.mMatrix[0][0]);
    for (int loop = 0; loop < LOOP_COUNT; loop++)
    {
        for (int i = 0; i < VERTEX_COUNT; i++)
        {
            multiplyDS1(DSin[i], blend, out[i]);
        }
    }
}

-- 


More information about the SLDev mailing list