[sldev] RFC: Vectorisation control patch

Dzonatas dzonatas at dzonux.net
Thu Aug 16 08:32:46 PDT 2007


Paul TBBle Hampson wrote:
>> There were more tested. 
>>     
>
> Not a single one of those is an Altivec machine, which is what I was
> talking about.
>   
AltiVec fell under "There were more tested."



> Either way, I think it's a good vote in support of trying all the
> relevant hand-tuned vectorisations, the gcc autovectorisation, and the
> non-vectorised code at each run. I'm not totally clear on how expensive
> that is, but I presume not very, given that to be an optimisation target
> this routine would have to be called an awful lot anyway.
>   
Auto-vectorization is only supported by certain compilers. It can't be 
used universally for general open source. It also lacks some operation 
to do fast transformations, which means we still have to use hand-coded 
vectorization to do those.

Auto-vectorization is nice to speed up general loops, but it doesn't 
make sense to use it to get the fastest inner loop.

If you look at the llviewerjointmesh_sse2.cpp version that I posted to 
this thread, you'll see there is no way to auto-vectorize that code. 
That is the code that is not being distributed with the viewer right 
now. It is several times faster per test results, but due to cache 
pollution the speed is not achieved under the SL process and threads. It 
does make a good proof-of-concept to show the dramatic change with cache 
pollution in effect.

I did happen to read that AMD proposed a way to profile each machine 
easier with ways to detect cache performance:

http://developer.amd.com/assets/HardwareExtensionsforLightweightProfilingPublic20070720.pdf

-- 
Power to Change the Void
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.secondlife.com/pipermail/sldev/attachments/20070816/7168e7b9/attachment.htm


More information about the SLDev mailing list