[sldev] RFC: Vectorisation control patch
Dzonatas
dzonatas at dzonux.net
Thu Aug 16 08:32:46 PDT 2007
Paul TBBle Hampson wrote:
>> There were more tested.
>>
>
> Not a single one of those is an Altivec machine, which is what I was
> talking about.
>
AltiVec fell under "There were more tested."
> Either way, I think it's a good vote in support of trying all the
> relevant hand-tuned vectorisations, the gcc autovectorisation, and the
> non-vectorised code at each run. I'm not totally clear on how expensive
> that is, but I presume not very, given that to be an optimisation target
> this routine would have to be called an awful lot anyway.
>
Auto-vectorization is only supported by certain compilers. It can't be
used universally for general open source. It also lacks some operation
to do fast transformations, which means we still have to use hand-coded
vectorization to do those.
Auto-vectorization is nice to speed up general loops, but it doesn't
make sense to use it to get the fastest inner loop.
If you look at the llviewerjointmesh_sse2.cpp version that I posted to
this thread, you'll see there is no way to auto-vectorize that code.
That is the code that is not being distributed with the viewer right
now. It is several times faster per test results, but due to cache
pollution the speed is not achieved under the SL process and threads. It
does make a good proof-of-concept to show the dramatic change with cache
pollution in effect.
I did happen to read that AMD proposed a way to profile each machine
easier with ways to detect cache performance:
http://developer.amd.com/assets/HardwareExtensionsforLightweightProfilingPublic20070720.pdf
--
Power to Change the Void
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.secondlife.com/pipermail/sldev/attachments/20070816/7168e7b9/attachment.htm
More information about the SLDev
mailing list