SSE (was: Re: [sldev] Linux non-x86 SSE build script)

Dzonatas dzonatas at dzonux.net
Wed Aug 8 09:33:44 PDT 2007


Paul TBBle Hampson wrote:
> I see that now. I'm not convinced that's a good solution (do I really
> want to include two classes in my binary that simply forward to the
> third class?) since it's also rather binary, using LL_VECTORIZE for
> what is really LL_SSE/SSE2.
>   

Sounds like you want "#if LL_VECTORIZE && LL_SSE2".

LL_VECTORIZE is only a trigger to favor vectorizable code that uses 
compiler intrinsics whether it be, SSE, SSE2, AltiVec, or other.  If it 
is unset (or set to 0), then the _vec files are used for alignment and 
data width compatibility.

> Then again, the SSE and SSE2 code appears identical.
>   
At the C++ level they are almost the same. However, look at the compiled 
code and you'll notice major differences. That is due to the extra 
registers and datawidth that SSE2 provides. One small change can have a 
major impact on the entire loop their.

> Even so, it precludes for example an explicit Altivec implementation.
>   
The GCC compiler optimized the _vec version quite nicely for AltiVec. 
I'm sure a hand-crafted version could do better. Given cost and time of 
implementation and questions of portability of hand-crafted code 
vectorization, the AltiVec code was dropped. The _vec version has 
provided a 4x speedup, as is. GCC (actually xcode) detected the 
alignment in _vec and optimized accordingly.

> Hmm. By 'detected' I mean 'looked up by ARCH', I guess.
>   
That is all in llv4math.h

Since not all compilers define the same auto-directives, it is better 
that the code defaults to compatible unoptimized versions than to break 
builds.


-- 
Power to Change the Void


More information about the SLDev mailing list