SSE (was: Re: [sldev] Linux non-x86 SSE build script)
Dzonatas
dzonatas at dzonux.net
Wed Aug 8 09:33:44 PDT 2007
Paul TBBle Hampson wrote:
> I see that now. I'm not convinced that's a good solution (do I really
> want to include two classes in my binary that simply forward to the
> third class?) since it's also rather binary, using LL_VECTORIZE for
> what is really LL_SSE/SSE2.
>
Sounds like you want "#if LL_VECTORIZE && LL_SSE2".
LL_VECTORIZE is only a trigger to favor vectorizable code that uses
compiler intrinsics whether it be, SSE, SSE2, AltiVec, or other. If it
is unset (or set to 0), then the _vec files are used for alignment and
data width compatibility.
> Then again, the SSE and SSE2 code appears identical.
>
At the C++ level they are almost the same. However, look at the compiled
code and you'll notice major differences. That is due to the extra
registers and datawidth that SSE2 provides. One small change can have a
major impact on the entire loop their.
> Even so, it precludes for example an explicit Altivec implementation.
>
The GCC compiler optimized the _vec version quite nicely for AltiVec.
I'm sure a hand-crafted version could do better. Given cost and time of
implementation and questions of portability of hand-crafted code
vectorization, the AltiVec code was dropped. The _vec version has
provided a 4x speedup, as is. GCC (actually xcode) detected the
alignment in _vec and optimized accordingly.
> Hmm. By 'detected' I mean 'looked up by ARCH', I guess.
>
That is all in llv4math.h
Since not all compilers define the same auto-directives, it is better
that the code defaults to compatible unoptimized versions than to break
builds.
--
Power to Change the Void
More information about the SLDev
mailing list