[sldev] Re: SSE
Paul TBBle Hampson
Paul.Hampson at Pobox.com
Thu Aug 9 06:15:36 PDT 2007
On Wed, Aug 08, 2007 at 09:33:44AM -0700, Dzonatas wrote:
> Paul TBBle Hampson wrote:
> >I see that now. I'm not convinced that's a good solution (do I really
> >want to include two classes in my binary that simply forward to the
> >third class?) since it's also rather binary, using LL_VECTORIZE for
> >what is really LL_SSE/SSE2.
> Sounds like you want "#if LL_VECTORIZE && LL_SSE2".
> LL_VECTORIZE is only a trigger to favor vectorizable code that uses
> compiler intrinsics whether it be, SSE, SSE2, AltiVec, or other. If
> it is unset (or set to 0), then the _vec files are used for alignment
> and data width compatibility.
Indeed. It makes sense to have an LL_VECTORIZE flag that enables and
disables vectorisation wholesale.
Of course, all existing uses of LL_VECTORIZE appear to be protecting
LL_SSE or LL_SSE2 code.
> >Then again, the SSE and SSE2 code appears identical.
> At the C++ level they are almost the same. However, look at the
> compiled code and you'll notice major differences. That is due to the
> extra registers and datawidth that SSE2 provides. One small change can
> have a major impact on the entire loop their.
Ah, that explains that then...
> >Even so, it precludes for example an explicit Altivec implementation.
> The GCC compiler optimized the _vec version quite nicely for AltiVec.
> I'm sure a hand-crafted version could do better. Given cost and time
> of implementation and questions of portability of hand-crafted code
> vectorization, the AltiVec code was dropped. The _vec version has
> provided a 4x speedup, as is. GCC (actually xcode) detected the
> alignment in _vec and optimized accordingly.
Now that you mention that it's xcode, that makes sense. The
auto-vectorisation is enabled on xcode's gcc by the compiler flags in
SConstruct.
> >Hmm. By 'detected' I mean 'looked up by ARCH', I guess.
> That is all in llv4math.h
The problem is once we're looking at the headers, it's too late. I want
the SConstruct to not _try_ to compile the _sse and _sse2 versions.
Is there a sane/portable way to check the definitions a compiler will
produce from SConstruct? Or shall I just assume i686 means SSE2, and
PowerPC means Altivec?
> Since not all compilers define the same auto-directives, it is better
> that the code defaults to compatible unoptimized versions than to
> break builds.
I'm not totally clear what you mean here, but I assume you're talking
about compilers other than gcc...?
--
-----------------------------------------------------------
Paul "TBBle" Hampson, B.Sc, LPI, MCSE
Very-later-year Asian Studies student, ANU
The Boss, Bubblesworth Pty Ltd (ABN: 51 095 284 361)
Paul.Hampson at Pobox.com
Of course Pacman didn't influence us as kids. If it did,
we'd be running around in darkened rooms, popping pills and
listening to repetitive music.
-- Kristian Wilson, Nintendo, Inc, 1989
License: http://creativecommons.org/licenses/by/2.1/au/
-----------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.secondlife.com/pipermail/sldev/attachments/20070809/9ec4c1a6/attachment.pgp
More information about the SLDev
mailing list