[sldev] RFC: Vectorisation control patch

Dzonatas dzonatas at dzonux.net
Mon Aug 13 09:21:08 PDT 2007


Paul TBBle Hampson wrote:
> In fact, as mentioned I'm getting 10% less speed out of the
> auto-vectorised _vec.cpp than out of the non-vectorised version. I dunno
> if that's the autovectorisation code actually doing a better job of
> vectorising the regular code, or what...
>

We have discovered the results are too random.

Most of that phenomena is related to the terrible thread schedulers of 
many OSs. It appears recent Linux versions have tried to address a few 
of the issues, but the main slowdown is that the cache space used by 
hardware accelerations get easily polluted on thread switches. There is 
currently no way for the modern OSs thread schedulers to detect or 
govern such cache usage.

The rule of thumb for now is that if the core uses HT, then pair the 
core to use two separate tasks. Don't let both sides of the core hit the 
same cache with totally unrelated data, or we experience the problems as 
seen now.

The OS needs to give up the thread scheduler and just take requests for 
control of individual cores.

-- 
Power to Change the Void


More information about the SLDev mailing list