[sldev] getting serious about software.

Dirk Moerenhout blakar at gmail.com
Thu Jun 21 02:34:42 PDT 2007


You do realise SL spends a lot of time on math? It's a 3D application,
not an OS. If you audit the code for "what is necessary" you'll hardly
have any speed up because cutting the math means cutting a feature.
Sure you can discuss whether the effect of that feature is worth its
cost in CPU time but it's not stray or redundant code.

For the libraries we have the same issue. Which libraries are eating
cycles? Again those doing lots of math. What they do is again
necessary so an audit will keep them. You may upgrade to a newer
version but it'll never yield big results.

My personal approach to optimisation for this kind of application is:
--- avoid complex math where possible ---
This can be achieved in several ways. A few examples:
* Hard code all numbers that are known in advance (sometimes still
beaten by custom asm if CPU knows these numbers)
* Make sure your code has short cuts for the most used calculations
(if something gets called 90% of the time with the same parameters it
should have the result hard coded and only perform the calculation in
the other cases)
* Cache results if they are calculated over and over (if you know that
at a certain point in time you call the same function with the same
input a lot you should make it possible for it to return one of the
recent results).
* Abuse your knowledge on IEEE numbers, numbers are still based on
bits you can directly manipulate (create custom code for things that
can be done by bypassing true math)
* Make sure what you're doing is the most efficient mathematical way
to achieve your goal.  If you calculate things in a long winded way
the compiler will never optimise this as it has little to no clue on
math rules.

--- redesign complex systems ---
Some code may yield the wished for (mathematical) result but you
could've achieved something with similar accuracy with a lot less
hassle. Compression is an example. Your compression algorithm should
be so that it gives you the best balance between speed and space. If
10% more compression costs you 50% more time then there's little
reason to do so.

--- Write custom code for your math ---
All modern CPU's support SIMD instructions aimed at 3D. Compilers
don't convert your vector math for you, you need to do it yourself.

To do all of the above in the most efficient way you're best off
taking a profiler and checking whether what is using the CPU does fall
within the things defined above. The above is not an exhaustive list
it's just an ordered list of what I can think of at the moment.

Note also that in the end doing customisation geared towards chips is
_fun_ for some of us. I've for example replaced one of the functions
with a piece of custom asm that is 70x faster than the current code
while yielding the exact same result. I don't care that the code will
probably mean no more than 0.1% increase in speed overall. I bumped
into it and though "no way that this is efficient". A bit of thinking
later I had created a neat trick based on how IEEE numbers are stored
and it made my day.

Kind regards,

Dirk aka Blakar Ogre


> I am becoming increasingly distressed about all this focus on chip-level
> optimizations. At best we're talking about a speedup of around 1.3.
> Auditing the code would probably close 2/3rds the open bug reports and
> yield a speedup between 5 and 10 times.
>
> Here are a few articles. of interest:
>
> http://www.eros-os.org/essays/reliability/paper.html
>
> The lesson I take from Eros is that to make a piece of software FAST,
> you first put all your effort into making it verifiably reliable and
> removing every stray and redundant line of code you can find.
>
> Next, Examine how your app interacts with the libraries on your system,
> are you using the right libraries? are you taking full advantage of the
> library's features?
>
> Once you've done that, go to the manufacturer's performance manuals:
>
> http://developer.amd.com/devguides.jsp
>
> These manuals tell you how best to use the language and the compiler to
> generate more efficient binaries before writing off any CPUs.
>
> next, you identify the most performance critical modules and generate
> dynamic libraries compiled for each of the major architectures.
>
> Finally, if you still need more that your compiler can't give you, you
> write assembly language kernels for the stuff that absolutely positively
> needs to be screaming fast and the code the compiler makes blows donky
> chunks.
>
> --
> Opera: Sing it loud! :o(  )>-<
> _______________________________________________
> Click here to unsubscribe or manage your list subscription:
> /index.html
>


More information about the SLDev mailing list