[sldev] QueryPerformanceCounter() & related issues VWR-940,
VWR-962, VWR975
Paul TBBle Hampson
Paul.Hampson at Pobox.com
Sat Jun 2 22:47:11 PDT 2007
On Sat, Jun 02, 2007 at 11:55:10AM -0700, Dzonatas wrote:
> Paul TBBle Hampson wrote:
> >So that assumption has no basis here.
> >
> Correct.
> >On the other hand, the simplicity was asserted by the person who wrote
> >the PowerPC assembly routine I used in my patch. And authorative as he
> >might be for his code, I'm not sure he can speak for anyone else who's
> >implemented functionally identical code.
> A quick google reveals similar code used elsewhere. Actually, a cmpw instruction instead of a cmplw seems more common. Hmm.
> >Main change I did was move rdtsc and likewise code into the header
> >>file, llfasttimers.h, so that code can now assemble in-line. Eh...
> >>saves a little overhead.
> >As per coding standard, please provide profiling justification for this.
> Consider that the code that calls this function is in-line (in the class header), it creates a bias in the compiler to not in-line any of the code -- even the code of the class header. The reason that it
> originally exists in the object module rather than the header may be based on the pentium 1 or pre VS6.0 era. That is a easy presumption because of the emit usage. The instruction rdtsc was undocumented, and the
> only sure way to use it was to wrap it up in a call structure with a stack. Newer compilers know about rdtsc and how to handle its usage, so there is no need to wrap it anymore unless you want to time individual
> assembly instructions and need to prevent out-of-order execution.
Huh. I was under the impression that it was the other way around,
putting short code in the class header made it bias _towards_ inlining.
I'm not sure about this whole Pentium-1 argument you've put forward, I
doubt slviewer's that old (At least here in Australia, the Pentium 1
predated the public arrival of the Internet).
Mind you, I haven't got a _good_ reason why gcc sample code uses the
.byte version instead of the rdtsc instruction itself. I'm assuming the
assembler would come up with the same answer either way.
> >It's already hard enough to make sense of as it is, and moving things
> >around like that makes it _very_ hard to review a patch. Once the code
> >is done and working, profile and inline it as neccessary.
> I've profiled the viewer and found that there is high cache pollution.
> A polluted cache will not let any of the viewer run at expected speed.
> One step to fix this is to reduce bandwidth bottlenecks. The change to
> make this more in-line will help localize data access. For one, more
> in-line code avoids a global call stack and further page faults. Such
> isolated cache fetches are not easy to profile, if they can be
> isolated, due to the nature of live data being much more random with
> the overhead of 50 threads.
OK, but that's not what the coding-standard requires. I haven't looked,
but if it's being called from inlined code, then it prolly should be
inlined.
My reading of the Coding Standard implies numerical proof,
presumably showing (a) that that code is a hotspot; and (b) that
the proposed change fixes that hotspot.
(I could be wrong about that... but experience suggests that
optimisation without numerical support is wasted effort)
This code isn't yet functionally complete, so it _can't_ have been
proved to be a hotspot yet...
I still feel this's something that should be done _after_ the patch
review, not during.
I would however be particularly interested in the method of profiling
you use, so I can try applying it to my build here, and work out why
_it_ is so painfully slow.
> General data alignment is another way to reduce cache pollution, but we are aways from that being worthwhile. I've turned on alignment and optimizations for more recent processors. With the viewer completely
> compiled that way, I've seen code run 2 to 3 times faster. However, that would mean maintenance overhead for separate binaries for similar processors, which I've learned has been discussed and passed on.
Oh, I like the idea of 2-3x speed improvement. It's basically not
currently playable on my 1.5GHz 512MB PowerBook. -_-
> We need incentives to improve end user hardware.
We need what now? I'm just not even going to think too hard about that.
> >cpu MHz only exists on x86 and AMD64. It's "clock" on PowerPC.
> Thanks. I'll add that.
> >/proc might not be the best place to fetch the current CPU speed from,
> >/sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq is prolly more
> >reliable,
> Between what Tofu and a google recommends, this appears to be the preferred order:
> Test /sys/devices/system/cpu/cpu0/cpuinfo_cur_freq, else
> Test /sys/devices/system/cpu/cpu0/scaling_cur_freq, else
> Test /proc/cpuinfo, else
> Test gettimeofday() & rdtsc (or likewise), else
> Test gettimeofday() & expect 1Mhz
Looks good to me.
> > ##else
> >
> >> U64 LLFastTimer::countsPerSecond()
> >> {
> >>@@ -108,6 +62,8 @@
> >> {
> >> CProcessor proc;
> >> sCPUClockFrequency = proc.GetCPUFrequency(50);
> >>+ if(!sCPUClockFrequency)
> >>+ sCPUClockFrequency = 1000000.0;
> >> return U64(sCPUClockFrequency);
> >>
> >I think this rather confusingly ties "has a high-performance counter"
> >code to the "has a sensible GetCPUFrequency value" code.
> > This is fine on win32 (has both), Linux x86 (has both), Linux AMD64 (has
> >both), and PowerPC (Special-cased with #define, never hits this).
> >Also the high performance counters falling back to gettime of day, will
> >be correct.
> >But it took me two readings to realise that, and I suggest it's a rather
> >tangled way of approaching the problem.
> Yep. Needs work.
> Perhaps, it is best to call LLCPUInfo.getMhz() instead of a direct call to CProcessor::GetCPUFrequency(). That way the best frequency determination can be more localized to LLCPUInfo, and fallback can be in
> CProcessor::GetCPUFrequency().
> ...almost quietly fixed a bug here.
I think so, yes. Over the course of this discussion, a voice in the back
of my head has been helpfully suggesting that we've got two classes that
are trying to do the same thing...
--
-----------------------------------------------------------
Paul "TBBle" Hampson, B.Sc, LPI, MCSE
On-hiatus Asian Studies student, ANU
The Boss, Bubblesworth Pty Ltd (ABN: 51 095 284 361)
Paul.Hampson at Pobox.com
Of course Pacman didn't influence us as kids. If it did,
we'd be running around in darkened rooms, popping pills and
listening to repetitive music.
-- Kristian Wilson, Nintendo, Inc, 1989
License: http://creativecommons.org/licenses/by/2.1/au/
-----------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.secondlife.com/pipermail/sldev/attachments/20070603/e128e080/attachment.pgp
More information about the SLDev
mailing list