[sldev] [VWR] Heap checkers
Robin Cornelius
robin.cornelius at gmail.com
Thu Jul 31 03:55:26 PDT 2008
Following on from last thursdays meeting (Rob's Hours), an action
point to ping sldev with this was raised, so here goes.
We currently know there is some memory/heap corruption occurring with
in the viewer, I for one see this as random crashes which can often
include new() or malloc() operators crashing or double free errors. I
also know that the corruption is difficult to track down and the end
results depend on the compiler used, the arch 32/64 bit etc and are
all related to the padding at the start/end of a allocated heap block
and the arch alignment and also what happens to be next in memory.
2 such issues have been identified so far but this was done the hard way:-
Issue 1) The abstracted gl system (llrender.cpp) has two function
calls that can and do over run the array. LLRender::color4ub() and
LLRender::texCoord2f() both can write to element 4096 of a 4096
element array, LLRender::vertex3f() causes the issue by setting the
array index to 4096, but protects itself from this (and typically all
3 are called as a group). Big thanks to Carjay for that one . This
issue is triggered for tortured prims and sculpties but due to memory
alignment etc does not seem to cause much problems on i386 but can be
critical to the GL context on 64 bit builds.
Issue 2) creating 2 XUI input events, For instance managing to click
the "Teleport" button twice can cause the xui event to be triggered
twice. Two do so takes a broken mouse (yay mine does this from time to
time) which generates 2 very close click events. or hitting enter and
clicking Teleport at the same time. Despite this being an obvious
attempt to break something there is an uncaught use of a freeed memory
block in the teleport code as the xui handler passes a pointer (by
value) to an asset which is used to make a message then freeed. Teh
2nd entry uses the same pointer (which now points to a free memory
location) and tries to run the code again. Strangely this causes a
crash not there but on the next new() or malloc() operator as the
memory allocation code barfs up due to heap corruption.
The point here is that a good heap checking utility could catch these,
so we started a discussion of possible utilites that may be of use
last week and thinks like duma,tcmalloc,valgrind to name a few linux
ones of the top of my head. I also found that there is a windows heap
utility that can be enabled with the gflags tool on a per process
basis.
So the questions remain, any other good utilities that people know of,
even platform dependent ones?, as 98% of the code is common anyway so
an error caught somewhere is probably effecting more that one system.
The issues with the utilities are run time speed, the really through
checkers which put pages boundaries between allocs are very very slow
and use lots and lots of memory. I have not so far been able to get
duma to even get to the viewer logon screen. tcmalloc runs with out
heapcheck but with heap check gtk threads barfs and the viewer will
not start. The other issue is that if the run time speed IS that bad
the viewer will never hold a connection to the server and getting
fully in world is really a requirement, any ideas here on how to
resolve this? the Microsoft heap checker has so far not caught
anything, including case study 1 above and also fails to get in/stay
in world (due to viewer execution speed) 4/5 times.
tcmalloc's heap monitor does produce some interesting info that could
help catch memory leaks or profile the viewers memory usage, see
output at http://www.byteme.org.uk/heap/
Regards
Robin
More information about the SLDev
mailing list