[sldev] OSX mutex-related lockups
Trevor Powell
trevor at gridbug.org
Sat Aug 4 21:35:21 PDT 2007
This may have been raised before; I'm pretty new to SL development,
and I haven't read absolutely all of the list archives yet, but I
haven't seen any comments about it yet.. and as an extra caveat, I'm
still working with the 1.18.0.6 code; I haven't looked at the source
for 1.18.1.2 yet. For all I know, this may have already been
addressed, in which case, please disregard everything that follows. :)
While testing various patches in an OS X build, I've noticed an issue
with mutexes. Inside llthread.cpp, line 269, llMutexes create their
internal mutex objects using the "APR_THREAD_MUTEX_DEFAULT" mutex
behaviour.
Apparently, under Win32 (I'm reliably informed), this default
behaviour is the equivalent of APR_THREAD_MUTEX_NESTED, whereas under
OSX (I've determined through testing), it's the equivalent of
APR_THREAD_MUTEX_UNNESTED. I haven't tested under Linux, but I
suspect that it's probably treated as UNNESTED there, as well.
So this means that if code within a single thread tries to lock a
single llMutex twice before unlocking it again, it will appear to run
correctly in all Win32 builds, but will cause a lockup when that same
code runs under OS X.
In my local build, I've modified the llMutex class to explicitly
request the 'NESTED' mutex type, and this seems to have resolved a
couple of frequent OS X lockups I've suffered while testing various
patches. I'd propose making this change part of the official
source; I figure that anything which makes the different platforms
work more in the same way can only be a good thing, right?
Alternately, it'd be just as good to switch the official source to
explicitly use 'UNNESTED' mutexes and so share the OS X lockups with
the Win32 folks, so that Win32-using patch authors become able to
debug their own mutex issues, which are currently invisible to them.
The base code appears to have all been written assuming that mutexes
will follow the UNNESTED mutex behaviour anyway; it's only been in
patches where I've seen code which assumed mutexes work the other
way. But I don't actually have any preference one way or the other,
as long as we all have the same mutex behaviour on the various
platforms when we're done.
It's also worth noting that there are a few mutexes created in SL
which don't go through the llMutex class, most notably in llapr.cpp
and in llpumpio.cpp. These also currently use the DEFAULT behaviour,
and probably also ought to be switched to a specific intended mutex
behaviour (either NESTED or UNNESTED), instead of letting the
different platforms treat mutexes in whatever strange manner they
happen to have set up as the default.
Anybody have thoughts on this? Or better, confirmation that this
actually is a real potential problem in the base code, and that I
haven't made some terrible newbie blunder and totally misinterpreted
all my test results? :)
Trev
More information about the SLDev
mailing list