IMs and group IMs (was Re: [sldev] Frequent bugs with difficultrepros)

jwolk at lindenlab.com jwolk at lindenlab.com
Sun Mar 2 11:37:49 PST 2008


I'll chime in here with some information.  These are not excuses, just
information.

Double IMs, out of order IMs and other "strange" behaviors are bugs and
the protocol used (UDP or TCP) does not guarantee will be successful or
fail.  These bugs will hopefully be fixed at some point in the future, but
these also have been happening for a long, long time even with the old
centralized userserver and are very difficult to do in backwards
compatible way.

Synopsis:

A centralized server is simpler, but in no way can scale and definitely is
not a solution.  There are many problems with the current structure and
the distributed chat module.  I hope to fix some design flaws in the
distributed chat module sometime soon (TM) but some changes that would be
needed to help group IM are very vast and none of which, much like the
rest of SL, are easy.  Also, some of the group IM "features" that SL
currently has (like "passively" joining group IM sessions when you log in)
are strange and make the group IM work much harder than it really needs to
be.

Details:

Now, some stuff about the userserver.  The userserver worked "better" in
terms of group IM because it was centralized and did keep a persistent
"connection" to all of the users logged into SL at any given moment. 
Therefore, if a group IM wanted to get delivered to someone, the
userserver simply just sent an IM on a connection that it knew already
existed.

The code path used to be: viewer -> userserver -> (msg all members' in the
group viewer)

Unfortunately, as Kelly said, this way of keeping an active connection to
every user probably wouldn't have made it past 6k concurrent users.

Currently, only an SL region has an active connection to you and your
viewer.  Therefore, all requests to and from your viewer *HAVE* to go
through the region you're connected to and that region can change from
time to time (when you region cross or teleport).

Therefore, just in terms of code paths or the number of processes needed
to send a message around, the current distributed architecture is much
more complicated.

viewer -> region -> distributed chat module -> (send to all regions which
have a member of the group in them) -> have those region deliver to your
viewer

Therefore, if there is a problem with the region (being slow or
overwhelmed) you're in or the regions the other members of the group are
in there can be a problem sending the IM.  Also, if the chat module has a
problem, there will be a problem sending the IM.  Also, the distributed
chat module has to look up what region each of the members is in and send
a message to them.  Basically, there are many more moving parts now than
there were before.

Now, I will be the first to say that the distributed chat module has some
design flaws and there are some things that can be done in both the chat
module and the general "flow" of group IMs that can be streamlined and
fixed.  But these changes are hard, especially when being made in a
backwardly compatible way.

-Jonathan Linden



More information about the SLDev mailing list