[opensource-dev] Group IMs and scalability.

Carlo Wood carlo at alinoe.com
Sat Apr 17 07:18:25 PDT 2010


On Sat, Apr 17, 2010 at 12:02:57AM -0700, Erik Anderson wrote:
> Hey, if you're looking for a review of message queueing agents, I ran across an
> SL review of MQs a while back when trying to choose one for our company's back
> end COMET server.  It had value in my research and may have for someone trying
> to come up with chat alternatives...
> 
> http://wiki.secondlife.com/wiki/Message_Queue_Evaluation_Notes

Um... that page links to a page of mine and says:

  http://www.xs4all.nl/~carlo17/irc/run-irc.htm [Carlo Wood's notes on IRC].
  Of note is the fact that IRC does not guarantee message order. 


I must say that I can't remember to ever have said that.
In fact, IRC does garantee message order from the same source
(which is all you can ask for), and obviously also between
messages where one is a reaction to another.

For example:

Person A asks "What does LL stand for?"
Person B answers "Linden Lab"

Then those messages ARE garanteed to be seen in that order
by everyone.

The *current* group IM system being used in SL manages to
reverse even how I see my own messages! I wonder how often
different people see messages in a different order :/


The page also says that the largest IRC channels "top out" near 3400 members.
The reason for that is that in the IRC protocol every part and join is
sent to all members. I added an extention to the protocol (that can be
set as channel MODE) to delay join/part messages until someone actually
talks. Thus, instead of:

12:10	foo JOIN #channel
12:11	bar JOIN #channel
12:15	xyz JOIN #channel
12:17	foo PRIVMSG #channel :Hello

You see:

12:17	foo JOIN #channel
12:17	foo PRIVMSG #channel :Hello

I've seen channels with up to 20,000 users work fine
like that (large "events").

Nevertheless, even that has a limit, of course:
It remains needed to send a message that is spoken on
the channel / group to every participant on the channel / group.

However, an interesting fact is that groups do not have
to grow indefinitely: *active* participants that read messages
and write messages can, say read ten times as fast as they
can type, that means that beyond 10 active participants the
communication will start to slow down because the humans
themselves are busy catching up.

Non-active participants that haven't said anything for the
longest time (you can simply order them like that) to NOT
need real time information: you could just queue messages
(that have a limit of say 1 per second per group, no matter
how large the group is thus) for ... 10 minutes, and thus
600 messages, and then start to drop messages to those that
have been silent the longest time.

Thus, a new message comes in (that person is bumped to
the top of the active participant list). The message is
queued for output. The output thread starts sending the
message to the top of the active participants list working
it's way down; meaning: it puts the messages in the per-member
output queues, and stops doing that if the total number of
memory is exceeded (note that only one pointer per message
is needed, so you can serve a LOT of users that way).
If the thread is still busy while a new message comes in,
it also stops and starts with the new message, provided
that it sent the message at least to those that were active
in the past 10 minutes (which will be at most around 10
people (and not grow indefinitely), see above), otherwise
the new message is queued (for 1 microsecond) until it
did that.

That scales to an infinitely sized group, only limited
and *automatically* limited by the human digestion speed
(if too many messages occur, people will bail off and
therefore automatically be skipped after 10 minutes).

The only limit then remaining is the problem of having
a file descriptor open per member, even those that idle:
in order to know if they still idle you have to read
their file descriptor... and as we all know, the kernel
implementation of watching many sockets isn't optimal on
every operating system. There is a limit there between
4000 and 20000 open file descriptors per machine.

Hence, in order to make groups really scalable, you'll
have to kick idling people out. For example:
Someone logs into SL, is considered active and is added
to all his 500 groups (or maybe the viewer will allow
people to say which groups they want to join *automatically*
in the future :p.  Setting a limit to the number of
automatically joined groups makes sense, setting a limit
to the groups you can be a member of not). If does read
not react to lots of those groups, so he is being kicked
out of very active groups after 10 minutes, and out of
non-active groups with thousands of members that are logged
in, after -say- one hour.

Note that typically those latter groups are of the type
"access groups", groups that are used for access to a sim,
not for chatting. It makes sense to treat those completely
different. In the very least I'd say you shouldn't join those
automatically at login.

-- 
Carlo Wood <carlo at alinoe.com>


More information about the opensource-dev mailing list