[sldev] Preferred CAPS timeouts and retries?

Ryan Williams rdw at lindenlab.com
Mon May 21 20:43:24 PDT 2007


John Hurliman wrote:
> It seems like someone has been doing upgrades on the CAPS servers 
> these days and they perform a lot better than they used to (the 
> turning point seemed to be shortly after group IMs were converted to 
> CAPS and everyone complained). The 502 errors are back but that's not 
> actually a problem, at least we can connect to the servers now most of 
> the time. The other few times when an initial connection doesn't 
> succeed, it seems to be because of a server problem and the next 
> couple of connection attempts will also likely fail. I've seen up to 
> 20 failed connections before a successful one (using 30 second 
> timeouts), and one libsecondlife user is reporting 40 failed 
> connections followed by a successful one (using 60 second timeouts).
>
For the EventQueueGet capability, 30 second timeouts with a 502 are 
normal and expected.  We're using Comet 
(http://alex.dojotoolkit.org/?p=545) to push messages from the simulator 
to the viewer over HTTP.  This functionality has been mostly unused, so 
most of the time the viewer just waits for nothing, causing a timeout 
and 502, and then it retries. This the viewer always has an HTTP 
connection open to the simulator.  When the simulator actually does have 
a message, it sends it down the pipe right away, and then I think you 
don't get a 502.  As you've noticed, the case where the simulator has a 
message is increasingly common, and will become more so as we build out 
our LLSD messaging infrastructure.  We should probably make an effort to 
not emit a 502 for expected behavior like this, but as I understand it, 
that will be tricky since Squid is acting as an intermediary and we 
might not be able to change its behavior for this one case.

You shouldn't be seeing this for other caps, though, so do tell if other 
ones are.  The capability services don't have a special amount of load, 
to my knowledge, so you shouldn't be seeing performance problems from them.

Has there been other discussion about the caps?  It seems like the Event 
Queue is a topic that needs more documentation.

-RYaN

> Is 60 seconds too short of a timeout for establishing the initial CAPS 
> connection? What is a preferable timeout? After a certain number of 
> retries should we just give up to prevent the already half-dead server 
> from getting hit even more, or just keep trying until it works 
> (although only some bots have the luxury of waiting around for 40 
> minutes before teleport and group IMs will start functioning)?
>
> John Hurliman
> _______________________________________________
> Click here to unsubscribe or manage your list subscription:
> /index.html



More information about the SLDev mailing list