[sldev] Grid redundancy in face of disaster (Re: Navigations
andLandmark Project)
Joshua Bell
josh at lindenlab.com
Wed Apr 23 09:13:13 PDT 2008
Soft wrote:
> Any simulator (machine) on the simulator VPN, with the correct
> certificates, and with the correct simulator binary can bring up any
> region (island/"sim").
Modulo different hardware classes (and simulator-to-CPU ratio), as Jason
pointed out in a fork of this thread. That's also easily tweaked in our
management tool, but pricing for the host capabilities (i.e. Quality of
Service) comes into it.
> Pragmatically, we limit machines to specific
> data centers. This is done because adjacent regions chatter back and
> forth considerably and there's no sense in sending that traffic across
> the country.
>
Specifically, it makes region crossings much less prone to
rubber-banding since the handoff is faster.
> The address move project that I believe you're thinking about was for
> changing simulators' IP addresses. That's a bit more touchy
The reason this was problematic was that prior to 1.21 Server (which
we'll deploy some day, I swear...) the way that this redundancy worked
is that sim hosts running simulators that weren't hosting regions would
poll the central database asking "got anything for me to run?" - we call
these "spares". Too many spares and you have too many machines polling
the database. Too few spares (or too infrequent polling) and you don't
have enough reserve capacity to handle the demand when you shut down a
few hundred sim hosts for maintenance work.
The 1.21 Server update replaces this polling with a service based around
an HTTP "long poll", so we can in theory have an arbitrary number of
spares, and the glorious future of "half the grid dies, but the grid
just routes around the problem" has arrived. Turning this service on
will be done slowly with lots of monitoring and will happen after the
1.21 rollout itself, to avoid fireworks.
Joshua
More information about the SLDev
mailing list