[sldev] SSE2 / Ouch

Joshua Bell josh at lindenlab.com
Mon Jul 9 11:16:25 PDT 2007


Nicholaz Beresford wrote:
> What worries me most is that a change like this is rolled out without
> mention in the changes, not with a note like "well, we're trying
> something daring which might cause speed benefit long term, but also
> trouble, but you can roll back and please let us know about any
> problems".
My fault. It should have been listed in the notes. Here's a post-mortem:

Internally, we have multiple (poorly named) mainline branches, "release" 
and "release-candidate" (you'll see these names on 
http://wiki.secondlife.com/wiki/Source_downloads). Changes that have 
passed QA merge into one of those branches, per the following rules:

"release" is compatible with the main grid; at any point, a viewer built 
from "release" can connect with the main grid, and simulators could be 
built from "release" that could be deployed in a rolling-restart to the 
main grid.

"release-candidate" is maintained as a superset of release (so all 
changes to release are merged up into r-c), including all of the 
compatibility breakers (if any). It is deployed to the Beta grid, and on 
a "big deploy" to the main grid. (At which point, the changes are merged 
down into release)

Here's what happened specifically:

Several bug fixes were made, passed QA, and merged into "release". 
Following post-merge testing, we released the 1.17.1 optional viewer 
(and later on, picking up one more fix, 1.17.2). The SSE changes passed 
QA (mistake #1 - we apparently didn't test as comprehensively as we 
should have), and merged into "release". Subsequently, several more bug 
fixes were made and merged into "release". Following post-merge testing 
(of both those fixes and the SSE changes), we released 1.17.3 built out 
of "release".

If you read between the lines, you'll note that at any point have 
potentially two release codelines up in the air - 1.18.0 and 1.17.3. In 
our internal Jira we tag changes with what version they'll go out with. 
Prior to deciding to release 1.17.3, all of the changes (including the 
SSE changes) were marked as being included in 1.18.0. When we decided to 
also release 1.17.3, I went back and marked the included changes 
(viewer-side only) as being in 1.17.3 (and implicitly as 1.18.0, 
although they shouldn't show in the 1.18.0 notes since they're not 
"new"). Mistake #2 - I missed marking the SSE changes in Jira as being 
in 1.17.3. So they weren't included in the release notes.

(In my limited defense, I was just back from a fun vacation so my head 
wasn't screwed on straight. *sigh*)
> Either nobody was aware that it might cause trouble, or it slipped
> into 1.17.3 accidentally or it was just performed as a silent
> experient ... and I'm not sure which of those alternatives is worse.
Nothing so nefarious as the latter two. Exclusion from the release notes 
was an oversight - I simply missed tagging those changes as "also in 
1.17.3" in our internal JIRA. However, the belief was that the code was 
sufficient quality to release - it was re-tested at various stages. 
Unfortunately, it appears that those tests were not as comprehensive as 
they should have been.

As a total aside: a big part of why us Lindens are psyched up about the 
"message liberation" project is that it moves us out of the realm of 
monolithic, lock-step client/server upgrades being the norm (and hence, 
the focus of most of our process), and will force us to revamp our 
processes to focus more on asynchronous updates.

Joshua



More information about the SLDev mailing list