[sldev] SSE2 / Ouch
Joshua Bell
josh at lindenlab.com
Mon Jul 9 11:16:25 PDT 2007
Nicholaz Beresford wrote:
> What worries me most is that a change like this is rolled out without
> mention in the changes, not with a note like "well, we're trying
> something daring which might cause speed benefit long term, but also
> trouble, but you can roll back and please let us know about any
> problems".
My fault. It should have been listed in the notes. Here's a post-mortem:
Internally, we have multiple (poorly named) mainline branches, "release"
and "release-candidate" (you'll see these names on
http://wiki.secondlife.com/wiki/Source_downloads). Changes that have
passed QA merge into one of those branches, per the following rules:
"release" is compatible with the main grid; at any point, a viewer built
from "release" can connect with the main grid, and simulators could be
built from "release" that could be deployed in a rolling-restart to the
main grid.
"release-candidate" is maintained as a superset of release (so all
changes to release are merged up into r-c), including all of the
compatibility breakers (if any). It is deployed to the Beta grid, and on
a "big deploy" to the main grid. (At which point, the changes are merged
down into release)
Here's what happened specifically:
Several bug fixes were made, passed QA, and merged into "release".
Following post-merge testing, we released the 1.17.1 optional viewer
(and later on, picking up one more fix, 1.17.2). The SSE changes passed
QA (mistake #1 - we apparently didn't test as comprehensively as we
should have), and merged into "release". Subsequently, several more bug
fixes were made and merged into "release". Following post-merge testing
(of both those fixes and the SSE changes), we released 1.17.3 built out
of "release".
If you read between the lines, you'll note that at any point have
potentially two release codelines up in the air - 1.18.0 and 1.17.3. In
our internal Jira we tag changes with what version they'll go out with.
Prior to deciding to release 1.17.3, all of the changes (including the
SSE changes) were marked as being included in 1.18.0. When we decided to
also release 1.17.3, I went back and marked the included changes
(viewer-side only) as being in 1.17.3 (and implicitly as 1.18.0,
although they shouldn't show in the 1.18.0 notes since they're not
"new"). Mistake #2 - I missed marking the SSE changes in Jira as being
in 1.17.3. So they weren't included in the release notes.
(In my limited defense, I was just back from a fun vacation so my head
wasn't screwed on straight. *sigh*)
> Either nobody was aware that it might cause trouble, or it slipped
> into 1.17.3 accidentally or it was just performed as a silent
> experient ... and I'm not sure which of those alternatives is worse.
Nothing so nefarious as the latter two. Exclusion from the release notes
was an oversight - I simply missed tagging those changes as "also in
1.17.3" in our internal JIRA. However, the belief was that the code was
sufficient quality to release - it was re-tested at various stages.
Unfortunately, it appears that those tests were not as comprehensive as
they should have been.
As a total aside: a big part of why us Lindens are psyched up about the
"message liberation" project is that it moves us out of the realm of
monolithic, lock-step client/server upgrades being the norm (and hence,
the focus of most of our process), and will force us to revamp our
processes to focus more on asynchronous updates.
Joshua
More information about the SLDev
mailing list