[sldev] Complete viewer compilation benchmarks
James Cook
james at lindenlab.com
Sat Mar 31 21:28:48 PDT 2007
This is a great analysis. Thanks for sharing.
My own internal tests with building Second Life on Windows, Mac and
Linux show that build time is almost completely CPU bound. The best way
to get faster builds is to find more workstations for Incredibuild,
distcc, or your distributed build system of choice. :-)
Where a RAM disk or faster hard drive might matter is for linking. Link
times are primarily I/O bound. A 10K RPM disk will speed linking by 30%
or more. See below.
James
(A piece of mail I sent internally a while ago)
Takeaway:
I humbly recommend developer workstations include a secondary drive,
preferably at 10,000 RPM.
Details:
I decided to try to reduce my link times by throwing hardware at the
problem.
C: Western Digital 80 GB 7200 RPM 1.5 Gb SATA disk that came with my
box. It has the operating system and compiler installed on it.
D: Western Digital 120 GB 7200 RPM 1.5 Gb SATA disk I added.
E: Western Digital 36 GB 10,000 RPM 1.5 Gb SATA disk I added.
Time to link newview_noopt.exe:
C: 2'15"
D: 2'00"
E: 1'30"
Also, because I'm remote checking out a branch takes a while. I usually
make a copy of a "release" checkout, then svn switch it to the target
branch.
Time to duplicate "release" checkout:
C: 13'42"
D: 8'45"
E: 6'35"
E: is this disk, which cost $120 including tax and shipping.
http://www.newegg.com/Product/Product.asp?Item=N82E16822136054
Dale Glass wrote:
> I've been working on benchmarking the compilation speed. My idea was
> to try both extremes: Building after a reboot, when no data is
> cached, and building from an install located entirely in RAM, with
> absolutely no hard disk usage.
>
> The point of this was to determine how much difference a better hard
> disk could make. Results were quite interesting.
>
> Hardware:
> Athlon 64 X2 5200+ (1MB cache)
> 4GB ECC DDR2 666 RAM
> Root on RAID-1, SAMSUNG HD300LJ and Maxtor 6V300F0
>
> Settings:
> Compiling with -j2
> llmozlib enabled
> building 32 bit SL version
> Source from http://svn.daleglass.net/secondlife, rev 81.
>
> Software:
> Gentoo, x86_64, gcc 4.1.1
> Ubuntu Edgy, x86_64, gcc 4.1.2
>
> Gentoo tests were done with /tmp on tmpfs.
>
> Ubuntu tests were done with the whole install on /tmp on tmpfs, and
> the whole install on disk. This took about 1.5GB RAM:
>
> * 334MB for temporary files generated during compilation
> * 642MB for the source code
> * 462MB for the Ubuntu install (probably can be trimmed a bit)
>
>
> Ubuntu tests were done as follows: Distribution was installed with
> debootstrap. Sources were copied into it (same copy as used for
> Gentoo). The whole tree, OS and sources included was copied to /tmp
> (on tmps), and chrooted into.
>
> The "(on disk)" tests were done afterwards, by copying the whole tree
> (1.5GB) back to disk (as some changes were required to get the build
> going).
>
> The "(after reboot)" were done as follows:
> 1. Do required preparations (eg, binary removal)
> 2. Reboot
> 3. Login
> 4. Chroot
> 5. Build
>
> X wasn't running during the tests.
>
>
> Benchmarks:
>
> Full compilation time
> ---------------------
>
> Gentoo:
> real 16m44.594s
> user 29m25.549s
> sys 2m57.599s
>
> Ubuntu:
> (after reboot)
> real 19m23.391s
> user 35m15.595s
> sys 2m9.673s
>
> (on disk)
> real 19m12.447s
> user 35m23.223s
> sys 2m16.113s
>
> (on ram)
> real 19m5.572s
> user 35m24.639s
> sys 2m13.387s
>
> Build with no changes
> ---------------------
>
> Gentoo:
> First time:
> real 0m41.752s
> user 0m39.938s
> sys 0m1.722s
>
> Second time:
> real 0m42.537s
> user 0m39.220s
> sys 0m1.441s
>
> Ubuntu:
> (after reboot)
> real 0m27.656s
> user 0m14.858s
> sys 0m0.952s
>
> (on disk)
> real 0m15.457s
> user 0m14.963s
> sys 0m0.644s
>
> (on ram)
> real 0m15.444s
> user 0m14.771s
> sys 0m0.784s
>
> Remove binaries, rebuild
> ------------------------
>
> Gentoo:
>
> real 1m24.408s
> user 1m12.687s
> sys 0m4.599s
>
>
> Ubuntu:
> (after reboot)
> real 1m17.616s
> user 0m37.895s
> sys 0m4.360s
>
> (on disk)
> real 0m41.092s
> user 0m37.455s
> sys 0m3.546s
>
>
> (on ram)
> real 0m41.268s
> user 0m38.016s
> sys 0m3.258s
>
>
> Conclusions:
>
> Get RAM, and lots of it! Results seem to show that so long enough RAM
> is present, the whole tree can be cached. This is why the "(on disk)"
> results differ so little from the ones done fully on RAM. Based on
> personal experience, 2GB is enough to get things done but far from
> perfect, 4GB is ideal.
>
> Gentoo appears to compile 15% faster than Ubuntu, but Gentoo's scons
> is 2 times slower.
>
> The hard disk creates a very significant difference if data is not
> cached. If it is, it seems to be largely irrelevant. My guess as to
> what happens during a full build: Reading source files doesn't take
> long, and all the products of the build (.o files) get successfully
> cached, so linking takes less than could be expected. This is why
> while a full compile from RAM on Ubuntu takes 18 seconds less, a
> relink takes 36 seconds less (as the .o files need to be read).
>
> Compiling with -j3 (vs -j2 on a dual core system) may mean a
> performance decrease if there's too little RAM, as the compiler can
> easily take 200-300MB for some files, which could be enough to evict
> useful data out of the cache.
>
> A guess is that using tmpfs for /tmp may help in the case of too
> little RAM, by acting as a permanent cache that doesn't get evicted
> by other things going on. This may be ultimately counterproductive,
> however.
>
> A better hard disk should be the last option -- load the box with as
> much RAM as possible, and the hard disk's performance will make very
> little difference. A better disk should help quite a lot if it's not
> possible to get enough RAM for an effective cache, though.
>
>
>
>
> The worsening of the overall compile time on Ubuntu was unexpected. I
> can only guess that Gentoo DOES improve performance quite a bit by
> compiling with architecture specific optimizations. However, this is
> strange as 64 bit systems are new and all have SSE and similar
> things, so there shouldn't be such a difference as between a program
> built for 386 and Athlon MP.
>
> Perhaps the current Ubuntu compiler has bad performance for whatever
> reason? Version isn't exactly the same, so there might be something
> there.
>
> Gentoo was built with:
> CFLAGS="-march=athlon64 -O2 -pipe"
>
> Absolutely nothing especially weird there.
>
> Ubuntu compiler:
> Using built-in specs.
> Target: x86_64-linux-gnu
> Configured
> with: ../src/configure -v --enable-languages=c,c++,fortran,objc,
> obj-c++,treelang --prefix=/usr --enable-shared --with-system-zlib
> --libexecdir=/usr/lib --without-included-gettext
> --enable-threads=posix --enable-nls --program-suffix=-4.1
> --enable-__cxa_atexit --enable-clocale=gnu --enable-libstdcxx-debug
> --enable-mpfr --enable-checking=release x86_64-linux-gnu
>
> Thread model: posix
> gcc version 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)
>
> Gentoo compiler:
> Using built-in specs.
> Target: x86_64-pc-linux-gnu
> Configured
> with: /var/tmp/portage/gcc-4.1.1-r3/work/gcc-4.1.1/configure
> --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.1.1
> --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.1.1/include
> --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.1.1
> --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.1.1/man
> --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.1.1/info
> --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.1.1/include/g++-v4
> --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu
> --disable-altivec --enable-nls --without-included-gettext
> --with-system-zlib --disable-checking --disable-werror
> --enable-secureplt --disable-libunwind-exceptions
> --enable-multilib --disable-libmudflap --disable-libssp
> --disable-libgcj --enable-languages=c,c++,fortran
> --enable-shared --enable-threads=posix
> --enable-__cxa_atexit --enable-clocale=gnu
> Thread model: posix
> gcc version 4.1.1 (Gentoo 4.1.1-r3)
>
>
> On the other hand, scons is oddly noticeably faster on Ubuntu. I'm not
> sure what's going on here either. I ran ltrace on scons, and it
> suggests that something significantly different is going on Gentoo
> and Ubuntu, while the trees are actually the same. Traces are
> attached.
>
> With enough RAM present, scons will run without any disk access, and
> using very little kernel time. This seems to suggest that scons could
> use some optimization.
>
>
> ------------------------------------------------------------------------
>
> ltrace -f -c python /usr/bin/scons -j2 DISTCC=no BTARGET=client BUILD=release ARCH=i686 COLORGCC=yes
>
> % time seconds usecs/call calls function
> ------ ----------- ----------- --------- --------------------
> 21.91 5.147915 84 61026 strlen
> 16.56 3.889046 38 101578 memcpy
> 16.21 3.809040 38 98098 __ctype_b_loc
> 10.16 2.385957 114 20791 memset
> 10.08 2.367111 120 19631 free
> 9.22 2.165291 474 4564 fopen64
> 9.02 2.117852 331 6386 strcmp
> 1.50 0.353396 19 18497 malloc
> 1.41 0.330744 18 17851 strchr
> 0.85 0.199837 18 10633 realloc
> 0.62 0.145106 18 7852 strncpy
> 0.58 0.137361 18 7278 strcpy
> 0.29 0.069284 34 1988 __xstat64
> 0.18 0.042296 18 2287 memchr
> 0.17 0.040029 19 2100 vsnprintf
> 0.13 0.029530 26 1106 _IO_getc
> 0.12 0.027302 18 1442 pthread_self
> 0.09 0.022305 18 1214 __rawmemchr
> 0.09 0.021296 2129 10 popen
> 0.08 0.019905 4976 4 qsort
> 0.06 0.014357 20 707 __errno_location
> 0.06 0.014232 18 772 funlockfile
> 0.06 0.014192 97 145 fread
> 0.06 0.014129 18 772 flockfile
> 0.06 0.013733 980 14 __uflow
> 0.06 0.013402 18 711 strrchr
> 0.05 0.012512 42 293 fclose
> 0.05 0.011957 18 648 strstr
> 0.05 0.011486 32 351 __fxstat64
> 0.04 0.010384 20 517 strerror
> 0.03 0.007038 19 361 fileno
> 0.03 0.006391 18 351 memmove
> 0.03 0.006308 33 190 sem_post
> 0.01 0.003514 18 187 sem_trywait
> 0.01 0.003404 243 14 dlopen
> 0.01 0.002185 32 67 sigaction
> 0.01 0.002052 19 108 _setjmp
> 0.01 0.002027 20 100 __strtod_internal
> 0.01 0.001904 19 100 localeconv
> 0.00 0.001044 18 55 strcat
> 0.00 0.000637 79 8 pclose
> 0.00 0.000291 20 14 dlsym
> 0.00 0.000260 18 14 __ctype_tolower_loc
> 0.00 0.000250 27 9 readdir64
> 0.00 0.000236 39 6 lseek64
> 0.00 0.000216 43 5 fwrite
> 0.00 0.000210 21 10 clearerr
> 0.00 0.000200 66 3 ftell
> 0.00 0.000164 20 8 getenv
> 0.00 0.000145 29 5 fflush
> 0.00 0.000139 34 4 isatty
> 0.00 0.000135 19 7 feof
> 0.00 0.000134 67 2 opendir
> 0.00 0.000134 19 7 sem_init
> 0.00 0.000116 38 3 getcwd
> 0.00 0.000110 36 3 readlink
> 0.00 0.000080 20 4 sem_wait
> 0.00 0.000075 37 2 chdir
> 0.00 0.000070 70 1 realpath
> 0.00 0.000068 34 2 closedir
> 0.00 0.000066 22 3 setlocale
> 0.00 0.000063 21 3 sigemptyset
> 0.00 0.000055 27 2 sprintf
> 0.00 0.000050 50 1 read
> 0.00 0.000046 46 1 sysconf
> 0.00 0.000043 43 1 open64
> 0.00 0.000041 20 2 __strdup
> 0.00 0.000041 20 2 strncat
> 0.00 0.000040 20 2 ungetc
> 0.00 0.000035 35 1 close
> 0.00 0.000035 35 1 rewind
> 0.00 0.000029 29 1 pow
> 0.00 0.000025 25 1 getpid
> 0.00 0.000024 24 1 __libc_current_sigrtmax
> 0.00 0.000024 24 1 nl_langinfo
> 0.00 0.000024 24 1 __libc_current_sigrtmin
> ------ ----------- ----------- --------- --------------------
> 100.00 23.491165 390940 total
>
>
> ------------------------------------------------------------------------
>
> ltrace -f -c python /usr/bin/scons -j2 DISTCC=no BTARGET=client BUILD=release ARCH=i686 COLORGCC=yes
>
> % time seconds usecs/call calls function
> ------ ----------- ----------- --------- --------------------
> 27.85 2.305453 137 16739 strcmp
> 25.58 2.117985 330 6413 memcpy
> 24.67 2.042898 26191 78 read
> 9.37 0.776018 18 41706 malloc
> 3.45 0.285930 18 15525 free
> 3.03 0.250792 18 13794 strchr
> 2.09 0.173386 18 9502 memmove
> 1.20 0.099074 18 5417 strlen
> 0.96 0.079327 18 4350 memset
> 0.69 0.057382 18 3122 realloc
> 0.62 0.051619 18 2860 strncmp
> 0.22 0.018134 34 522 __xstat64
> 0.06 0.004769 18 262 strrchr
> 0.03 0.002211 18 122 __errno_location
> 0.02 0.001679 31 54 lseek64
> 0.02 0.001655 17 92 __sigsetjmp
> 0.01 0.001149 31 36 isatty
> 0.01 0.001133 37 30 open64
> 0.01 0.000984 32 30 close
> 0.01 0.000939 18 52 __ctype_toupper_loc
> 0.01 0.000939 18 52 __ctype_tolower_loc
> 0.01 0.000865 18 46 __strtol_internal
> 0.01 0.000619 18 33 getenv
> 0.01 0.000481 240 2 dlopen
> 0.00 0.000383 31 12 getgroups
> 0.00 0.000323 32 10 signal
> 0.00 0.000270 135 2 realpath
> 0.00 0.000242 24 10 setlocale
> 0.00 0.000231 21 11 snprintf
> 0.00 0.000226 37 6 __xstat
> 0.00 0.000168 56 3 _obstack_begin
> 0.00 0.000152 38 4 sprintf
> 0.00 0.000134 33 4 sigaction
> 0.00 0.000133 44 3 __strdup
> 0.00 0.000132 33 4 fcntl
> 0.00 0.000103 20 5 dcgettext
> 0.00 0.000099 49 2 readlink
> 0.00 0.000088 44 2 printf
> 0.00 0.000085 21 4 strxfrm
> 0.00 0.000082 82 1 bindtextdomain
> 0.00 0.000082 20 4 drand48
> 0.00 0.000080 40 2 __fxstat64
> 0.00 0.000076 38 2 time
> 0.00 0.000069 34 2 getuid
> 0.00 0.000066 33 2 getegid
> 0.00 0.000065 32 2 geteuid
> 0.00 0.000065 32 2 getgid
> 0.00 0.000062 31 2 frexp
> 0.00 0.000060 20 3 nl_langinfo
> 0.00 0.000056 18 3 __fsetlocking
> 0.00 0.000049 24 2 sigemptyset
> 0.00 0.000049 24 2 dlsym
> 0.00 0.000047 23 2 getpid
> 0.00 0.000044 22 2 sysconf
> 0.00 0.000043 21 2 memchr
> 0.00 0.000042 21 2 localeconv
> 0.00 0.000042 21 2 putenv
> 0.00 0.000041 20 2 __ctype_b_loc
> 0.00 0.000041 20 2 srand48
> 0.00 0.000038 19 2 strcasecmp
> 0.00 0.000033 33 1 sbrk
> 0.00 0.000024 24 1 textdomain
> 0.00 0.000021 21 1 __xpg_basename
> 0.00 0.000020 20 1 strncpy
> 0.00 0.000020 20 1 dirname
> 0.00 0.000019 19 1 strcpy
> 0.00 0.000019 19 1 calloc
> 0.00 0.000019 19 1 fputs_unlocked
> 0.00 0.000019 19 1 strstr
> ------ ----------- ----------- --------- --------------------
> 100.00 8.279583 120975 total
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Click here to unsubscribe or manage your list subscription:
> /index.html
More information about the SLDev
mailing list