[sldev] OpenJPEG optimization (Patch 20070331)
Callum Lerwick
seg at haxxed.com
Sun Apr 1 17:17:46 PDT 2007
Alright, so I'm still obsessively plugging away at OpenJPEG. My latest
t1 patch is here:
http://www.haxxed.com/code/openjpeg-1.1.1-t1-optimize.20070331.patch
The main change is I noticed the lookup tables are dynamically allocated
and constantly recalculated, they all seem to be static so that seems
pretty pointless. So I moved the table calculation into a separate
program that generates a static header file to use instead.
I've also begun taking a look at the DWT. Using oprofile to track
DATA_CACHE_MISSES, I determined that accessing arrays with large strides
are an issue. I added FIXME comments with the results to the hotspots,
but I haven't wrapped my brain around the code enough to actually fix it
yet.
Another issue I noticed is OpenJPEG dynamically allocates and frees RAM
for all sorts of things, all over the place (except the t1?), often
inside loops. And it does this with an opj_malloc wrapper function, that
looks like this:
void* opj_malloc( size_t size ) {
void *memblock = malloc(size);
if(memblock) {
memset(memblock, 0, size);
}
return memblock;
}
Its zeroing every piece of RAM it allocates! That explains where all
those memset()s are coming from in oprofile. This is likely unnecessary
in most places, but unfortunately all the code has been written to
assume all RAM has been cleared before use. A lot of structures and
arrays of pointers are dynamically allocated, and it is then assumed
that the pointers are already cleared to 0, which means bad things
happen if you just take the memset out of opj_malloc. With the help of a
specially instrumented opj_malloc and gdb, I've been going through and
painstakingly determining what allocations don't need to be cleared and
which do, starting with the largest allocations and working my way down.
Ones that don't are changed to a plain malloc, ones that do are changed
to calloc.
My latest dwt (and other things) patch:
http://www.haxxed.com/code/openjpeg-1.1.1-dwt-optimize.20070331.patch
I also came up with something of a test suite to better quantify how
much improvement I'm getting, and make sure I'm not breaking anything.
Here's my torturej2k script:
#!/bin/sh
for FILE in ~/Pictures/Jpeg2000/*.{j2k,jp2};do
echo Decoding $FILE
j2k_to_image -i $FILE -o $FILE.bmp >/dev/null
done
My test suite consists of a bunch of images I grabbed from various web
sites:
1.jp2 file4.jp2 oaklandbest.jp2 p0_05.j2k p0_16.j2k
Bretagne1.j2k file5.2.jp2 oaklandlossless.jp2 p0_09.j2k p1_01.j2k
Bretagne2.j2k file5.jp2 Otoe_OrthoImage8.jp2 p0_10.j2k p1_02.j2k
CB_TM432.jp2 file8.2.jp2 Otoe_Relief8.jp2 p0_11.j2k p1_03.j2k
CB_TM_QQ432.jp2 file8.jp2 p0_01.j2k p0_12.j2k p1_06.j2k
file1.jp2 oakland03.jp2 p0_02.j2k p0_13.j2k potholes2.jp2
file3.jp2 oakland50.jp2 p0_04.j2k p0_14.j2k
http://www.openjpeg.org/index.php?menu=samples
http://www.microimages.com/gallery/jp2/
http://www1.mplayerhq.hu/MPlayer/samples/jpeg2000/
For performance testing, I run "time torturej2k", and take the average
of three runs from the "real" time. With an unmodified OpenJPEG 1.1.1,
it runs in 41.243 seconds. With my t1 patch, it runs in 38.14 seconds,
7.5% faster. With my dwt patch as well, it runs in 37.57 seconds, 9%
faster. For comparison, KDU runs the test in 16.041 seconds, 57.3%
faster than my current optimizations.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.secondlife.com/pipermail/sldev/attachments/20070401/78ed9120/attachment.pgp
More information about the SLDev
mailing list