[sldev] Jpeg2000 codec optimization

Stefan Westerfeld stefan at space.twc.de
Mon Jan 22 07:32:00 PST 2007


   Hi!

I've spent some time on profiling and debugging OpenJPEG now, and a
first optimized version is available at

http://space.twc.de/~stefan/quickOpenJPEG

On Thu, Jan 18, 2007 at 03:25:12AM -0500, bushing Spatula wrote:
> 
> On Thu, 18 Jan 2007 03:48:05 +0100, "Stefan Westerfeld"
> <stefan at space.twc.de> said:
> 
> > I wanted to have a look into the jpeg2000 thing. Now, my first thoughts
> > were something along the lines that I could identify one or two
> > functions in the image decoder that produce the main performance issues,
> > understand what they do, and replace them (possibly using special multi
> > media instructions like SSE). It should have been a small fun hacking
> > thing, which could be completed in a reasonable time frame... but so far
> > I failed, because it is not that simple.
> 
> Unfortunately, you're right. :(
> 
> Analysis on OSX/PPC showed:
> 
> - 27.1% dwt_decode_1_real (OpenJPEG)
> - 22.9% tcd_decode_tile (OpenJPEG)
> - 5.6% t1_dec_sigpass_step (OpenJPEG)
> - 5.4% t1_decode_cblks (OpenJPEG)
> - 5.4% t1_dec_refpass_step (OpenJPEG)
> - 4.7% mqc_decode (OpenJPEG)

For statistics of my Linux/AMD64 combination, see URL above.

> tcd_decode_tile is really just a call to dwt_decode_real, so half of the
> time is spent in dwt_decode_real and dwt_decode_1_real. *(but see below)

Well, have spent some time on profiling and debugging now, and when I
started, the Tier-1 decoding (t1_decode_cblks + children) was worse than
DWT, and I tried to improve this in the last days. It takes less time
now. However I'll see if it would be possible to make it even faster,
because otherwise I don't see a chance to perform as good as the Kakadu
libs.  And then of course, the DWT should get some more optimization.

> Perhaps -- we have to make sure to maintain compatibility with all of
> the textures that have already been uploaded to SL.  

Thats a point where testing would help. I made my tree available, so if
anybody tests it, I'd like to know, whether

1. my tree misdecodes something (one can automate that, because right
now the decoded bmp should be bit-wise identical, so md5sum comparisions
can be used)

2. some files do not benefit from the optimizations much

These corner cases would be interesting. I'd also like to hear it, if the
(I think unlikely) case quickJPEG is slower than the normal version
occurs.
 
> > What would be ideal is a if there would be two things:
> >  - one (or a few) sample image that are typical for what needs to be
> >    decoded
> 
> I can provide as many as you'd like, that part's easy! :)  I attached a
> small one, to get you started.

I think a representative set would help. Maybe some files are slow due
to different parts of the decoder than others.

> >  - a standalone test app that decodes, measures performance and compares
> >    the expected result with the actual result (this would help to see
> >    if a certain optimiyation step messed up the decoder)
> 
> [...]
> 
> This is not the world's most scientific comparison -- one image at a
> time, and who knows if it's representative.  I ran each program over 129
> images, and got these numbers on my 2GHz PPC G5:
> 
> OpenJpeg: 1 min 25 sec
> GeoJasPer: 1 min 34 sec
> Kakadu: 19 sec (!)

Now you can try my quickOpenJpeg, and see how it performs.

However, if your machine is 32-bit, the performance gain may not be as
nice as on my AMD64 (at least my suspicion is that the fixed point
arithmetic used by OpenJpeg is a problem on 32-bit machines - I'll look
into improving this later on).

   Cu... Stefan
-- 
Stefan Westerfeld, Hamburg/Germany, http://space.twc.de/~stefan


More information about the SLDev mailing list