Progress – iPhone 4 Performance (Programming)

Merry Christmas, everyone!

I’ve been working on Progress, my first iOS app, on-and-off since February. I did 99% of the coding so far, Edward Sanchez did all of the graphical design, our friend Samuel Iglesias helped with some UI tricks, and all 3 of us collaborated on UX. Today, on my last full day of version 1.0 development, I implement the last major piece of “polish” code that I’m particularly proud of, and it’s focused on performance.

When I first ran the app on an iPhone 4, I could barely scrub through 4 full-screen pictures per second. But today, check this out (Vimeo link):

iPhone 4 Scrubbing Performance

Yes, that is iPhone 4 blazing through full-screen pictures without any trickery (no GPUImageView yet or anything, just your normal UIImageView). To get here, I had to do three things:

1. Store each photo in 3 different versions:

  • “Full-size” version at 1936×2592 (iPhone 4 camera resolution, the lowest denominator) and compressed to 40% (as in, iOS SDK’s 0.4 on a scale of 0 to 1 – this is really a much better compression than JPEG’s 40% as you will be hard pressed to notice any artifacts whatsoever). Between 150-300KB/photo.
  • “Device-optimized”, i.e w/e your device screen resolution is (so 640×960 for 3.5-inch screens, 640×1136 for 4-inch screens), compressed to 45% of the full-sized photo above. Due to double compression, 45% (0.45) was as low as I could go. About 75-100KB/photo.
  • “Low quality”, or the same resolution as device-optimized but at 25% compression level. This is about 40-50KB/photo. I can probably compress this further, but right now 25% works.

2. Use NSOperationQueues when loading the photos. I normally use GCD, but queues are perfect here. I start loading a device-optimized version for photo #1, but if photo #2 is requested while the device-optimized photo-loading queue still has an unfinished operation in it, I cancel that operation and start a low-quality photo-loading operation instead. At the same time, I throw a new NSOperation to load a device-optimized version (this time for photo #2), but with the low-quality photo operation as the dependency and with the same completion block. So as soon as the low-quality photo is loaded, the device-optimized version begins to load and then replaces the low-quality version. Or, the operation loading device-optimized version is cancelled once again if the user jumps to the next photo while all this is going on.

This is really just for devices like iPhone 4 and, to a lesser degree, iPod Touch 5G and iPhone 4S, and even there it happens so fast that the longest a user sees the low-quality version is maybe for 1/10th of a second – they don’t even get to notice the heavy compression on the low-quality photo. But they do get to notice the general shape/features, which is what the app is about, so for us it’s important to show _something_ instead of waiting for the device-optimized version.

3. The piece I added today: pre-caching. I remember reading @mattt’s article on NSCache some months ago where he talked about the mythical totalCostLimit property of NSCache. Mythical it is indeed, but Apple does use the size of an image in bytes as an example of what the property can mean. In this app, I have 3 NSCache instances, one for each of the photo versions I listed above and each with a different totalCostLimit. The NSCache instance for device-optimized photos has a totalCostLimit of 5242880, for example, which is 5MB written out as number of bytes. So when the app loads, I launch a background task to pre-cache as many device-optimized photos as I can before I hit that limit. With current average photo sizes, that’s about 50-60 pictures. I take note of all the photos I wasn’t able to cache, and then run another algorithm to cache low-quality versions of those photos (2MB limit for that cache), which is another 30-40 pictures or so. iPhone 4 manages to read and cache at least 15-20 photos per second, so by the time an average user tries scrubbing through their photos, chances are most of those photos will be already cached in either full-quality or low-quality.

The power of the last item is what the video above really demonstrates – between the time the app loaded and I started scrubbing through the photos, all of the photos were already cached. That’s what 76/76 number means in the top-right “debug” area – there were 76 total requests from the scrubber to show a photo, and for all 76 of those the app was able to show a device-optimized photo. If it wasn’t able to keep up, the third column (empty black area) would show the number of low-quality photos it would have had to fetch and show temporarily before catching up and replacing them with device-optimized versions (which was the case before today). Success!

Outside of the general performance tricks that I still need to work on, such as minimizing alpha-blending, there’s another caching-related task I want to do before I can move on with a peace of mind – forward-caching. For example, if we’re looking at photo #1 and start scrubbing to the right, going to #2, etc., and photos 2 through 9 are already cached, I need to start caching #10, 11, and so forth. When they stop on a photo, I need to cache 5 photos or whatever on each side. Sure, they’ll still catch up with me eventually if they scrub fast enough, but it will be a smooth experience for them until then. And if they slow down for just a moment, I’ll again be X cached photos ahead of them, which for normal scrubbing speed means “infinite” smooth scrubbing experience.

Forward-caching isn’t essential for version 1.0, though – I’m content to ship with what we have.