View on GitHub

# Memory efficiency when using Ruby

Written by on 14 January 2010

I have been spending some time creating a pure Ruby PNG library. For this library, I need to have some representation of the image, which is composed of RGB pixels, supporting an alpha channel. Because images can be composed of a lot of pixels, I want the implementation to be as memory efficient as possible. I also would like decent performance.

A very naive Ruby implementation for an image represents the red, green, blue and alpha channel using a floating point number between 0.0 and 1.0, and might look something like this:

class Pixel

def initialize(r, g, b, a = 1.0)
@r, @g, @b, @a = r, g, b, a
end
end

class Image

def initialize(width, height)
@width, @height = width, height
@pixels = Array.new(width * height)
end

def [](x,y)
@pixels[y * width + x]
end

def []=(x,y, pixel)
@pixels[y * width + x] = pixel
end
end

For a 10×10 image, this representation requires 4 times 100 floating point numbers, which require 8 bytes each. That’s already over 3kB for such a small image just for the floating point numbers! Ouch.

A simple improvement is to decide that 8-bit color depth is enough in the case, in which case each channel can be represented by an integer between 0 and 255. Storing such a number only costs one byte of memory. Ruby’s Fixnum class typically uses 4-byte integers. If only the 4 channels of one byte each could be combined into a single Fixnum instance… Behold!

class Pixel
alias :to_i :value

def initialize(value)
@value = value
end

def self.rgba(r, g, b, a = 255)
self.new(r << 24 | g << 16 | b << 8 | a)
end

def r; (@value & 0xff000000) >> 24; end
def g; (@value & 0x00ff0000) >> 16; end
def b; (@value & 0x0000ff00) >>  8; end
def a; (@value & 0x000000ff); end
end

Notice the bit operations, which are extremely fast. This only requires 100 times 4 bytes = 400 bytes for storing the RGBA values for a 10×10 image, an 8 times improvement!

This implementation wraps every pixel inside an object. This is nice, because I want to access the separate channels of every pixel easily using the r, g, b, and a methods, and every other method that is defined for every pixel. However, a Ruby object instance has an overhead of at least 20 bytes. That’s 20 times 100 is about 2kB for our 10×10 image!

To get rid of the object overhead, it is possible to simply store the Fixnum value for every pixel, and only wrapping it inside a Pixel object when it is accessed. This can be done by modifying the Image class:

class Image
# ...

def [](x,y)
Pixel.new(@pixels[y * width + x]) # wrap
end

def []=(x,y, pixel)
@pixels[y * width + x] = pixel.to_i # unwrap
end
end

As you can see, some simply changes in the representation can really make a difference in the memory usage. Can this representation be improved further?

## Integer math calculations

Because we are now using integers to represent a pixel, this can cause problems when the math requires you to use floating point numbers. For example, the formula for alpha composition of two pixels is as follows:

$C_o = C_a \alpha_a + C_b \alpha_b (1 - \alpha_a)$

in which $$C_a$$ is the color component of the foreground pixel, $$\alpha_a$$ the alpha channel of the foreground pixel, $$C_b$$ and $$\alpha_b$$ the same values for the background pixel, all of which should be values between 0 and 1.

A naive implementation could convert the integer numbers to their floating point equivalents:

def compose(fg, bg)
return bg if fg.a == 0
return fg if fg.a == 255

fg_alpha = fg.a / 255.0
bg_alpha = fg.a / 255.0
alpha_complement = (1.0 - fg_alpha) * bg_alpha

new_r = (fg_alpha * fg.r + alpha_complement * bg.r).round
new_g = (fg_alpha * fg.g + alpha_complement * bg.g).round
new_b = (fg_alpha * fg.b + alpha_complement * bg.b).round
new_a = ((fg_alpha + alpha_complement) * 255).round

Pixel.rgba(new_r, new_g, new_b, new_a)
end

This implementation is already a little bit optimized: no unnecessary conversions and calculations are being performed. However, this composition can be done a lot quicker after realizing that 255 is almost a power of two, in which computers excel because it can use bitwise operators and shifting for some calculations.

My new approach uses a quicker implementation of multiplication of 8-bit integers that represent floating numbers between 0 and 1:

def compose(fg, bg)
return bg if fg.a == 0
return fg if fg.a == 255

alpha_complement = multiply(255 - fg.a, bg.a)
new_r = multiply(fg.a, fg.r) + multiply(alpha_complement, bg.r)
new_g = multiply(fg.a, fg.g) + multiply(alpha_complement, bg.g)
new_b = multiply(fg.a, fg.b) + multiply(alpha_complement, bg.b)
new_a = fg.a + alpha_complement

Pixel.rgba(new_r, new_g, new_b, new_a)
end

# Quicker alternative for (a * b / 255.0).round
def multiply(a, b)
t = a * b + 0x80
((t >> 8) + t) >> 8
end

Note that the new implementation is less precise in theory, but this precision is lost anyway because we have to convert the values back to 8 bit RGBA values. Your thoughts?