This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.
[In this Intel-sponsored feature, part of Gamasutra's Visual Computing section, Muthyalampalli examines how you can use the fast Fourier transform (FFT) for image processing.]
The discrete Fourier transform (DFT) is a specific form of Fourier analysis to convert one function (often in the time or spatial domains) into another (frequency domain). DFT is widely employed in signal processing and related fields to analyze frequencies contained in a sample signal, solve partial differential equations, and perform other operations, such as convolutions.
The fast Fourier transform (FFT) is an efficient implementation of DFT and is used, apart from other fields, in digital image processing. FFT is applied to convert an image from the image (spatial) domain to the frequency domain. Applying filters to images in the frequency domain is computationally faster than to do the same in the image domain.
This article will not go into the theory of FFT but will address the implementation of the algorithm in converting a 2D image to the frequency domain and back to the image domain (inverse FFT). Once the image is transformed into the frequency domain, filters can be applied to the image by convolutions. FFT turns the complicated convolution operations into simple multiplications. An inverse transform is then applied in the frequency domain to get the result of the convolution.
The sample application was developed in DirectX 10 and demonstrates the forward and inverse transforms of the image to the frequency domain and back. Applying the filters is straightforward once the transform takes place and hence is not discussed here.
The application uses ping-pong textures, a common technique used in many GPGPU (general-purpose computing on graphics processing units) applications. Ping-pong textures involve a pair of texture surfaces that a shader uses both as input and output data. The shader program uses one texture as input to do some computation and writes the output to the second texture. Subsequent iterations will swap the input and output textures (thus the input from a previous iteration becomes the output in the current iteration and so on).
The Fourier transform decomposes an image into its real and imaginary components, which is a representation of the image in the frequency domain. If the input signal is an image, the number of frequencies in the frequency domain is equal to the number of pixels in the image or spatial domain. The inverse transform re-transforms the frequencies to the image in the spatial domain. The FFT and its inverse of a 2D image are given by the following equations.
where f(m,n) is the pixel at coordinates (m,n), F(x,y) is the value of the image in the frequency domain corresponding to the coordinates x and y, and M and N are the dimensions of the image.
As the equations show, a naive implementation of this algorithm is very expensive. But the beauty of FFT is that it is separable; namely, the 2D transform can be split into two 1D transforms-one in the horizontal direction and the other in the vertical direction. The equation below shows the 1D transform in the horizontal direction.2 The end result is equivalent to performing the 2D transform in the frequency space.
The FFT that is implemented in the application here requires that the dimensions of the image be a power of two. Another interesting property of FFT is that the transform of N points can be rewritten as the sum of two N/2 transforms (divide and conquer).3 This is important because some of the computations can be reused, which eliminates expensive operations.
The output of the Fourier transform is a complex number and has a much greater range than the image in the spatial domain. Therefore to accurately represent these values, they are stored as fl oats. Furthermore, the dynamic range of the Fourier coefficients are too large to be displayed on the screen, and hence these values are scaled [usually by dividing by the (Width × Height) of the image] to bring them within the range of values that can be displayed.4
The next section describes the implementation details of the FFT algorithm and its inverse in a GPGPU application.
2 Sumanaweera, Thilaka, and Donald Liu, "Medical Image Reconstruction with the FFT." Chapter 48 in GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation, Matt Pharr, ed., Addison Wesley, 2005.
3 Cooley, J. W., and Tukey, O. W., "An Algorithm for the Machine Calculation of Complex Fourier Series." Math. Comput. 19 (1965): 297-301.
4 Mitchell, Jason L., Marwan Y. Ansari, and Evan Hart, "Advanced Image Processing with DirectX® 9 Pixel Shaders." Section 4 in ShaderX2: Shader Programming Tips and Tricks with DirectX 9, Wolfgang F. Engel, ed., Plano, TX: Wordware Publishing, 2003.