


The Art of Interface 
Article 10Fast Fourier transform — FFTCategory. Digital signal processing (DSP) software development. Abstract. The article is a practical tutorial for fast Fourier transform — FFT — understanding and implementation. Article contains theory, C++ source code and programming instructions. Popular CooleyTukey technique is considered. References. Case study: ECG processing — Rpeaks detection and 4D ultrasound volume image processing.
1. Introduction to fast Fourier transformFast Fourier transform — FFT — is speedup technique for calculating discrete Fourier transform — DFT, which in turn is discrete version of continuous Fourier transform, which indeed is origin for all its versions. So, historically continuous form of the transform was discovered, then discrete form was created for sampled signals and then algorithm for fast calculation of discrete version was invented. 2. Understanding FFTFirst of all let us have a look at what Fourier transform is. Fourier transform is an integral of the form:
The transform operates in complex domain. Recall, that imaginary exponent could be written as:
For sampled function continuous transform (1) turns into discrete one:
Expression (3) is discrete Fourier transform — DFT. Here {f_{0}, f_{1}, ... , f_{N−1}} is input discrete function and {F_{0}, F_{1}, ... , F_{N−1}} is result of Fourier transform. It is easily could be seen that to program DFT it is enough to write double loop and just calculate sums of products of input samples and imaginary exponents. The number of operations required is obviously of O(N^{2}) order. But due to transform properties it is possible to reduce the number of operations to the order of O(Nlog_{2}N). Now, let us explore tricks we can use to speedup calculations. Let us put N=8 and write down our DFT:
Easily could be seen we can split the sum into two similar sums separating odd and even terms and factoring out the latter sum:
Now we can split the sums in brackets again:
Thus, we have 3 — log_{2}8 — levels of summation. The deepest one in parenthesis, the middle one in brackets and the outer one. For every level exponent multiplier for the second term is the same.
And now the most important observation one can make to get speedup: periodicity of the exponent multipliers.
For the exponent multiplier in parenthesis period is n=2, which means sums in parenthesis are exactly the same for n=0, 2, 4, 6 and for n=1, 3, 5, 7. It means on deepest level in parenthesis we need 4×2=8 — number of sums times period — sums in total. And note, since n=1, 3, 5, 7 corresponds to the half of the period π, exponent multiplier is the same as for n=0, 2, 4, 6 but with the opposite sign.
Indeed they are 1 for n=0, 2, 4, 6 and −1 for n=1, 3, 5, 7. For the exponent multiplier in brackets period is n=4, which means we have the same sums for pairs n=0, 4; n=1, 5; n=2, 6 and n=3, 7. It means on the middle level in brackets we have 2×4=8 sums and the second half of them could be received again by changing sign in first half of them — due to the fact distance between n and n+2 is π. Thus, for n=0, 4 the factor is 1 and for n=2, 6 it is −1; for n=1, 5 it equals −i and for n=3, 7 it is i. On the outer level we have just one sum for every transform component, and the period of the exponent multiplier is 8. Which gives us 1×8=8 sums and second half of them could be received by changing sign in the first half. So, on every calculation level we have 8 sums. In terms of N it means we have log_{2}N levels and N sums on every level, which gives us O(Nlog_{2}N) order of number of operations. On the other hand having the constant number of sums on every level means we can process data inplace. In summary, we have got fast Fourier transform — FFT. Now it is time to develop stepbystep instruction list to be carved in code. 3. FFT algorithmHaving understanding of what features of DFT we are going to exploit to speedup calculation we can write down the following algorithm:
First step of reordering is putting input data into their natural order in (7): {f_{0}, f_{1}, f_{2}, f_{3}, f_{4}, f_{5}, f_{6}, f_{7}}→{f_{0}, f_{4}, f_{2}, f_{6}, f_{1}, f_{5}, f_{3}, f_{7}}. Since this new order was received as result of successive splitting terms into even and odd ones, in binary domain it looks like ordering based on bit greatness starting from the least significant bit — see table 1.
This leads to “mirrored arithmetics” — result binary column in the mirror looks like {0, 1, 2, 3, 4, 5, 6, 7} — our initial order indeed. Thus, to reorder input elements it is enough just to count in mirrored manner. Since double mirroring gives again the same number, reordering reduces to swaps. Summation levels include our parenthesis, brackets and outer level. In general case this leads to iterations on pairs, quadruples, octads and so on. Further, we iterate on components of halfperiod, second halfperiod we are getting as result of taking differences instead of sums for the first half. Thus, for the deepest level of parenthesis period is 2 and halfperiod is 1, which means this cycle will be executed only once. For the second level period is 4 and halfperiod is 2 and cycle will be executed 2 times. In general case we have succession 1, 2, 4, 8, ... for this cycle. Factor calculation is calculation of our imaginary exponent. To restrict the number of trigonometric function calls (2) we use the recurrence:
Which is indeed expression
written in a tricky way not to lose significance for small δ — for this case cosδ≈1 and sinδ≈δ, which tells us that for cosδ memory will be just packed with .999999(9) but for sinδ there will be much more useful information. Thus, (10) is just the way to eliminate cosδ from calculations. If you look back at (7) you will find, that for N=8 δ=π/2, π/4 — not a small numbers. But for transforms of much bigger N δ=π/2, π/4, π/8, ... up to 2π/N, for sure, could be very small. The innermost loop looks for sums, where calculated imaginary exponent present, calculates product and takes sum and difference, which is the sum for the second halfperiod at π distance, where our exponent changes its sign but not the absolute value according to (9). To perform inplace processing we utilize the following scheme: Fig. 1. FFT butterfly.This operation is elementary brick of inplace FFT calculation and usually is called butterfly. The bottom term is multiplied by imaginary exponent and then sum of the terms is stored in place of the upper term and difference is stored in place of the bottom term. General butterfly picture is depicted below — fig. 2. Fig. 2. All FFT butterflies for N=8.Elegant scheme. It is time to engrave this beauty in code. But before we delve into programming let us make a small digression: it is known thing that Fourier transform is a transfer to frequency domain, so, let us see how to be back from the realm of frequencies. 4. Inverse Fourier transformExpression for inverse Fourier transform is
and its discrete counterpart is
Thus, the difference between forward (3) and inverse (13) transforms is just a sign and not necessary scale factor — one does not need it if interested in ratio between components but not in absolute values. It means that the routine for forward transform with slight modification can perform inverse one as well. 5. FFT programmingHere is our FFT class declaration.
Class has four public methods for performing FFT: two functions for forward transform and two ones for inverse transform. Every couple consists of inplace version and version that preserves input data and outputs transform result into provided array. Protected section of the class has as well four functions: two functions for data preprocessing — putting them into convenient order, core function for transform performing and auxiliary function for scaling the result of inverse transform. Every of four public methods is very similar and is indeed wrapper that controls processing workflow. Let us see how one of them designed.
Inside wrapper you can find check of input parameters, then data preparation — rearrangement, — and transform itself.
While loop implements addition of 1 in mirrored manner to get target position for the element. Now there is turn of the core method, which performs our fast Fourier transform.
The code is exact reflection of our FFT algorithm and butterfly scheme in fig. 2. Difference between forward and inverse transforms is reflected in the first line of the method where the proper sign for exponent argument is set. Initializations inside outer loop are just preparations for successive calculation of factors via trigonometric recurrence. And the job done inside inner loop, which performs butterfly operations. Trigonometric recurrence in the last line is exactly our expression (10). Wrappers for inverse transform are designed the same way as for forward one:
The only difference is conditional scaling of the result at postprocessing stage. By default the scaling is performed according to (13) but if one interested only in relative values she can drop the corresponding flag to skip this operation. Scaling itself is a primitive function below.
Well, our FFT is ready. You can download full source code here:
Full file listings are available online as well:
Optimized for high performance source code of complex number class you can find here:
6. How to useTo utilize the FFT you should include header file fft.h and place in your code lines like:
Here inplace forward Fourier transform performed for signal of 1024 samples size. 7. Final remarksThe FFT algorithm implemented in literature is called CooleyTukey. There is also SandTukey algorithm that rearranges data after performing butterflies and in its case butterflies look like ours in fig. 2 but mirrored to the right so that big butterflies come first and small ones do last. From all our considerations follows that length of input data for our algorithm should be power of 2. In the case length of the input data is not power of 2 it is good idea to extend the data size to the nearest power of 2 and pad additional space with zeroes or input signal itself in a periodic manner — just copy necessary part of the signal from its beginning. Padding with zeroes usually works well. If you were attentive you could notice that butterflies in parenthesis and brackets in (7) do not really need multiplications but only additions and subtractions. So, optimizing two deepest levels of butterflies we can even improve FFT performance.


