Understanding it fast:

- very good slides (in German) about DFT by Klaus Toennies with some examples of image → (amplitude/magnitude, phase) spectrum

Briefly, the MAGNITUDE tells “how much” of a certain frequency component is present and the PHASE tells “where” the frequency component is in the image

There is a `DFT`

sample in the OpenCV samples folder (see `dft.cpp`

), but unfortunately no `DCT`

sample :(

The reference docu about OpenCV's dct() function tells you how to call the method, but there are no details how to prepare the input image matrix :(

Here is an important note about the matrix type needed for the dct() function (`CV_64F`

) which helped me a lot.

It took me an hour, but now I figured out how to do the DCT using OpenCV.

There are two important points:

- image dimensions have to be a multiple of 2
- image input matrix that goes into
`dct()`

has to be of type`float`

Here is the code:

// Read image Mat img = imread("stanzi_and_me_bahamas.png", CV_LOAD_IMAGE_GRAYSCALE); // Make sure the both image dimensions are a multiple of 2 Mat img2; int w = img.cols; int h = img.rows; int w2,h2; if (w % 2 == 0) w2 = w; else w2 = w+1; if (h % 2 == 0) h2 = h; else h2 = h+1; copyMakeBorder(img, img2, 0, h2-h, 0, w2-w, IPL_BORDER_REPLICATE); // Grayscale image is 8bits per pixel, // but dct() method wants float values! Mat img3 = Mat( img2.rows, img2.cols, CV_64F); img2.convertTo(img3, CV_64F); imwrite("W:\\tmp\\orig.png", img3); // Let's do the DCT now: image => frequencies Mat freq; dct(img3, freq); // Save a visualization of the DCT coefficients imwrite("W:\\tmp\\dct.png", freq); for (int Start=100; Start>0; Start-=1) { // Set some frequencies to 0 for (int y=Start; y<freq.rows; y++) { for (int x=Start; x<freq.cols; x++) { freq.at<double>(y,x) = 0.0; } } // Do inverse DCT: (some low) frequencies => image Mat dst; idct(freq, dst); // Show frame nr char txt[100]; sprintf(txt, "%04d", Start); cv::putText( dst, txt, Point(10,20), CV_FONT_HERSHEY_SIMPLEX, 0.5, CV_RGB(0,0,0) ); // Save visualization of reconstructed image where //we have thrown away some of the high frequencies char fname[500]; sprintf(fname, "W:\\tmp\\idct_%04d.png", Start); imwrite(fname, dst); }

The original image

will be transformed to a frequency matrix which looks visualized using gray-values like this:

When we iteratively take more and more higher frequencies as input for the `idct()`

function, we can see how we can see more and more details:

The more high frequency cosine (gray value intensity) waves we superimpose the more details of the original image are represented in the inverse (`idct()`

) DCT'ed (based on frequency representation reconstructed) image!

These images were generated with the above code which should show you how to use OpenCV's `dct()`

function.

Note that using only 100×100 = 10000 frequencies values of the original 614×462 = 283668 pixel gray values, i.e. using only 10000/283668 = 0.03 = 3% of the original data size, we get a quite good reconstructed image :)

So let's save only these 100×100 low frequencies. That's compression!

Starting with the original `dft.cpp`

OpenCV example, I made some modifications (computing magnitude spectrum for the image coming from a webcam, showing a zoom-into the center of the magnitude spectrum, …)

Here you can see the result:

You can see at least 3 things:

- as you rotate your object, the magnitude spectrum rotates as well
- shrinking an object → magnitude spectrum seems to zoom (near image big structures become far small structures, i.e. low frequencies are “shifted” to high frequencies). Ok, more or less
- the different geometric objects produce typical magnitude spectrum patterns

And that's the corresponding code of the demo:

cv::VideoCapture videoCapture; // in case a video media is used, its manager is declared here videoCapture.open(1); int counter = 0; while (true) { // Get image from camera Mat camimg; videoCapture >> camimg; // Convert that image to grayscale Mat img; cvtColor(camimg,img,CV_RGB2GRAY); // Prepare grayscal image <img> for DFT int M = getOptimalDFTSize( img.rows ); int N = getOptimalDFTSize( img.cols ); Mat padded; copyMakeBorder(img, padded, 0, M - img.rows, 0, N - img.cols, BORDER_REPLICATE, Scalar::all(0)); // Prepare matrix for DFT result: <complexImg> Mat planes[] = {Mat_<float>(padded), Mat::zeros(padded.size(), CV_32F)}; Mat complexImg; merge(planes, 2, complexImg); // DFT dft(complexImg, complexImg); // Get magnitude spectrum from complex result matrix // Compute log(1 + sqrt(Re(DFT(img))**2 + Im(DFT(img))**2)) split(complexImg, planes); magnitude(planes[0], planes[1], planes[0]); Mat mag = planes[0]; mag += Scalar::all(1); log(mag, mag); // Crop the spectrum, if it has an odd number of rows or columns mag = mag(Rect(0, 0, mag.cols & -2, mag.rows & -2)); // Rearrange the quadrants of Fourier image // so that the origin is at the image center int cx = mag.cols/2; int cy = mag.rows/2; Mat tmp; Mat q0(mag, Rect(0, 0, cx, cy)); Mat q1(mag, Rect(cx, 0, cx, cy)); Mat q2(mag, Rect(0, cy, cx, cy)); Mat q3(mag, Rect(cx, cy, cx, cy)); q0.copyTo(tmp); q3.copyTo(q0); tmp.copyTo(q3); q1.copyTo(tmp); q2.copyTo(q1); tmp.copyTo(q2); // Get some region about the center of the magnitude spectrum int r = 10; Mat roi(mag, Rect(mag.cols/2-r, mag.rows/2-r, 2*r,2*r)); Mat MagPartly; roi.copyTo(MagPartly); // padded is an 8UC1 image with values in the range 0-255 // Make mag a 8UC1 image with values in that range as well // so that we can copy both images to a bigger one normalize(mag, mag, 0, 255, CV_MINMAX); mag.convertTo(mag, CV_8UC1); // Separately normalize mag and MagPartly normalize(MagPartly, MagPartly, 0, 255, CV_MINMAX); MagPartly.convertTo(MagPartly, CV_8UC1); // Zoom MagPartly Mat MagZoomed; resize(MagPartly, MagZoomed, Size(padded.cols,padded.rows)); // Prepare one big output image with: // | grayscale image | magnitude spectrum image | // | empty | zoomed magnitude spectrum image | int w = padded.cols * 2; int h = padded.rows * 2; Mat bigimg(h,w, CV_8UC1); Mat roi1(bigimg, Rect(0,0, padded.cols, padded.rows)); padded.copyTo(roi1); Mat roi2(bigimg, Rect(padded.cols,0, padded.cols, padded.rows)); mag.copyTo(roi2); Mat roi3(bigimg, Rect(padded.cols,padded.rows, padded.cols, padded.rows)); MagZoomed.copyTo(roi3); // Show images imshow("big img", bigimg); // Save images char fname[500]; sprintf_s(fname, "W:\\tmp\\dft%05d.png", counter); imwrite(fname, bigimg); counter++; waitKey(1); }

And here is the corresponding code for a DCT demo:

cv::VideoCapture videoCapture; // in case a video media is used, its manager is declared here videoCapture.open(1); int counter = 0; while (true) { // Get image from camera Mat camimg; videoCapture >> camimg; // Convert that image to grayscale Mat img; cvtColor(camimg,img,CV_RGB2GRAY); // Make sure the both image dimensions are a multiple of 2 Mat img2; int w = img.cols; int h = img.rows; int w2,h2; if (w % 2 == 0) w2 = w; else w2 = w+1; if (h % 2 == 0) h2 = h; else h2 = h+1; copyMakeBorder(img, img2, 0, h2-h, 0, w2-w, IPL_BORDER_REPLICATE); // Grayscale image is 8bits per pixel, // but dct() method wants float values! Mat img3 = Mat( img2.rows, img2.cols, CV_64F); img2.convertTo(img3, CV_64F); // Let's do the DCT now: image => frequencies Mat freq; dct(img3, freq); freq += Scalar::all(1); log(freq, freq); // Get some region starting at the origin of the DCT spectrum int r = 25; Mat roi(freq, Rect(0,0,2*r,2*r)); Mat FreqPartly; roi.copyTo(FreqPartly); normalize(freq, freq, 0, 255, CV_MINMAX); freq.convertTo(freq, CV_8UC1); normalize(FreqPartly, FreqPartly, 0, 255, CV_MINMAX); FreqPartly.convertTo(FreqPartly, CV_8UC1); // Zoom FreqPartly Mat FreqZoomed; resize(FreqPartly, FreqZoomed, Size(img2.cols,img2.rows)); // Prepare one big output image int width_bigimg = img2.cols * 2; int height_bigimg = img2.rows * 2; Mat bigimg(height_bigimg,width_bigimg, CV_8UC1); Mat roi1(bigimg, Rect(0,0, img2.cols, img2.rows)); img2.copyTo(roi1); Mat roi2(bigimg, Rect(img2.cols,0, img2.cols, img2.rows)); freq.copyTo(roi2); Mat roi3(bigimg, Rect(img2.cols,img2.rows, img2.cols, img2.rows)); FreqZoomed.copyTo(roi3); // Show images imshow("big img", bigimg); // Save images char fname[500]; sprintf_s(fname, "W:\\tmp\\dct%05d.png", counter); imwrite(fname, bigimg); counter++; waitKey(1); }

Here I found some good answers regarding this question:

The Discrete Fourier Transform (DFT) and Discrete Cosine Transform (DCT) perform similar functions: they both decompose a finite-length discrete-time vector into a sum of scaled-and-shifted basis functions. The difference between the two is the type of basis function used by each transform; the DFT uses a set of harmonically-related complex exponential functions, while the DCT uses only (real-valued) cosine functions.The DFT is widely used for general spectral analysis applications that find their way into a range of fields. It is also used as a building block for techniques that take advantage of properties of signals' frequency-domain representation, such as the overlap-save and overlap-add fast convolution algorithms.

The DCT is frequently used in lossy data compression applications, such as the JPEG image format. The property of the DCT that makes it quite suitable for compression is its high degree of “spectral compaction;” at a qualitative level, a signal's DCT representation tends to have more of its energy concentrated in a small number of coefficients when compared to other transforms like the DFT. This is desirable for a compression algorithm; if you can approximately represent the original (time- or spatial-domain) signal using a relatively small set of DCT coefficients, then you can reduce your data storage requirement by only storing the DCT outputs that contain significant amounts of energy.

In particular, it is well known that any discontinuities in a function reduce the rate of convergence of the Fourier series…the smoother the function is, the fewer terms in its DFT or DCT are required to represent it accurately, and the more it can be compressed… However, the implicit periodicity of the DFT means that discontinuities usually occur at the boundaries… In contrast, a DCT where both boundaries are even always yields a continuous extension at the boundaries. This is why DCTs…generally perform better for signal compression than DFTs and DSTs. In practice, a type-II DCT is usually preferred for such applications, in part for reasons of computational convenience.

Cosine transforms are nothing more than shortcuts for computing the Fourier transform of a sequence with special symmetry (e.g. if the sequence represents samples from an even function).

The difference between a Discrete Fourier Transform and a Discrete Cosine transformation is that the DCT uses only real numbers, while a Fourier transform can use complex numbers. The most common use of a DCT is compression. It is equivalent to a FFT of twice the length.

From Wikipedia:In particular, a DCT is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using only real numbers. DCTs are equivalent to DFTs of roughly twice the length, operating on real data with even symmetry (since the Fourier transform of a real and even function is real and even), where in some variants the input and/or output data are shifted by half a sample. There are eight standard DCT variants, of which four are common.

Differences:

- DFT: gives you complex numbers
- DCT: gives you only real numbers
- DCT: better for compression applications than DFT

public/dft_dct_dst.txt · Last modified: 2012/02/22 01:08 (external edit) · []