Multimedia 101. Feel the Source
MULTIMEDIA
Multimedia 101. Feel the Source
2015-11-18
By
David "DeMO" Martínez Oliveira

Multimedia is a tricky topic. If you ever had tried to do something on this field, you already know that. You have to manage a lot of information coming from different sources. Then you have to process all that data and send it out to something that allows, us humans, to perceive it. And all this has to be done very quickly. Actually, in a sense, multimedia systems are real-time systems with strong timing constraints to fulfill.
So, let's start exploring the MultiMedia Universe. We will use gstreamer as our vehicular tool to get into the topic and to answer the questions that will arise during our exciting journey. But, before we can start this journey, we need to know a little bit about the problem and the tool we are going to use.

A BASIC MULTIMEDIA SYSTEM

As previously outlined, a multimedia system is composed of three main elements:

  • The data source. This is some kind of transducer able to convert some physical magnitude into numbers that a computer can process. For the sake of simplicity, our data source element will be composed of the transducer plus the hardware required to get that data into the computer. Examples of data sources are USB cameras, or the system composed by a microphone and a sound card.
  • The processing. Usually we capture media data to do something with it. Let's call that little something we do with our data Processing. This is a very broad concept running from just encoding the data from our sources for its storage or transmission, to actually extract information from it for a artificial intelligence application.
  • The data sink. The processing element basically performs operation on numbers and produces other numbers. We, humans, cannot make much sense of a very long sequence of number so we need some other element to convert back those numbers into a physical magnitude that our sensory system can make sense of.

The diagram below summarises graphically the concepts described above.

Figure 1. Basic Multimedia System Block Diagram

GSTREAMER

gstreamer is an open source powerful framework for developing multimedia applications. It provides most of the tools you may ever need. That sounds like building a multimedia application using gstreamer would be easy. Actually it is, as far as you do not have special requirements.

We will not be writing code yet, instead, we will use the gst-launch-1.0 tool that will allow us to test gstreamer components from the command line in a very easy way.

Let's see a simple example to show in a window the image captured by an attached camera:

gst-launch-1.0 v4l2src ! video/x-raw,width=320, height=240 ! fpsdisplaysink sync=false

What does that command-line do?. Let's see:

The first thing you have to know is that the different elements you want to use are linked using the ! character. In gstreamer jargon, the concatenation of multiple elements using the ! character, is called a pipeline.

The first element in our example pipeline is v4l2src. As you can imaging this is the component that allows us to interface to v4l2 devices as, for instance, a camera. This is called a source element as it feeds data into the pipeline but it does not get data from other element.

The last element is a sink element, something that consumes data, but does not outputs data to other component. In this case, the fpsdisplaysink is a component that can show video data in a window and overlay information about the framerate of the video stream being rendered.

The element in between is a so-called capsfilter. In gstreamer world, a capsfilter is a component that forces the format of the data streams between two components. In the example above we are forcing the camera and the output video window to a resolution of 320x240.

Finally, some components can accept parameters. In our example, the fpsdisplaysink receives a parameter sync=false. For this specific case, this tells the component not to try to synchronize the streams but display the images as fast as possible. For benchmarking or just measuring speed this parameter provides an upper bound. Another parameter that may be of interest for you is the parameter device=/dev/videoX for the v4lsrc component that allows you to use different video devices, for instance, when you have more than one camera attached to a computer.

A SMALL EXPERIMENT

Now that we know the very basics of how to work with gstreamer, We are going to run a small experiment. We will start increasing the resolution of our video stream and see what happens. Depending on the type of camera you are using, you may need to use different resolution values. Depending on the interface your camera is attached to, you may get different fps.

So, lets start increasing the parameters:

gst-launch-1.0 v4l2src ! video/x-raw,width=320, height=240 ! fpsdisplaysink sync=false

If you are using a UVC camera (which is the most like situation), you can find out the list of supported resolutions and frame rates with command:
uvcdynctrl -f

ResolutionFrames per second
320x240 30 fps
640x480 30 fps
1280x720 10 fps

So, what is going on?... my camera can even do Full HD, this program you are using does not work man!

SIZE MATTERS

Let's do some quick calculations to see how much data is required by a frame for each of the different resolutions we've tried:

ResolutionFrame (bytes)Frame (bits)Frame (Mbits)
320x240x3 bytes/pixel 230400 bytes 1843200 bits 1.76 Mb
640x480x3 bytes/pixel 921600 bytes 7372800 bits 7 Mb
1280x720x3 bytes/pixel 2764800 bytes 22118400 bits 21Mb

According to Wikipedia, a USB 2.0 high speed bus has a theoretical bandwidth of 480 Mbps, but, in practice, the throughput of the bus is around 280 Mbps. With this figures at hand, let's see how many frames we can fit in a USB 2.0 bus for each of the tested resolution:

ResoulutionBandwidth RatioFPS
320x240 280 / 1.76 159 fps
640x480 280 / 7 40 fps
1280x720 280 / 21 13.3 fps

As can be seen, there is no problem on accommodating the lower resolutions to work with some of the standard frame rates (25, 30 fps). In those cases, it is the hardware who is limiting the bandwidth. The camera itself is not physically capturing more than 30 frames per second, and it does not have any issue on sending those 30 fps through a USB 2.0 bus.

However, for the HD version (720) we see that the theoretical frame rate is 13 fps which roughly matches our empirical measurement of 10 fps. In this case, it is the USB interface the one limiting the frame rate we can get from the camera.

WHAT'S THE POINT OF HAVING A FULL HD CAMERA?

You are maybe asking this question to yourself right now. You may even be puzzled as you had seen your Full HD camera delivering your 1080 video stream to some applications. You will probably be thinking... eh!, this guy is wrong, I had seen it working!.

So the HD and Full HD USB cameras are actually doing a nifty trick to enable the capture of those big images. As the capacity of the bus is fixed and cannot be increased in anyway, the only option left is to reduce the data coming from the camera. And that is what actually happens.

Those cameras have special hardware to compress the capture frame intro a JPEG image and that is what you get from the camera when capturing HD or Full HD at 25 or 30 frames per second. This can be easily demonstrated using gstreamer, and slightly changing our capsfilter to force the camera to deliver JPEG frames. The command we have to use looks like this:

gst-launch-1.0 v4l2src ! image/jpeg,width=1280,height=720 ! jpegdec ! fpsdisplaysink sync=false
Now you get steady 30 frames per second HD/Full HD video stream from your camera. However this has a price.

Did you notice the additional jpegdec element on the pipeline?. Sure you did. Basically, now we have to decode the jpeg frame from the camera before being able to render it in a window. This has a computational cost. For a normal dekstop or laptop it is so small that it seems that JPEG decoding is for free, but when you try this in a small computer, embedded platform or smartphone you will realize that decoding a JPEG is not that such a light task.

Actually, when we need to process HD or Full HD video in modest hardware we really need to have some hardware acceleration support to make this extra processing free for the CPU. Of course you can decode the jpeg on the CPU, and you will still have room to do more things, but capturing your images should be as much cpu-free as possible. After all, what you want to do is something with the images not just show them in a display.

CONCLUSIONS

Well, this is it for now. I hope you had got a first feeling on gstreamer and how it works and you better understand the amount of data our devices are manipulating nowadays and how this data may, sometimes, have troubles going through our devices interfaces.

Header Image Credits: Skitterphoto

 
Tu publicidad aquí :)