Working with COM and DirectShow

I’m doing some work with webcams just now, and in the pursuit of efficiency, I am turning to DirectShow — that is, the part of Microsoft’s DirectX which handles things like videos and webcams. Surprisingly, I have hardly used DirectX at all before, except briefly in 1st year as an undergrad, so it’s fairly new ground for me. However, I quickly found that it really is well worth spending time getting familiar with the technology, as it can be amazingly useful.

Getting used to COM

The Component Object Model (COM) has a bit of a bad reputation for being complex, so having not really touched COM or DirectX significantly before, I was a little intimidated at first. However, after a couple of hours reading through the DX9 SDK documentation, it’s really not that bad! (Incidentally, I am using DX9 because I do not trust that my current computer would be up to DX10.)

OK, so the fact that you have pointers-to-pointers can be a little off-putting, as can the way in which COM seems to circumvent or severely ‘wrap’ lots of basic C++ conventions like memory allocation, but once you understand some of the principles, you are in good stead to tackle a wide range of DirectX tasks. COM is apparently also used elsewhere in Windows programming, although I have not encountered it anywhere else.

Objects and Interfaces

The basic principle of COM is that you have various kinds of objects, and each object has various interfaces to it. Each interface provides a well-defined set of functions for manipulating the object, and the same interface can be implemented by any number of different objects. For example, let’s imagine you have a “Person” object and a “Dog” object, and you want to control the way they walk. The COM wouldn’t let you control that directly, but it might provide a movement interface, such as “IMove”.

You would simply request an IMove interface for your Person, and use it to make the person walk, e.g. using “IMove::WalkForwards()”. You could do the same for the Dog. As far as your code is concerned, it is the same interface, but it has been implemented differently depending on whether you are using it on a Person or a Dog. For some programmers, the confusing thing is that the object doesn’t implement the interface in the way you might expect if you use Java — rather, interfaces are effectively implemented for specific objects.

If you are familiar with the Model-View-Controller design pattern, then think of the objects as the models, and the interfaces as the controllers.

It is also worth noting that there are two main ways of actually creating objects with COM. Either you can directly create an object, specifying what kind of interface you want for it initially, or you can use an indirect method, which usually means calling a helper function which does extra configuration, and gives you a pre-determined interface type. Either way, you can request further interface types after the object is created.

Filter Graphs in DirectShow

After getting up to speed on COM stuff, I looked through some of the DirectShow documentation. There is an awful lot to take in, but the basic principle breaks down into “filters”. A filter is something which somehow operates on audio and/or video data, and you connect a series of filters together (into a tree-like graph structure) to achieve your goal. Many of the complexities are handled by the filter graph manager, which DirectX provides, and for common tasks it can even setup the graph entirely on its own.

Some simple examples of filters are given in the documentation. For example, you might start with a source filter to read data from an AVI file (Audio/Video Interleave). You then have a splitter filter to separate the audio and video parts into different parts of the graph. You then need an audio decoding or decompression filter or such to output the audio, and something similar for the video feed. Finally, you need a video render filter. Everything is tied and synchronised together by the filter graph manager… you just need to sit back and let it roll.

Playing a Video

I went through an example in the documentation of using DirectShow to play a video, and it found it remarkably straightforward after having spent some time familiarising myself with everything above. (I think a common problem, particularly among novices, is the tendency to jump straight to source code and get bogged down by not understanding why certain things work in certain ways.)

The results came together remarkably easily, in the form of a console application which sets up DirectShow, and then plays a video from a file in a small window. You can see the inset screenshot to the right, and you can view the source code. It was developed in MS Visual Studio 2003 (yes, I still use that… it’s ‘clean’ by comparison to the modern versions!).

Two things you need to note: in order to compile it, you will need the link in with a couple of DirectShow libraries, “Strmiids.lib” and “Quartz.lib”. Also, remember to put in the path/filename of a video file (preferably AVI or WMV) in at line 66.

Leave a Reply

Your email address will not be published. Required fields are marked *