Computer video concepts can be very confusing, and streaming adds more complexity, live streaming possibly more so. This is largely because there are competing proprietary methods for encoding, storing and transporting video. I'll try to break it down for you as it is in August 2011.
First, it should be noted that when we talk about video we usually mean video and audio together. Video and audio are different forms of information, but they have to be delivered together for the expected experience. The overall conceptual breakdown:
- Video source: Most likely a USB webcam for a small web show broadcast
- Audio source: Often integrated with the webcam, but could be a discrete microphone
- Encoder: For our small live web show this is a laptop or PC that the webcam and microphone plug into. On professional setups this could be a piece of professional hardware as part of a camera system or a standalone device. The encoder will encode the audio and video streams and multiplex them into a container format. More on these terms later, but the data stream that leaves the encoder will be readable by viewer clients.
- Media server: The encoder sends the video/audio data stream to the media server to be delivered to all the viewer clients. It's possible the encoder machine could also be the media server, but it's more flexible and scalable if the media server is a separate server box.
- Client: The viewers are looking at their PCs, Macs, tablets or phones which are pulling the video (and audio) from the media server.
As the audio and video moves through the above progression, there are several properties that transform it, store it or transport it:
- Video codec: Codec is "coder/decoder". It is the algorithm or language, if you will, that turns what the camera sees into computer information. At its very basic, video is a series of still photographs played after one another, but that would take up a prohibitive amount of disk space and network bandwidth to store and transmit, so there are various competing and evolving codecs to encode quality video in a small data space. The encoder you choose and configure will determine the video codec, and the media server you choose may only support certain codecs even though the media server itself doesn't need to encode or decode the video. (Examples: h.264, VP8, VC-1, Theora)
- Audio codec: Same concept as video, but it applies to audio instead. Your choices in components and software may dictate which audio codec you can use. (Examples: MP3, AAC, WMA)
- MUX or multiplex: When playing video (with audio), you are playing two different types of information at the same time: audio and video. But you are playing only one data file or one live data stream. How do both audio and video get delivered in one data stream? Multiplexing. In brief, each elemental data stream (video, audio, etc.) is split into data chunks, and chunks of each individual stream take turns being delivered through the actual one data stream. There are different standards for multiplexing, and again your choice of components may dictate a particular MUX. Note also that other data may be multiplexed in your streams, like subtitles, alternate video or audio streams or metadata.
- Container format: At its most basic, this is the file storage type, and is usually indicated by the file extension like .AVI, .MOV, .MP4 and .M4V. Conceptually this would seem to blend with MUX, but somehow in ways not completely clear to me they are different from MUX, and some container types work with different MUX types.
- Streaming protocol: This is the method the data stream is transported or delivered over a network. (Examples: HTTP, RTP, RTSP, RTMP, MMS)
So, the encoder transforms the information from the video and audio sources using particular codecs and MUX's them into a container format, then uses a streaming protocol to deliver the container stream to the media server, and clients use a streaming protocol—not necessarily the same one used between the encoder and media server—to receive the container stream from the media server.
But you can't just pick whatever codecs, MUX, container and protocols you want. They don't all work together. The real pain of the situation is that your audience's clients may need particular codecs, MUX, container and protocols. The really real pain is that if your audiences' clients are different, they may be incompatible with your choices and incompatible with each other.
I'll remind the reader that the following are described from the point of view of a small, not-for-profit web show that wishes to grow from a viewership of 20-50 live viewers with no or minimal cost, assuming an existing 25mbps upstream connection and existing Windows 2008 Server and Linux server available. The organized portion of our show is recorded and made available for free download later, so content protection is not a concern for us. The community is rooted in Microsoft-based applications and therefore are generally assumed to have Windows, IE and Windows Media Player available, but in practice many of them own iPhones, iPads or iPods and desire to view the show on them. I also want to avoid clients having to install extra plugins or software to be able to view the live web show.
I'll use a team metaphor for the types of clients the audience may have. There are three big teams, one fading team and one emerging team:
- Team Flash video: The ubiquitous Adobe Flash browser plugin supports video in certain formats across multiple platforms and browsers. Flash is almost assumed for general web surfing, and our existing broadcasts over Ustream require Flash, making Flash a likely choice for us since our audience already has it. There is a free version of the encoder, limited but useful, and a free Flash-based player whose only restriction is an unobtrusive logo, but the Flash Media Server software costs at least $1000. I have yet to find a free and simple substitute for Flash Media Server, but I have managed with much difficulty to use VLC as encoder and media server. I will blog about that separately, and I think there are other possibilities, but the free options require a lot of tinkering and testing and time. However, the popular iPads, iPhones and iPods cannot use flash and aren't otherwise natively compatible
- Team Windows Media: If you're sure all your viewers will be using Windows and Internet Explorer to view your show, this may be the option for you. Microsoft offers Windows 2008 Media Services (media server) as a free download, and there is a free version of Microsoft Expression Encoder. I found them easy to set up and use, but with the free tools most of the touted neat features like forward error correction, smooth streaming, adaptive streaming and multi-bitrate encoding aren't available with the free tools. Still, with the free tools I was able to easily set up a live webcast viewable by Windows/IE/Windows Media Player users. However, those using other browsers or other platforms couldn't view the stream. There may be a Java applet that will play the streams on other platforms, but I moved on to other teams before I really tried to make this work.
- Team Apple: Even though our web show is inseparably tied to Microsoft and Windows, a vocal number of the audience want to use their iPads and iPhones to view the show. At first glance team Apple and team Flash seem to overlap on compatible codecs, but they use different streaming protocols and possibly different muxes and containers. Since our primary target is Wintel platforms this does not seem to be the best server platform for us. However the Quicktime encoder pricing did seem to be quite reasonable, and Darwin Streaming Server is freely available and compatible with Windows, although I found the download page excruciatingly hard to find.
- Team RealMedia: I consider this a fading team. That may or may not be fair, but the RealPlayer plugin or player has to be installed for clients to use this, and I don't believe most have this installed already, so for me it was a nonstarter. I didn't look into pricing.
- Team HTML5: The great news: an open standard, and a choice of codecs including an open free video codec (VP8 / WebM). The bad news: this is an emerging standard, and I don't think I can assume my viewers will be using an HTML5-compliant browser, although I could be wrong about that. And apparently the current VP8 encoder is not fast enough for live webcasts. HTML5 with its video tags brings hopes that the proprietary video format wars may be over in a few years and streaming video will be easier and cheaper, but it's not here yet.