Setup your own WebRTC system

Setup your own WebRTC system

If you are looking for a neat introduction to WebRTC and a tutorial on how to setup both the server and client side, look no further. We will go through all of this here. There is a lot to unpack so let's jump straight to the point.

Disclaimer: during the tutorial, I assume you are comfortable with Elixir and JavaScript

Introduction to WebRTC

WebRTC is a standard available in all modern browsers that allows users to establish a direct connection with other peers and then to directly share audio, video and generic data. The connection between peers is direct and do not transit through any other systems. This is the technology used for all the real-time meetings system you have seen emerged the last couple of years (Google Meet, Slack Call, Jitsi, etc...). Needless to say it is completely secure and extremely reliable.

A WebRTC stack is very easy: a small web-server that simply forward messages between peers trying to connect to each other and a small front-end app using the native WebRTC API. The only tricky part about WebRTC is the instanciation of the connection. But once the peers are connected, everything is just a piece of a cake.

Establishing a new connection: ICE & STUN

We need to break down this part of the protocol since it is the most complicated. WebRTC needs to create a direct connection between peers. The problem is that for security reasons, a random peer is not able to find another peer on the network. So each peer must send to other peers the route to use to locate them in the network. The route, called ICE, is just a list of connection points to navigate through in order to reach the initial peer.

To find the best ICE to use, a peer will send a list of many ICE available. They are called ICE Candidates. Peers will then agree on which ICE to use.

Finally, most of the time, those peers are not on the same network. And by default, a peer cannot navigate outside of its own network. So, by default, WebRTC could only work between peers on the same WIFI. And this is where the STUN comes into play. STUN are used as bridges to navigate between different networks. You can use your own STUN or use a public one (Google/Firefox provides good STUNs). So if you want your WebRTC to work over the Internet, you need to provide a STUN server when fetching the ICE Candidates.

Step 1 - WebRTC server

The job of our server is simply to maintain a websocket connection that keeps all the peers of a room synchronized. The server trigger and broadcast events to all peers when something happens (a new peer is trying to connect, a peer has been disconnected, etc...). It is also through this server that peers exchange their ICE Candidates and Connection details.

The events are part of the protocol therefore they cannot change, however the format of the data is completely free and up to the developers. For example, Freeswitch uses its own format called Verto, so you can either follow a common implementation or build your own.

We will be using the amazing library webrtc-server made by mat-hek, who has been working on some really good projects.

Create our Application

mix new webrtc --sup --module WR

We create a new Elixir application with a supervision tree and name our main module "WR"

You can then add the following libraries to your project:

{:jason, "1.2.2"},
{:plug, "~> 1.7"},
{:plug_cowboy, "2.5.0"},
{:membrane_webrtc_server, "0.1.2"}

We will be using plug & cowboy as a web server. It is much smaller to install than Phoenix and is perfect for such small application.

I am not going through all the details of the server implementation since this is all a very basic implementation of both plug_cowboy and webrtc_server. You can find all the resources you need here.

Create the Peer & Room handlers

The webrtc-server library expects you to implement a Room & Peer instance. This is where you will be able to apply your logic and store your states.

Create a peer.ex:

defmodule Webrtc.Peer do
  require Logger
  use Membrane.WebRTC.Server.Peer

  @impl true
  def parse_request(request) do
    with {:ok, room_name} <- get_room_name(request),
         {:ok, credentials} <- get_credentials(request) do
      {:ok, credentials, %{}, room_name}
    end
  end

  defp get_credentials(request) do
    {:ok, %{username: "guest", password: "public"}}
  end

  defp get_room_name(request) do
    room_name = :cowboy_req.binding(:room, request)

    if room_name == :undefined do
      {:error, :no_room_name_bound_in_url}
    else
      {:ok, room_name}
    end
  end

  @impl true
  def on_init(_context, _auth_data, _options) do
    {:ok, %{}}
  end
end

parse_request allow your to parse any header or cookie to retrieve credential or apply any logic you want when a new peer is trying to connect to the system.

Create a `room.ex`:

defmodule WR.Room do
  require Logger
  use Membrane.WebRTC.Server.Room

  @impl true
  def on_init(options \\ %{}) do
    {:ok, options}
  end

  @impl true
  def on_join(_auth_data, state) do
    Logger.info("User joined")
    {:ok, state}
  end

  @impl true
  def on_leave(_peer_id, state) do
    Logger.info("USER LEFT")
    {:ok, state}
  end
end

Room allows you to apply some business logic. I leave it empty here since there is not much I want to do.

And that's all ! With this, our webrtc is almost ready !

Run the web server !

Once you are ready, the first step is to create a rooter for your application.

Create a router.ex:

defmodule WR.Router do
  use Plug.Router

  plug(Plug.Static,
    at: "/",
    from: :wr
  )

  plug(:match)
  plug(:dispatch)

  match _ do
    send_resp(conn, 404, "404")
  end
end

This is a default handler for plug_cowboy.

Finally, let's write our main:

defmodule WR.Application do
  use Application

  alias Membrane.WebRTC.Server.Room
  alias Membrane.WebRTC.Server.Peer

  @impl true
  def start(_type, _args) do
    children = [
      Plug.Cowboy.child_spec(
        scheme: :http,
        plug: WR.Router,
        options: [
          dispatch: dispatch(),
          port: 8042
        ]
      ),
      Supervisor.child_spec(
        {Room,
         %Room.Options{
           name: "room",
           module: WR.Room,
           custom_options: %{}
         }},
        id: :room
      )
    ]

    opts = [strategy: :one_for_one, name: WR.Supervisor]
    Supervisor.start_link(children, opts)
  end

  defp dispatch do
    options = %Peer.Options{module: WR.Peer}

    [
      {:_,
       [
         {"/webrtc/[:room]/", Peer, options},
         {:_, Plug.Cowboy.Handler, {WR.Router, []}}
       ]}
    ]
  end
end

You can now run your application and get ready to write our client application !

Step 2 - JavaScript client

This is where the real work starts.

Host HTML elements

You will need a template to host the media elements used to display the audio & video of each peers.

In your HTML file, add an empty element that you can easily retrieve and a template that includes a video element:

<div id="webrtc">
</div>

<template id="webrtc-video-template">
    <video playsinline autoplay onclick="fullscreen(this)"></video>
</template>

We will then dynamically add/remove HTML tags when new peers connect or disconnect. Let's write a small class that handle this:

/** Prepare our DOM to use WebRTC */
function setup(id, parent, templateId, stream) {
    const template = document.getElementById(templateId);
	const child = document.importNode(template.content, true);
	child.id = id;
    parent.appendChild(child);
    const videoEl = child.querySelect('video');
    videoEl.srcObject = stream;
    return el;
  }

export default class DomManager {

  constructor(rootNode) {
    this.templateId = 'webrtc-video-template';
    this.rootNode = rootNode;
  }

  setupLocal(stream) {
	const el = setup('local', this.rootNode, this.templateId, stream);
	return el;
  }

  setupPeer(peerId, stream) {
    setup(peerId, this.rootNode, this.templateId, stream);
  }

  removePeer(peerId) {
    const el = document.getElementById(getPeerVideoId(peerId))
    el.parentNode.removeChild(el);
  }

}

Retrieve user Media

The first step is get access and start the microphone and/or the video of the user. To do so, we can use the navigator.mediaDevices.getUserMedia function:

const constrains = {
	audio: true, // Start the mic
    video: 'auto' // Start the camera
};
navigator.mediaDevices.getUserMedia(constrains).then((stream) => {
	// we now have access to a MediaStream
}).catch((e) => console.error('Failed to retrieve user media', e));

Connect to the server

Now we have access to the user media, we can start to connect to our server. To do so, we start a WebSocket connection to our server. We can create our own class Socket that wraps the native API:


export default class Socket {

  constructor(url) {
    this.url = url;
  }

  open() {
    // We ping the server frequentely so the server do not kill the connection
    if (this.interval) {
      window.clearInterval(this.interval);
    }
    this.interval = window.setInterval(() => {
      this.socket.send('ping');
    }, 30000);

    // You can setup some credentials that will then be send to the server
    document.cookie = "credentials=" + JSON.stringify({
      username: "XXX",
      password: "VVV"
    });
    this.socket = new WebSocket(this.url);

    this.socket.onmessage = (raw) => {
      try {
        const message = JSON.parse(raw.data);
		console.log('new message', message);
      } catch {
        // console.log('invalid message format (ignored) > ', raw);
      }
    };
  }
}

Let's merge all of this

Ok, now we have the basics, let's merge our code into something that actually does something:

const url = 'ws://localhost:8042/room';
const constrains = {
	audio: true, // Start the mic
    video: 'auto' // Start the camera
};

const domManager = new DomManager(document.getElementById('webrtc'));

navigator.mediaDevices.getUserMedia(constrains).then((stream) => {
	// setup a media element with our own stream
    domManager.setupLocal(stream);
    // connect to our server
    const socket = new Socket(url);
    socket.open();
}).catch((e) => console.error('Failed to retrieve user media', e));

We should now start to receive events from the server.

Handle the WebRTC events

Now we need to take a break and learn a bit more about the WebRTC protocol.

You will receive many events and your front-end must handle all events properly. Below is the list of all events you will receive:

  • AUTHENTICATED: you don't have much to do here. The server simply lets you know you are successfully logged in
  • JOINED: a new peer has joined the room. When you log into a new room, you will receive a JOINED event for all the peers already connected. You need to start the ICE Candidates flow
  • LEFT: a peer has left the room. You need to clean all data related to this peer
  • CANDIDATE: a peer is sending you its ICE candidates
  • OFFER: you have received an offer to connect to a peer and you need to send your own description to proceed with the connection.
  • ANSWER: a peer has accepted your offer and you need to store its description
  • ERROR: it never happens with me :)

Now we know what to do, let's write a new class that handles all events properly:

const rtcConfig = {
    iceServers:[{urls: 'stun:stun.l.google.com:19302'}]
}

function getHandleDescription(socket, rtcConnections, peer_id, event) {
  return (description) => {
      rtcConnections[peer_id].setLocalDescription(description);
      const message = {to: [peer_id], event: event, data: description};
      socket.send(JSON.stringify(message));
  }
}

function getOnIceCandidate(socket, peer_id) {
  return (event) => {
      if (event.candidate != null) {
          const message = {to: [peer_id], event: "candidate", data: event.candidate};
          socket.send(JSON.stringify(message));
      }
  }
}

function getHandleTrack(domManager, peerId, from_metadata) {
  return (event) => {
    domManager.setupPeer(peerId, event.streams[0]);
  };
}

function startRTCConnection(socket, domManager, peer_id, from_metadata, localStream, rtcConnections, config) {
  let connection = new RTCPeerConnection(config);
  connection.addStream(localStream);
  connection.onicecandidate = getOnIceCandidate(socket, peer_id);
  connection.ontrack = getHandleTrack(domManager, peer_id, from_metadata);
  rtcConnections[peer_id] = connection;
}

export default class Orchestrator {
  constructor(stream, domManager) {
    this.domManager = domManager;
    this.stream = stream;
    this.rtcConfig = rtcConfig;
    this.offerOptions = {
      offerToReceiveAudio: 1,
      offerToReceiveVideo: 1,
    };
    this.rtcConnections = {};
  }

  onAuthenticated(socket, data, from) {
    this.logMessage('authenticated', data, from);
  }

  onAnswer(socket, data, from) {
    this.rtcConnections[from].setRemoteDescription(data);
  }

  onJoined(socket, data, from, from_metadata) {
    this.logMessage('joined', data, from);
    let peer_id = data.peer_id;
    startRTCConnection(socket, this.domManager, peer_id, from_metadata, this.stream, this.rtcConnections, rtcConfig);
    this.rtcConnections[peer_id].createOffer(
        getHandleDescription(socket, this.rtcConnections, peer_id, "offer"),
        console.dir,
        this.offerOptions
    );
  }

  onLeft(socket, data, from) {
    this.logMessage('left', data, from);
    delete this.rtcConnections[data.peer_id];
    this.domManager.removePeer(data.peer_id);
  }

  onOffer(socket, data, from, from_metadata) {
    startRTCConnection(socket, this.domManager, from, from_metadata, this.stream, this.rtcConnections, this.rtcConfig);
    let connection = this.rtcConnections[from];
    connection.setRemoteDescription(data)
    connection.createAnswer(
        getHandleDescription(socket, this.rtcConnections, from, "answer"),
        console.dir,
    );
  }

  onCandidate(socket, data, from) {
    try {
      var candidate = new RTCIceCandidate(data);
      this.rtcConnections[from].addIceCandidate(candidate);
    } catch (e) {
      console.dir(e);
    }
  }

  onError(socket, data, from) {
    this.logMessage('error', data, from);
    console.warn('Server error > ', data);
  }

  logMessage(name, data, from) {
    console.log(`== New message == [${name}]`);
    console.log(`DATA >`, data);
    console.log('FROM >', from);
    console.log('==');
  }

}

The first thing to understand is that the native object RTCPeerConnection is the main API when working with WebRTC.

This idea is to create 1 instance of RTCPeerConnection per peer. So we keep an object rtcPeerConnections that stores all the RTCPeerConnection with the peer ID as key.

When a new peer is connecting, (JOINED event), we create a new RTCPeerConnection. Once the RTCPeerConnection is created, it first setup the tracks (audio & video) and then starts to gather ICE Candidates. The ICE Candidates are directely sent to the new peers.

Once the ICE negociation is done, you will either receive an ANSWER (in case you are the one joining the room) or an OFFER (if a new peer is joining the room). You then need to generate and set the Description. Descriptions is a very complex and encrypted string with a bunch of information that you don't need to understand. However, you need to send your own description and to set your peer's description to its RTCPeerConnection.

The final product

We now need to bind our newly created orchestrator with the WebSocket handler. Edit the open function of our Socket object:

open() {
	// ...
    const messageEventListeners = {
      answer: (data, from, from_metadata) => this.orchestrator.onAnswer(this.socket, data, from),
      authenticated: (data, from, from_metadata) => this.orchestrator.onAuthenticated(this.socket, data, from),
      candidate: (data, from, from_metadata) => this.orchestrator.onCandidate(this.socket, data, from),
      joined: (data, from, from_metadata) => this.orchestrator.onJoined(this.socket, data, from, from_metadata),
      left: (data, from, from_metadata) => this.orchestrator.onLeft(this.socket, data, from),
      offer: (data, from, from_metadata) => this.orchestrator.onOffer(this.socket, data, from, from_metadata),
      error: (data, from, from_metadata) => this.orchestrator.onError(this.socket, data, from),
    };

    this.socket.onmessage = (raw) => {
      try {
        const message = JSON.parse(raw.data);
        messageEventListeners[message.event](message.data, message.from, message.from_metadata);
      } catch {
        // console.log('invalid message format (ignored) > ', raw);
      }
    };
  }

I hope you enjoyed this article. WebRTC is not something easy to understand at first but if you take time to learn the flow and how the JavaScript API works, it is very easy to use !

If you have a problem and no one else can help. Maybe you can hire the Kalvad-Team.