Percona XtraDB cluster overview

Abstract

This document describes the concepts, the architecture, the strengths and the limitations of using Percona XtraDB cluster.

Introduction concepts

Server-centric

Usually in the application layer if you need more performance you can add more resources but what about the databases?

1) you have to distribute all the changes to all the servers in real time (major point)

2) the databases has to be available for all the applications

3) the application has to be able to do changes

 

In the common server-centric topology, one server streams the data to another one, but this isn’t the best method to protect your data.

You can create a real complex topology, like in this image, but you will always have a single point of failure: the master node.

The replication and the communication between the master and the slave nodes are async so the master does not care about the slave status (transaction, replication delay…) and if the master crashes, transaction that it has committed might not have been transmitted to any slave, so a failover might cause data loss.

Data-centric

Another solution to manage a database cluster is the data-centric method.

Data-centric refers to an architecture where data is the primary and permanent asset, and applications come and go.

Data is synchronised between servers.

The data-centric method (virtual synchronous) “synchronous” guarantees that:

  • if changes happened on one node of cluster, they happened on other nodes “synchronously”
  • it is always HA (no data loss)
  • data replicas are always consistent
  • transactions can be executed on all node in parallel

How does the virtual synchronous works

In node1 the user starts a transaction and does some queries, than when the user does the commit, at that time the cluster sends this write-set (network event notification) to the other nodes; the nodes say “I got the write-set” and send back an acknowledge; at this point, as you can see in the image, the certification process begins, and only when the certification is preceded, the node 1 executes physical COMMIT and when this process is finished, the node 1 returns to the client a feedback “the commit is done”.

Now pay attention at the node2: it has the write-set, it certifies the packages as well, and it starts to apply the transaction; so as you can see there can be some small time out-of-sync.

As you can understand, here the critical point is the means of communication between the nodes: the network; so you have to have a very good quality and low latency network.

Certification

In the certification-based replication, each transaction has a global ordinal sequence number (that is the order number for the execution of the transactions) and this procedure determines whether or not the node can apply the write-set.

When the node receives the COMMIT from client, it collects all changes into a write-set and it starts to send this set to all the nodes. The write-set undergoes a deterministic certification test; during this test, all the nodes have to determinate if they are able to replay this write-set or if they have some blocking like some other transaction in their queue that blocks that write-set; to do this, the certification test use the primary keys (so it’s very important to use it!).

This procedure is deterministic and all the nodes receive transactions in the same order. So all nodes reach the same decision about  the outcome of the transaction. The node that started the transaction, as you can see in the previous picture, can then notify the client application whether or not it has committed the transaction.

During this procedure you can have 2 type of errors:

  • Brute force abort when other node execute conflicting transaction and local active transaction needs to be killed;
  • local certification errors: when 2 nodes execute conflicting workload and add it to the queue at the same time.

When you have only one server, you use the traditional locking model, and if you try to write the same data in two transactions at the same time, the transaction 2 must wait for the transaction 1, and if the transaction 1 doesn’t commit, the transaction 2 times out.

However, If you work in a cluster situation you have a different behavior called Optimistic locking. As you can see in the bottom part of the picture, you have a transaction 1 starting on server 1 and transaction 2 starting on server 2; on the server 1 you can update row and you can commit it, while doing the same steps on the server 2: at this point the certification process comes into play; it understands that there is another transaction coming from server 1, it checks its queue and evaluates that the two transactions are in conflict. So the result will be that transaction 1 will be completed while transaction 2 won’t.

Percona XtraDB cluster

Percona XtraDB Cluster is an active/active high availability and high scalability open source solution for MySQL ® clustering.

It integrates Percona Server and Percona XtraBackup with the Codership Galera library of MySQL high availability solutions in a single package that enables you to create a cost-effective MySQL high availability cluster

Main features

Synchronous replication: Data is written to all nodes simultaneously, or not written at all if it fails even on a single node.

Multi-master replication: Any node can trigger a data update.

True parallel replication: Multiple threads on slave performing replication on row level.

Automatic node provisioning: You simply add a node and it automatically syncs.

Data consistency: No more unsynchronized nodes.

PXC Strict Mode: Avoids the use of experimental and unsupported features.

Configuration script for ProxySQL: Percona provides a ProxySQL package with the proxysql-admin tool that automatically configures Percona XtraDB Cluster nodes.

Automatic configuration of SSL encryption:  Percona XtraDB Cluster includes the pxc-encrypt-cluster-traffic variable that enables automatic configuration of SSL encryption.

Optimized Performance: Percona XtraDB Cluster performance is optimized to scale with a growing production workload.

Data compatibility: You can use data created by any MySQL variant.Application compatibility: There is no or minimal application changes required.

Percona XtraDB cluster is leaning:

  • on Galera replication plugin, that enables write-set replication service functionality;
  • on Group communication plugin (f.e. gcomm), important for the communication between the nodes and to maintain the order of transaction execution;
  • on wsrep API which is the interface between the galera application plugin and databases server.

Percona XtraDB Cluster is based on Percona Server running with the XtraDB storage engine (which is a Percona version of InnoDB storage engine). It uses the Galera library, which is an implementation of the write-set replication API.

The group communication layer manages the transactions and their sorting; when node 1 commiting a transaction, the gcomm send this information (the write-set) to all the nodes.

If there are concurrent transactions on more nodes, the gcomm define an order to guarantees that all messages are read in the same order on all the nodes.

The Flow control

This is a great replication feedback mechanism offered by Galera.

This feedback allows any node in the cluster to instruct the group when it needs replication to pause and when it is ready for replication to continue. This prevents any node in the synchronous replication group from getting too far behind the others in applying replication.

Limitation

  • only InnoDB tables are supported
  • it use optimistic locking instead of traditional. So if you are writing on multiple nodes at the same time, you can have conflicts
  • the weakest node limits the write performance (if you have a weak node which could not replay the transaction faster than other, that node sends a lot of flow control messages)
  • all the tables  should have a Primary key
  • large transaction are not recommended

lock table, GET_LOCK() are not supported (because when you lock a table only going to lock the table locally on your node)

State Transfer

There are 2 kind of State Transfer:

  • the Full data SST: used for the new nodes or for the node long time disconnected
  • the incremental ST: used for node short time disconnected

The SST  (snapshot state transfer) could be perform with:

 

  • mysqldump
  • rsync
  • Xtradbbackup

 

The first two block the donor during the copy, instead using the Xtradbbackp the donor is always available (so the latter is the recommended method)

Load Balancing

In order to have load balancing between the nodes, you can use a proxy like HAproxy or ProxySQL. We use and recommend the second one.

ProxySQL (a Layer 7 proxy) has an advanced multi-core architecture. It’s built from the ground up to support hundreds of thousands of concurrent connections, multiplexed to potentially hundreds of backend servers.

The main features are:

  • Query caching
  • Query Routing
  • Advanced configuration with 0 downtime
  • Application layer proxy
  • Advanced topology support

Quorum

In cluster solution, every node has a point, a vote; this points are very important in case of problems to understand if the cluster is still consistent and which nodes are broken.

In order to have spit-brain protection you need at least 3 nodes because if one of the nodes has some problems, the other 2 nodes have 2 points (that are bigger than the majority of 3), while the broken node has only 1 vote; the quorum system puts this broken node in a non-primary state and so the node doesn’t accepts any read or write.

It’s a preferred choice to have a number of odd nodes in order to prevent a pair point situation in which all the nodes goes in a non primary state.

Conclusions

not recommended very recommended
write-scalable solution easy to scale the reads
large transactions data consistency
working with Foreign keys easy failover
sharding no data loss if server crashes
easy to add/remove a node

Linkography

Thanks to Percona community and Percona’s tutorial site

Kamailio KEMI Framework – route logic migration to python

Abstract

In this article I will describe the usage of the KEMI framework on our Kamailio nodes. We’ve migrated all our http async requests to our API from the kamailio configuration scripting language to python. I’ve already described our dynamic dispatchers in Kamailio with jsonrpc and graphql with external Orchestrator and API. Please take a look at that article before proceeding.

Architecture overview

Our highly scalable cloud SIP infrastructure is using docker and kubernetes with microservices in kamailio, asterisk, rtpengine and cgrates. This kind of infrastructure can scale on premise, in the cloud with geographical regions request routing. It also has the possibility of having RTP nodes near the client installed on premise and it is surely a boost for the quality of experience and simplicity of deployment.

We have two distinct layers of kamailio nodes: the proxy and the router layer. Only the proxy layer has a public IP and is connected to the outside world on the Internet besides the media layer. Under the router layer we have a TPS (Transcoding and Playback Service) Asterisk layer and obviously external RTP nodes layer.

Dispatcher list reloading process

In the following schema you can observe the flow of the dynamic dispatcher reloading process that starts with a http rpc call from our orchestrator towards a kamailio node that triggers a graphql query on API to get a list of dispatchers.

This kind of approach is just a step towards a stateless kamailio instance that gets configured on runtime by the needs of the infrastructure as a whole. Creating multiple layers and handling the mutability of the whole by an orchestrator gives the ability to have a highly scalable and cloud oriented architecture.

Kamailio KEMI Framework

Kamailio uses a scripting language for it’s kamailio.cfg file which was developed from scratch and it’s a simil-C type of language with initial design going back to the year 2001.

The native scripting language meets it’s limitations with external services like our API.

The solution to meet our demands was found in the Kamailio Embedded Interface (KEMI) framework for executing SIP routing scripts in other programming languages.

This framework was added first in Kamailio v5.0.0 enabling multiple languages to be used and the interpreters for these languages are embedded in Kamailio, initialized at startup to be as fast as possible during runtime execution.

The kamailio.cfg still keeps the parts with:

  • Global parameters
  • Loading modules
  • Modules settings

These parts are evaluated once at startup and the majority of the parameters can be changed at runtime via RPC commands.

The languages supported by KEMI scripting are JavaScript, Lua, Python and Squrrel.

For more information about the KEMI Framework take a look at this article:

https://kamailio.org/docs/tutorials/devel/kamailio-kemi-framework/

Configuring Kamailio

It’s very simple to use python with KEMI, we only need two lines in our kamailio.cfg:

loadmodule "app_python.so"

modparam("app_python", "load", "/etc/kamailio/kamailio.py")

The modparam sets the initial python file where a class kamailio defines the functions that will be used in kamailio with the function python_exec. We will get to that later.

Writing kamailio.py

So our main python file imports all the libraries we need to use and returns the class kamailio with all it’s functions used with python_exec in the kamailio.cfg.

In the Python script we have to declare the global mod_init method where to instantiate an object of a class that implements the other callback methods (functions) to be executed by Kamailio.

import KSR as KSR

import json

import re


import gnr

import apiapp

import cgrates



def mod_init():

  return kamailio()



class kamailio:

  def child_init(self, rank):

    return 0


  def updateDispatchers(self, msg):

    try:

      token = getToken()



      dl = apiapp.dispatcherList(token)


      dispatcherList =

        getJsonPath("data.kamailio.dispatcher.list.kamailioConf", dl)


      if(dispatcherList):

        with open("/tmp/dispatcher.list", "w") as dlFile:

    dlFile.write(dispatcherList)

     except:

       kamExit("updateDispatchers failed!")

     return 1

# -- END class kamailio -- #

As you can see the the updateDispatchers function will be called inside our kamailio.cfg, but we will get to that later. In the same file we have defined also the getToken function that reads from the shared table the token otherwise it just loads a new one from the API and stores it to the shared table for future use.

The shared table for the token has an autoexpire of 50 minutes while the JavaScript Web Token expires an hour time.

modparam("htable", "htable", "api=>size=2;autoexpire=3000;")

So let’s take a look at the getToken function:

def getToken():

  """Checks the token availability in the shared table.

  If there is no token gets a new one from API and saves it to the sht.

  """

  token = KSR.pv.get("$sht(api=>token)")


  if(token is None):

    KSR.xlog.xlog("L_INFO", "Refreshing api token\n")

      try:

        newToken = apiapp.getApiToken()

         if(newToken["success"]):

           KSR.pv.sets("$sht(api=>token)", newToken["token"])

           token = newToken["token"]

         else:

           KSR.xlog.xerr("{}".format(json.dumps(newToken)))

      except:

        kamExit("getToken failed!")


  return token

Quite simple to understand, now let’s see the apiapp.py functions included in the kamailio.py:

import os

import json

import httplib


apiCredentials=""

with open("/tmp/apiCredentials", "r") as cred:

apiCredentials=cred.read().replace("\n", "")



dispatcherListQuery = """

{ kamailio { dispatcher { list { kamailioConf } } } }

"""


def getApiToken():

hdr = {"content-type": "application/x-www-form-urlencoded" }

conn = httplib.HTTPConnection("api-app-service:9016")

conn.request("POST", "/auth", apiCredentials, hdr)



response = conn.getresponse()

data = response.read()



return json.loads(data)



def graphqlQuery(apiToken, query):

payload = {}

payload["query"] = query



hdr = {

"content-type": "application/json",

"Authorization": "Bearer {token}".format(token=apiToken)

}


conn = httplib.HTTPConnection("api-app-service:9016")

conn.request("POST", "/graphql", json.dumps(payload), hdr)



response = conn.getresponse()

data = response.read()


return json.loads(data)


def dispatcherList(token):

return graphqlQuery(token, dispatcherListQuery)

With dispatcherList function we’re querying the graphql endpoint from the origin updateDispatchers function in the kamailio.py seen above.

Using python function in the xhttp route

At this point we’ve rewritten all the kamailio route logic seen in the previous article (DISPATCHER_LIST and DISPATCHER_SET) into a simpler python code with no need to use the HTTP_ASYNC_CLIENT module.

Our xhttp request route will be the following:

event_route[xhttp:request] {

  if!(dst_port==80) {

    xlog("L_NOTICE", "[XHTTP:REQUEST] $si FORBIDDEN! ***\n");

    exit;

  }

  if ($hu =~ "^/rpc") {

    xlog("L_NOTICE", "[XHTTP:REQUEST] $si ACCEPTED ***\n");

    jansson_get("method", "$rb", "$var(rpcMethod)");

    xlog("L_NOTICE", "[XHTTP:REQUEST] RPC METHOD: $var(rpcMethod) ***\n");

    if($var(rpcMethod) == "dispatcher.reload") {

      xlog("L_NOTICE", "Reloading dispatcher list\n");

      python_exec("updateDispatchers");

    }

  }

  jsonrpc_dispatch();

  exit;

}

The orchestrator http request towards our kamailio node is the following:

url: `http://${pod.ipAddress}/rpc`,

body: JSON.stringify({'jsonrpc': '2.0', 

                      'method': 'dispatcher.reload', 'id': '1'})

As you can see the jansson_get function loads the dispatcher.reload RPC function that is triggered by the jsonrpc_dispatch function after the python_exec(“updateDispatchers”) function fetches the new dispatcher list and writes it to our /tmp/dispatcher.list. The same file used by the dispatcher module:

modparam("dispatcher", "list_file", "/tmp/dispatcher.list")

Conclusions

This example shows you the possibility to use python (or any other KEMI supported language) for functions that are easier to read and develop by programmers knowing the basics of kamailio and obviously python.

For external services using modern technology solutions it’s surely simpler to develop in a high-level programming languages than using native kamailio scripting language. We’ve managed to use both but find it easier to just use python for some parts of the routing logic.

 

Dynamic dispatchers in Kamailio with jsonrpc and graphql with external Orchestrator and API

Abstract

This document describes the usage of kamailio in a dynamic, multi layer and containerized environment with and external orchestrator that is able to force a custom dynamic list of dispatchers to a running kamailio node.

Architecture overview

Our highly scalable cloud SIP infrastructure is using docker and kubernetes with microservices in kamailio, asterisk, rtpengine and cgrates. This kind of infrastructure can scale on premise, in the cloud with geographical regions request routing. It also has the possibility of having RTP nodes near the client installed on premise and it is surely a boost for the quality of experience and simplicity of deployment.

We have two distinct layers of kamailio nodes: the proxy and the router layer. Only the proxy layer has a public IP and is connected to the outside world on the Internet besides the media layer. Under the router layer we have a TPS (Transcoding and Playback Service) Asterisk layer and obviously external RTP nodes layer.

The API

We’re using a JavaScript API running on Node.JS and using GraphQL as a query language. This API is tracking the creation of every node in our infrastructure and is aware of the infrastructure architecture being able to serve only relevant data to a node. In our multi layer infrastructure the proxy layer should be able to have a list of routers under itself. The router layer should be able to get a list of proxies over and TPS under itself.

A sample router GraphQL API request in our case will be structured like this;

{ 

  kamailio { 

    dispatcher {

      list { 

        kamailioConf

      } 

    }

  }

}

The response for the kamailioConf key will hold a value like this:

2 sip:172.22.2.6:5060

1 sip:172.22.2.95:5060

1 sip:172.22.2.94:5060

As you can see this router has a proxy on top with setid = 2 and two TPS nodes beneath with setid = 1.

Kamailio configuration

In our pursuit of having a stateless kamailio instance we define a file on our container that will contain the dispatcher list that can then be updated. So our configuration should be like this:

loadmodule "dispatcher.so"

modparam("dispatcher", "list_file", "/tmp/dispatcher.list")

We’re then using the xhttp module to be able to trigger the dispatcher list reload route from the orchestrator.

Some kamailio pod in our infrastructure have two network ports, one local and one public.
For security reasons we’re listening to the port 80 only on the local network interface. Thus filtering the request by that port gives us the security that the orchestrator is making the call.

event_route[xhttp:request] {

  # Check if the call is from the local network

  if!(dst_port==80) {

    xlog("L_NOTICE", "[XHTTP:REQUEST] $si FORBIDDEN! ***\n");

    exit;

  }




  if ($hu =~ "^/rpc") {

    xlog("L_NOTICE", "[XHTTP:REQUEST] $si ACCEPTED ***\n");

    jansson_get("method", "$rb", "$var(rpcMethod)");

    xlog("L_NOTICE", "[XHTTP:REQUEST] RPC METHOD: $var(rpcMethod) ***\n");




    if($var(rpcMethod) == "dispatcher.reload") {

      xlog("L_NOTICE", "Reloading dispatcher list\n");

      route(DISPATCHER_LIST);

    }

    jsonrpc_dispatch();

    exit;

}

Our DISPATHER_LIST method will then handle the API query for the new list of dispatchers and reload the list via rpc.

route[DISPATCHER_LIST] {

  xlog("L_INFO", "route[DISPATCHER_LIST]: 

Fetching dispacther_list! \n");

  if($sht(api=>token)==$null){

    route(GET_API_TOKEN);

  }

  $http_req(all) = $null;

  $http_req(suspend) = 1;

  $http_req(timeout) = 500;

  $http_req(method) = "POST";

  $http_req(hdr) = "Content-Type: application/json";

  $http_req(hdr) = "Authorization: Bearer " + $sht(api=>token);

  $var(graphql_query) = 

"{\"query\": \"{kamailio {dispatcher {list {kamailioConf }}}}\"}";

  $http_req(body) = $var(graphql_query);

  http_async_query(API_QUERY_URL, "DISPATCHER_SET");

}

The response of the async http query is then handled by DISPATCHER_SET route:

route[DISPATCHER_SET] {

  if ($http_ok && $http_rs == 200) {

    xlog("L_INFO", "route[DISPATCHER_SET]: response $http_rb)\n");

    jansson_get("data.kamailio.dispatcher.list.kamailioConf", $http_rb, "$var(conf)");

    if($var(conf)!="0") {

      exec_msg("printf \"$var(conf)\" > /tmp/dispatcher.list");

      jsonrpc_exec('{"jsonrpc": "2.0", "method":

"dispatcher.reload", "id": "1"}');

      xlog("L_INFO", "route[DISPATCHER_SET]: 

Dispatchers reloaded! \n");

    }

  }

}

The API Call

So at this point we have Kamailio ready for handling the dispatcher reload RPC call that first refreshes the dispatcher list got from the API.

Now let’s take a look at how we managed to send the request via our API written in JS.

We have a function called makeRPCSignal  which handles different rpc calls (rpcMethod) beside the dispatcher.reload. Here’s a simplified example:

function makeRPCSignal (rpcMethod) {

  const options = {

    url: `http://${pod.ipAddress}/rpc`,

    body: JSON.stringify({'jsonrpc': '2.0', 'method': rpcMethod,

                          'id': '1'}),

  }

  request.post(options, (error, res, body) => {

    if (!error && res.statusCode === 200) {

      log.debug(`${JSON.stringify(res)}\n`, moduleInfo)

      resolve(new RPCSuccess(pod, res.statusCode, rpcMethod))

    } else {

      log.debug(`${JSON.stringify(error)}\n`, moduleInfo)

      if (res) {

        if (res.statusCode) {

          log.debug(`rpcMethod: Status Code: ${res.statusCode}\n`,

                     moduleInfo)

          reject(new RPCException(pod, res.statusCode, '', rpcMethod))

        } else {

           reject(new RPCException(pod, -1, '', rpcMethod))

        }

      } else {

        reject(new RPCException(pod, -1, '', rpcMethod))

      }

    }

  }).on('error', (e) => {

    reject(new RPCException(pod, -1, e.message, rpcMethod))

  })

}

As you can see there’s a also a simple function RPCException that formats the logs with additional information ready to be sent to Elasticsearch.

Conclusions

So at this point we have a port 80 exposed for the orchestrator to make a RPC call via HTTP that triggers a kamailio route which enquiries the API for the actual dispatcher list. If there is any error in the process we will handle it on the orchestrator side (retry, deletion, creation of new instances, etc.).

This kind of approach is just a step towards a stateless kamailio instance that gets configured on runtime by the needs of the infrastructure as a whole. Creating multiple layers and handling the mutability of the whole by an orchestrator gives the ability to have a highly scalable and cloud oriented architecture.

Kamailio route testing

Abstract

This document describes the testing of a single route in kamailio using specific headers sent by sipp and custom testing routes in kamailio. We will cover an example route that handles multiple conditions and replies to our call with a positive (200 OK) or negative (500 Server Internal Error) response.

Architecture overview

We are developing a highly scalable cloud SIP infrastructure with docker and kubernetes with microservices in kamailio, asterisk, rtpengine and cgrates. Our infrastructure can scale on premise, in the cloud with geographical regions request routing. It also has the possibility of having RTP nodes near the client installed on premise and it is surely a boost for the quality of experience and simplicity of deployment.

We have two distinct layers of kamailio nodes: the proxy and the router layer. Only the proxy layer has a public IP and is connected to the outside world on the Internet besides the media layer.

The different layers are interconnected and it’s impossible to test the single layer using sipp (http://sipp.sourceforge.net/). It’s even harder to test a single route in kamailio.

CD/CI

In our development pipeline we’re using Jenkins to run tests on every commit/merge in our testing branch with different tools. The behaviour tests on the whole infrastructure are run using sipp with different scenarios.

This is an example bash script for running a test:

#!/usr/bin/env bash
 set -o pipefail
 set -o nounset
 NAME="ut_kama_proxy-should_dispatcher_reload"
 DATE_FILE=$(date '+%Y-%m-%d_%H-%M')
 NODE_OWNER=$(echo "${POD_NODE}" | cut -d '.' -f 2)
 NODE_DOMAIN=$(echo "${POD_NODE}" | cut -d '.' -f3-)
 TARGET="proxy.${NODE_OWNER}.${NODE_DOMAIN}"
 # ---------------------------------------------------------------- #
 echo "${NAME} :: starting script"
 # ---------------------------------------------------------------- #
 # ---------------------------------------------------------------- #
 echo "${NAME} :: creating csv file with credentials"
 # ---------------------------------------------------------------- #
 mkdir -p /tmp/inf_files
 cat << EOF > /tmp/inf_files/"${NAME}".csv
 SEQUENTIAL
 test;$TARGET;[authentication username=test password=xxx];0039040123123;
 EOF
 SIPP=/usr/local/bin/sipp
 SCENARIOS=/sipp/scenarios/"${NAME}".xml
 CREDENTIALS=/tmp/inf_files/"${NAME}".csv
 "${SIPP}" "${TARGET}" -sf "${SCENARIOS}" -l 1 -m 1 -r 1 -max_non_invite_retrans 3 -rp 1000 -inf "${CREDENTIALS}"
 EXIT_CODE="${?}"
 # ---------------------------------------------------------------- #
 echo "${NAME} :: remove file and folder with credentials"
 # ---------------------------------------------------------------- #
 rm -rf /tmp/inf_files
 exit "${EXIT_CODE}";

As you can see we’re using scenarios in .xml and for the example above the scenario file is very simple and sends a certain header that specifies the test to be run.

<?xml version="1.0" encoding="iso-8859-2" ?>

<!DOCTYPE scenario SYSTEM "sipp.dtd">


<scenario name="[router] [unit test] SHOULD_DISPATCHER_RELOAD - OK">


 <send retrans="10">

   <![CDATA[


     INVITE sip:[field3]@[remote_ip]:[remote_port] SIP/2.0

     Via: SIP/2.0/[transport] [local_ip]:[local_port];branch=[branch]

     From: sipp <sip:[field0]@[field1]>;tag=[call_number]

     To: <sip:[field3]@[field1]:[remote_port]>

     Call-ID: [call_id]

     CSeq: [cseq] INVITE

     Contact: sip:[field0]@[local_ip]:[local_port]

     Max-Forwards: 70

     Content-Type: application/sdp

     Content-Length: [len]

     X-evosip-Test: TEST_SHOULD_DISPATCHER_RELOAD

     v=0

     o=user1 53655765 2353687637 IN IP[local_ip_type] [local_ip]

     s=-

     c=IN IP[media_ip_type] [media_ip]

     t=0 0

     m=audio [media_port] RTP/AVP 8

     a=rtpmap:8 PCMA/8000


   ]]>

 </send>


 <recv response="200" rrs="true" optional="false"></recv>


 <!-- definition of the response time repartition table (unit is ms)   -->

 <ResponseTimeRepartition value="10, 20, 30, 40, 50, 100, 150, 200"/>

</scenario>

So we can automate calls with different types of requests, headers and options and expect a certain response for a test to be defined as passed or failed.

Kamailio configuration structure

Our kamailio nodes have the standard kamailio.cfg file that includes the testing routes, that I will describe later, only if the global variable TESTING is defined. We pass that variable on the creation of the container only upon the testing process in Jenkins.

#!ifdef TESTING

include_file "kamailio-test.cfg"

#!endif

Kamailio-test.cfg file contains the routes that are triggered by this if condition at the top of the request route:

request_route {

#!ifdef TESTING

 if($hdr(X-evosip-Test) =~ "^TEST_") {

   route($(hdr(X-evosip-Test){s.rm,"})); # "

   exit;

 }

#!endif

...

The name of the route as from the example before is TEST_SHOULD_DISPATCHER_RELOAD and so that route is triggered when the header contains a record for X-evosip-Test.

 Kamailio route to be tested

We use the xhttp module for accepting rpc calls to reload the dispatchers that are requested to our central API that orchestrates the SIP infrastructure.

This route uses a shared table dispatcher that has the variable list set to 1 if the dispatchers have been correctly updated upon the xhttp request. Otherwise it checks if the dispatchers are being updated for longer than 60 seconds before returning a true value.

# ---------------------------------------------------------------------------------

# route SHOULD_DISPATCHER_RELOAD

# returns true / false depens on where the dispatchers are currently being reloaded

# ---------------------------------------------------------------------------------

route[SHOULD_DISPATCHER_RELOAD] {

  if($sht(dispatcher=>list) == 1) {

    return 0;

  } else {

    $var(todate) = $(sht(dispatcher=>list){s.int}) + 60;

    if ($var(todate) < $TV(sn)) {

      return 1;

    } else {

      return 0;

    }

  }

}#end route[SHOULD_DISPATCHER_RELOAD]

Dispatcher list reloading via API

The xhttp:request route just calls the DISPATCHER_LIST route that handles the API calls and updates the dispatcher list.

event_route[xhttp:request] {

  …

  if ($hu =~ "^/rpc") {

    $var(command) = $(hu{s.select,2,/});

    if($var(command) == "reload.dispatchers") {

      route(DISPATCHER_LIST);

      xhttp_reply("200", "OK", "text/html", "Dispatchers Set for RELOAD");

      exit;

    }

...

So the first thing the route DISPATCHER_LIST does is to set the shared table variable to the current timestamp until the dispatcher list is not correctly updated. This allows the route to not be called again and avoid concurrent calls towards the API. When the dispatcher list is correctly updated the shared table variable dispatcher=>list is set to 1.

route[DISPATCHER_LIST] {

  $sht(dispatcher=>list) = $TV(sn);

  …

  jsonrpc_exec('{"jsonrpc": "2.0", "method": "dispatcher.reload", "id": "1"}');

  xlog("L_INFO", "route[DISPATCHER_SET]: Dispatchers reloaded! \n");

  $sht(dispatcher=>list) = 1;

  …

Kamailio testing routes

Our kamailio testing routes are auxiliary routes defined to call specific functions in our kamailio.cfg, functions that return a specific value or a boolean one. We tend to write simple routes for specific functions that are then called inside a routing logic.

The example route from the previous section is this one:

route[TEST_SHOULD_DISPATCHER_RELOAD] {

  $var(TestsPassed) = 0;

  $var(TestsNum) = 3;

  # Dispatcher list has been correctly reloaded

  # and should not be reloaded again

  $sht(dispatcher=>list) = 1;

  if(!route(SHOULD_DISPATCHER_RELOAD)) {

    $var(TestsPassed) = $var(TestsPassed) + 1;

  } 

  # The dispatcher list route has been just triggered

  # and should not be called again for 60 seconds

  $sht(dispatcher=>list) = $TV(sn);

  if(!route(SHOULD_DISPATCHER_RELOAD)) {

    $var(TestsPassed) = $var(TestsPassed) + 1;

  }

  # The dispatcher list route has been called

  # more than 60 seconds ago and should reload

  $sht(dispatcher=>list) = $(sht(dispatcher=>list){s.int}) - 65;

  if(route(SHOULD_DISPATCHER_RELOAD)) {

    $var(TestsPassed) = $var(TestsPassed) + 1;

  }

  # Tests concluded count test number and respond via sl_send_reply

  if($var(TestsPassed) >= $var(TestsNum)) {

    sl_send_reply("200", "DISPATCHER RELOAD TRIGGER WORKING");

  } else {

    sl_send_reply("500", "DISPATCHER RELOAD TRIGGER NOT WORKING");

  }

}

This test route consists on 3 tests to check the correct functionality of the route SHOULD_DISPATCHER_RELOAD works correctly. If all three cases return the expected boolean value the router returns a 200 reply to sipp and passes the tests, otherwise it returns a 500 reply that fails the test.

Conclusions

Using this methodology we’re able to test specific kamailio routes in our infrastructure checking the correct functionality in a production like environment.