Performing the upgrade

Typical upgrade steps

A lot of the upgrade steps can be performed without impact on your users. Basically, the deployment or upgrade of components is split in two actions:

  1. Configuration changes, such as added or changed configuration parameters, including the new component’s version

  2. Deployment of the upgraded component, by (re)starting

The configuration changes can be done in advance most of the time, limiting downtime for your end users.

In the following upgrade steps, platform-config refers to the location where your platform configuration is stored for that particular environment.

Verifying every step of the way

When performing the upgrade, we strongly advise to verify whether things are working every step of the way. It is pointless to continue the upgrade if halfway 1 of the services fail to start. In general, we can give you the following tips that apply to every service when performing (re)starts after an upgrade:

  • Check whether the new docker image version has been pulled successfully

  • Check whether the container actually starts and is at least up for > 30 seconds, and not in "Restarting" mode

There are also verification steps that depend on the service which is being upgraded. Those steps can be found in the upgrade docs itself.

Performing the upgrade

Step 1 - Upgrade broker to 6.0.0

Make sure the default version in the configuration is not overwritten. If it is, please remove or comment out the setting as shown below.

platform-config/clusters/{cluster-name}/broker.sh

# Docker image version, only override if you explicitly want to use a different version
#BROKER_VERSION='[OLDER VERSION]'
This upgrade must be done broker by broker in a rolling fashion

Restart the broker; enter the following command:

axual.sh -v restart cluster broker
Wait for the broker to restart, and the under replicated partition count on the Broker stats per node dashboard to go back to zero.

The following output is expected (for a 3 cluster broker):

---
Axual Platform 2021.1
Loading cluster definition: [Cluster A]
PREFIX for loading cluster definition is: CLUSTER_[Tenant]_[Cluster A]_
Loading cluster definition: [Cluster B]
PREFIX for loading cluster definition is: CLUSTER_[Tenant]_[Cluster B]_
Loading cluster definition: [Cluster C]
PREFIX for loading cluster definition is: CLUSTER_[Tenant]_[Cluster C]_
Analyzing cluster [Cluster B]-inter-broker-listener
Analyzing cluster [Cluster B]-mgmt-api-db
Analyzing cluster [Cluster B]
Analyzing cluster [Cluster A]-inter-broker-listener
Analyzing cluster [Cluster A]
Analyzing cluster [Cluster C]-inter-broker-listener
Analyzing cluster [Cluster C]
Hostname: '[Worker Hostname]'. Using configuration for cluster '[Cluster C]', node ID: '1'
Loading tenant definition: axual
Loading tenant definition: [Tenant]
Loading instance definition: [Tenant]-[Instance A]
Loading instance definition: [Tenant]-[Instance B]
Stopping cluster services for node [Worker Hostname] in cluster [Cluster C]
...
Stopped
Loading cluster definition: [Cluster A]
PREFIX for loading cluster definition is: CLUSTER_[Tenant]_[Cluster A]_
Loading cluster definition: [Cluster B]
PREFIX for loading cluster definition is: CLUSTER_[Tenant]_[Cluster B]_
Loading cluster definition: [Cluster C]
PREFIX for loading cluster definition is: CLUSTER_[Tenant]_[Cluster C]_
Analyzing cluster [Cluster B]-inter-broker-listener
Analyzing cluster [Cluster B]-mgmt-api-db
Analyzing cluster [Cluster B]
Analyzing cluster [Cluster A]-inter-broker-listener
Analyzing cluster [Cluster A]
Analyzing cluster [Cluster C]-inter-broker-listener
Analyzing cluster [Cluster C]
Hostname: '[Worker Hostname]'. Using configuration for cluster '[Cluster C]', node ID: '1'
Loading tenant definition: axual
Loading tenant definition: [Tenant]
Loading instance definition: [Tenant]-[Instance A]
Loading instance definition: [Tenant]-[Instance B]
Configuring cluster services for node [Worker Hostname] in cluster [Cluster C]
...
Preparing broker: Done
Param 1: run
Param 2: broker
Param 3: axual/broker:6.0.0
Param 4: Starting broker
Param 5: Done
Param 6: -d --tty=false --restart=always -v /appl/kafka/config/broker:/config/broker ...
Param 7:
...
Done
---

Step 2 - Upgrade Instance API to 3.0.1

Make sure the default version in the configuration is not overwritten. If it is, please remove or comment the setting as shown below.

platform-config/tenants/{tenant-name}/instances/{instance-name}/instance-api.sh

# Docker image version, only override if you explicitly want to use a different version
#INSTANCEAPI_VERSION='[OLDER_VERSION]'

Restart the service

axual.sh restart instance <instance-name> instance-api

The following output is expected:

Axual Platform 2021.1
Stopping instance services for [INSTANCE NAME] in cluster [CLUSTER NAME]
Stopping [INSTANCE NAME]-instance-api: Stopped
Done, cluster-api is available
Deploying topic _[INSTANCE NAME]-schemas: Done
Deploying topic _[INSTANCE NAME]-consumer-timestamps: Done
Done, cluster-api is available
Done, cluster-api is available
Applying ACLs : {...}
Done
Done, cluster-api is available
Applying ACLs : {...}
Done
Done, cluster-api is available
Applying ACLs : {...}
Done
Configuring instance services for [INSTANCE NAME]-[CLUSTER NAME] in cluster [CLUSTER NAME]
Preparing [INSTANCE NAME]-instance-api: Done
Cluster servers are https://[CLUSTER ENDPOINT]:9080
Starting [INSTANCE NAME]-instance-api: Done

Step 3 - Upgrade Cluster Browse to 1.1.1

Make sure the default version in the configuration is not overwritten. If it is, please remove or comment the setting as shown below.

platform-config/clusters/{cluster-name}/cluster-browse.sh

# Docker image version, only override if you explicitly want to use a different version
#CLUSTER_BROWSE_VERSION='[OLDER_VERSION]'

Restart the service

axual.sh restart cluster cluster-browse

The following output is expected:

Axual Platform 2021.1
Stopping cluster services for node [NODE NAME] in cluster [CLUSTER NAME]
Stopping cluster-browse: Stopped
Configuring cluster services for node [NODE NAME] in cluster [CLUSTER NAME]
Preparing cluster-browse: Done
Preparing acls:
Done, cluster-api is available
Applying ACLs : {...}
Done
Starting cluster-browse: Done

Step 4 - Upgrading to Schema-Registry 5.0.4

In the below steps, we are going to set up Schema-Registry to use the 5.0.4 version.

Step 4a - Configuring Schema-Registry

Make sure the default version in the configuration is not overwritten. If you did please remove the following line or comment it out as shown below.

platform-config/tenants/{tenant-name}/instances/{instance-name}/schemaregistry.sh

# Docker image version, only override if you explicitly want to use a different version
#SCHEMAREGISTRY_VERSION="[OLDER VERSION]"

Step 4b - Restarting Schema-Registry slave

  1. Run the following command for each instance where schema-registry slave is running:

    axual.sh restart instance <instance-name> sr-slave

    The following output is expected:

    Axual Platform 2021.1
    Stopping instance services for [Instance Name] in cluster [Cluster Name]
    Stopping [Instance Name]-sr-slave: Stopped
    Done, cluster-api is available
    Deploying topic _[Instance Name]-schemas: Done
    Deploying topic _[Instance Name]-consumer-timestamps: Done
    Done, cluster-api is available
    Done, cluster-api is available
    Applying ACLs : {...}
    Done
    ...
    Applying ACLs : {...}
    Done
    Configuring instance services for [Instance Name] in cluster [Cluster Name]
    Preparing [Instance Name]-sr-slave: Done
    Starting [Instance Name]-sr-slave: Done
  2. Verification after Schema-Registry slave restart - Don’t continue before all the following criteria met:

    1. Verifying every step of the way

    2. Check the docker logs and make sure there is no error and service is up.

      docker logs -f <instance-name>-sr-slave

Step 4c - Restarting Schema-Registry master

  1. Run the following command for each instance where schema-registry master is running:

    axual.sh restart instance <instance-name> sr-master

    The following output is expected:

    Axual Platform 2021.1
    Stopping instance services for [Instance Name] in cluster [Cluster Name]
    Stopping [Instance Name]-sr-master: Stopped
    Done, cluster-api is available
    Deploying topic _[Instance Name]-schemas: Done
    Deploying topic _[Instance Name]-consumer-timestamps: Done
    Done, cluster-api is available
    Done, cluster-api is available
    Applying ACLs :...
    Done
    ...
    Applying ACLs :...
    Done
    Configuring instance services for [Instance Name] in cluster [Cluster Name]
    Preparing [Instance Name]-sr-master: Done
    Starting [Instance Name]-sr-master: Done
  2. Verification after Schema-Registry master restart - Don’t continue before all the following criteria met:

    1. Verifying every step of the way

    2. Check the docker logs and make sure there is no error and service is up.

      docker logs -f <instance-name>-sr-master

Step 5 - Upgrade Discovery API to 2.3.2

While Discovery API is not available, client reconfiguration will not happen.

Make sure the default version in the configuration is not overwritten. If it is, please remove or comment the setting as shown below.

platform-config/tenants/{tenant-name}/instances/{instance-name}/disccovery-api.sh

# Docker image version, only override if you explicitly want to use a different version
#DISCOVERYAPI_VERSION='[OLDER_VERSION]'

Restart the service.

axual.sh restart instance <instance-name> discovery-api

The following output is expected:

Axual Platform 2021.1
Stopping instance services for [Instance Name] in cluster [Cluster Name]
Stopping [Instance Name]-discovery-api: Stopped
Done, cluster-api is available
Deploying topic _[Instance Name]-schemas: Done
Deploying topic _[Instance Name]-consumer-timestamps: Done
Done, cluster-api is available
Done, cluster-api is available
Applying ACLs : {...}
Done
...
Done, cluster-api is available
Applying ACLs : {...}
Done
Configuring instance services for [Instance Name] in cluster [Cluster Name]
Running copy-config-[Instance Name]-discovery-api: Done
Preparing [Instance Name]-discovery-api: Done
Starting [Instance Name]-discovery-api: Done

Step 6 - Upgrading to Distributor 4.0.1

Due to incompatibilities in the internal Connect version, this upgrade cannot be performed in a rolling fashion. The following steps need to be performed on each cluster where you do the upgrade.

Make sure the default distributor version in the configuration is not overwritten. If you did, please remove the following line or comment it out as shown below.

platform-config/clusters/{cluster-name}/distributor.sh

# Docker image version, only override if you explicitly want to use a different version
#DISTRIBUTOR_VERSION='[OLDER_VERSION]'

Step 6a - Take the cluster out of distribution

  1. Move applications to other clusters

    For each of your instances, issue the command:

    axual.sh instance <instance-name> set status app off

    The following output is expected:

    Instance <tenant>-<instance> on cluster <tenant>-<cluster> has received state change event DISABLE_APPLICATIONS

    Wait until the message rates on your clusters have stabilized and are roughly equal.

  2. Disable offset distribution

    For each of your instances, issue the command:

    axual.sh instance <instance-name> set status offset off

    The following output is expected:

    Instance <tenant>-<instance> on cluster <tenant>-<cluster> has received state change event DISABLE_OFFSETS

    Wait until the incoming load on topic _<tenant>-<instance>-consumer-timestamps has settled. See the Distributor Overview dashboard.

  3. Disable message distribution

    For each of your instances, issue the command:

    axual.sh instance <instance-name> set status data off

    The following output is expected:

    Instance <tenant>-<instance> on cluster <tenant>-<cluster> has received state change event DISABLE_DATA

Step 6b - Restart connectors

For each of your instances, issue the command:

axual.sh restart instance <instance-name> distribution

The following output is expected:

Stopping instance services for <instance-name> in cluster <cluster-name>
...
The above command needs to run on from the first distribution node.

Step 6c - Stop and start distributors

On each node in your distributor cluster, issue the command:

axual.sh stop cluster distributor

The following output is expected:

Axual Platform 2021.1
Stopping cluster services for node [NODE-NAME] in cluster [CLUSTER-NAME]
Stopping distributor: Stopped

After the distributors have stopped, start them again. On each node in your distributor cluster, issue the command:

axual.sh start cluster distributor

The following output is expected:

Axual Platform 2021.1
Configuring cluster services for node [NODE-NAME] in cluster [CLUSTER-NAME]
Done, cluster-api is available
Preparing distributor security: Done
Preparing distributor topics: Deploying topic _distributor-config: Done
Deploying topic _distributor-offset: Done
Deploying topic _distributor-status: Done
Done
Starting distributor: Done

Step 6d - Check the connector status

  • Check Grafana and verify all connectors have RUNNING status (on the Distributor Overview dashboard)

  • Check if a task has been assigned to the distributor connector: visit

    The response should be a JSON payload looking like the following:

    {
       "name": "[Connector NAME]",
       "connector": {
          "state": "RUNNING",
          "worker_id": "[NODE]:8083"
       },
       "tasks": [
       {
          "id": 0,
          "state": "RUNNING",
          "worker_id": "[NODE]:8083"
       },
       {
          "id": 1,
          "state": "RUNNING",
          "worker_id": "[NODE]:8083"
       }
       ],
       "type": "sink"
    }
If the tasks are not distributed over all the workers, restart the missing worker.

Step 6e - allow connections to the upgraded cluster

The following steps will trigger the Discovery API to allow the upgraded cluster to be the active cluster for your instances.

  1. Enable message distribution

    Issue the following command:

    axual.sh cluster set status data on

    The following output is expected:

    Instance <tenant>-<instance> on cluster <tenant>-<cluster> has received state change event ENABLE_DATA
    ...

    Wait for the message rates to stabilize; they should be roughly equal across clusters.

  2. Enable offset distribution

    Issue the following command:

    axual.sh cluster set status offset on

    The following output is expected:

    Instance <tenant>-<instance> on cluster <tenant>-<cluster> has received state change event ENABLE_OFFSETS
    ...

    Wait until the incoming load on topic _<tenant>-<instance>-consumer-timestamps has settled; see the Distributor Overview dashboard.

  3. Switch on applications

    Enter the following command:

    axual.sh cluster set status app on

    The following output is expected:

    Instance <tenant>-<instance> on cluster <tenant>-<cluster> has received state change event ENABLE_APPLICATIONS
    ...

Step 7 - Upgrade Cluster API to 1.7.2

While the cluster API is not available, topic apply is not possible.
Cluster API 1.7.2 is incompatible with Instance API 2.1.0, be sure all your instance(s) are running at least Instance API 3.0.1 version.

Make sure the default version in the configuration is not overwritten. If it is, please remove or comment the setting as shown below.

platform-config/clusters/{cluster-name}/cluster-api.sh

# Docker image version, only override if you explicitly want to use a different version
#CLUSTERAPI_VERSION='[OLDER VERSION]'

Restart the service:

axual.sh restart cluster cluster-api

Step 8 - Upgrade Management API to 6.0.0

Management API 6.0.0 is incompatible with Instance API 2.1.0, be sure all your instance(s) are running at least Instance API 3.0.1 version.

Make sure the default version in the configuration is not overwritten. If it is, please remove or comment the setting as shown below.

platform-config/clusters/{cluster-name}/mgmt-api.sh

# Docker image version, only override if you explicitly want to use a different version
#MGMT_API_VERSION='[OLDER_VERSION]'

Restart the service

axual.sh restart mgmt mgmt-api

The following output is expected:

Axual Platform 2021.1
Stopping mgmt services for node [NODE NAME] in cluster [CLUSTER NAME]
Stopping mgmt-api: Stopped
Configuring mgmt services for node [NODE NAME] in cluster [CLUSTER NAME]
Testing DB connection
Connection successful
Preparing mgmt-api: Done
Starting mgmt-api: Done
After restart, log in to Management UI and verify whether the upgrade was successful. Hover the "Axual X" in the top right, you can determine the version of Management API and Management UI. This should be 6.0.0 and 5.7.2 respectively.

Step 9 - Upgrade Management UI to 5.10.0

Make sure the default version in the configuration is not overwritten. If it is, please remove or comment the setting as shown below.

platform-config/clusters/{cluster-name}/mgmt-ui.sh

# Docker image version, only override if you explicitly want to use a different version
#MGMT_UI_VERSION='[OLDER_VERSION]'

Restart the service

axual.sh restart mgmt mgmt-ui

The following output is expected:

Axual Platform 2021.1
Stopping mgmt services for node [NODE NAME] in cluster [CLUSTER NAME]
Stopping mgmt-ui: Stopped
Configuring mgmt services for node [NODE NAME] in cluster [CLUSTER NAME]
Starting mgmt-ui: Done
After restart, log in to Management UI and verify whether the upgrade was successful. Hover the "Axual X" in the top right, you can determine the version of Management API and Management UI. This should be 6.0.0 and 5.10.0 respectively.

Step 10 - Upgrade Connect to 2.2.4

Make sure the default version in the configuration is not overwritten. If it is, please remove or comment the setting as shown below.

platform-config/tenants/{tenant-name}/instances/{instance-name}/axual-connect.sh

# Docker image version, only override if you explicitly want to use a different version
#CONNECT_VERSION=[SOME_VERSION]

Restart the service

axual.sh restart client <instance-name> axual-connect

The following output is expected (for restarting all the client services):

Axual Platform 2021.1
Loading cluster definition: [Cluster A]
PREFIX for loading cluster definition is: CLUSTER_[Cluster A]_
Loading cluster definition: [Cluster B]
PREFIX for loading cluster definition is: CLUSTER_[Cluster B]_
Loading cluster definition: [Cluster C]
PREFIX for loading cluster definition is: CLUSTER_[Cluster C]_
Analyzing cluster [Cluster C]-inter-broker-listener
Analyzing cluster [Cluster C]-mgmt-api-db
Analyzing cluster [Cluster C]
Hostname: '[Worker Hostname]'. Using configuration for cluster '[Cluster C]', node ID: '1'
Loading tenant definition: [Tenant Name]
Loading tenant definition: axual
Loading instance definition: [Instance A]
Loading instance definition: [Instance B]
Loading cluster definition: [Cluster A]
PREFIX for loading cluster definition is: [Cluster A]
Loading cluster definition: [Cluster B]
PREFIX for loading cluster definition is: [Cluster B]
Loading cluster definition: [Cluster C]
PREFIX for loading cluster definition is: CLUSTER_[Cluster C]_
Analyzing cluster altair-[Cluster C]-inter-broker-listener
Analyzing cluster altair-[Cluster C]-mgmt-api-db
Analyzing cluster altair-[Cluster C]
Hostname: '[Worker Hostname]'. Using configuration for cluster '[Cluster C]', node ID: '1'
Loading tenant definition: [Tenant Name]
Loading tenant definition: axual
Loading instance definition: [Instance A]
Loading instance definition: [Instance B]

Step 11 - Updating Prometheus Targets and Alerts

Restarting Prometheus

Log in to the VM where prometheus runs and use the following command to recreate targets.json and alerts.json:

axual.sh restart mgmt prometheus

Open the Prometheus UI to check that targets and alerts are all there.

You can search for Management-API as target to confirm new versions have been deployed.