Performing the upgrade

Typical upgrade steps

A lot of the upgrade steps can be performed without impact on your users. Basically, the deployment or upgrade of components is split in two actions:

Configuration changes, such as added or changed configuration parameters, including the new component’s version
Deployment of the upgraded component, by (re)starting

The configuration changes can be done in advance most of the time, limiting downtime for your end users.

In the following upgrade steps, platform-config refers to the location where your platform configuration is stored for that particular environment.

Verifying every step of the way

When performing the upgrade, we strongly advise to verify whether things are working every step of the way. It is pointless to continue the upgrade if halfway 1 of the services fail to start. In general, we can give you the following tips that apply to every service when performing (re)starts after an upgrade:

Check whether the new docker image version has been pulled successfully
Check whether the container actually starts and is at least up for > 30 seconds, and not in "Restarting" mode

There are also verification steps that depend on the service which is being upgraded. Those steps can be found in the upgrade docs itself.

Performing the upgrade

Step 1 - Setting up Keycloak:11

In the below steps, we are going to set up Keycloak to use the 11.0.2 version.

Step 1a - Configuring Keycloak:11

In platform-config find the following settings in the mgmt cluster such as platform-config/clusters/{cluster-name}/keycloak.sh) and add/edit with the following configuration.

# Version of keycloak to run
KEYCLOAK_VERSION=11.0.2

# Port to listen on
KEYCLOAK_PORT=[NO_CHANGE]
KEYCLOAK_ADMIN_PORT=8993

#Optional, default=false
#KEYCLOAK_IMPORT_REALM=

Step 1b - Stop Keycloak

Log in to the node where Keycloak is running and stop Keycloak with the following command:
```
./axual.sh stop mgmt mgmt-keycloak
```
Remove the old themes folder from your platform-config/clusters/{cluster-name}/configuration/keycloak
Place the new axual-keycloak-theme under your platform-config/clusters/{cluster-name}/configuration/keycloak so that the themes folder is child of keycloak folder

In case you have own Keycloak:11 themes, please make sure you have migrated your existing theme(s).

Step 1c - Start Keycloak

./axual.sh start mgmt mgmt-keycloak

Step 1d - Access Admin Console on the management port

The new version of Keycloak is now exposing the admin console on a separate port to improve security. You can access it as follows:

Log in to Keycloak Admin Console, using the UI, go to https://KEYCLOAK_HOSTNAME:KEYCLOAK_ADMIN_PORT/auth. You will see the following login screen.
Push Administration Console
Enter the KEYCLOAK_USER and KEYCLOAK_PASSWORD stored on your platform-config, then press login.

Step 2 - Restart components to get metrics exposed

Step 2a - Configuring Cluster-Browse’s Management Port and Open Endpoints

In platform-config find the following settings in the mgmt cluster such as platform-config/clusters/{cluster-name}/cluster-browse.sh) and add/edit with the following configuration.

# Docker image version, only override if you explicitly want to use a different version
#CLUSTER_BROWSE_VERSION=


#########
# PORTS #
#########

# Port at which the web-server is hosted on the host machine
CLUSTER_BROWSE_PORT=[NO_CHANGE]
# Port at which the management-server is hosted on the hot machine
CLUSTER_BROWSE_MGMT_PORT=9086

# At least `/actuator/health` and `/actuator/prometheus` need to be exposed
CLUSTER_BROWSE_SECURITY_OPEN_ENDPOINTS="/actuator/health,/actuator/prometheus,/actuator/info"

Step 2b - Restarting Cluster-Browse

./axual.sh restart cluster cluster-browse

There is one Cluster-Browse running per cluster. Each of them need to be restarted.

Check logs to confirm the successful restart

docker logs -f --tail 400 cluster-browse

Step 2c - Configuring Stream-Browse’s Management Port and Open Endpoints

In platform-config find the following settings in the mgmt cluster such as platform-config/clusters/{cluster-name}/stream-browse.sh) and add/edit with the following configuration.

# Docker image version, only override if you explicitly want to use a different version
#STREAM_BROWSE_VERSION=


#########
# PORTS #
#########

# Port at which the web-server is hosted on the host machine
STREAM_BROWSE_PORT=[NO_CHANGE]
# Port at which the management-server is hosted on the hot machine
STREAM_BROWSE_MGMT_PORT=6980

# At least `/actuator/health` and `/actuator/prometheus` need to be exposed
STREAM_BROWSE_SECURITY_OPEN_ENDPOINTS="/actuator/health,/actuator/prometheus,/actuator/info"

Step 2d - Restarting Stream-Browse

./axual.sh restart mgmt stream-browse

Check logs to confirm the successful restart

docker logs -f --tail 400 stream-browse

Step 2e - Configuring Instance-API’s Management Port and Open Endpoints

In platform-config find the following settings in the mgmt cluster such as platform-config/tenants/{tenant-name}/instances/{instance-name}/instance-api.sh) and add/edit with the following configuration.

# Docker image version, only override if you explicitly want to use a different version
#INSTANCEAPI_VERSION=



#########
# PORTS #
#########
# Instance-level defined ports are comma separated pairs of "cluster-name,port"

# The port on which Instance-API hosts the spring boot actuator.
INSTANCE_API_MANAGEMENT_SERVER_PORT="<cluster-1>:<cluster-1-mgmt-port>,<cluster-2>:<cluster-2-mgmt-port>"

# Comma separated list (no spaces) which contains the endpoints which will never require
# authentication, even when 2-way TLS is enforced.
INSTANCE_API_OPEN_ENDPOINTS="/actuator/health,/actuator/prometheus,/actuator/info"

Step 2f - Restarting Instance-API

./axual.sh restart instance [your-instance-name] instance-api

Change [your-instance-name] with something like demo-local

Check logs to confirm the successful restart

docker logs -f --tail 400 [your-instance-name]-instance-api

Step 2g - Configuring Operation-Manager’s Management Port and Open Endpoints

In platform-config find the following settings in the mgmt cluster such as platform-config/clusters/{cluster-name}/operation-manager.sh) and add/edit with the following configuration.

# Docker image version, only override if you explicitly want to use a different version
#OPERATION_MANAGER_VERSION=


#########
# PORTS #
#########

# Port at which the web-server is hosted on the host machine
OPERATION_MANAGER_PORT=[NO_CHANGE]
# Port at which the management-server is hosted on the hot machine
OPERATION_MANAGER_MGMT_PORT=37779

# At least `/actuator/health` and `/actuator/prometheus` need to be exposed
OPERATION_MANAGER_OPEN_ENDPOINTS="/actuator/info,/actuator/health,/actuator/prometheus"

Step 2h - Restarting Operation-Manager

./axual.sh restart mgmt operation-manager

Check logs to confirm the successful restart

docker logs -f --tail 400 operation-manager

Step 2i - Configuring Management-API’s Management Port and Open Endpoints

In platform-config find the following settings in the mgmt cluster such as platform-config/clusters/{cluster-name}/mgmt-api.sh) and add/edit with the following configuration.

# Docker image version, only override if you explicitly want to use a different version
#MGMT_API_VERSION=


#########
# PORTS #
#########

# Port at which the web-server is hosted on the host machine
MGMT_API_PORT=[NO_CHANGE]
# Port at which the management-server is hosted on the hot machine
MGMT_API_MGMT_PORT=8096

# At least `/actuator/health` and `/actuator/prometheus` need to be exposed
MGMT_API_OPEN_ENDPOINTS="/actuator/info,/actuator/health,actuator/prometheus"

Step 2j - Restarting Management-API

./axual.sh start mgmt mgmt-api

Check logs to confirm the successful restart

docker logs -f --tail 400 mgmt-api

Step 2k - Restarting Management-UI

./axual.sh start mgmt mgmt-ui

Check logs to confirm the successful restart

docker logs -f --tail 100 mgmt-ui

Step 3 - Upgrading to Broker 5.4.0

In this step we are going to set up Broker to use the 5.4.0 version (Apache Kafka 2.6.0).

Step 3a - Configuring Broker

Make sure you didn’t override the default value before. If you did please remove the following line or comment it out as shown below.

platform-config/clusters/{cluster-name}/versions.sh

# Docker image version, only override if you explicitly want to use a different version
#BROKER_VERSION=[OLDER VERSION]

Step 3b - Restarting Brokers

To limit the impact on the users restart brokers in a rolling fashion (one node at a time).

Restart the Active Controller Broker last, to minimize the number of re-elections.
Use the Broker stats per node Dashboard to find which broker is the Active Controller Broker (Controllers = 1)

Restarting Broker:

platform-deploy

./axual.sh restart cluster broker

The following output is expected (for a 3 clusters step)

Axual Platform 2020.3
Loading cluster definition: [Cluster A]
PREFIX for loading cluster definition is: CLUSTER_[Tenant]_[Cluster A]_
Loading cluster definition: [Cluster B]
PREFIX for loading cluster definition is: CLUSTER_[Tenant]_[Cluster B]_
Loading cluster definition: [Cluster C]
PREFIX for loading cluster definition is: CLUSTER_[Tenant]_[Cluster C]_
Analyzing cluster [Cluster B]-inter-broker-listener
Analyzing cluster [Cluster B]-mgmt-api-db
Analyzing cluster [Cluster B]
Analyzing cluster [Cluster A]-inter-broker-listener
Analyzing cluster [Cluster A]
Analyzing cluster [Cluster C]-inter-broker-listener
Analyzing cluster [Cluster C]
Hostname: '[Worker Hostname]'. Using configuration for cluster '[Cluster C]', node ID: '1'
Loading tenant definition: axual
Loading tenant definition: [Tenant]
Loading instance definition: [Tenant]-[Instance A]
Loading instance definition: [Tenant]-[Instance B]
Stopping cluster services for node [Worker Hostname] in cluster [Cluster C]
...
Stopped
Loading cluster definition: [Cluster A]
PREFIX for loading cluster definition is: CLUSTER_[Tenant]_[Cluster A]_
Loading cluster definition: [Cluster B]
PREFIX for loading cluster definition is: CLUSTER_[Tenant]_[Cluster B]_
Loading cluster definition: [Cluster C]
PREFIX for loading cluster definition is: CLUSTER_[Tenant]_[Cluster C]_
Analyzing cluster [Cluster B]-inter-broker-listener
Analyzing cluster [Cluster B]-mgmt-api-db
Analyzing cluster [Cluster B]
Analyzing cluster [Cluster A]-inter-broker-listener
Analyzing cluster [Cluster A]
Analyzing cluster [Cluster C]-inter-broker-listener
Analyzing cluster [Cluster C]
Hostname: '[Worker Hostname]'. Using configuration for cluster '[Cluster C]', node ID: '1'
Loading tenant definition: axual
Loading tenant definition: [Tenant]
Loading instance definition: [Tenant]-[Instance A]
Loading instance definition: [Tenant]-[Instance B]
Configuring cluster services for node [Worker Hostname] in cluster [Cluster C]
...
Preparing broker: Done
Param 1: run
Param 2: broker
Param 3: axual/broker:5.4.0
Param 4: Starting broker
Param 5: Done
Param 6: -d --tty=false --restart=always -v /appl/kafka/config/broker:/config/broker ...
Param 7:
...
Done

Verification After Broker Restart - use Grafana Dashboards, don’t continue before all the following criteria met:
1. Verifying every step of the way
2. On Cluster Overview Dashboard - Verify Under Replicated Partitions reached 0.
3. On Broker stats per node Dashboard - Verify all the metrics pulled and visible.

Under-replicated partitions means that one or more replicas are not available.

Step 4 - Upgrading to Distributor 3.6.3

In the below steps, we are going to set up Distributor to use the 3.6.3 version.

Step 4a - Configuring Distributor

Make sure you didn’t override the default value before. If you did please remove the following line or comment it out as shown below.

platform-config/clusters/{cluster-name}/versions.sh

# Docker image version, only override if you explicitly want to use a different version
#DISTRIBUTOR_VERSION=[OLDER VERSION]

Step 4b - Restarting Distributor

To limit the impact on the users restart Distributor in a rolling fashion (one node at a time).

Restarting Distributor:

platform-deploy

./axual.sh restart cluster distributor

The following output is expected:

Axual Platform 2020.3
Stopping cluster services for node [Worker Hostname] in cluster [Cluster A]
Stopping distributor: Stopped
Configuring cluster services for node [Worker Hostname] in cluster [Cluster A]
Done, cluster-api is available
Preparing distributor security: Done
Preparing distributor topics: Deploying topic _distributor-config: Done
Deploying topic _distributor-offset: Done
Deploying topic _distributor-status: Done
Done
Starting distributor: Done

Verification After Distributor Restart - use Grafana Dashboards, don’t continue before all the following criteria met:
1. Verifying every step of the way
2. On Distributor Overview Dashboard - Verify all connectors on RUNNING status.

Step 5 - Upgrading to Connect 2.2.3

Connect is located in the new client services. These are applications that can connect to any cluster in the instance.

Step 5a - Configuring Connect

Make sure you didn’t override the default value before. If you did please remove the following line or comment it out as shown below.

platform-config/tenants/{tenant-name}/instances/{instance-name}/versions.sh

# Uncomment and set a value to load a different version of Axual Connect
#CONNECT_VERSION=[OLDER VERSION]

Step 5b - Restarting Connect

Run the following command for each instance:

 ./axual.sh restart client <instance-name> axual-connect

or simply use the following command to start all client services for all instances on the machine.

 ./axual.sh restart client

The following output is expected (for restarting all the client services):

Axual Platform 2020.3
Loading cluster definition: [Cluster A]
PREFIX for loading cluster definition is: CLUSTER_[Cluster A]_
Loading cluster definition: [Cluster B]
PREFIX for loading cluster definition is: CLUSTER_[Cluster B]_
Loading cluster definition: [Cluster C]
PREFIX for loading cluster definition is: CLUSTER_[Cluster C]_
Analyzing cluster [Cluster C]-inter-broker-listener
Analyzing cluster [Cluster C]-mgmt-api-db
Analyzing cluster [Cluster C]
Hostname: '[Worker Hostname]'. Using configuration for cluster '[Cluster C]', node ID: '1'
Loading tenant definition: [Tenant Name]
Loading tenant definition: axual
Loading instance definition: [Instance A]
Loading instance definition: [Instance B]
Loading cluster definition: [Cluster A]
PREFIX for loading cluster definition is: [Cluster A]
Loading cluster definition: [Cluster B]
PREFIX for loading cluster definition is: [Cluster B]
Loading cluster definition: [Cluster C]
PREFIX for loading cluster definition is: CLUSTER_[Cluster C]_
Analyzing cluster altair-[Cluster C]-inter-broker-listener
Analyzing cluster altair-[Cluster C]-mgmt-api-db
Analyzing cluster altair-[Cluster C]
Hostname: '[Worker Hostname]'. Using configuration for cluster '[Cluster C]', node ID: '1'
Loading tenant definition: [Tenant Name]
Loading tenant definition: axual
Loading instance definition: [Instance A]
Loading instance definition: [Instance B]

Verification After Connect Restart - use Grafana Dashboards, don’t continue before all the following criteria met:
1. Verifying every step of the way
2. On Axual Connect Dashboard Verify per Instance/Connector.

Step 6 - Upgrading to REST Proxy 1.2.2

In the below steps, we are going to set up REST Proxy to use the 1.2.2 version.

Step 6a - Configuring REST Proxy

Make sure you didn’t override the default value before. If you did please remove the following line or comment it out as shown below.

platform-config/tenants/{tenant-name}/instances/{instance-name}/rest-proxy.sh

# Docker image version, only override if you explicitly want to use a different version
#RESTPROXY_VERSION="[OLDER VERSION]"

Step 6b - Restarting REST Proxy

Run the following command for each instance:

./axual.sh restart instance <instance-name> rest-proxy

The following output is expected:

Axual Platform 2020.3
Stopping instance services for [Instance Name]n in cluster [Cluster Name]
Stopping [INSTANCE NAME]-rest-proxy: Stopped
Done, cluster-api is available
Deploying topic _[Instance Name]-schemas: Done
Deploying topic _[Instance Name]-consumer-timestamps: Done
Done, cluster-api is available
Done, cluster-api is available
Applying ACLs :...
Done
...
Applying ACLs :...
Done
Configuring instance services for [Instance Name] in cluster [Cluster Name]
Preparing [INSTANCE NAME]-rest-proxy: Done
Warning: 'ADVERTISED_DEBUG_PORT_REST_PROXY' is not an instance variable and a default value is not configured as 'DEFAULT_INSTANCE_ADVERTISED_DEBUG_PORT_REST_PROXY'
Starting [Instance Name]-rest-proxy: Done

Verification After Rest-Proxy Restart - use Grafana Dashboards, don’t continue before all the following criteria met:
1. Verifying every step of the way
2. On Rest-Proxy Detailed Overview Verify per Instance Uptime and Start time correct.

Step 7 - Upgrading to Discovery API 2.1.0

In the below steps, we are going to set up Discovery API to use the 2.1.0 version.

Step 7a - Configuring Discovery API

Make sure you didn’t override the default value before. If you did please remove the following line or comment it out as shown below.

platform-config/tenants/{tenant-name}/instances/{instance-name}/versions.sh

# Docker image version, only override if you explicitly want to use a different version
#DISCOVERYAPI_VERSION="[OLDER VERSION]"

Step 7b - Restarting Discovery API

Run the following command on every node which Discovery API running on.

./axual.sh restart instance <instance-name> discovery-api

The following output is expected:

Axual Platform 2020.3
Stopping instance services for [Instance Name] in cluster [Cluster Name]
Stopping [Instance Name]-discovery-api: Stopped
Done, cluster-api is available
Deploying topic _[Instance Name]-schemas: Done
Deploying topic _[Instance Name]-consumer-timestamps: Done
Done, cluster-api is available
Done, cluster-api is available
Applying ACLs : {...}
Done
...
Done, cluster-api is available
Applying ACLs : {...}
Done
Configuring instance services for [Instance Name] in cluster [Cluster Name]
Running copy-config-[Instance Name]-discovery-api: Done
Preparing [Instance Name]-discovery-api: Done
Starting [Instance Name]-discovery-api: Done

Verification After Discovery API Restart - use Grafana Dashboards, don’t continue before all the following criteria met:
1. Verifying every step of the way
2. On Discovery status Verify per node Status is Active.

Step 8 - Updating Prometheus Targets and Alerts

Restarting Prometheus

./axual.sh restart mgmt prometheus

Open the Prometheus UI to check that targets and alerts are all there.

You can search for Management-API as target to confirm new versions have been deployed.

Rollback of Keycloak

In case something went wrong with the upgrade to Keycloak:11, follow this steps to rollback to Keycloak:6

Before you start the rollback procedure, please make sure:

you have stopped your mgmt-keycloak container that was failing. Use docker ps to check the current running containers.

Run ./axual.sh stop mgmt mgmt-keycloak to stop the container

you have transfer the Keycloak’s DB back to the node where mgmt-db runs, it will be use for importing the data and structure of the DB.

If you are using a remote_db be sure you are performing the following procedure by a node that has connectivity to the remote DB.

Rollback story: Keycloak:11 is not working

Cleanup the Keycloak’s DB

Access Keycloak’s DB with a phpmyadmin instance
Truncate all the keycloak’s tables, the privileges for the Keycloak’s DB must be maintained
Import the Keycloak:6 DB backup
Change your platform-config/clusters/{cluster-name}/keycloak.sh with the following version
```
# Version of keycloak to run
KEYCLOAK_VERSION=6.0.1
```
Remove the new themes folder from your platform-config/clusters/{cluster-name}/configuration/keycloak
Place the old axual-keycloak-theme under your platform-config/clusters/{cluster-name}/configuration/keycloak so that the themes folder is child of keycloak folder
Start the Keycloak 6.0.1 container
```
./axual.sh start mgmt mgmt-keycloak
```
Check logs to confirm the successful rollback of Keycloak
```
docker logs -f --tail 400 mgmt-keycloak
```

Rollback story: Keycloak:11 is not working and your DB is corrupted

Your keycloak-db is gone corrupted, so you have to recreate from scratch.

Revert your keycloak_version on platform-config/clusters/{cluster-name}/keycloak.sh
```
# Version of keycloak to run
KEYCLOAK_VERSION=6.0.1
```
Remove the new themes folder from your platform-config/clusters/{cluster-name}/configuration/keycloak
Place the old axual-keycloak-theme under your platform-config/clusters/{cluster-name}/configuration/keycloak so that the themes folder is child of keycloak folder
Edit your platform-config/{cluster-name}/nodes.sh by adding the service keycloak-populate-db on your nodes’s mgmt_services

Like this:
```
NODE1_MGMT_SERVICES=localhost:mgmt-db,keycloak-populate-db
```
This will re-create a clean keycloakdb to be used to import the exported Keycloak:6 DB
Execute the axual.sh command to recreate the DB
```
./axual.sh start mgmt keycloak-populate-db
```

Import the Keycloak:6 db with all data and structure via mysql

docker exec -i mgmt-db mysql -uKEYCLOAK_DB_USER -pKEYCLOAK_DB_PASSWORD KEYCLOAK_DB_DATABASE < [path/to/sql/backup]

If no errors you can now rollback

./axual.sh start mgmt mgmt-keycloak

Check logs to confirm the successful rollback of Keycloak
```
docker logs -f --tail 400 mgmt-keycloak
```