Performing the upgrade

Typical upgrade steps

A lot of the upgrade steps can be performed without impact on your users. Basically, the deployment or upgrade of components is split in two actions:

  1. Configuration changes, such as added or changed configuration parameters, including the new component’s version

  2. Deployment of the upgraded component, by (re)starting

The configuration changes can be done in advance most of the time, limiting downtime for your end users.

In the following upgrade steps, platform-config refers to the location where your platform configuration is stored for that particular environment.

Verifying every step of the way

When performing the upgrade, we strongly advise to verify whether things are working every step of the way. It is pointless to continue the upgrade if halfway 1 of the services fail to start. In general, we can give you the following tips that apply to every service when performing (re)starts after an upgrade:

  • Check whether the new docker image version has been pulled successfully

  • Check whether the container actually starts and is at least up for > 30 seconds, and not in "Restarting" mode

There are also verification steps that depend on the service which is being upgraded. Those steps can be found in the upgrade docs itself.

Performing the upgrade

Step 1 - Setting up Keycloak:11

In the below steps, we are going to set up Keycloak to use the 11.0.2 version.

Step 1a - Configuring Keycloak:11

In platform-config find the following settings in the mgmt cluster such as platform-config/clusters/{cluster-name}/keycloak.sh) and add/edit with the following configuration.

# Version of keycloak to run
KEYCLOAK_VERSION=11.0.2

# Port to listen on
KEYCLOAK_PORT=[NO_CHANGE]
KEYCLOAK_ADMIN_PORT=8993

#Optional, default=false
#KEYCLOAK_IMPORT_REALM=

Step 1b - Stop Keycloak

  1. Log in to the node where Keycloak is running and stop Keycloak with the following command:

    ./axual.sh stop mgmt mgmt-keycloak
  2. Remove the old themes folder from your platform-config/clusters/{cluster-name}/configuration/keycloak

  3. Place the new axual-keycloak-theme under your platform-config/clusters/{cluster-name}/configuration/keycloak so that the themes folder is child of keycloak folder

In case you have own Keycloak:11 themes, please make sure you have migrated your existing theme(s).

Step 1c - Start Keycloak

Log in to the node where Keycloak is running and restart Keycloak with the following command:

./axual.sh start mgmt mgmt-keycloak

Step 1d - Access Admin Console on the management port

The new version of Keycloak is now exposing the admin console on a separate port to improve security. You can access it as follows:

  1. Log in to Keycloak Admin Console, using the UI, go to https://KEYCLOAK_HOSTNAME:KEYCLOAK_ADMIN_PORT/auth. You will see the following login screen.

    Keycloak Admin welcome screen
  2. Push Administration Console

  3. Enter the KEYCLOAK_USER and KEYCLOAK_PASSWORD stored on your platform-config, then press login.

    Keycloak Admin login screen

Step 2 - Restart components to get metrics exposed

Step 2a - Configuring Cluster-Browse’s Management Port and Open Endpoints

In platform-config find the following settings in the mgmt cluster such as platform-config/clusters/{cluster-name}/cluster-browse.sh) and add/edit with the following configuration.

# Docker image version, only override if you explicitly want to use a different version
#CLUSTER_BROWSE_VERSION=


#########
# PORTS #
#########

# Port at which the web-server is hosted on the host machine
CLUSTER_BROWSE_PORT=[NO_CHANGE]
# Port at which the management-server is hosted on the hot machine
CLUSTER_BROWSE_MGMT_PORT=9086

# At least `/actuator/health` and `/actuator/prometheus` need to be exposed
CLUSTER_BROWSE_SECURITY_OPEN_ENDPOINTS="/actuator/health,/actuator/prometheus,/actuator/info"

Step 2b - Restarting Cluster-Browse

Log in to the node where Cluster-Browse is running and restart it with the following command:

./axual.sh restart cluster cluster-browse
There is one Cluster-Browse running per cluster. Each of them need to be restarted.

Check logs to confirm the successful restart

docker logs -f --tail 400 cluster-browse

Step 2c - Configuring Stream-Browse’s Management Port and Open Endpoints

In platform-config find the following settings in the mgmt cluster such as platform-config/clusters/{cluster-name}/stream-browse.sh) and add/edit with the following configuration.

# Docker image version, only override if you explicitly want to use a different version
#STREAM_BROWSE_VERSION=


#########
# PORTS #
#########

# Port at which the web-server is hosted on the host machine
STREAM_BROWSE_PORT=[NO_CHANGE]
# Port at which the management-server is hosted on the hot machine
STREAM_BROWSE_MGMT_PORT=6980

# At least `/actuator/health` and `/actuator/prometheus` need to be exposed
STREAM_BROWSE_SECURITY_OPEN_ENDPOINTS="/actuator/health,/actuator/prometheus,/actuator/info"

Step 2d - Restarting Stream-Browse

Log in to the node where Stream-Browse is running and restart it with the following command:

./axual.sh restart mgmt stream-browse

Check logs to confirm the successful restart

docker logs -f --tail 400 stream-browse

Step 2e - Configuring Instance-API’s Management Port and Open Endpoints

In platform-config find the following settings in the mgmt cluster such as platform-config/tenants/{tenant-name}/instances/{instance-name}/instance-api.sh) and add/edit with the following configuration.

# Docker image version, only override if you explicitly want to use a different version
#INSTANCEAPI_VERSION=



#########
# PORTS #
#########
# Instance-level defined ports are comma separated pairs of "cluster-name,port"

# The port on which Instance-API hosts the spring boot actuator.
INSTANCE_API_MANAGEMENT_SERVER_PORT="<cluster-1>:<cluster-1-mgmt-port>,<cluster-2>:<cluster-2-mgmt-port>"

# Comma separated list (no spaces) which contains the endpoints which will never require
# authentication, even when 2-way TLS is enforced.
INSTANCE_API_OPEN_ENDPOINTS="/actuator/health,/actuator/prometheus,/actuator/info"

Step 2f - Restarting Instance-API

Log in to the node where Instance-API is running and restart it with the following command:

./axual.sh restart instance [your-instance-name] instance-api
Change [your-instance-name] with something like demo-local

Check logs to confirm the successful restart

docker logs -f --tail 400 [your-instance-name]-instance-api

Step 2g - Configuring Operation-Manager’s Management Port and Open Endpoints

In platform-config find the following settings in the mgmt cluster such as platform-config/clusters/{cluster-name}/operation-manager.sh) and add/edit with the following configuration.

# Docker image version, only override if you explicitly want to use a different version
#OPERATION_MANAGER_VERSION=


#########
# PORTS #
#########

# Port at which the web-server is hosted on the host machine
OPERATION_MANAGER_PORT=[NO_CHANGE]
# Port at which the management-server is hosted on the hot machine
OPERATION_MANAGER_MGMT_PORT=37779

# At least `/actuator/health` and `/actuator/prometheus` need to be exposed
OPERATION_MANAGER_OPEN_ENDPOINTS="/actuator/info,/actuator/health,/actuator/prometheus"

Step 2h - Restarting Operation-Manager

Log in to the node where Operation-Manager is running and restart it with the following command:

./axual.sh restart mgmt operation-manager

Check logs to confirm the successful restart

docker logs -f --tail 400 operation-manager

Step 2i - Configuring Management-API’s Management Port and Open Endpoints

In platform-config find the following settings in the mgmt cluster such as platform-config/clusters/{cluster-name}/mgmt-api.sh) and add/edit with the following configuration.

# Docker image version, only override if you explicitly want to use a different version
#MGMT_API_VERSION=


#########
# PORTS #
#########

# Port at which the web-server is hosted on the host machine
MGMT_API_PORT=[NO_CHANGE]
# Port at which the management-server is hosted on the hot machine
MGMT_API_MGMT_PORT=8096

# At least `/actuator/health` and `/actuator/prometheus` need to be exposed
MGMT_API_OPEN_ENDPOINTS="/actuator/info,/actuator/health,actuator/prometheus"

Step 2j - Restarting Management-API

Log in to the node where Management-API is running and restart it with the following command:

./axual.sh start mgmt mgmt-api

Check logs to confirm the successful restart

docker logs -f --tail 400 mgmt-api

Step 2k - Restarting Management-UI

Log in to the node where Management-UI is running and restart it with the following command:

./axual.sh start mgmt mgmt-ui

Check logs to confirm the successful restart

docker logs -f --tail 100 mgmt-ui

Step 3 - Upgrading to Broker 5.4.0

In this step we are going to set up Broker to use the 5.4.0 version (Apache Kafka 2.6.0).

Step 3a - Configuring Broker

Make sure you didn’t override the default value before. If you did please remove the following line or comment it out as shown below.

platform-config/clusters/{cluster-name}/versions.sh
# Docker image version, only override if you explicitly want to use a different version
#BROKER_VERSION=[OLDER VERSION]

Step 3b - Restarting Brokers

To limit the impact on the users restart brokers in a rolling fashion (one node at a time).

Restart the Active Controller Broker last, to minimize the number of re-elections.
Use the Broker stats per node Dashboard to find which broker is the Active Controller Broker (Controllers = 1)

  1. Restarting Broker:

    platform-deploy
    ./axual.sh restart cluster broker

    The following output is expected (for a 3 clusters step)

    Axual Platform 2020.3
    Loading cluster definition: [Cluster A]
    PREFIX for loading cluster definition is: CLUSTER_[Tenant]_[Cluster A]_
    Loading cluster definition: [Cluster B]
    PREFIX for loading cluster definition is: CLUSTER_[Tenant]_[Cluster B]_
    Loading cluster definition: [Cluster C]
    PREFIX for loading cluster definition is: CLUSTER_[Tenant]_[Cluster C]_
    Analyzing cluster [Cluster B]-inter-broker-listener
    Analyzing cluster [Cluster B]-mgmt-api-db
    Analyzing cluster [Cluster B]
    Analyzing cluster [Cluster A]-inter-broker-listener
    Analyzing cluster [Cluster A]
    Analyzing cluster [Cluster C]-inter-broker-listener
    Analyzing cluster [Cluster C]
    Hostname: '[Worker Hostname]'. Using configuration for cluster '[Cluster C]', node ID: '1'
    Loading tenant definition: axual
    Loading tenant definition: [Tenant]
    Loading instance definition: [Tenant]-[Instance A]
    Loading instance definition: [Tenant]-[Instance B]
    Stopping cluster services for node [Worker Hostname] in cluster [Cluster C]
    ...
    Stopped
    Loading cluster definition: [Cluster A]
    PREFIX for loading cluster definition is: CLUSTER_[Tenant]_[Cluster A]_
    Loading cluster definition: [Cluster B]
    PREFIX for loading cluster definition is: CLUSTER_[Tenant]_[Cluster B]_
    Loading cluster definition: [Cluster C]
    PREFIX for loading cluster definition is: CLUSTER_[Tenant]_[Cluster C]_
    Analyzing cluster [Cluster B]-inter-broker-listener
    Analyzing cluster [Cluster B]-mgmt-api-db
    Analyzing cluster [Cluster B]
    Analyzing cluster [Cluster A]-inter-broker-listener
    Analyzing cluster [Cluster A]
    Analyzing cluster [Cluster C]-inter-broker-listener
    Analyzing cluster [Cluster C]
    Hostname: '[Worker Hostname]'. Using configuration for cluster '[Cluster C]', node ID: '1'
    Loading tenant definition: axual
    Loading tenant definition: [Tenant]
    Loading instance definition: [Tenant]-[Instance A]
    Loading instance definition: [Tenant]-[Instance B]
    Configuring cluster services for node [Worker Hostname] in cluster [Cluster C]
    ...
    Preparing broker: Done
    Param 1: run
    Param 2: broker
    Param 3: axual/broker:5.4.0
    Param 4: Starting broker
    Param 5: Done
    Param 6: -d --tty=false --restart=always -v /appl/kafka/config/broker:/config/broker ...
    Param 7:
    ...
    Done
  2. Verification After Broker Restart - use Grafana Dashboards, don’t continue before all the following criteria met:

    1. Verifying every step of the way

    2. On Cluster Overview Dashboard - Verify Under Replicated Partitions reached 0.

    3. On Broker stats per node Dashboard - Verify all the metrics pulled and visible.

Under-replicated partitions means that one or more replicas are not available.

Step 4 - Upgrading to Distributor 3.6.3

In the below steps, we are going to set up Distributor to use the 3.6.3 version.

Step 4a - Configuring Distributor

Make sure you didn’t override the default value before. If you did please remove the following line or comment it out as shown below.

platform-config/clusters/{cluster-name}/versions.sh
# Docker image version, only override if you explicitly want to use a different version
#DISTRIBUTOR_VERSION=[OLDER VERSION]

Step 4b - Restarting Distributor

To limit the impact on the users restart Distributor in a rolling fashion (one node at a time).

  1. Restarting Distributor:

    platform-deploy
    ./axual.sh restart cluster distributor

    The following output is expected:

    Axual Platform 2020.3
    Stopping cluster services for node [Worker Hostname] in cluster [Cluster A]
    Stopping distributor: Stopped
    Configuring cluster services for node [Worker Hostname] in cluster [Cluster A]
    Done, cluster-api is available
    Preparing distributor security: Done
    Preparing distributor topics: Deploying topic _distributor-config: Done
    Deploying topic _distributor-offset: Done
    Deploying topic _distributor-status: Done
    Done
    Starting distributor: Done
  2. Verification After Distributor Restart - use Grafana Dashboards, don’t continue before all the following criteria met:

    1. Verifying every step of the way

    2. On Distributor Overview Dashboard - Verify all connectors on RUNNING status.

Step 5 - Upgrading to Connect 2.2.3

Connect is located in the new client services. These are applications that can connect to any cluster in the instance.

Step 5a - Configuring Connect

Make sure you didn’t override the default value before. If you did please remove the following line or comment it out as shown below.

platform-config/tenants/{tenant-name}/instances/{instance-name}/versions.sh
# Uncomment and set a value to load a different version of Axual Connect
#CONNECT_VERSION=[OLDER VERSION]

Step 5b - Restarting Connect

  1. Run the following command for each instance:

     ./axual.sh restart client <instance-name> axual-connect

    or simply use the following command to start all client services for all instances on the machine.

     ./axual.sh restart client

    The following output is expected (for restarting all the client services):

    Axual Platform 2020.3
    Loading cluster definition: [Cluster A]
    PREFIX for loading cluster definition is: CLUSTER_[Cluster A]_
    Loading cluster definition: [Cluster B]
    PREFIX for loading cluster definition is: CLUSTER_[Cluster B]_
    Loading cluster definition: [Cluster C]
    PREFIX for loading cluster definition is: CLUSTER_[Cluster C]_
    Analyzing cluster [Cluster C]-inter-broker-listener
    Analyzing cluster [Cluster C]-mgmt-api-db
    Analyzing cluster [Cluster C]
    Hostname: '[Worker Hostname]'. Using configuration for cluster '[Cluster C]', node ID: '1'
    Loading tenant definition: [Tenant Name]
    Loading tenant definition: axual
    Loading instance definition: [Instance A]
    Loading instance definition: [Instance B]
    Loading cluster definition: [Cluster A]
    PREFIX for loading cluster definition is: [Cluster A]
    Loading cluster definition: [Cluster B]
    PREFIX for loading cluster definition is: [Cluster B]
    Loading cluster definition: [Cluster C]
    PREFIX for loading cluster definition is: CLUSTER_[Cluster C]_
    Analyzing cluster altair-[Cluster C]-inter-broker-listener
    Analyzing cluster altair-[Cluster C]-mgmt-api-db
    Analyzing cluster altair-[Cluster C]
    Hostname: '[Worker Hostname]'. Using configuration for cluster '[Cluster C]', node ID: '1'
    Loading tenant definition: [Tenant Name]
    Loading tenant definition: axual
    Loading instance definition: [Instance A]
    Loading instance definition: [Instance B]
  2. Verification After Connect Restart - use Grafana Dashboards, don’t continue before all the following criteria met:

    1. Verifying every step of the way

    2. On Axual Connect Dashboard Verify per Instance/Connector.

Step 6 - Upgrading to REST Proxy 1.2.2

In the below steps, we are going to set up REST Proxy to use the 1.2.2 version.

Step 6a - Configuring REST Proxy

Make sure you didn’t override the default value before. If you did please remove the following line or comment it out as shown below.

platform-config/tenants/{tenant-name}/instances/{instance-name}/rest-proxy.sh
# Docker image version, only override if you explicitly want to use a different version
#RESTPROXY_VERSION="[OLDER VERSION]"

Step 6b - Restarting REST Proxy

  1. Run the following command for each instance:

    ./axual.sh restart instance <instance-name> rest-proxy

    The following output is expected:

    Axual Platform 2020.3
    Stopping instance services for [Instance Name]n in cluster [Cluster Name]
    Stopping [INSTANCE NAME]-rest-proxy: Stopped
    Done, cluster-api is available
    Deploying topic _[Instance Name]-schemas: Done
    Deploying topic _[Instance Name]-consumer-timestamps: Done
    Done, cluster-api is available
    Done, cluster-api is available
    Applying ACLs :...
    Done
    ...
    Applying ACLs :...
    Done
    Configuring instance services for [Instance Name] in cluster [Cluster Name]
    Preparing [INSTANCE NAME]-rest-proxy: Done
    Warning: 'ADVERTISED_DEBUG_PORT_REST_PROXY' is not an instance variable and a default value is not configured as 'DEFAULT_INSTANCE_ADVERTISED_DEBUG_PORT_REST_PROXY'
    Starting [Instance Name]-rest-proxy: Done
  2. Verification After Rest-Proxy Restart - use Grafana Dashboards, don’t continue before all the following criteria met:

    1. Verifying every step of the way

    2. On Rest-Proxy Detailed Overview Verify per Instance Uptime and Start time correct.

Step 7 - Upgrading to Discovery API 2.1.0

In the below steps, we are going to set up Discovery API to use the 2.1.0 version.

Step 7a - Configuring Discovery API

Make sure you didn’t override the default value before. If you did please remove the following line or comment it out as shown below.

platform-config/tenants/{tenant-name}/instances/{instance-name}/versions.sh
# Docker image version, only override if you explicitly want to use a different version
#DISCOVERYAPI_VERSION="[OLDER VERSION]"

Step 7b - Restarting Discovery API

  1. Run the following command on every node which Discovery API running on.

    ./axual.sh restart instance <instance-name> discovery-api

    The following output is expected:

    Axual Platform 2020.3
    Stopping instance services for [Instance Name] in cluster [Cluster Name]
    Stopping [Instance Name]-discovery-api: Stopped
    Done, cluster-api is available
    Deploying topic _[Instance Name]-schemas: Done
    Deploying topic _[Instance Name]-consumer-timestamps: Done
    Done, cluster-api is available
    Done, cluster-api is available
    Applying ACLs : {...}
    Done
    ...
    Done, cluster-api is available
    Applying ACLs : {...}
    Done
    Configuring instance services for [Instance Name] in cluster [Cluster Name]
    Running copy-config-[Instance Name]-discovery-api: Done
    Preparing [Instance Name]-discovery-api: Done
    Starting [Instance Name]-discovery-api: Done
  2. Verification After Discovery API Restart - use Grafana Dashboards, don’t continue before all the following criteria met:

    1. Verifying every step of the way

    2. On Discovery status Verify per node Status is Active.

Step 8 - Updating Prometheus Targets and Alerts

Restarting Prometheus

Log in to the VM where prometheus runs and use the following command to recreate targets.json and alerts.json:

./axual.sh restart mgmt prometheus

Open the Prometheus UI to check that targets and alerts are all there.

You can search for Management-API as target to confirm new versions have been deployed.

Rollback of Keycloak

In case something went wrong with the upgrade to Keycloak:11, follow this steps to rollback to Keycloak:6

Before you start the rollback procedure, please make sure:

  • you have stopped your mgmt-keycloak container that was failing. Use docker ps to check the current running containers.

Run ./axual.sh stop mgmt mgmt-keycloak to stop the container
  • you have transfer the Keycloak’s DB back to the node where mgmt-db runs, it will be use for importing the data and structure of the DB.

If you are using a remote_db be sure you are performing the following procedure by a node that has connectivity to the remote DB.

Rollback story: Keycloak:11 is not working

Cleanup the Keycloak’s DB

  1. Access Keycloak’s DB with a phpmyadmin instance

  2. Truncate all the keycloak’s tables, the privileges for the Keycloak’s DB must be maintained

  3. Import the Keycloak:6 DB backup

  4. Change your platform-config/clusters/{cluster-name}/keycloak.sh with the following version

    # Version of keycloak to run
    KEYCLOAK_VERSION=6.0.1
  5. Remove the new themes folder from your platform-config/clusters/{cluster-name}/configuration/keycloak

  6. Place the old axual-keycloak-theme under your platform-config/clusters/{cluster-name}/configuration/keycloak so that the themes folder is child of keycloak folder

  7. Start the Keycloak 6.0.1 container

    ./axual.sh start mgmt mgmt-keycloak
  8. Check logs to confirm the successful rollback of Keycloak

    docker logs -f --tail 400 mgmt-keycloak

Rollback story: Keycloak:11 is not working and your DB is corrupted

Your keycloak-db is gone corrupted, so you have to recreate from scratch.

  1. Revert your keycloak_version on platform-config/clusters/{cluster-name}/keycloak.sh

    # Version of keycloak to run
    KEYCLOAK_VERSION=6.0.1
  2. Remove the new themes folder from your platform-config/clusters/{cluster-name}/configuration/keycloak

  3. Place the old axual-keycloak-theme under your platform-config/clusters/{cluster-name}/configuration/keycloak so that the themes folder is child of keycloak folder

  4. Edit your platform-config/{cluster-name}/nodes.sh by adding the service keycloak-populate-db on your nodes’s mgmt_services

    Like this:

    NODE1_MGMT_SERVICES=localhost:mgmt-db,keycloak-populate-db

    This will re-create a clean keycloakdb to be used to import the exported Keycloak:6 DB

  5. Execute the axual.sh command to recreate the DB

    ./axual.sh start mgmt keycloak-populate-db
  6. Import the Keycloak:6 db with all data and structure via mysql

    docker exec -i mgmt-db mysql -uKEYCLOAK_DB_USER -pKEYCLOAK_DB_PASSWORD KEYCLOAK_DB_DATABASE < [path/to/sql/backup]

    If no errors you can now rollback

    ./axual.sh start mgmt mgmt-keycloak
  7. Check logs to confirm the successful rollback of Keycloak

    docker logs -f --tail 400 mgmt-keycloak