Infrastructure Requirements

Kubernetes

The Axual Platform started out on Virtual Machines, but moved to Kubernetes/OpenShift for the many operational benefits that platform offers when running mission-critical applications and services.

For more information, continue reading on the official Kubernetes documentation and the Axual Resilience section.

Openshift

The Axual Kafka platform is fully compatible and runs mission-critical workloads on OpenShift. The Axual Platform is available on the Red Hat Ecosystem Catalog.

Most examples in Axual documentation are made with Kubernetes, but these work similarly on OpenShift. Replace the kubectl command in examples with oc.

Requirements

These requirements should be considered for the infrastructure design of a new installation, to be refined further in collaboration with the responsible infrastructure team.

Kubernetes infrastructure requirements

Category

Requirement

Remarks

Kubernetes Version

> 1.24

EKS, AKS, GKE, Rancher, OpenShift are supported.

Kubernetes Nodes

For a Production environment: 3 Dedicated Nodes for Kafka brokers and Zookeeper

  • 8 CPU 32GB Nodes are a common choice

  • Scale horizontally by adding groups of 3 brokers

  • 3 Nodes in different Availability Zones to allow for high available deployments.

  • Tainted, so avoided by other platform components and create dedicated Kafka Nodes

  • 2/3 Misc Nodes for all other Axual Platform components

  • 4 CPU, 16GB, non-dedicated Nodes

  • 3 Nodes for Connect and/or Distributor

  • These can be combined with the Axual Platform components, replacing Misc Nodes above.

  • 3 Nodes in different Availability Zones to allow for high available deployments.

For POC, 4 nodes of 4 CPU, 16G should be enough. Further reading on Kafka cluster setups: Kafka Architecture DC Setups & Availability

Persistent Volumes

  • Kafka brokers perform best with dedicated SSDs/Spinning disks for each broker, using the fastest and/or dedicated disks will lead to the fastest Kafka cluster.

  • Block devices are preferred over NFS since lower latency is better. Apache Kafka

  • Sizing based on requirements like expected retention period, number and size of records.

  • Dedicating Volumes to Availability Zones or racks is essential for disaster recovery.

  • No backup of these volumes is required, because Kafka ensures internal replication of data. Due to the real-time nature of data, restoring backups is not feasible.

Kubernetes permissions

  • Strimzi Kafka Operator installation requires Kubernetes cluster administrator privileges to allow for the installation of CRDs, Cluster Roles and Cluster Role Bindings. So either access is provided or the Kubernetes team installs the Strimzi Kafka Operator.

  • One to three namespaces are required but two minimum:

  • 1 for the installation of the Axual Platform,

  • 1 for the installation of the Strimzi Operator,

  • a 3rd is required to host a Hashicorp Vault cluster if this is not available in the infrastructure.

Network Connectivity

  • Kafka requires low latency connections between brokers.

  • Kubernetes cluster must have NetworkPolicy implementation running (e.g. Calico) so you cannot use AWS VPC CNI from EKS, for example.

  • ZooKeeper is locked within the cluster by Network Policies.

  • API Gateway, Kafka Brokers, Schema Registry and Rest Proxy should be accessible from anywhere within the company that should be able to access the Event streaming platform. Generally all DevOps team environments should be able to connect to these.

  • Service Meshes are supported

DNS

  • DNS should be resolvable from within the Kubernetes cluster and also from anywhere within the company (both on-premises and cloud if any).

  • Self Service UI DNS should resolve in employee desktops and laptops as it is accessed from the browser.

  • Kafka brokers, Discovery and Schema Registry DNS should be at least resolvable from any servers where producer and consumer applications will run (both on-premises and cloud if any).

  • See the list of DNS names below.

Load Balancers or Ingress

  • Kafka brokers, Kafka bootstrap and Rest Proxy should be exposed preferably using Load Balancers or another highly available technology, since Ingress controllers can be a single point of failure. Non-critical components can be exposed through Ingresses.

  • API Gateway, Schema Registry Ingress listening on port 80/443, SSL passthrough is required as authentication happens via mTLS.

Nginx Ingress Controller and OpenShift Routes are supported.

Certificates

  • A Root CA should be provided for registration with the platform. This CA will be added to the truststore. All client applications (producers and consumers) must be able to obtain a certificate signed by this CA for mutual TLS as it is the only one that will be accepted.

  • Self Service is accessed in the browser so should use a certificate that is trusted within the browser.

  • Cert Manager is supported and preferred.

  • Certificate Reloader (https://github.com/stakater/Reloader) allows for automated component restart after certificate renewal.

  • Internal certificates for communication of Axual components should ideally be fully automated using above-mentioned tools.

  • See list of certificates below.

MySQL Database Service

  • MySQL 8 (preferred), MariaDB 10.3 or higher

  • The preferred option is to use a Managed DB service.

  • Character Set: UTF8MB4

  • Collation: utf8mb4_0900_ai_ci (MySQL) or utf8mb4_unicode_ci(MariaDB)

  • Storage: 50G

  • Backup should be enabled.

In total 2 databases (schemas) are used per cluster - Self Service DB and Keycloak DB A 3rd is required in the case of Apicurio schema registry.

For POC, MariaDB charts can be automatically deployed as part of platform charts deployment.

(Optional) Hashicorp Vault

If there is a Hashicorp Vault present in the infrastructure, it can be used. Alternatively this can be deployed as part of Axual Platform.

Two logical Vaults are 1) for Kafka streaming layer credentials and 2) Connector credentials for Kafka Connect. At least one physical vault is required.

If Vault is not available, Hashicorp Vault should be deployed on the cluster, separately from the Axual Platform charts.

Identity Provider

An Identity Provider, for example Azure Active Directory, depending on the infra/cloud solution available, should be integrated with the Axual Platform.

Helm Chart Repository

Axual Platform is distributed via Helm Charts. The Helm Chart repository should be reachable from the Kubernetes Cluster or a deployment tool.

  • Axual Helm Charts are available publicly (with authentication) at registry.axual.io

  • An internal Helm repository can proxy Axual’s public Helm repository.

  • If using a Helm repository is not an option, then the charts can be pushed into any Git repository.

  • Helm Charts need to be reachable from the CI/CD (ArgoCD/GitLab etc)

Image Registry

  • Axual Platform components are available as images only, based on RedHat Universal Base Images. You must be able to pull these images from within the Kubernetes cluster.

  • Axual Platform images are available publicly (with authentication) at registry.axual.io

  • If the above is not possible, an internal image registry should be provided that proxies Axual’s public image registry to pull images.

  • If proxying the image registry is not possible, then all relevant Axual Platform images can be manually pulled and pushed to an internal image registry. In the future, during upgrades, this step must be repeated to make the latest Axual images available locally.

(Optional) GitOps facilities

Axual prefers to work in a GitOps way, where all infrastructure and configuration is stored in git and applied using tools like ArgoCD or Terraform.

Git repositories for the installation configurations are a minimum requirement. Deployment tooling is great to have.

(Optional) Sensitive Configuration Storage

The platform configuration consists of many sensitive configurations (like private keys, DB passwords, keystore passwords etc.). All configurations will be stored in a Git repository so these will need to be encrypted or stored in another location.

Helm Secrets + Mozilla SOPS is supported, but also Sealed Secrets and 1Password Secrets etc.

Can be skipped for POC.

(Optional) Monitoring & Alerting

Axual Platform provides metrics that are ready for Prometheus to scrape. Integrating these metrics into a central Prometheus, Grafana and Alertmanager stack of the operations team is preferred for all parties involved.

There are ServiceMonitors, PodMonitors and PrometheusRules (for alerting) readily available.

If no Prometheus stack is available, integration with the alerting solution of the customer will be required.

(Optional) Centralized Logging & Tracing

Integration with any Centralized Logging or Distributed Tracing solution would be of great benefit to the observability of the platform. All components are OTEL compliant and can write logs in JSON format.

DNS Names

The DNS names are indicative only. Please change according to your requirements.

Component

Layer

DNS Name

IP Address (TBA)

Port

Protocol

Exposed By

Broker 1

Streaming

esp-broker-0.company.org

<Custom LB IP>

9094, 9095

mTLS+TCP, SASL

Loadbalancer

Broker 2

Streaming

esp-broker-1.company.org

<Custom LB IP>

9094, 9095

mTLS+TCP, SASL

Loadbalancer

Broker 3

Streaming

esp-broker-2.company.org

<Custom LB IP>

9094, 9095

mTLS+TCP, SASL

Loadbalancer

Broker bootstrap

Streaming

esp-broker-bootstrap.company.org

<Custom LB IP>

9094, 9095

mTLS+TCP, SASL

Loadbalancer

API Gateway

Governance

esp-gateway.company.org

<Ingress LB IP>

443

HTTPS

Ingress

Schema Registry

Streaming

esp-schemas.company.org

<Ingress LB IP>

443

HTTPS

Ingress

Rest Proxy

Streaming

esp-restproxy.company.org

<Custom LB IP>

443

mTLS+HTTPS

Loadbalancer

(Optional) Broker 1 Internal

Streaming

esp-broker-0-internal.company.org

<Custom LB IP>

9096, 9095

mTLS+TCP, SASL

Loadbalancer

Certificates and Private Key Infrastructure

When using a Service Mesh, internal PKI and internal certificates are no longer required, as components can work without their own mTLS solutions and still be secure.

PKI

Purpose

Enterprise PKI

Enterprise PKI is trusted by the platform for external interface components. It is used to sign server and client certificates (see below) for external-facing components. It is also used to sign application certificates which will connect to ESP.

PKIESP (Internal)

Custom CA for internal certificates, preferably facilitated via cert-manager. Used to sign private listener certificates of Kafka brokers (9091, 9093). The private key of the CA is also installed in the Operator. Not exposed to external connecting applications. Included in the platform package.

Certificates

Certificates are required for service-service mTLS communication within the Kubernetes cluster and for external communication from clients (producers and consumers) to external components like Kafka, Schema Registry and Rest Proxy.

Use cert-manager to automate internal certificates, any company-wide cert-manager should be integrated if possible.

In table below, replace <prefix> with either:

  • <tenant-short-name>-<instance-name> for example customer-dta-platform-manager

    • Replace <tenant-short-name> with the value configured in .Values.global.tenant.shortName

    • Replace <instance-name> with the value configured in .Values.global.instance.name

  • the name of the Chart, for example “axual” becomes axual-platform-manager

Subject

Issuer

Component

Subject Alternative Names

Certificate type

CN=internal-server-only

PKIESP

  • API Gateway

  • Platform Manager

  • Platform UI

  • Organization Manager

  • Keycloak

  • Topic Browse

  • Schema Registry

  • Rest Proxy

  • Axual Connect

  • Distributor

  • Vault

  • <prefix>-api-gateway

  • <prefix>-platform-manager

  • <prefix>-platform-ui

  • <prefix>-organization-mgmt

  • <prefix>-keycloak-http

  • <prefix>-topic-browse

  • <prefix>-schema-registry

  • <prefix>-rest-proxy

  • <prefix>-connect

  • <prefix>-distributor

  • <prefix>-vault

Server

CN=esp-company-server

Enterprise PKI

  • Kafka brokers and bootstrap

  • API Gateway Ingress

  • Schema Registry Ingress

  • Rest Proxy Ingress

  • esp-broker-0.company.org

  • esp-broker-1.company.org

  • esp-broker-2.company.org

  • esp-broker-bootstrap.company.org

  • esp-selfservice.company.org

  • esp-schemas.company.org

  • esp-restproxy.company.org

Server

Truststore

A truststore file (Secret or jks) should be created which contains the Enterprise PKI and the internal ESP PKI. A sample truststore.jks file is included in the package containing the internal PKI. Add the Enterprise PKI in the same truststore.