Setup Podlily

0. How to use the podlily repository

This README guides you through the process of setting up a leap-backend k3s cluster and a leap-gateway k3s cluster and installing monitoring on them. You will set up at least 4 virtual machines (3 for the backend cluster, 1 for the gateway cluster) on a cloud provider using Terraform and install all neccessary services on the k3s clusters using Helm.

It directly relies on these 4 repositories in the leap space:

terraform-hetzner-k3s-vpn : Terraform module with resources to provision backend and gateway k3s cluster with hetzner
terraform-ovh-k3s-vpn : Terraform module with resources to provision backend and gateway k3s cluster with ovh
leap-backend: Helm chart for leap-backend
leap-gateway : Helm chart for leap-gateway

Please check out their documentation for additional information.

To get started clone the podlily repository:

git clone https://0xacab.org/leap/container-platform/podlily.git

The repository contains 4 directories:

clusters: A structure to contain all the files you need to provision and organize the backend and gateway k3s clusters. You will move around in this directory to install, deploy and manage your clusters.
scripts: Helper scripts, e.g.scripts for certs and secrets creation used for TLS communication between backend and gateways.
templates: A copy of the structure in clusters containing template files and helper scripts to help you get started with the deployment.
docs: Documentation on podlily: architecture, TLS certificates, scripts. If you run into problems following this README, check out the file on Troubleshooting.

0.1. Prerequisites

You need to have the following installed:

helm3 (https://helm.sh/de/docs/intro/install/)
Terraform (https://developer.hashicorp.com/terraform/install)
kubectl (https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/)
git-crypt (for accessing encrypted files) - Installation guide
gnupg (for GPG key management) - Installation guide
x509ca - a very simple command-line tool to generate arbitrary X509 certificates (https://git.autistici.org/ale/x509ca). Short installation guide:
1. Install dependencies if needed:
  sudo apt-get install golang build-essential
2. Install x509ca:
  go install git.autistici.org/ale/x509ca@latest
3. Add Go bin to your PATH if needed:
  export PATH=\$PATH:\$(go env GOPATH)/bin

1. Setting up the backend cluster

1.1. Initializing backend cluster files

You can initialize the backend cluster with the default configuration files and directory layout by running the initialization script from the repository root:

bash scripts/init-cluster.sh backend

You will be prompted to choose a cloud provider you want to use for provisioning your backend cluster machines so that the right template file is copied. At the moment there are Terraform template files for ovh and hetzner.
The script will create a new directory at clusters/backend with the necessary helm, scripts, and terraform subdirectories populated from templates/backend.

If you want to force overwriting any existing files, add --force:

bash scripts/init-cluster.sh --force backend

The script also supports interactive mode: run bash scripts/init-cluster.sh with no arguments and follow the prompts.

Once the layout exists under clusters/backend, continue with Terraform in the next section.

1.2. Provisioning the backend cluster using Terraform

Change into clusters/backend/terraform. Here is the Terraform file that defines your backend cluster. You need to edit this file to define where to provision what type of machines, give valid API keys and add ssh keys to grant access to the cluster. Depending on the cloud provider you chose, expand the sections below and follow the instructions for the next steps. If you need more information on the Terraform module, have a look in the README of either terraform-hetzner-k3s-vpn or terraform-ovh-k3s-vpn.

Hetzner

1.2.1. Creating a Public Cloud project on Hetzner

Follow these steps to set up your project:

Create a Hetzner Account on https://www.hetzner.com/ and log in.
Create a new project: on the dashboard, click New Project, enter a name and click Add project.

1.2.2. Configuring the backend cluster

Configure your backend cluster by changing the defaults in cluster/backend/terraform/backend-hetzner.tf. Make sure to check/adapt all lines with ⚠️ comments.

In your template file you can choose how many and which servers you want to provision in which datacenter. It is important to first check which server types are available in which region. It is easiest to achieve this by navigating to your cloud project in the Hetzner console and pretending to want to create a server by clicking through the interface. Go to Servers on the top of the left navigation bar and click “Add Server”. There you can see all current datacenter locations with their codes and server types available in them. Choose only servers with ‘x86’ architecture. You need to fill in the server names in the template file.

Below is a list of the variables you need to check/adapt that are marked with a ⚠️ in the template file. For more information find descriptions of all variables in the terraform-hetzner-k3s-vpn README.

Variable	Type	Description
k3s_controller_server_type	string	The hetzner server code for controller nodes. Choose one from https://www.hetzner.com/cloud/#pricing, only choose Intel/AMD machines."
datacenter	string	Hetzner datacenter name (e.g., hel1-dc2 for Helsinki). Choose one from https://docs.hetzner.com/cloud/general/locations but make sure the chosen server type is available at the chosen location.
network_zone	string	Name of network zone. Must match the datacenter location. See https://docs.hetzner.com/cloud/general/locations/
admins	list(object({ name = string, public_key = string }))	List of admin SSH key objects containing a freely selectable name and a corresponding public ssh key.
k3s_worker_nodes	list(object({ name = string, count = number, server_type = optional string, image = optional string, labels = optional map(string) }))	A list of groups of worker nodes, each sharing a common operating system and server flavor. The variable count determines how many nodes of this kind you want to spin up. The image defaults to the value of k3s_base_os. In a single-node cluster like a gateway this variable should be [] as there is only one controller node and no worker nodes. See vars.tf for more configuration options.
other_labels	map(string)	Labels to tag the virtual machines in your cloud project for easy filtering in the Hetzner console. Both key and value must be 63 characters or less, beginning and ending with an alphanumeric character and alphanumerics can be used inbetween.

1.2.3. Creating and providing a Hetzner API token for your project

First you need to generate a Read-Write token for your project in order to create and delete resources with Terraform. Enter your new project in the Hetzner console and navigate to Security settings (left sidebar, bottom). Go to the API tokens tab and click Generate API token. Make sure to activate Read & Write permissions. Store the generated token securily. It is only shown once in the web interface.

In order to provide this API token to Terraform you can use an environment variable. Open a shell, fill in your newly created token and run:

export TF_VAR_hcloud_token=<your-API-token>

OVH

1.2.1. Creating a Public Cloud project on OVH

Register on OVH and create a Public Cloud project. Please note: by default OVH enforces quite strict quota and often you are only allowed to provision resources in the region you chose for your cloud project. You can check the quotas and the regional codes of your public cloud project for each region in the OVH cloud dashboard under Public Cloud / Settings / Quota & Regions.

1.2.2. Configuring the backend cluster

Configure your backend cluster by changing the defaults in cluster/backend/terraform/backend-ovh.tf. Make sure to check/adapt all lines with ⚠️ comments.

When choosing a region and a server type it is important to first check if the server type is available in this region and if you have enough quota to provision the number of servers you plan to. It is easiest to achieve this by navigating to your public cloud project in the OVH console and pretending to want to create a server by clicking through the interface. Go to Instances on the top of the left navigation bar and click “Create an instance”. There you can see all current datacenter locations with their regional codes and server types available in them. You need to fill in the regional codes and instance names in the template file. Also take care that you don’t exceed a quota. You can check the quotas and the regional codes of your public cloud project for each region in the OVH cloud dashboard under Public Cloud / Settings / Quota & Regions.

Here is a tabular overview of the configuration variables marked with a ⚠️ in the template file. For more information find descriptions of all variables in the terraform-ovh-k3s-vpn README.

Variable	Type	Description
ovh_service_name	string	the id of your public cloud project. You find it in the OVH console under your project name.
ovh_region	string	the OVH regional code for the location you want to provision resources in. Please mind your quota.
k3s_controller_server_type	string	The OVH flavor name for controller nodes. Choose one from https://www.ovhcloud.com/de/public-cloud/prices/#552. Defaults to “b2-7”
k3s_worker_nodes	list(object({ name = string, count = number, server_type = string, image_name = optional(string) }))	A list of groups of worker nodes, each sharing a common operating system and server flavor. The variable count determines how many nodes of this kind you want to spin up. The image_name defaults to the value of k3s_base_os. In a single-node cluster like a gateway this variable should be [] as there is only one controller node and no worker nodes. Defaults to [].
admin_ssh_key	object({ name = string, public_key = string })	An object containing any chosen name as name and your public ssh-key as public_key. A ssh_key resource will be created and linked to all of your created instances.
additional_ssh_keys	list[string]	A list of additional public ssh-keys as strings that you want to grant access to your resources to. Defaults to [].

1.2.3. Creating OVH secrets and providing them to Terraform

Next step is to create the OVH Application Key, Application Secret and Consumer Key. All of these are neccessary so that your Terraform code is allowed to make requests to the OVH API. You can create them here:

https://www.ovh.com/auth/api/createToken

For ‘Application name’ and ‘Application description’ you can use whatever you like. For the sake of this tutorial it is easiest to manually add one line for each right (GET, PUT, PATCH, POST, DELETE) and put * in the field to the right, thus granting your application universal rights. If you wish to have more control, you can play around here.

Next we have to somehow give these secrets to Terraform, so it can use them when making API calls to OVH. There are multiple ways to handle secret variables like these, but for now we will do it via environment variables following the scheme:
export OVH_<variable_name>=<value> .

Open a shell and set all of the created secrets by typing the following commands:

export OVH_CONSUMER_KEY=<your_consumer_key>
export OVH_APPLICATION_SECRET=<your_application_secret>
export OVH_APPLICATION_KEY=<your_application_key>

Additionally you should set your endpoint to your specific region. For the EU you set:

export OVH_ENDPOINT=ovh-eu

1.2.4. Provisioning the resources using Terraform

1.2.4.1. Initializing Terraform

In the same shell and in the folder with your Terraform project file, run

terraform init

This initializes a working directory containing Terraform configuration files.

1.2.4.2. Planning

When everything works out run

terraform plan

This creates an execution plan, which lets you preview the changes that Terraform plans to make to your infrastructure.Read the plan and make sure things are getting created as expected.

1.2.4.3. Applying

Last run

terraform apply

This executes the actions proposed in the Terraform plan to create, update, or destroy infrastructure. Your k3s cluster is now being provisioned. 🎊

You can check on the Hetzner Cloud Console or OVH Dashboard if all of your resources are created as expected.

If you followed the steps successfully, you should have a k3s cluster with 3 nodes up and running: 1 controller node and 2 worker nodes. You can check in the cloud console that Terraform indeed created the resources. K3s has already been installed on these machines during provisioning.

1.2.5. Accessing the cluster using port forwarding

In order to have easy access to the cluster from your machine you can use the script clusters/backend/scripts/access_cluster.sh. It will create a ssh tunnel to the controller node running on your chosen cloud provider. This enables you to create and manage cluster resources of the remote cluster from your local machine. You can find the documentation for this script in the docs folder.

To use it, change into clusters/backend and run

eval $(./scripts/access_cluster.sh --start)

Check if your cluster is healthy and you have access to it by running

kubectl get nodes -o wide

If you chose defaults, you should see the 3 nodes you provisioned.

1.3. Installing leap-backend

The next step is to install the leap platform components running on the backend cluster using the leap-backend Helm chart.

1.3.1. Creating a subdomain for your backend

Create an A-record subdomain and point it to the public IP of the controller node of your backend cluster. You can find out that IP through the cloud provider console or by looking at the output of your terraform apply command or searching for ‘k3s_controller_ip’ in clusters/backend/terraform/terraform.tfstate.

There are several services to create subdomains for free available on the internet.

1.3.2. Configuring backend.values.yaml

clusters/backend/helm/backend.values.yaml is the configuration file for the leap-backend Helm chart and lets you configure your provider and the services running on the backend cluster.

Configure your provider in the ‘Provider configuration’ section.
Also check all lines with ⚠️:

Fill in your newly created subdomain in domain.
Give a valid email in acme/email and traefik/additionalArguments for reminders for TLS certificate renewal.

Find more information about what is deployed and configuration options in the README.

1.3.3. Creating provider certificates

This script will generate 2 certificate chains. One is the Provider CA and another is the Menshen CA.

In the base directory run the script:

bash scripts/gen-provider-certs.sh

It will prompt you for 3 things:

Certificate output directory: you can optionally chose a different output directory for the secrets created by the script
Provider CA subject: optionally you can set a subject for the CA generation
Menshen domain: set this to the domain created in this step

You will see the created certificates in the output directory you chose or in a newly created top-level directory called secrets.

1.3.4. Creating backend secrets

The next script will create secrets on the backend cluster containing the newly created certs. Change into the top-level directory and run:

bash scripts/create-backend-secrets.sh

You will be prompted two times:

Certificates directory: the one created in the last step, secrets by default
Kubernetes namespace: the namespace you want to create the secrets in. It has to match the namespace you will deploy menshen in but ‘default’ is fine.

Additionally you need to create a Kubernetes Secret containing a key used for communication between menshen running on the backend cluster and menshen-agent running on the gateway cluster. You can use the provided helper script to create this secret in your backend cluster:

bash scripts/create-menshen-shared-secret.sh

It asks for a secrets directory (where the shared key file lives or is created) and the Kubernetes namespace. You can let the script create you a safe key or enter one if you like. The namespace should be the same namespace where menshen will run (often default).

This creates the secret menshen-agent-shared-key, which the backend chart references through menshenAgentSharedSecret.name in backend.values.yaml.

1.3.5. Installing the leap-backend Helm chart

Now we can finally deploy all backend services by changing into clusters/backend/helm and running

helm upgrade --install -f ./backend.values.yaml leap-backend https://0xacab.org/api/v4/projects/6134/packages/generic/helm-chart/v0.1.2/provider-backend-v0.1.2.tgz

You can check if everything runs smoothly by checking the pods:

kubectl get pods

You should see 2 pods running: provider-backend-menshen and traefik.

You can also check if the services are running as expected by playing around with the Menshen API. You can find the documentation here.

1.4. Installing backend monitoring with kube-prometheus-stack

In order to monitor the health of the k3s cluster and the pods running on it, we use the kube-prometheus-stack Helm chart.

1.4.1. Adding a subdomain for Prometheus

Create another A-record subdomain pointing to the same IP address of your controller node. You can call it how you like but need to configure it in clusters/backend/helm/monitoring.values.yaml. For now we’ll call it <prom.yourdomain.com>.

1.4.2. Configuring monitoring.values.yaml

This file lets you change some of the many configurable variables of the kube-prometheus-stack chart.

Fill in your newly created Prometheus subdomain and all other lines marked with ⚠️.

Find additional information in the official kube-prometheus-stack README.

1.4.3. Installing kube-prometheus-stack using the install script

You need to set up a secret in order to secure the Prometheus endpoint on your backend that Grafana Alloy on the gateways will write to.

For now you can use an environment variable. Make sure to use a sufficiently long random string, e.g. by using pwgen -s 30 1. Then replace <your-password> with the secure password and run:

export REMOTE_WRITE_PASSWORD=<your-password>

Make sure to store that password somewhere safe as you will need it later.

The script will create the secret to secure your Prometheus endpoint, install kube-prometheus-stack and create a custom dashboard in Grafana for gateway metrics.

In order for it to work you need access to your backend cluster, so make sure to run clusters/backend/scripts/access_cluster.sh again if neccessary. When kubectl commands show your cluster resources change into clusters/backend and run:

./scripts/deploy_monitoring.sh

After deployment you can check the newly running pods:

kubectl get pods -n monitoring

Since the kube-prometheus-stack chart and all of the monitoring resources have been installed in a new namespace monitoring, make sure to always add -n monitoring to kubectl commands intended for monitoring resources.

1.4.4. Looking at Grafana dashboards

You can access Grafana by using the provided script. It will output your Grafana password and start port-forwarding the Grafana instance that is running on your cluster to localhost. In clusters/backend run:

bash scripts/port_forward_grafana.sh

In order to look at the dashboards simply open this port in your browser: http://localhost:3000/

When visiting the first time give ‘admin’ as a username and the password outputted by the script.

Under dashboards you will find all the default dashboards of kube-prometheus-stack that give you tons of information on the health of your cluster plus one for monitoring gateway metrics. This will be empty until you have deployed any gateways which is the next step.

2. Setting up the gateway cluster

2.1. Initializing gateway cluster files

You can initialize the gateway cluster with the default configuration files and directory layout by running the initialization script from the repository root:

bash scripts/init-cluster.sh gateway [optional-suffix] [--provider ovh|hetzner]

The script also supports interactive mode: run bash scripts/init-cluster.sh with no arguments and follow the prompts.

2.2 Provisioning the gateway cluster using Terraform

Change into clusters/gateway/terraform. Here is the Terraform file that defines your gateway cluster. The template file defines a single-node cluster. You need to edit the file to define where to provision the machine, choose the server type and more. Again, depending on the cloud provider you chose, expand the sections below and follow the instructions for the next steps. If you need more information on the Terraform module, have a look in the README of either terraform-hetzner-k3s-vpn or terraform-ovh-k3s-vpn.

Hetzner

The steps are the same as done for the backend cluster. In order for the relative links to work, expand the instructions for Hetzner in this section.

If you have not already, create a Public Cloud Project on Hetzner following this step.
If you have not already, create a valid API token for your project and provide it to Terraform using an environment variable following this step.
Configure your gateway cluster by changing the defaults in cluster/gateway/terraform/gateway-hetzner.tf. Make sure to check/adapt all lines with ⚠️ comments. Find explanations for the configuration variables in this step. The only variables not covered there are k3s_cluster_name and k3s_network_name. In order to keep a good overview when provisioning gateways in several different locations, we propose to include the gateway location in the k3s resource names. But of course you can use any other naming convention you prefer.
Provision the resources using Terraform following instructions in this step

OVH

The steps are the same as done for the backend cluster. In order for the relative links to work, expand the instructions for OVH in this section.

If you have not already, create a Public Cloud Project on OVH following this step.
If you have not already, create OVH secrets for your project and provide them to Terraform using environment variables following this step.
Configure your gateway cluster by changing the defaults in cluster/gateway/terraform/gateway-ovh.tf. Make sure to check/adapt all lines with ⚠️ comments. Find explanations for the configuration variables in this step. The only variables not covered there are k3s_cluster_name and k3s_network_name. In order to keep a good overview when provisioning gateways in several different locations, we propose to include the gateway location in the k3s resource names. But of course you can use any other naming convention you prefer.
Provision the resources using Terraform following instructions in this step

If you followed the steps successfully, you should have a k3s cluster with 1 node up and running. You can check in the Cloud Console that Terraform indeed created the resources.

Before accessing the gateway cluster you have to stop the port-forwarding to your backend cluster if it is still running. Change to clusters/backend and run

./scripts/access_cluster.sh --stop

Then return to clusters/gateway and access your newly provisioned cluster:

eval $(./scripts/access_cluster.sh --start)

Check that your one-node cluster is up and running with

kubectl get nodes -o wide

2.3. Installing leap-gateway

The next step consists of installing the leap platform components that run on a gateway cluster by installing the leap-gateway Helm chart.

2.3.1. Configuring gw.values.yaml

You can change the configuration of all gateway deployments in this file: openvpn, obfsvpn, ovpn-addons and menshenAgent. For further help on configuring these services please refer to the leap-gateway documentation.

In order for the gateway deployment to work, it is neccessary to fill in all lines marked with ⚠️:

obfsvpn.OBFSVPN_LOCATION and menshenAgent.location: The location of your provisioned gateway server in small letters, e.g. london
obfsvpn.MENSHEN_URL or menshenAgent.menshenUrl: The url to the domain name you registered for your backend controller node, e.g. ‘https://thisis.yourdomain.com’
obfsvpn.OBFSVPN_HOSTNAME: Choose an unique identifier for the gateway. It will be handed over to the client so the name should not reveal too much about the underlying infrastructure.
obfsvpn.OBFS4_PUBLIC_HOST and menshenAgent.externalIp: The floating IP assigned to your gateway server. Look it up in the console of your cloud provider. If you don’t have a floating IP assigned, just take the public IP of your gateway controller node.

2.3.2. Generating gateway certificates

This script creates openvpn gateway certificates signed by the Provider CA used by clients to verify the authenticity of gateways.

It is neccessary that provider certificates have already been created. We’ll assume they are available in the top-level secrets directory. If they are missing, run scripts/gen-provider-certs.sh first. Find the instructions for this script here.

If the provider certificates are in place, change into the top-level podlily folder and run:

bash scripts/gen-gateway-certs.sh

You will be prompted:

Provider CA certificate path: The path where the certs are stored. The default should be right if you haven’t changed the path in the gen-provider-certs script
Menshen CA certificate path: same
Provider CA key path: same
Gateway domain or a unique name for the gateway: Choose a unique name for your gateway
Certificates output directory: The default is good but you can choose a different place if you want

If everything works as expected you should see a new folder under secrets named after your gateway name.

2.3.3. Creating gateway secrets

The next script will create secrets from the certificates that were just created so the deployments on the k3s cluster can consume them.

Make sure you have access to your gateway cluster (kubectl get nodes works) and run

bash scripts/create-gateway-secrets.sh

You will be prompted:

Certificates directory: If you haven’t changed it in the previous scripts the default is correct
Kubernetes namespace: The namespace the secrets will be created in. Must match the namespace leap-gateway will be deployed in. ‘default’ is fine
Gateway domain: Select the gateway name or domain for which you generated the certs in the previous step

For the communication between menshen on the backend cluster and menshen-agent on the gateway cluster to work you need to create a Kubernetes Secret on the gateway cluster containing the same key as in this step. Again, you can simply use the provided helper script to create this secret in your gateway cluster from the file that was created during backend secrets creation:

bash scripts/create-menshen-shared-secret.sh

It asks for a secrets directory (where the shared key file lives or is created) and the Kubernetes namespace. If the secret already exists from backend secret creation, the script will simply take that secret and make it available to menshen-agent running on the gateway cluster. The helm-gateway chart references that secret through menshenAgentSharedSecret.name in the Helm gw.values.yaml.

2.3.4. Deploying the leap-gateway Helm chart

With all of these secrets created we can move on to installing the leap platform components on the cluster. Change into clusters/gateway/helm, fill in your gateway name or domain in the following command and run it:

helm upgrade --install -f ./gw.values.yaml <your-gateway-name> https://0xacab.org/api/v4/projects/6095/packages/generic/helm-chart/v0.1.4/leap-gateway-v0.1.4.tgz

You can check if everything runs smoothly by checking the pods:

kubectl get pods

You should see 4 pods running: menshen-agent, obfsvpn, openvpn-tcp and openvpn-udp.

2.3.5. Updating menshen with the gateway location

In order for the bitmask client to suggest the closest gateway for each user and show the correct gateway location, the menshen instance running on the backend cluster needs some general information about the gateway location. This step is about providing menshen with this information.

Stop the port-forwarding to the gateway cluster by changing into clusters/gateway and running:

./scripts/access_cluster.sh --stop

Then return to clusters/backend and access this cluster:

eval $(./scripts/access_cluster.sh --start)

General location information is configured via the backend Helm chart values. Edit the locations section under eip: in your backend.values.yaml (usually found at clusters/backend/helm/backend.values.yaml or similar). Add or edit your locations as in the following example:

eip:
  locations:
    seattle:
      country_code: "US"
      display_name: "Seattle"
      hemisphere: "N"
      timezone: "-7"
    amsterdam:
      country_code: "NL"
      display_name: "Amsterdam"
      hemisphere: "N"
      timezone: "+1"

Make sure the location key is all lowercase and matches the variables obfsvpn.OBFSVPN_LOCATION and menshenAgent.location set in your gateway’s Helm values (gw.values.yaml).
The display_name field will appear in the Bitmask app as the city name, and the timezone should reflect the UTC offset.

After updating your backend.values.yaml, change into clusters/backend/helm and apply the changes by upgrading your leap-backend Helm release:

helm upgrade --install -f ./backend.values.yaml leap-backend https://0xacab.org/api/v4/projects/6134/packages/generic/helm-chart/v0.1.2/provider-backend-v0.1.2.tgz

Next you need to restart the menshen pod for the changes to take effect. First find out the pod-name by running:

kubectl get pods

Copy the name of the pod that starts with ’leap-backend-provider-backend-menshen’ and delete it with:

kubectl delete pod <pod-name>

K3s will immediately spin up a new menshen pod containing the changes. You can check if everything worked out smoothly by typing in your browser https://your.domain.com/api/5/service and see if your location shows up correctly in the “locations” part.

2.3.6. Connect to your gateway with Bitmask

Hooray everything is ready to connect to your gateway with the Bitmask app. In the provider choice menu add a new provider by typing <your.domain.com> in the URL field. If everything worked out Bitmask should connect you to your newly provisioned gateway showing the correct location. You can check if your IP actually changed by opening https://myip.wtf in your browser 🎉.

2.4. Installing gateway monitoring with Grafana Alloy

In order to monitor the health of the gateway and get statistics like number of connected users, the next steps guide you through installing Grafana Alloy on the gateway cluster. It will collect metrics from the gateway node and report them to Prometheus running on the backend cluster. This setup allows you to monitor gateway metrics in the same Grafana instance as backend metrics.

2.4.1. Configuring alloy.values.yaml

This file lets you configure the Grafana Alloy deployment. Fill in the Prometheus subdomain created in this step during backend monitoring installation and all other lines marked with ⚠️.

For further Alloy configuration options see the official documentation.

2.4.2. Installing Grafana Alloy using the install script

In order to write to the secured Prometheus endpoint on your backend, Grafana Alloy is using basic authentication with a username and password. The username is given in alloy.values.yaml, the password will be made available as a secret. It is important to use the same password you created for this reason in this step during backend monitoring installation preparations.

Again, replace <your-password> with the secure password and run:

export REMOTE_WRITE_PASSWORD=<your-password>

Next, the installation script will create the secret containing the authentication password for the communication with backend Prometheus and install Grafana Alloy.

In order for it to work you need access to your gateway cluster, so make sure to stop the port-forwarding to the backend cluster. Change to clusters/backend and run

./scripts/access_cluster.sh --stop

Then return to clusters/gateway and access the cluster:

eval $(./scripts/access_cluster.sh --start)

Finally run the installation script:

./scripts/deploy_alloy.sh

After deployment you can check the newly running pod:

kubectl get pods

You should see the alloy pod in addition to the 4 leap-gateway pods.

2.4.3. Monitoring your gateways using the Grafana dashboard

Since Grafana is running on the backend cluster, in order to access it, you first need to stop the port-forwarding to your gateway cluster. Change to clusters/gateway and run

./scripts/access_cluster.sh --stop

Then follow the instructions in this step to port-forward the Grafana instance running on your backend cluster.

Open the ‘Gateway metrics’ dashboard to see data of your gateway(s). Alloy is scraping metrics from the deployments and nodes running on your gateway cluster and writing them to the Prometheus endpoint on your backend cluster, making them visible in Grafana. If everything went well you should be able to see the number of connected clients and information on traffic and load, making this a vital dashboard for monitoring the health of your gateways.

3. Secrets Management with git-crypt

3.1. Setting up git-crypt

3.1.1. Initializing git-crypt (for repository maintainers)

If you’re setting up git-crypt for the first time in this repository:

Start by initializing git-crypt in the repository root:

git-crypt init

Next, configure which files should be encrypted. Add encryption rules to a .gitattributes file. For this repository, you should include these patterns:

secrets/** - All files in the secrets directory
*.tfstate - Terraform state files

Your .gitattributes file should look like:

secrets/** filter=git-crypt diff=git-crypt
*.tfstate filter=git-crypt diff=git-crypt

This will ensure all files in secrets/ and all *.tfstate files are encrypted.

Add team members’ GPG keys to allow them to decrypt secrets

To give repository access to a new person, simply add their GPG public key:

git-crypt add-gpg-user <gpg-key-id>

This command grants the user access to encrypted files. Before running it, make sure the team member’s public GPG key is available in your local keyring.

You can import their public key directly with:

gpg --import /path/to/team-member-public-key.asc

Alternatively, if you know their GPG key fingerprint, you can fetch it from a keyserver (e.g., keys.openpgp.org) with:

gpg --recv-keys <fingerprint>

To check that the key is present in your keyring:

gpg --list-keys <user-email-or-keyid-or-fingerprint>

Removing a Team Member’s Access

To remove a user’s access to encrypted files, first remove their GPG key from the repository with:

git-crypt remove-gpg-user <gpg-key-id>

Important: Removing a user only prevents them from decrypting new changes that are committed and pushed after their removal. They will still have access to previously unencrypted secrets or any data they already decrypted while they had access. To fully revoke access, rotate the repository secrets so only authorized users have access to the new encrypted files.

For further security, you may wish to re-encrypt secrets or rotate sensitive credentials if a user leaves the team.

Unlocking the Repository

After cloning, you’ll need to unlock the repository to access encrypted files:

# If you have GPG access (preferred method)
git-crypt unlock

# Or if you received a keyfile from a team member
git-crypt unlock /path/to/keyfile

Checking Repository Status

# Check if repository is unlocked
git-crypt status

# See which files are encrypted
git-crypt status -e

Working with Encrypted Files

Once unlocked, encrypted files work transparently - edit, commit, and push normally. The files are automatically encrypted when committed and decrypted when checked out.

Important: Always ensure the repository is unlocked before working with files in the secrets/ directory or any Terraform state files.