Setup Podlily

0. How to use the podlily repository

To get started clone the podlily repository:

git clone https://0xacab.org/leap/container-platform/podlily.git

The repository contains 4 directories:

clusters: A structure to contain all the files you need to provision and organize the backend and gateway k3s clusters. You will move around in this directory to install, deploy and manage your clusters.
scripts: Helper scripts, e.g.scripts for certs and secrets creation used for TLS communication between backend and gateways.
templates: A copy of the structure in clusters containing template files and helper scripts to help you get started with the deployment.
docs: Documentation on podlily: architecture, TLS certificates, scripts. If you run into problems following this README, check out the file on Troubleshooting.

This README guides you through the process of setting up a leap-backend k3s cluster and a leap-gateway k3s cluster and installing monitoring on them.

It directly relies on these 4 repositories in the leap space:

terraform-hetzner-k3s-vpn : Terraform module with resources to provision backend and gateway k3s cluster with hetzner
terraform-ovh-k3s-vpn : Terraform module with resources to provision backend and gateway k3s cluster with ovh
leap-backend: Helm chart for leap-backend
leap-gateway : Helm chart for leap-gateway

Please check out their documentation for additional information.

0.1. Prerequisites

You need to have the following installed:

helm3 (https://helm.sh/de/docs/intro/install/)
terraform (https://developer.hashicorp.com/terraform/install)
kubectl (https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/)
git-crypt (for accessing encrypted files) - Installation guide
gnupg (for GPG key management) - Installation guide

1. Setting up the backend cluster

1.1. Initializing backend cluster files

You can initialise the backend cluster with the default configuration files and directory layout by running the initialization script from the repository root:

bash scripts/init-cluster.sh backend

This will create a new directory at clusters/backend with the necessary helm, scripts, and terraform subdirectories populated from templates/backend.

If you want to force overwrite any existing files, add --force:

bash scripts/init-cluster.sh --force backend

The script also supports interactive mode: run bash scripts/init-cluster.sh with no arguments and follow the prompts.

Once the layout exists under clusters/backend, continue with Terraform in the next section.

1.2. Provisioning the backend cluster using terraform

Change into clusters/backend/terraform. Here is the terraform file that defines your backend cluster. Depending on the cloud provider you chose, follow the steps in the README of either terraform-hetzner-k3s-vpn or terraform-ovh-k3s-vpn to provision a multi-node backend cluster. At the end you should have a k3s cluster up and running.

In order to have easy access to the cluster from your machine you can use the script clusters/backend/scripts/access_cluster.sh. It will create a ssh tunnel to the controller node enabling you to create and manage cluster resources from your local machine. You can find the documentation for this script in the docs folder.

To use it, change into clusters/backend and run

eval $(./scripts/access_cluster.sh --start)

Check if your cluster is healthy and you have access to it by running

kubectl get nodes -o wide

If you chose defaults, you should see the 3 nodes you provisioned.

1.3. Installing leap-backend

1.3.1. Creating a subdomain for your backend

Create an A-record subdomain and point it to the public IP of the controller node of your backend cluster. You can find out that IP through the cloud provider console or by looking at the output of your terraform apply command or searching for ‘k3s_controller_ip’ in clusters/backend/terraform/terraform.tfstate.

There are several services to create subdomains for free available on the internet.

1.3.2. Configuring backend.values.yaml

Configure your provider in the ‘Provider configuration’ section.
Also check all lines with ⚠️:
Fill in your newly created subdomain in domain.
Give a valid email in acme/email for reminders for TLS certificate renewal.

Find more information about what is deployed and configuration options in the README.

1.3.3. Creating provider certificates

This script will generate 2 certificate chains. One is the Provider CA and another is the Menshen CA.

In the base directory run the script:

bash scripts/gen-provider-certs.sh

It will prompt you for 3 things:

Certificate output directory: you can optionally chose a different output directory for the secrets created by the script
Provider CA subject: optionally you can set a subject for the CA generation
Menshen domain: set this to the domain created in this step

You will see the created certificates in the output directory you chose or in a newly created top-level directory called secrets.

1.3.4. Creating backend secrets

The next script will create secrets on the backend cluster containing the newly created certs. Change into the top-level directory and run:

bash scripts/create-backend-secrets.sh

You will be prompted two times:

Certificates directory: the one created in the last step, secrets by default
Kubernetes namespace: the namespace you want to create the secrets in. It has to match the namespace you will deploy menshen in but ‘default’ is fine.

You need to create a Kubernetes Secret containing a key used for communication between menshen running on backend cluster and menshen-agent running on gateway cluster. You can use the provided helper script to create this secret in your backend cluster:

bash scripts/create-menshen-shared-secret.sh

It asks for a secrets directory (where the shared key file lives or is created) and the Kubernetes namespace. You can let the script create you a safe key or enter one if you like. The namespace should be the same namespace where menshen will run (often default).

This creates the secret menshen-agent-shared-key, which the backend chart references through menshenAgentSharedSecret.name in Helm values.

1.3.5. Installing the leap-backend helm chart

Now we can finally deploy all backend services by changing into clusters/backend/helm and running

helm upgrade --install -f ./backend.values.yaml leap-backend https://0xacab.org/api/v4/projects/6134/packages/generic/helm-chart/v0.1.2/provider-backend-v0.1.2.tgz

You can check if everything runs smoothly by checking the pods:

kubectl get pods

You should see 2 pods running: provider-backend-menshen and traefik.

You can also check if the services are running as expected by playing around with the Menshen API. You can find the documentation here.

1.4. Installing backend monitoring with kube-prometheus-stack

1.4.1. Adding a subdomain for prometheus

Create another A-record subdomain pointing to the same IP address of your controller node. You can call it how you like but need to configure it in clusters/backend/helm/monitoring.values.yaml. For now we’ll call it <prom.yourdomain.com>.

1.4.2. Configuring monitoring.values.yaml

Fill in your newly created prometheus subdomain and all other lines marked with ⚠️.

Find additional information in the official kube-prometheus-stack README.

1.4.3. Installing kube-prometheus-stack using the install script

You need to set up a secret in order to secure the prometheus endpoint on your backend that grafana alloy on the gateways will write to.

For now you can use an environment variable. Make sure to use a sufficiently long random string, e.g. by using pwgen -s 30 1. Then replace <your-password> with the secure password and run:

export REMOTE_WRITE_PASSWORD=<your-password>

Make sure to store that password somewhere safe as you will need it later.

The script will create the secret to secure your prometheus endpoint, install kube-prometheus-stack and create a custom dashboard in grafana for gateway metrics.

In order for it to work you need access to your backend cluster, so make sure to run clusters/backend/scripts/access_cluster.sh again if neccessary. When kubectl commands show your cluster resources change into clusters/backend and run:

./scripts/deploy_monitoring.sh

After deployment you can check the newly running pods:

kubectl get pods -n monitoring

Since the kube-prometheus-stack chart and all of the monitoring resources have been installed in a new namespace monitoring, make sure to always add -n monitoring to kubectl commands intended for monitoring resources.

1.4.4. Looking at grafana dashboards

You can access grafana by using the provided script. It will output your grafana password and start port-forwarding the grafana instance that is running on your cluster to localhost. In clusters/backend run:

bash scripts/port_forward_grafana.sh

In order to look at the dashboards simply open this port in your browser: http://localhost:3000/

When visiting the first time give ‘admin’ as a username and the password outputted by the script.

Under dashboards you will find all the default dashboards of kube-prometheus-stack that give you tons of information on the health of your cluster plus one for monitoring gateway metrics. This will be empty until you have deployed any gateways which is the next step.

2. Setting up the gateway cluster

2.1. Initializing gateway cluster files

You can initialise the gateway cluster with the default configuration files and directory layout by running the initialization script from the repository root:

bash scripts/init-cluster.sh gateway [optional-suffix] [--provider ovh|hetzner]

The script also supports interactive mode: run bash scripts/init-cluster.sh with no arguments and follow the prompts.

2.2 Provisioning the gateway cluster using terraform

Change into clusters/gateway/terraform. Here is the terraform file that defines your gateway cluster. Depending on the cloud provider you chose, follow the steps in the README of either terraform-hetzner-k3s-vpn or terraform-ovh-k3s-vpn to create a single-node gateway cluster. At the end you should have a k3s cluster with one node up and running.

Before accessing the gateway cluster you have to stop the port-forwarding to your backend cluster if it is still running. Change to clusters/backend and run

./scripts/access_cluster.sh --stop

Then return to clusters/gateway and access your newly provisioned cluster:

eval $(./scripts/access_cluster.sh --start)

Again, check that your one-node cluster is up and running with

kubectl get nodes -o wide

2.3. Installing leap-gateway

2.3.1. Configuring gw.values.yaml

You can change the configuration of all gateway deployments in this file: openvpn, obfsvpn, ovpn-addons and menshenAgent. For further help on configuring these services please refer to the leap-gateway documentation.

In order for the gateway deployment to work, it is neccessary to fill in all lines marked with ⚠️:

obfsvpn.OBFSVPN_LOCATION and menshenAgent.location: The location of your provisioned gateway server in small letters, e.g. london
obfsvpn.MENSHEN_URL or menshenAgent.menshenUrl_: The url to the domain name you registered for your backend controller node, e.g. ‘https://thisis.yourdomain.com’
obfsvpn.OBFSVPN_HOSTNAME: Choose an unique identifier for the gateway. It will be handed over to the client so the name should not reveal too much about the underlying infrastructure.
obfsvpn.OBFS4_PUBLIC_HOST and menshenAgent.externalIp: The floating IP assigned to your gateway server. Look it up in the console of your cloud provider. If you don’t have a floating IP assigned, just take the public IP of your gateway controller node.

2.3.2. Generating gateway certificates

This script creates openvpn gateway certificates signed by the Provider CA used by clients to verify the authenticity of gateways.

It is neccessary that provider certificates have already been created. We’ll assume they are available in the top-level secrets directory. If they are missing, run scripts/gen-provider-certs.sh first. Find the instructions for this script here.

In the top-level podlily folder run

bash scripts/gen-gateway-certs.sh

You will be prompted:

Provider CA certificate path: The path where the certs are stored. The default should be right if you haven’t changed the path in the gen-provider-certs script
Menshen CA certificate path: same
Provider CA key path: same
Gateway domain or a unique name for the gateway: Choose a unique name for your gateway
Certificates output directory: The default is good but you can choose a different place if you want

If everything works as expected you should see a new folder under secrets named after your gateway name.

2.3.3. Creating gateway secrets

The next script will create secrets from the certificates that were just created so the deployments on the k3s cluster can consume them.

Make sure you have access to your gateway cluster (kubectl get nodes works) and run

bash scripts/create-gateway-secrets.sh

You will be prompted:

Certificates directory: If you haven’t changed it in the previous scripts the default is correct
Kubernetes namespace: The namespace the secrets will be created in. Must match the namespace leap-gateway will be deployed in. ‘default’ is fine
Gateway domain: Select the gateway name or domain for which you generated the certs in the previous step

You need to create a Kubernetes Secret on the gateway cluster containing the same key as in this step. Again, you can simply use the provided helper script to create this secret in your gateway cluster from the file that was created during backend secrets creation:

bash scripts/create-menshen-shared-secret.sh

It asks for a secrets directory (where the shared key file lives or is created) and the Kubernetes namespace. If the secret already exists from backend secret creation, the script will simply take that secret and make it available to menshen-agent running on the gateway cluster. The helm-gateway chart references that secret through menshenAgentSharedSecret.name in the Helm gw.values.yaml.

2.3.4. Deploying the leap-gateway chart

Change into clusters/gateway/helm, fill in your gateway name or domain in the following command and run it:

helm upgrade --install -f ./gw.values.yaml <your-gateway-name> https://0xacab.org/api/v4/projects/6095/packages/generic/helm-chart/v0.1.4/leap-gateway-v0.1.4.tgz

You can check if everything runs smoothly by checking the pods:

kubectl get pods

You should see 4 pods running: menshen-agent, obfsvpn, openvpn-tcp and openvpn-udp.

2.3.5. Updating menshen with the gateway location

In order for the bitmask client to suggest the closest gateway for each user and show the correct gateway location, the menshen instance running on the backend cluster needs some general information about the gateway location. This step is about providing menshen with this information.

Stop the port-forwarding to the gateway cluster by changing into clusters/gateway and running:

./scripts/access_cluster.sh --stop

Then return to clusters/backend and access this cluster:

eval $(./scripts/access_cluster.sh --start)

General location information is configured via the backend Helm chart values. Edit the locations section under eip: in your backend.values.yaml (usually found at clusters/backend/helm/backend.values.yaml or similar). Add or edit your locations as in the following example:

eip:
  locations:
    seattle:
      country_code: "US"
      display_name: "Seattle"
      hemisphere: "N"
      timezone: "-7"
    amsterdam:
      country_code: "NL"
      display_name: "Amsterdam"
      hemisphere: "N"
      timezone: "+1"

Make sure the location key is all lowercase and matches the variables obfsvpn.OBFSVPN_LOCATION and menshenAgent.location set in your gateway’s Helm values (gw.values.yaml).
The display_name field will appear in the Bitmask app as the city name, and the timezone should reflect the UTC offset.

After updating your backend.values.yaml, change into clusters/backend/helm and apply the changes by upgrading your leap-backend Helm release:

helm upgrade --install -f ./backend.values.yaml leap-backend https://0xacab.org/api/v4/projects/6134/packages/generic/helm-chart/v0.1.2/provider-backend-v0.1.2.tgz

Next you need to restart the menshen pod for the changes to take effect. First find out the pod-name by running:

kubectl get pods

Copy the name of the pod that starts with ’leap-backend-provider-backend-menshen’ and delete it with:

kubectl delete pod <pod-name>

You can check if everything worked out smoothly by typing in your browser https://your.domain.com/api/5/service and see if your location shows up correctly in the “locations” part.

2.3.6. Connect to your gateway with Bitmask

Hooray everything is ready to connect to your gateway with the Bitmask app. In the provider choice menu add a new provider by typing <your.domain.com> in the URL field. If everything worked out Bitmask should connect you to your newly provisioned gateway showing the correct location. You can check if your IP actually changed by opening https://myip.wtf in your browser 🎉.

2.4. Installing gateway monitoring with grafana alloy

2.4.1. Configuring alloy.values.yaml

Fill in the prometheus subdomain created in this step during backend monitoring installation and all other lines marked with ⚠️.

For further alloy configuration options see the official documentation.

2.4.2. Installing grafana alloy using the install script

In order to write to the secured prometheus endpoint on your backend, grafana alloy is using basic authentication with a username and password. The username is given in alloy.values.yaml, the password will be made available as a secret. It is important to use the same password you created for this reason in this step during backend monitoring installation preparations.

Again, replace <your-password> with the secure password and run:

export REMOTE_WRITE_PASSWORD=<your-password>

The script will create the secret containing the authentication password for the communication with backend prometheus and install grafana alloy.

In order for it to work you need access to your backend cluster, so make sure to stop the port-forwarding to the backend cluster and run clusters/gateway/scripts/access_cluster.sh again. Then change into clusters/gateway and run:

./scripts/deploy_alloy.sh

After deployment you can check the newly running pod:

kubectl get pods

You should see the alloy pod in addition to the 4 leap-gateway pods.

2.4.3. Monitoring your gateways using the grafana dashboard

Since grafana is running on the backend cluster, in order to access it, you first need to stop the port-forwarding to your gateway cluster. Change to clusters/gateway and run

./scripts/access_cluster.sh --stop

Then follow the instructions in this step to port-forward the grafana instance running on your backend cluster.

Open the ‘Gateway metrics’ dashboard to see data of your gateway(s). Alloy is scraping metrics from the deployments and nodes running on your gateway cluster and writing them to the prometheus endpoint on your backend cluster, making them visible in grafana. If everything went well you should be able to see the number of connected clients and information on traffic and load, making this a vital dashboard for monitoring the health of your gateways.

3. Secrets Management with git-crypt

3.1. Setting up git-crypt

3.1.1. Initializing git-crypt (for repository maintainers)

If you’re setting up git-crypt for the first time in this repository:

Start by initializing git-crypt in the repository root:

git-crypt init

Next, configure which files should be encrypted. Add encryption rules to a .gitattributes file. For this repository, you should include these patterns:

secrets/** - All files in the secrets directory
*.tfstate - Terraform state files

Your .gitattributes file should look like:

secrets/** filter=git-crypt diff=git-crypt
*.tfstate filter=git-crypt diff=git-crypt

This will ensure all files in secrets/ and all *.tfstate files are encrypted.

Add team members’ GPG keys to allow them to decrypt secrets

To give repository access to a new person, simply add their GPG public key:

git-crypt add-gpg-user <gpg-key-id>

This command grants the user access to encrypted files. Before running it, make sure the team member’s public GPG key is available in your local keyring.

You can import their public key directly with:

gpg --import /path/to/team-member-public-key.asc

Alternatively, if you know their GPG key fingerprint, you can fetch it from a keyserver (e.g., keys.openpgp.org) with:

gpg --recv-keys <fingerprint>

To check that the key is present in your keyring:

gpg --list-keys <user-email-or-keyid-or-fingerprint>

Removing a Team Member’s Access

To remove a user’s access to encrypted files, first remove their GPG key from the repository with:

git-crypt remove-gpg-user <gpg-key-id>

Important: Removing a user only prevents them from decrypting new changes that are committed and pushed after their removal. They will still have access to previously unencrypted secrets or any data they already decrypted while they had access. To fully revoke access, rotate the repository secrets so only authorized users have access to the new encrypted files.

For further security, you may wish to re-encrypt secrets or rotate sensitive credentials if a user leaves the team.

Unlocking the Repository

After cloning, you’ll need to unlock the repository to access encrypted files:

# If you have GPG access (preferred method)
git-crypt unlock

# Or if you received a keyfile from a team member
git-crypt unlock /path/to/keyfile

Checking Repository Status

# Check if repository is unlocked
git-crypt status

# See which files are encrypted
git-crypt status -e

Working with Encrypted Files

Once unlocked, encrypted files work transparently - edit, commit, and push normally. The files are automatically encrypted when committed and decrypted when checked out.

Important: Always ensure the repository is unlocked before working with files in the secrets/ directory or any terraform state files.