johnwfinigan.github.io

Using SSH over IAP TCP Forwarding to Build Compute Images with Packer in Google Cloud Build

Previously I wrote about using Google Identity Aware Proxy based SSH to run Ansible within Google Cloud Build without needing a firewall penetration or network peering between Cloud Build and the target VMs, and without the target VMs needing public IPs. In this post I’ll show how to use IAP TCP forwarding to build compute images using Hashicorp Packer running in Cloud Build, also without direct networking between Cloud Build and the ephemeral VM that Packer uses for the image build.

That previous post documents the necessary IAM and firewall rules needed for full IAP SSH to work, however here we are using IAP TCP forwarding, and SSH authentication is being handled by Packer itself instead of OSLogin, so only a subset of the requirements need to be met: the GCE firewall must allow IAP to communicate with the VM, and the Cloud Build service account must have IAM rights to start a tunnel to the VM (roles/iap.tunnelResourceAccessor)

Example cloudbuild.yaml - Click here to view raw.

steps:
  - name: 'hashicorp/packer'
    entrypoint: sh
    args:
      - '-c'
      - |
          cp $(which packer) /workspace/
          chmod 555 /workspace/packer

  - name: 'gcr.io/google.com/cloudsdktool/google-cloud-cli:slim'
    env:
      - 'PACKER_NO_COLOR=true'
    entrypoint: bash
    args:
      - '-c'
      - |
          set -euo pipefail
          $(gcloud info --format="value(basic.python_location)") -m pip install numpy
          python3 -m pip install ansible
          touch ./log
          ( while ! grep -Fq "Instance has been created" ./log ; do 
              echo "waiting to start tunnel" ; 
              sleep 5 ; 
            done ; 
            sleep 60 ; 
            gcloud compute start-iap-tunnel packer-${BUILD_ID} 22 --local-host-port=127.0.0.1:22222 --zone=${_BUILD_ZONE} ) &
          /workspace/packer build \
            -var zone=${_BUILD_ZONE} \
            -var instance_name=packer-${BUILD_ID} \
            my_packerfile.pkr.hcl |& tee ./log

options:
  logging: CLOUD_LOGGING_ONLY
timeout: 3600s

Essentially, IAP TCP tunnelling is used to make port 22 on the target VM appear at port 22222 inside the Cloud Build runtime, and directives are added to the packerfile to link this all together, as shown below. In Cloud Build, $BUILD_ID is a built-in variable, but $_BUILD_ZONE is a user-supplied substitution that I am showing here since IAP tunneling and the compute instance have to be coordinated regarding the zone and the build VM’s name. Your packerfile will contain something like this:

source "googlecompute" "my_build" {
  ...
  ...
  zone                    = "${var.zone}"
  disable_default_service_account = true
  instance_name           = "${var.instance_name}"
  ssh_host                = "127.0.0.1"
  ssh_port                = "22222"
  pause_before_connecting = "60s"
  metadata = { 
    enable-oslogin = "FALSE"
  }
  ...
  ...
}

Notably, this is not the prettiest shell scripting. There are probably race conditions in it, and some of the inserted waits may not actually be needed to avoid them. However, I’ve run a few dozen Linux image builds successfully using this code, and have not experienced a failure to connect yet.

Unlike my Ansible example, here I chose to rely on no custom containers and assemble everything needed using well known images.

As a bonus, here is some terraform you may be able to adapt to set up your firewall and IAM to allow IAP tunnelling to your VMs:

resource "google_compute_firewall" "allow-iap-ssh" {
  name    = "allow-iap-ssh"
  network = google_compute_network.FIXME.name
  allow {
    protocol = "tcp"
    ports    = ["22"]
  }
  source_ranges = ["35.235.240.0/20"]
  priority      = "1000"
}

module "image-cloudbuild" {
  source       = "terraform-google-modules/service-accounts/google"
  names        = ["image-cloudbuild"]
  display_name = "image-cloudbuild"
  project_roles = [ 
    "FIXME_PROJECT=>roles/cloudbuild.builds.builder",
    "FIXME_PROJECT=>roles/compute.instanceAdmin.v1",
    "FIXME_PROJECT=>roles/compute.networkUser",
    "FIXME_PROJECT=>roles/iap.tunnelResourceAccessor",
  ]
}