DevOps Engineer (AI-First)

high priority — specialists are needed right now, and those who show the best results will immediately start an internship

Apply

Infrastructure engineer position. Design CI/CD pipelines, automate deployments, and build enforcement tooling with AI-assisted workflows.

To apply, fill out the form: https://docs.google.com/forms/d/1wcebtAFzZTGu76Co6W-HyzL8aHbhwItH2kfFLa3G17s/viewform

The form includes a section for your test task submission. Your answer should be a link to a public GitHub repository. Read the task description below before filling out the form.

About the Position

We are looking for a DevOps Engineer who works AI-first: Claude, Cursor, Windsurf, or similar tools are your primary development instrument, not a fallback. You will design CI/CD pipelines, automate infrastructure, and build enforcement tooling — first in sandbox projects under the guidance of engineers and AI capability leads, then on paid commercial projects with full responsibility for deadlines, quality, and deliverables.

Why Helm and Ansible. Helm charts are declarative — templates describe the desired Kubernetes state, not the steps to reach it. Ansible orchestrates deployment order and validates the result. Both are Infrastructure as Code that maps directly to specifications, making them natural for SDD traceability and easier for LLMs to work with.

An engineer with AI knows why the code works — and can prove it through specifications, tests, and traceability. A vibe coder knows that it works, until it doesn’t. We hire engineers.

For details about how our internship works, check out our Internship Overview.
Learn more about our team at https://foreachpartners.com/.

How We Work

Our engineering philosophy is Parsimony-Driven Development (PDD) — every artifact, instruction, and line of code must justify its existence. If removing it introduces no ambiguity and loses no meaning, it should not be there.

We implement PDD through Specification-Driven Development (SDD) — specification precedes implementation. Before writing code, you formulate requirements, derive contracts, and only then implement. AI tools accelerate every stage but replace none.

SDD rests on four pillars: Traceability, DRY, Deterministic Enforcement, and Parsimony. You MUST read and understand these principles before starting the test task:

Required reading: Specification-Driven Development: Four Pillars

AI generates code. You are responsible for it. The competence we evaluate is not how fast you produce output — it is how well you verify correctness, enforce constraints, and catch what the model gets wrong. The operator owns the result, not the agent.

What You Will Do

Design and maintain CI/CD pipelines that enforce specification compliance
Build and manage Kubernetes deployments using Helm charts
Orchestrate infrastructure provisioning and deployment with Ansible
Build deterministic validation scripts that run on every commit
Configure monitoring and alerting (Grafana, Prometheus)
Review and correct AI-generated infrastructure code for security and reliability

What We’re Looking For

AI Proficiency:
- Confident user of at least one AI IDE (Cursor, Windsurf, or Claude Code)
- Understanding of prompt engineering: how to structure instructions, provide context, and iterate on output
- Understanding of context engineering: what to include in the model’s context and what to leave out
- Ability to decompose tasks for AI and critically evaluate output
DevOps Foundations:
- Comfortable with Linux CLI and shell scripting
- Basic understanding of Docker and container orchestration
- Familiarity with at least one CI/CD platform (GitHub Actions, GitLab CI)
Engineering Mindset:
- Understanding of why deterministic enforcement matters
- Comfort with Git workflows and branching strategies
Nice-to-Have:
- Experience with Kubernetes and Helm
- Experience with Ansible for infrastructure automation
- Familiarity with monitoring stacks (Prometheus, Grafana)
- Understanding of networking fundamentals and security basics

Why Apply?

AI-First Culture: Work in a team where AI tools are the norm, not an experiment
Structured Growth: Start in sandbox projects, prove your quality control skills, then move to paid commercial work
Career Path: Outstanding interns transition to permanent roles with full engineering responsibility

Test Task

To apply, complete the test task below. This is how we evaluate your ability to work with AI tools on a real engineering problem.

You MUST use AI (Claude, Cursor, Windsurf, or similar) as your primary development tool. Manual coding without AI assistance is not what we’re evaluating.

Time budget: 4–6 hours with AI tools.

The Big Picture

You are building one piece of a larger SDD toolchain. Multiple interns across different specializations work on the same product:

Rust service — scans codebases, computes traceability metrics, serves data via REST API
Next.js dashboard — web interface consuming the API
Flutter app — mobile interface consuming the API
DevOps infrastructure (your task) — deploys and monitors the whole stack

All parts share a common API specification: SDD Navigator API · download YAML

SDD Navigator — Kubernetes Deployment

Deploy the full SDD Navigator stack to Kubernetes using Helm charts orchestrated by Ansible, with CI validation for all infrastructure code. The stack consists of a Rust API service, PostgreSQL database, and Next.js frontend served by nginx.

You are given a requirements.yaml below — you do NOT write your own. Your job is to implement infrastructure that satisfies these requirements and annotate every artifact with # @req references back to them.

Provided: requirements.yaml

Copy this file into your repository root. This is the specification your infrastructure MUST satisfy.

requirements:
  - id: SCI-HELM-001
    type: FR
    title: Helm chart for API service
    description: >-
      Chart MUST define a Deployment for the sdd-coverage Rust API
      container with configurable image, replicas, resource limits,
      liveness probe on /healthcheck (initialDelaySeconds: 10,
      periodSeconds: 15), and readiness probe on /healthcheck
      (initialDelaySeconds: 5, periodSeconds: 5).
    createdAt: "2026-03-01T00:00:00Z"
    updatedAt: "2026-03-01T00:00:00Z"

  - id: SCI-HELM-002
    type: FR
    title: PostgreSQL deployment
    description: >-
      Chart MUST deploy PostgreSQL via Bitnami subchart or custom
      StatefulSet with PVC for data persistence. Database name, user,
      and password MUST be configurable via values.yaml. Credentials
      MUST be stored in a Kubernetes Secret, not in plaintext defaults.
    createdAt: "2026-03-01T00:00:00Z"
    updatedAt: "2026-03-01T00:00:00Z"

  - id: SCI-HELM-003
    type: FR
    title: Frontend deployment
    description: >-
      Chart MUST deploy an nginx container serving the pre-built
      Next.js static export. Image, replicas, and resource limits
      MUST be configurable via values.yaml.
    createdAt: "2026-03-01T00:00:00Z"
    updatedAt: "2026-03-01T00:00:00Z"

  - id: SCI-HELM-004
    type: FR
    title: Ingress routing
    description: >-
      Chart MUST define an Ingress resource routing /api/* to the
      API service and / to the frontend. TLS MUST be configurable
      (enabled/disabled with secretName). Host MUST be configurable
      via values.yaml.
    createdAt: "2026-03-01T00:00:00Z"
    updatedAt: "2026-03-01T00:00:00Z"

  - id: SCI-HELM-005
    type: FR
    title: Secrets and ConfigMaps
    description: >-
      Chart MUST use a Secret for database credentials and a ConfigMap
      for application configuration (API port, database host, log level).
      No credentials in values.yaml defaults — defaults MUST use
      placeholder values that fail visibly if not overridden.
    createdAt: "2026-03-01T00:00:00Z"
    updatedAt: "2026-03-01T00:00:00Z"

  - id: SCI-HELM-006
    type: FR
    title: DRY configuration
    description: >-
      Every configurable value (image tags, ports, resource limits,
      replica counts, hostnames) MUST be defined in values.yaml and
      referenced via templates. No hardcoded values in template files.
      Shared labels MUST use a _helpers.tpl partial.
    createdAt: "2026-03-01T00:00:00Z"
    updatedAt: "2026-03-01T00:00:00Z"

  - id: SCI-ANS-001
    type: FR
    title: Ansible orchestration
    description: >-
      Playbook MUST deploy the Helm chart to a Kubernetes cluster
      in correct order: namespace creation, secrets, database, API
      service, frontend, ingress. Playbook MUST wait for each
      component to reach ready state before proceeding to the next.
    createdAt: "2026-03-01T00:00:00Z"
    updatedAt: "2026-03-01T00:00:00Z"

  - id: SCI-ANS-002
    type: FR
    title: Post-deploy validation
    description: >-
      Ansible MUST include a validation role that runs after
      deployment and verifies: /healthcheck returns HTTP 200 with
      status "healthy", /stats returns HTTP 200, all pods are
      Running, PostgreSQL accepts connections via pg_isready.
      Results MUST be reported as Ansible debug output with
      pass/fail per check.
    createdAt: "2026-03-01T00:00:00Z"
    updatedAt: "2026-03-01T00:00:00Z"

  - id: SCI-ANS-003
    type: FR
    title: Idempotency
    description: >-
      Running the playbook twice MUST produce zero changes on
      the second run. All Ansible tasks MUST report ok (not changed)
      when the desired state already exists.
    createdAt: "2026-03-01T00:00:00Z"
    updatedAt: "2026-03-01T00:00:00Z"

  - id: SCI-CI-001
    type: FR
    title: CI pipeline
    description: >-
      GitHub Actions workflow MUST run helm lint, helm template
      (render and validate), ansible-lint, and yamllint on every
      push. Jobs MUST run in parallel where possible.
    createdAt: "2026-03-01T00:00:00Z"
    updatedAt: "2026-03-01T00:00:00Z"

  - id: SCI-CI-002
    type: FR
    title: Manifest validation
    description: >-
      CI MUST render Helm templates and validate the output against
      Kubernetes schemas using kubeconform or kubeval. Validation
      MUST fail on unknown fields and missing required fields.
    createdAt: "2026-03-01T00:00:00Z"
    updatedAt: "2026-03-01T00:00:00Z"

  - id: SCI-TRACE-001
    type: FR
    title: Traceability annotations
    description: >-
      Every Helm template file, every Ansible task file, and every
      CI job MUST contain a @req annotation comment referencing the
      requirement it implements. A traceability check script MUST
      scan all .yaml, .yml, and .tpl files and report any file
      missing @req annotations.
    createdAt: "2026-03-01T00:00:00Z"
    updatedAt: "2026-03-01T00:00:00Z"

  - id: SCI-SEC-001
    type: AR
    title: Security baseline
    description: >-
      API and frontend containers MUST run as non-root user.
      PostgreSQL data volume MUST have restrictive permissions.
      No container SHOULD use the latest tag — all images MUST
      specify explicit version tags.
    createdAt: "2026-03-01T00:00:00Z"
    updatedAt: "2026-03-01T00:00:00Z"

The /healthcheck endpoint returns (per the API spec):

{
  "status": "healthy",
  "version": "3.0.0",
  "timestamp": "2026-02-28T12:00:00Z"
}

Step 1: Set up the project structure

Study the SDD Navigator API spec to understand the service you are deploying. Copy the provided requirements.yaml into your repository root.

Create the project skeleton:

requirements.yaml                      # provided above — copy as-is
charts/
  sdd-navigator/                       # umbrella Helm chart
    Chart.yaml
    values.yaml
    templates/
      _helpers.tpl                     # shared labels, selectors, names
      ingress.yaml                     # @req SCI-HELM-004
    charts/
      api/                             # subchart: Rust API service
        Chart.yaml
        values.yaml
        templates/
          deployment.yaml              # @req SCI-HELM-001
          service.yaml                 # @req SCI-HELM-001
          configmap.yaml               # @req SCI-HELM-005
          secret.yaml                  # @req SCI-HELM-005
      frontend/                        # subchart: nginx + static files
        Chart.yaml
        values.yaml
        templates/
          deployment.yaml              # @req SCI-HELM-003
          service.yaml                 # @req SCI-HELM-003
ansible/
  playbook.yml                         # @req SCI-ANS-001
  inventory/
    local.yml
  group_vars/
    all.yml
  roles/
    deploy/
      tasks/main.yml                   # @req SCI-ANS-001
    validate/
      tasks/main.yml                   # @req SCI-ANS-002
scripts/
  check-traceability.sh                # @req SCI-TRACE-001
.github/
  workflows/
    infra-ci.yml                       # @req SCI-CI-001, SCI-CI-002
README.md

PostgreSQL: use the Bitnami PostgreSQL Helm subchart as a dependency in Chart.yaml, or implement a custom StatefulSet subchart. Both are acceptable — document your choice in README.md.

Step 2: Build the Helm charts

Umbrella chart (charts/sdd-navigator/):

Chart.yaml with dependencies: api (local subchart), frontend (local subchart), postgresql (Bitnami or local subchart)
values.yaml with global values: domain, namespace, image registry prefix, TLS settings
templates/_helpers.tpl: common labels (app.kubernetes.io/name, app.kubernetes.io/instance, app.kubernetes.io/version), selector template, fullname template
templates/ingress.yaml: routes /api/* to API service, / to frontend service

API subchart (charts/sdd-navigator/charts/api/):

Deployment: configurable image, replicas, resource requests/limits, liveness probe (GET /healthcheck, initialDelaySeconds: 10, periodSeconds: 15), readiness probe (GET /healthcheck, initialDelaySeconds: 5, periodSeconds: 5)
Service: ClusterIP on configurable port
ConfigMap: API_PORT, DATABASE_URL, LOG_LEVEL
Secret: database credentials injected as environment variables
Environment variables referencing ConfigMap and Secret

Frontend subchart (charts/sdd-navigator/charts/frontend/):

Deployment: nginx container serving pre-built static files (use nginx:1.27-alpine as default image with configurable tag)
Service: ClusterIP on port 80

PostgreSQL (via Bitnami dependency or custom subchart):

Bitnami approach: add bitnami/postgresql as dependency with auth.database, auth.username, auth.existingSecret overrides in parent values.yaml
Custom approach: StatefulSet with PVC (1Gi default, configurable), readiness probe via pg_isready

All charts MUST:

Have no hardcoded values in templates — everything from values.yaml
Include # @req SCI-XXX-NNN comment at the top of each template file
Pass helm lint and helm template without errors
Use explicit image version tags, not latest

Step 3: Build the Ansible orchestration

Playbook (ansible/playbook.yml):

Target: localhost with connection: local (uses local kubeconfig)
Roles: deploy, validate

Deploy role (ansible/roles/deploy/tasks/main.yml):

Create namespace (if not exists)
Create Kubernetes Secret for database credentials
Run helm upgrade --install with --wait --timeout 5m
Wait for all pods to reach Ready state using kubectl wait

Validate role (ansible/roles/validate/tasks/main.yml):

Check API healthcheck: verify HTTP 200 and "status": "healthy"
Check stats endpoint: verify HTTP 200 with valid JSON body
Check all pods Running: kubectl get pods with field selector, expect no non-Running pods
Check PostgreSQL readiness: pg_isready inside the database pod
Report results as Ansible debug output with pass/fail per check

Group vars (ansible/group_vars/all.yml):

namespace, release_name, chart_path
api_image, api_tag, api_replicas
frontend_image, frontend_tag
db_name, db_user (no passwords in files — use env or vault)
domain, tls_enabled

All Ansible tasks MUST:

Have # @req SCI-ANS-XXX annotations
Be idempotent (use changed_when / when guards where needed)
Use ansible.builtin fully qualified module names

Step 4: Build the CI pipeline

.github/workflows/infra-ci.yml:

Jobs (running in parallel where possible):

helm-lint — helm lint charts/sdd-navigator/ with strict mode
helm-validate — helm template charts/sdd-navigator/ piped to kubeconform --strict --summary
ansible-lint — ansible-lint ansible/
yamllint — yamllint -d relaxed .
traceability — bash scripts/check-traceability.sh (scans all infra files for @req annotations)
summary — runs after all jobs, produces consolidated pass/fail report as step summary

Each job MUST have a # @req SCI-CI-001 or # @req SCI-CI-002 comment.

Traceability check script (scripts/check-traceability.sh):

Scan all .yaml, .yml, and .tpl files in charts/ and ansible/
Check each file for at least one # @req SCI- annotation
Extract referenced requirement IDs and validate they exist in requirements.yaml
Report orphan annotations (referencing non-existent requirements)
Report unannotated files
Exit code 1 on any violation

Step 5: Demonstrate enforcement

On the main branch: all CI checks pass, all infra code is annotated, helm lint and ansible-lint produce no errors.

Create a branch demo/violation with intentional SDD violations:

A Helm template file with no @req annotation — violates SCI-TRACE-001
A hardcoded port number in a template (not from values.yaml) — violates SCI-HELM-006
A missing liveness probe on the API deployment — violates SCI-HELM-001
A database password in values.yaml defaults as plaintext — violates SCI-HELM-005
A CI job comment with an orphan @req ID (references a requirement that does not exist)

CI on demo/violation MUST fail, with clear output showing each violation.

Link both the passing and failing CI runs in README.md.

Deliverables

Provide a link to a public GitHub repository containing:

Helm umbrella chart with subcharts for API, frontend, and PostgreSQL (charts/)
Ansible playbook with deploy and validate roles (ansible/)
GitHub Actions workflow with a passing CI run on main
Traceability check script (scripts/check-traceability.sh)
demo/violation branch with a failing CI run
requirements.yaml (the provided specification, copied as-is)
README.md: what each component deploys, how to run helm template locally, how to run the Ansible playbook, architecture decisions (Bitnami vs custom PostgreSQL), links to CI runs
PROCESS.md — your AI development process artifact (see below)

How We Evaluate

We are fully transparent about evaluation. Below are the exact prompts we use.

Step 1: You generate PROCESS.md

After completing the task, run the following prompt against your full AI conversation history. Commit the output as PROCESS.md in the repository root.

Analyze all AI conversations used during development of this project.
For each conversation, extract timestamps (start time, end time) from the chat metadata.

Produce a markdown document PROCESS.md with the following sections:

1. **Tools Used** — which AI tools (IDE, model, plugins) the developer used and for what.
2. **Conversation Log** — for each AI session: start/end timestamps, topic, what the
   developer asked for, what was accepted, what was rejected or corrected.
3. **Timeline** — chronological list of major steps with timestamps and duration.
4. **Key Decisions** — what architectural and implementation choices the developer made,
   and why. What alternatives were considered?
5. **What the Developer Controlled** — which parts of the output the developer reviewed,
   tested, or rewrote. Be specific: list files, functions, and config sections.
   What verification steps did the developer take before accepting AI output?
6. **Course Corrections** — moments where the developer identified incorrect, incomplete,
   or suboptimal AI output and changed direction. What was the issue, how was it caught,
   and what did the developer do instead?
7. **Self-Assessment** — which SDD pillars (Traceability, DRY, Deterministic Enforcement,
   Parsimony) are well-covered in the submission and which need improvement.

Step 2: We evaluate your repository

We run the following prompt against your submission. You can run it yourself before submitting:

Evaluate this repository against the SDD (Specification-Driven Development) four pillars:

1. **Traceability**: Does every Helm template, Ansible task, and CI job have @req
   annotations? Does the traceability check script enforce this? Are there orphan
   annotations or unannotated files? Does the provided requirements.yaml cover all
   infrastructure artifacts?

2. **DRY**: Are shared labels defined once in _helpers.tpl? Are configurable values
   in values.yaml, not hardcoded in templates? Are Ansible variables centralized in
   group_vars, not scattered across role files? Is there duplication between the
   umbrella chart values and subchart values?

3. **Deterministic Enforcement**: Does CI run helm lint, kubeconform, ansible-lint,
   yamllint, and traceability checks on every push? Can any check be bypassed? Is
   the Ansible validate role automated, not manual? Are there manual verification
   steps that could be scripted?

4. **Parsimony**: Are Helm charts minimal — no unnecessary resources or templates?
   Does each Ansible role do one thing? Is the CI pipeline concise — no redundant
   jobs? Is values.yaml organized without unused parameters? Is the README factual,
   not narrative?

For each pillar: rate as PASS / PARTIAL / FAIL with specific file references and line
numbers. Produce a summary table and a list of concrete violations.

A good submission is honest, not polished. We value a candidate who catches AI mistakes over one who ships fast without checking.

If you feel overwhelmed by the volume of new concepts here — that is normal. What we describe is the cutting edge of AI-assisted engineering. These are not widely known practices yet. Open Cursor or Claude, and study this material together with your AI tool. Just remember: it is you who is learning, not your agent. Move to practice as quickly as possible — only hands-on work turns information into applicable skill.

We look forward to seeing how you build with AI — and how you think about what AI builds for you.