CozyHawk Platform
Sample deliverable.ExampleCo is a fictional AWS/EKS SaaS company. Every score, finding, and document below was generated by the CozyHawk Platform assessment engine from ExampleCo's questionnaire answers — exactly what your engagement produces, with your data.

ExampleCo — Platform Assessment & Blueprint

B2B SaaS · 120 engineers · multi-account AWS · EKS · Terraform/Terragrunt · organically grown GitOps

Prepared by CozyHawk Platform — founder-led by a principal platform engineer.

Overall Platform Maturity

2.5/ 5Defined but inconsistent

Standards are defined but applied unevenly across teams.

AWS Account Strategy

2.0

Partially defined

Environment Strategy

2.8

Defined but inconsistent

Networking & DNS

2.2

Partially defined

EKS & Kubernetes

2.8

Defined but inconsistent

Infrastructure as Code

2.8

Defined but inconsistent

GitOps

2.7

Defined but inconsistent

CI/CD

3.0

Defined but inconsistent

Observability

2.7

Defined but inconsistent

Secrets & Security

2.4

Partially defined

Cost Allocation & Tagging

2.4

Partially defined

Ownership & Operating Model

2.0

Partially defined

Documentation

2.0

Partially defined

Top Risks

  1. 1Without a deliberate account structure, blast radius, billing boundaries, and security isolation are weaker than they should be, and every new workload increases entropy.
  2. 2Unclear ownership turns every incident, upgrade, and audit into detective work, and orphaned resources accumulate cost and risk.
  3. 3Stale documentation actively misleads: engineers trust it, act on it, and lose time; new hires onboard slowly and tribal knowledge concentrates risk.
  4. 4Untracked CIDR ranges and inconsistent DNS naming create collision risk, slow incident response, and make future VPC peering or consolidation expensive.
  5. 5Scattered secrets handling and uneven guardrails mean a single leaked credential or misconfigured workload can become a serious incident.

Top Quick Wins

  1. 1Inventory all existing accounts, their purpose, and their owner in a single registry.
  2. 2Run an ownership sweep on production resources; tag or list everything with no clear owner.
  3. 3Produce a current-state architecture overview and environment matrix — even a first draft removes the most common questions.
  4. 4Export all VPC CIDRs and hosted zones into one document and flag overlaps and naming inconsistencies.
  5. 5Audit for secrets in repos and CI variables; rotate and migrate the riskiest ones to the central store.

Executive Summary

Status: Generated assessment · ExampleCo

Purpose

This document summarizes the current platform engineering maturity of ExampleCo, the most material risks, and the recommended sequence of improvements. It is written for engineering leadership and is backed by the detailed category assessment.

Overall Maturity

2.5 / 5 — Defined but inconsistent

Standards are defined but applied unevenly across teams.

ExampleCo is not starting from zero. The platform has working pieces across CI/CD and Environment Strategy, but they are not yet organized into one consistent operating model. The weakest areas are AWS Account Strategy (2.0/5), Ownership & Operating Model (2.0/5), and Documentation (2.0/5).

What this means in practice

"Defined but inconsistent" is the expensive middle. The organization has already paid for standards — someone wrote the tagging policy, someone set up GitOps — but because adoption is uneven, it still pays the ad-hoc tax on top: each team solves the same problems its own way, cost questions from finance take detective work instead of a query, and every audit or incident starts with "first, figure out what exists and who owns it." This does not hold still. Each new account, cluster, and service is created against the inconsistent pattern, so the cost of converging grows every quarter. The encouraging part: closing the gap from 2.5 toward 4 is mostly sequencing and follow-through, not new technology — the components are already in place.

Maturity by Category

AreaSignal
AWS Account Strategy2.0/5 — Partially defined
Environment Strategy2.8/5 — Defined but inconsistent
Networking & DNS2.2/5 — Partially defined
EKS & Kubernetes2.8/5 — Defined but inconsistent
Infrastructure as Code2.8/5 — Defined but inconsistent
GitOps2.7/5 — Defined but inconsistent
CI/CD3.0/5 — Defined but inconsistent
Observability2.7/5 — Defined but inconsistent
Secrets & Security2.4/5 — Partially defined
Cost Allocation & Tagging2.4/5 — Partially defined
Ownership & Operating Model2.0/5 — Partially defined
Documentation2.0/5 — Partially defined

The three weakest categories — account strategy, ownership, and documentation — are visibility problems, not tooling problems. That is why the first 30 days of the roadmap focus there: they are the cheapest categories to improve and they unblock every other one.

Reported Pain Points

The team reports five recurring pains: account and environment sprawl, weak cost attribution against growing cloud spend, EKS clusters that have drifted apart, stale or missing documentation, and unclear ownership of services and resources.

Stated Priorities (next 6 months)

Leadership's stated priorities are to standardize environments, establish cost visibility and attribution that finance can trust, converge on one consistent GitOps model, and produce executive-ready documentation that matches reality.

Top Risks

  1. AWS Account Strategy — Without a deliberate account structure, blast radius, billing boundaries, and security isolation are weaker than they should be, and every new workload increases entropy.
  2. Ownership & Operating Model — Unclear ownership turns every incident, upgrade, and audit into detective work, and orphaned resources accumulate cost and risk.
  3. Documentation — Stale documentation actively misleads: engineers trust it, act on it, and lose time; new hires onboard slowly and tribal knowledge concentrates risk.
  4. Networking & DNS — Untracked CIDR ranges and inconsistent DNS naming create collision risk, slow incident response, and make future VPC peering or consolidation expensive.
  5. Secrets & Security — Scattered secrets handling and uneven guardrails mean a single leaked credential or misconfigured workload can become a serious incident.

Recommended Quick Wins

  1. Inventory all existing accounts, their purpose, and their owner in a single registry.
  2. Run an ownership sweep on production resources; tag or list everything with no clear owner.
  3. Produce a current-state architecture overview and environment matrix — even a first draft removes the most common questions.
  4. Export all VPC CIDRs and hosted zones into one document and flag overlaps and naming inconsistencies.
  5. Audit for secrets in repos and CI variables; rotate and migrate the riskiest ones to the central store.

The Path Forward

The goal is not to slow teams down. The goal is a supported path that helps teams move faster with less rework. The attached roadmap sequences the work into 30/60/90 day phases: visibility first (inventory, ownership, tagging), then standardization (environments, IaC structure, GitOps promotion), then governance (enforcement, dashboards, continuous assessment).

The first two weeks are deliberately unglamorous and deliberately cheap: stand up the account registry, run the production ownership sweep, and publish the first-draft environment matrix. None of it requires new tooling, and each item removes a class of questions that currently lands on the platform team. Re-score the assessment quarterly — the maturity trend, not any single score, is the number leadership should watch.

Current State Assessment

Purpose

This document records the current state of the ExampleCo platform as reported in the assessment questionnaire. It is a factual baseline, not a recommendation page. Strategy and target state are covered in the Platform Blueprint.

Executive Summary

ExampleCo operates 6–20 AWS accounts with 3–10 EKS clusters across several regions. Overall platform maturity is 2.5/5 (Defined but inconsistent): standards exist in most areas, but adoption is uneven and drift is accumulating.

Category Findings

AWS Account Strategy — 2.0/5 (Partially defined)

██░░░

Accounts have some organizing logic, but exceptions are common and there is no landing-zone or vending process. That combination is how sprawl compounds: every new account is a one-off decision, starts from a different baseline, and inherits guardrails only if someone remembers. At 6–20 accounts this is still cheap to fix — and gets meaningfully more expensive with each account added.

Evidence from questionnaire:

  • How many AWS accounts do you operate?: 6–20 accounts
  • How are AWS accounts organized?: Some structure, but exceptions are common
  • Do you have a landing zone or account vending process?: No defined process

Environment Strategy — 2.8/5 (Defined but inconsistent)

███░░

A standard environment set exists, which puts the organization ahead of many estates this size. The erosion is in the exceptions: teams diverge from the set, non-production drifts from production in configuration, and regions were added organically rather than deliberately. This is the drift pattern that eventually makes "it worked in staging" stop meaning anything.

Evidence from questionnaire:

  • How many AWS regions are actively used?: Several regions, grown organically
  • How are environments defined?: Standard set, but some teams diverge
  • How similar are non-production environments to production?: Similar shape but configuration drifts

Networking & DNS — 2.2/5 (Partially defined)

██░░░

CIDR allocation lives in a spreadsheet and connectivity is point-to-point peering — workable at today's size, but each is one growth spurt away from an address collision or a peering mesh nobody can reason about. DNS naming that varies "by team and era" is the tell that there was never a moment of deliberate design; it slows incident response and keeps certificate handling partly manual.

Evidence from questionnaire:

  • Are VPC CIDR ranges centrally planned and tracked?: Tracked informally (spreadsheet/wiki)
  • How are VPCs connected?: Point-to-point peering as needed
  • Is there a consistent DNS naming pattern across environments?: Naming varies by team and era
  • How are TLS certificates managed?: Mix of automated and manual

EKS & Kubernetes — 2.8/5 (Defined but inconsistent)

███░░

The clusters started from a common shape and have been drifting since: versions and add-ons differ, and upgrades happen when end-of-life forces them. Reactive upgrades are the riskiest kind — they pair a forced timeline with an inconsistent fleet. Namespace conventions exist but are unenforced, which is fine right up until tenancy boundaries or per-team cost attribution start depending on them.

Evidence from questionnaire:

  • How many EKS clusters do you run?: 3–10
  • How consistent are clusters with each other?: Similar but drifting (versions, add-ons)
  • How are Kubernetes version upgrades handled?: Reactive, when forced by EOL
  • Is there a defined namespace/tenancy model?: Conventions exist but are not enforced

Infrastructure as Code — 2.8/5 (Defined but inconsistent)

███░░

Most infrastructure is in code, which is a real foundation. But manual changes "under pressure" mean the code cannot be fully trusted as the source of truth — and the divergence happens precisely when reliability matters most, during incidents. Repo conventions that are consistent but undocumented are a quieter version of the same risk: the structure lives in a few engineers' heads.

Evidence from questionnaire:

  • How much infrastructure is managed by IaC?: Majority, with manual exceptions
  • How are IaC repositories structured?: Mostly consistent but undocumented conventions
  • How common are manual (out-of-band) changes?: Happens under pressure

GitOps — 2.7/5 (Defined but inconsistent)

███░░

GitOps is in place — the hard adoption step is done — but the structure grew organically and promotion is copy/paste between environment folders. Copy/paste promotion is the single most common source of "staging and production were supposed to be identical" surprises: every promotion is a hand edit, and every hand edit is a chance to diverge.

Evidence from questionnaire:

  • How are Kubernetes workloads deployed?: GitOps, but the structure grew organically
  • How do changes promote across environments in GitOps?: Copy/paste between env folders

CI/CD — 3.0/5 (Defined but inconsistent)

███░░

A defined promotion path exists, which is the hard part. The friction is in the manual steps and the copy-pasted pipelines: each service's pipeline is effectively a fork, so improvements and security fixes don't propagate, and every manual step adds queue time plus one more chance for variance between services.

Evidence from questionnaire:

  • Is there a defined promotion path between environments?: Yes, but manual steps involved
  • How consistent are pipelines across services?: Similar patterns, copy-pasted

Observability — 2.7/5 (Defined but inconsistent)

███░░

The stack exists; coverage is the problem. Because it varies by team, "can we see this service?" has a different answer per service — which is the question that matters during an incident. Noisy alerting is the more urgent half of the finding: noise trains on-call to ignore alerts, which is operationally worse than having fewer alerts.

Evidence from questionnaire:

  • What does the observability stack look like?: Metrics/logs exist, coverage varies by team
  • How mature is alerting and on-call?: Alerts exist but are noisy

Secrets & Security — 2.4/5 (Partially defined)

██░░░

Long-lived IAM users coexisting with SSO is the finding to close first: they are the credentials that leak, and they bypass the identity controls already paid for. Secrets split across stores with manual handling means rotation is aspirational rather than routine, and a security baseline that is "some controls, not unified" is hard to state to an auditor with confidence.

Evidence from questionnaire:

  • How is human access to AWS accounts managed?: Mix of SSO and long-lived IAM users
  • Are workload guardrails in place?: Some, inconsistently applied
  • How are application secrets managed?: Mix of stores and manual handling
  • Is there a defined cloud security baseline?: Some controls, not unified

Cost Allocation & Tagging — 2.4/5 (Partially defined)

██░░░

A tagging standard exists on paper; the spend data says otherwise. Attribution is partial with a large unattributed remainder, and Kubernetes cost is rough estimates — so every cost conversation with finance starts from a number nobody fully trusts. Until coverage is measured and enforced, showback and chargeback stay out of reach regardless of tooling.

Evidence from questionnaire:

  • How consistent is resource tagging?: Standard documented but inconsistently applied
  • Can you attribute cost to products/teams today?: Partially; large unattributed remainder
  • Is Kubernetes cost visible per namespace/team?: Rough estimates only

Ownership & Operating Model — 2.0/5 (Partially defined)

██░░░

Ownership lives in people's heads, which works until the people are on vacation, reorganized, or gone. Combined with a ticket-driven engagement model, the platform team spends its capacity doing work for teams instead of building the paved road that would remove the tickets. This is the category that quietly taxes every other one.

Evidence from questionnaire:

  • Is there a reliable record of who owns each service/resource?: Mostly tribal knowledge
  • How does the platform team engage with product teams?: Ticket-driven; platform does work for teams

Documentation — 2.0/5 (Partially defined)

██░░░

Documentation exists but is stale in places, and the architecture diagrams describe a previous version of the platform. Stale documentation is worse than missing documentation, because engineers trust it and act on it. The durable fix is not a documentation sprint; it is maintaining a small living set of documents tied to the operating model — which this assessment seeds.

Evidence from questionnaire:

  • What is the state of platform documentation?: Exists but stale in places
  • Do accurate architecture diagrams exist?: Yes, but outdated

Items Still Needing Human Confirmation

The following are not fully answerable from questionnaire data alone and should be confirmed during the walkthrough:

  • The authoritative account-to-environment mapping.
  • The live resource inventory, tag coverage percentages, and orphaned-resource candidates.
  • The actual drift between IaC state and deployed reality on critical stacks.
  • Cost Explorer rollups by account, region, and service for the trailing 6 months.
  • Any regulated/segregated delivery requirements (e.g. GovCloud paths).

Target-State Platform Blueprint

Purpose

This document defines the recommended target state for the ExampleCo platform: account model, environment lanes, networking/DNS, EKS baseline, IaC structure, GitOps model, and cost governance. Each section is a direction, not a mandate — exceptions are allowed but should be documented and visible.

Guiding Principles

  1. Standardize the common path. Work every team repeats should not be reinvented per team.
  2. Separate application code from environment configuration. Code moves; configuration is per-lane.
  3. Build once, promote with confidence. Artifacts are traceable from commit to production.
  4. Infrastructure is reproducible. Resources are understandable from source control, not only the console.
  5. Ownership is visible. Every meaningful resource answers: what product, what environment, who supports it.
  6. Exceptions are visible. An undocumented exception is hidden platform drift.

AWS Account & Environment Strategy

Target a deliberate AWS Organizations structure with an account vending process, so new accounts start compliant instead of being retrofitted later. The recommended shape is a small set of accounts (development, staging, production, shared tooling) with logical environment lanes inside them: staging-account lanes (e.g. stg, dem, uat) share the account but get distinct namespaces, DNS prefixes, secrets, and tags. Maintain a short code registry (environment, region, account codes) reused across IaC paths, state keys, tags, namespaces, and DNS — one vocabulary everywhere.

Networking & DNS Strategy

Move CIDR allocation and DNS naming from informal tracking to a central registry before further growth makes retrofitting expensive. The recommended shape: centrally governed CIDR allocation (a registry now, IPAM as you grow), a deliberate VPC connectivity topology instead of ad-hoc peering, environment-prefixed DNS (stg., uat. …), and automated certificates everywhere.

EKS & Kubernetes Strategy

Converge the fleet toward a single cluster baseline — versions, add-ons, node strategy, and baseline tooling — so differences between clusters are intentional and documented, not accumulated. Namespaces follow <product>-<envcode>; workload guardrails (quotas, pod security, network policies) are enforced by policy rather than convention; upgrades run on a regular cadence instead of being forced by end-of-life.

IaC Strategy

Make source control the only path to change. The recommended shape: reusable modules plus a live tree organized <account>/<region>/<product>/<unit>; one remote-state pattern; plan/apply through CI with review; required tags injected at the provider level; scheduled drift detection; and a "no console changes" norm with a documented exception path for genuine emergencies.

GitOps Strategy

Replace the organically grown layout with a structure split by change frequency: platform wiring (cluster entrypoints, shared components), a reusable component catalog, and fast-moving app rollout config. Promotion happens via environment overlays and image automation — not copy/paste between folders and not long-lived branches. Secrets are encrypted in git (SOPS) or referenced from a central store.

CI/CD Strategy

Extract shared pipeline templates for the common path (build → scan → publish → deploy), versioned and adopted service by service, so improvements propagate instead of being re-forked. Pair with consistent branch protection and a "build once, promote the same artifact" rule.

Observability Strategy

Standardize the metrics/logs/traces stack and define a minimum bar every service must meet: logs, golden-signal metrics, health checks, a dashboard, actionable alerts, an owner, and a runbook. Coverage stops varying by team because the baseline is the default, not an aspiration.

Secrets & Security Strategy

Centralize secrets in one managed store with automated delivery to workloads; SSO for humans, workload identity for machines, and retirement of long-lived credentials. Apply a cloud security baseline to every account via a phased roadmap: near-term hygiene and visibility, mid-term enforcement, longer-term continuous evidence.

Cost Allocation & Tagging Strategy

Adopt a minimum mandatory tag set — Name, Product, Environment, Owner, CostCenter, DataClassification — first as a measured standard, then enforced via provider default tags and CI validation. Pair it with a resource-cleanup workflow: inventory → classify (active/orphaned/POC/historical) → confirm owner → decommission safely.

Ownership & Operating Model

Stand up a lightweight ownership registry (service → team → contact) and make ownership a required field for anything new. Shift platform engagement from ticket-driven work-for-teams to platform-as-product: paved roads, a visible decision log, and adoption tracked like a product metric.

Documentation Strategy

Maintain a numbered, Confluence-ready living doc set: executive overview, current state, principles, per-domain strategies, adoption playbook, decision log, and open questions. Documentation should match reality — including the manual steps — because docs that describe the intended platform instead of the actual one are how trust erodes.

30/60/90 Day Roadmap

Purpose

A practical, sequenced improvement plan for the ExampleCo platform. Each phase builds on the previous: first visibility, then standardization, then governance. Quick wins are deliberately front-loaded.

Days 0–30 — Visibility & Quick Wins

Focus: know what exists, who owns it, and stop the bleeding in the weakest areas.

  • AWS Account Strategy: Inventory all existing accounts, their purpose, and their owner in a single registry.
  • Ownership & Operating Model: Run an ownership sweep on production resources; tag or list everything with no clear owner.
  • Documentation: Produce a current-state architecture overview and environment matrix — even a first draft removes the most common questions.
  • Networking & DNS: Export all VPC CIDRs and hosted zones into one document and flag overlaps and naming inconsistencies.

Exit criteria (checkable):

  • Account registry published; every account has a stated purpose and a named owner.
  • Production ownership sweep complete; unowned resources are listed, not unknown.
  • First-draft architecture overview and environment matrix published where the team works (wiki/Confluence).
  • CIDR and hosted-zone export exists; overlaps and naming conflicts are flagged with owners assigned.

Days 31–60 — Standardization

Focus: define and adopt the standard path in the areas that hurt most.

  • Secrets & Security: Centralize secrets in one managed store with automated delivery to workloads, and define a cloud security baseline applied to every account.
  • Cost Allocation & Tagging: Define a minimal mandatory tag set (owner, product, environment, cost-center) and enforce it through IaC and tag policies.
  • GitOps: Define the target GitOps repo layout (bases + environment overlays) and a promotion flow, then migrate incrementally.
  • Observability: Standardize the metrics/logs/traces stack and define a minimum observability baseline every service must meet.

Exit criteria (checkable):

  • Secrets baseline, tag standard, GitOps layout, and observability baseline are written and reviewed.
  • The riskiest secrets found in the audit are rotated and migrated to the central store.
  • Tag coverage measured on the top-spend accounts; the unattributed-spend number is published.
  • One pilot product runs the paved road end-to-end — build, promote, observe — with zero copy/paste promotion steps.

Days 61–90 — Governance & Scale

Focus: make the standard durable — enforcement, dashboards, and adoption beyond the pilot.

  • Environment Strategy: Drive non-prod/prod parity through shared IaC so environments differ only by declared configuration.
  • EKS & Kubernetes: Template cluster creation through IaC so new clusters are born consistent, and put upgrades on a regular cadence.
  • Infrastructure as Code: Move toward full coverage with plan/apply automation in CI and policy checks on changes.
  • CI/CD: Treat pipelines as a platform product: versioned templates, deprecation policy, and adoption tracking.

Exit criteria (checkable):

  • Required tags and workload guardrails are enforced automatically on at least the production path.
  • New clusters and environments can only be created from templates; an upgrade cadence is scheduled, not reactive.
  • The assessment is re-scored; the trend is published alongside the next-quarter backlog, prioritized.

Beyond Day 90

Re-run the assessment quarterly. Maturity scoring over time becomes the platform health metric leadership actually reads.

Implementation Backlog

Prioritized from weakest category to strongest. High marks the three weakest categories (account strategy, ownership, documentation) plus credential and CIDR exposure — these anchor days 0–30 and the start of standardization. Low marks real work that sits in the strongest categories and can ride later phases without added risk. Sizing and sequencing should be confirmed with the platform team.

#CategoryItemTypePriority
1AWS Account StrategyInventory all existing accounts, their purpose, and their owner in a single registry.Quick winHigh
2AWS Account StrategyDefine a target AWS Organizations structure (workloads split by environment at minimum) and an account vending process so new accounts start compliant.StandardizationHigh
3Ownership & Operating ModelRun an ownership sweep on production resources; tag or list everything with no clear owner.Quick winHigh
4Ownership & Operating ModelStand up a lightweight ownership registry (service → team → contact) and make ownership a required field for anything new.StandardizationHigh
5DocumentationProduce a current-state architecture overview and environment matrix — even a first draft removes the most common questions.Quick winHigh
6DocumentationGenerate and maintain a core set of living documents: architecture overview, environment matrix, runbooks for critical paths.StandardizationMedium
7Networking & DNSExport all VPC CIDRs and hosted zones into one document and flag overlaps and naming inconsistencies.Quick winHigh
8Networking & DNSEstablish a CIDR allocation plan and a DNS naming convention; record both in a central registry before further growth.StandardizationMedium
9Secrets & SecurityAudit for secrets in repos and CI variables; rotate and migrate the riskiest ones to the central store.Quick winHigh
10Secrets & SecurityCentralize secrets in one managed store with automated delivery to workloads, and define a cloud security baseline applied to every account.StandardizationHigh
11Cost Allocation & TaggingMeasure current tag coverage on the top spend accounts and publish the unattributed-spend number — it motivates the cleanup.Quick winMedium
12Cost Allocation & TaggingDefine a minimal mandatory tag set (owner, product, environment, cost-center) and enforce it through IaC and tag policies.StandardizationMedium
13GitOpsDocument the current deployment path for one critical service end-to-end and identify the steps that vary by team.Quick winMedium
14GitOpsDefine the target GitOps repo layout (bases + environment overlays) and a promotion flow, then migrate incrementally.StandardizationMedium
15ObservabilityDefine golden-signal dashboards and alerts for the top five services by traffic or revenue impact.Quick winMedium
16ObservabilityStandardize the metrics/logs/traces stack and define a minimum observability baseline every service must meet.StandardizationMedium
17Environment StrategyPublish a one-page environment matrix: name, purpose, account, region, and owner for every environment.Quick winMedium
18Environment StrategyStandardize a named set of environments with documented purpose, parity expectations, and promotion order.StandardizationMedium
19EKS & KubernetesCreate a cluster inventory: version, add-ons, node groups, and owner per cluster, with deltas from the newest cluster highlighted.Quick winMedium
20EKS & KubernetesDefine a single cluster baseline (versions, add-ons, node strategy, baseline tooling) and converge existing clusters toward it.StandardizationMedium
21Infrastructure as CodeRun drift detection on the most critical stacks and reconcile or import the worst offenders.Quick winMedium
22Infrastructure as CodeDefine the IaC repo structure (modules + per-environment live config), document it, and route all changes through it.StandardizationMedium
23CI/CDPick the three most-changed services and move them onto one shared pipeline template.Quick winLow
24CI/CDExtract shared pipeline templates for the common path (build, scan, publish, deploy) and adopt them service by service.StandardizationLow

Want this for your platform?

The Platform Assessment Sprint produces this exact document set for your platform in 3–5 business days.