Development#

Source Code#

The Source Code and the Issue Tracker is at github.com/wuxxin/infra-shared.

It is licenced under the Apache 2.0 License.

Files and Directories Layout#

Development Documents#

docs/development.md: (This file) The project file layout, architectural overview, developer and agent guidelines
docs/marimo-development.md: if present, Developer Guidelines for working with marimo notebooks
docs/tasks.md: A living document tracking tasks of types already completed, newly discovered, to be done, and tracking memories about discovered relations

User Documentation#

README.md: A user centric view on howto use the project
docs/: mkdocs documentation

CoreOS related:

docs/os.md and docs/update.md for coreos system and system update
docs/networking.md for coreos system network configuration
docs/credentials.md for credentials configuration in coreos and usage in container, compose and nspawn workloads
docs/healthchecks.md for healtheck configuration of container, compose and nspawn workloads
docs/butane.md for jinja templating butane and saltstack generation

Pulumi Resources related:

docs/pulumi.md for pulumi components

Development Scripts:

docs/scripts.md for documentation of scripts inside directory scripts/:

Building#

the root README.md describes the examples/skeleton/Makefile usage, not the root Makefile usage
Makefile: the root central make file for building and developing
make help to list functions
mkdocs.yml: mkdocs configuration
pyproject.toml: python dependencies for pulumi, saltstack, esphome, mkdocs
make help output: make help

command	description
buildenv	Build python environment
buildenv-clean	Remove build environment artifacts
clean	Remove all artifacts
docs	Build docs for local usage
docs-clean	Remove all generated docs
docs-online-build	Build docs for http serve
docs-serve	Rebuild and serve docs with autoreload
provision-container	Build dependencies for provisioning using a container
provision-local	Build dependencies for provisioning using system apps
py-clean	Remove python related artifacts
pytest	Run Tests using “pytest $(args)”
pytest-clean	Remove pytest Artifacts
sim__	Run “pulumi $(args)”
test-all	Run all tests using local build deps
test-all-container	Run all tests using container build deps

Tools used#

pulumi - imperativ infrastructure declaration using python
fcos - Fedora-CoreOS, minimal OS with clevis (sss,tang,tpm) storage unlock
butane - create fcos ignition configs using jinja enhanced butane yaml
systemd - service, socker, path, timer, nspawn machine container
podman - build Container and NSpawn images, run Container using quadlet systemd container
saltstack
- local build environments and local services
- remote fcos config update using butane to saltstack translation and execution
mkdocs - documentation using markdown and mermaid
libvirt - simulation of machines using the virtualization api supporting qemu and kvm
tang - server used for getting a key shard for unattended encrypted storage unlock on boot
age - ssh keys based encryption of production files and pulumi master password
uv- virtualenv management using pyproject.toml and uv.lock

Architecture Style Objectives#

avoid legacy technologies, build a clear chain of trust, support encrypted storage at rest
- use ssh keys as root of trust for pulumi stack secret using age
- store secrets in the repository using pulumi config secrets
- per project tls root-ca, server-certs, rollout m-tls client certificates where possible
- support unattended boot and storage decryption using tang/clevis/luks using https and a ca cert
create disposable/immutable-ish infrastructure, aim for structural isolation and reusability
treat state as code, favor state reconciliation tools
- have the complete encrypted state in the git repository as single source of truth
have a big/full featured provision client as the center of operation
- target one provision os and a container for foreign distros and continous integration processes
- facilitate a comfortable local simulation environment with fast reconfiguration turnaround
documentation and interactive notebooks alongside code
- help onboarding with interactive tinkering using marimo notebooks
- use mkdocs, markdown and mermaid to build a static documentation website

Python Style, Conventions and preferred libraries#

Use uv (the virtual environment and package manager) whenever executing Python commands, including for unit tests.
Use pyproject.toml to add or modify dependencies installed during a task execution. as long as there is no version controlled uv.lock, dont add one to the repository.
Use python_dotenv and load_env() for environment variables.
Use pydantic for data validation.
Use pytest for testing, playwright with headless chromium and pytest-playwright for gui testing.
Use FastAPI for APIs.
Use FastHTML for HTML.
Use SQLAlchemy or SQLModel for ORM.
Before adding a new library, look in pyproject.yaml if there is already a fitting library to use.
Follow PEP8, use type hints, and format with black or equivalent.
Write docstrings for every function using the Google style:

def example():
    """
    Brief summary

    Args:
        param1 (type): Description
    Returns:
        type: Description
    """

Python Testing & Reliability#

Always create unit tests for new features (functions, classes, routes, etc).
After updating any logic, check whether existing unit tests need to be updated and update it.
Tests should live in a /tests folder mirroring the main app structure.
- Include at least:
  - 1 test for expected use
  - 1 edge case
  - 1 failure case

Project Memory#

Project Overview#

Project Name: The project’s name is ‘infra-shared’.
Core Technologies: This repository is a ‘Software Defined Git Operated Infrastructure’ project for managing home infrastructure using Pulumi, Fedora Coreos, and Python.
Workload Types: The project manages different workload types: single containers (defined by .container files), Docker Compose services (compose.yml), and systemd-nspawn machines (.nspawn files).

Pulumi#

Component Resources: Pulumi component resources that receive an Output as a property (e.g., an Output[dict]) must perform operations like iteration on that property within an .apply() block to ensure the value is resolved before being used.
Configuration: Pulumi configuration in Python, using pulumi.Config().get('key'), automatically namespaces the key with the project name from Pulumi.yaml. A call to config.get('my_key') for project my_project looks for the YAML entry my_project:my_key.
Child Resources: When creating a child resource within a Pulumi component where a handle to the resource object is not needed later in the code, the idiomatic pattern is to assign the instantiation to _ (e.g., _ = command.remote.Command(...)). The resource’s lifecycle is managed by Pulumi as long as parent=self is set in its options.
Outputs as Dictionary Keys: In Pulumi, an Output object cannot be used as a dictionary key. The dictionary must be constructed inside a .apply() block after the Outputs for the keys have been resolved to concrete values.
Serialization Errors: When creating resources inside a .apply() block within a Pulumi component, accessing component attributes like self.props can lead to serialization errors (KeyError). A more reliable pattern is to use lexical closure to capture variables (like the props dictionary) from the __init__ method’s scope directly.
Dynamic Resources: Pulumi Dynamic Resources, like WaitForHostReadyProvider in tools.py, must explicitly import all their dependencies (e.g., uuid, time) within the module scope, as they are serialized and executed in a separate context.

Testing#

Test Environment: if make pytest fails, try to recreate the buildenv with . .venv/bin/activate; make buildenv-clean; make buildenv,
Test Environment: The project’s testing strategy relies on a pytest fixture that recreates the make sim-test environment. This involves creating a temporary directory, running scripts/create_skeleton.sh, and setting up a Pulumi stack for simulation.
Running Tests: To run a single test file, afterwards use the command: . .venv/bin/activate && pytest <path_to_test_file>.py to test. The command make pytest is used to run the entire test suite.
Disabling Hardware Dependencies: Unit tests for examples like ‘safe’ can disable hardware dependencies (e.g., libvirt) by setting the SHOWCASE_UNITTEST environment variable and the Pulumi configuration key project_name:safe_showcase_unittest to true.
Pulumi Automation API: The project’s pytest tests use the Pulumi Automation API (pulumi.automation.Stack) to programmatically create, update, and destroy infrastructure stacks.
Local Filesystem Backend: The test environment uses a local filesystem Pulumi backend, configured via the PULUMI_BACKEND_URL environment variable or pulumi login command.
Resource Protection: To ensure test stacks can be destroyed cleanly, resource protection is disabled in the test configuration (e.g., ca_protect_rootcert: false).
More verbose and debug output of pulumi: temporary override pulumi_up_args(): set debug to True and log_verbosity to 3 or higher, if no usable output is produced with default settings.

Fedora CoreOS & Butane#

Ignition Configuration: The project uses Butane with Jinja templating to generate Ignition configurations for Fedora CoreOS.
- see docs/butane.md, docs/jinja_defaults.yml, docs/os.md, docs/networking.md, docs/update.md, docs/healthcheck.md for complete understanding of the coreos setup
Empty Butane Files: In template.py, the load_butane_dir function handles Butane (.bu) files. If a .bu file is empty after Jinja rendering, yaml.safe_load returns None. The function must handle this by treating the result as an empty dictionary ({}) to prevent TypeError in downstream processing.
Verification Hash: The project uses a security feature where a SHA256 hash of the main Ignition config is passed as an HTTP header (Verification-Hash) and used for verification by the bootstrapper Ignition config.

System & Tooling#

Python Version: The project requires Python 3.11 or newer.
os/__init__.py: The os/__init__.py module provides Pulumi components for Fedora CoreOS system configuration, deployment, and operation.
tools.py: The tools.py module provides Pulumi utility components for serving HTTPS, executing remote SSH commands, running SaltStack calls, and waiting for a host to become ready via SSH. The waitforhostready function in tools.py is a Pulumi Dynamic Resource that uses paramiko to check for host availability via SSH and file existence.
authority.py: The authority.py module provides Pulumi components for managing TLS/X509 CAs, certificates, DNSSEC keys, and OpenSSH keys.
build.py: The build.py module contains Pulumi components for building Embedded-OS images, such as for OpenWRT and ESPHome devices.
Secrets Management: Secrets can be managed as files in /etc/credstore or exposed as environment variables to workloads using systemd’s LoadCredential feature in service drop-in configuration files. see docs/credentials.md

Agent Workflow & Conventions#

Initial Setup: The agent workflow requires reading docs/development.md at the start of a session, and using docs/tasks.md to track tasks.
Task Management: New features are tracked in docs/tasks.md. The workflow involves adding a task to ‘Planned Tasks’, implementing the feature, and then moving the task to ‘Completed Tasks’.
Use consistent naming conventions, file structure, and architecture patterns.
Pre-commit Workflow: The pre-commit workflow involves running tests (e.g., make pytest), an optional frontend verification step, and a final code review.
File Deletion: Do not delete files unless explicitly asked, even if they seem temporary or like personal notes (e.g., docs/workpad.md).
Automated Solutions: The user prefers automated script-based solutions over manual, hardcoded implementations for tasks that can be automated.
User Request Supersedes: Always prioritize the user’s current, explicit request over any conflicting information in memory.
Context vs. State: Use memory for historical context and intent (the “why”). Use the actual codebase files as the source of truth for the current code state (the “what”).
Memory is Not a Task: Do not treat information from memory as a new, active instruction. Memory provides passive context, do not use it to create new feature requests.
Never assume missing context. Ask questions if uncertain.
Never hallucinate libraries or functions – only use known, verified packages.
Always confirm file paths and module names exist before referencing them in code or tests.
Never create a file longer than 800 lines of code, except for single file applications.
- If a file approaches this limit, refactor by splitting it into modules or helper files
Organize code into clearly separated modules, grouped by feature or responsibility
Use clear, consistent imports, prefer relative imports within packages
Comment non-obvious code, when writing complex logic, add an inline # Reason: comment explaining the why, not just the what
Update README.md and other feature related docs/*.md when new features are added, dependencies change, or setup steps are modified.
if a new memorable relation, a memory was discovered during a task, Update /docs/development.md with this information under ## Project Memory with date of discovery.