Development#
Source Code#
The Source Code and the Issue Tracker is at github.com/wuxxin/infra-shared.
It is licenced under the Apache 2.0 License.
Files and Directories Layout#
Development Documents#
docs/development.md: (This file) The project file layout, architectural overview, developer and agent guidelinesdocs/marimo-development.md: if present, Developer Guidelines for working with marimo notebooksdocs/tasks.md: A living document tracking tasks of types already completed, newly discovered, to be done, and tracking memories about discovered relations
User Documentation#
README.md: A user centric view on howto use the projectdocs/: mkdocs documentation
CoreOS related:
docs/os.mdanddocs/update.mdfor coreos system and system updatedocs/networking.mdfor coreos system network configurationdocs/credentials.mdfor credentials configuration in coreos and usage in container, compose and nspawn workloadsdocs/healthchecks.mdfor healtheck configuration of container, compose and nspawn workloadsdocs/butane.mdfor jinja templating butane and saltstack generation
Pulumi Resources related:
docs/pulumi.mdfor pulumi components
Development Scripts:
docs/scripts.mdfor documentation of scripts inside directoryscripts/:
Building#
-
the root
README.mddescribes theexamples/skeleton/Makefileusage, not the root Makefile usage -
Makefile: the root central make file for building and developing make helpto list functionsmkdocs.yml: mkdocs configuration-
pyproject.toml: python dependencies for pulumi, saltstack, esphome, mkdocs -
make help output:
make help
| command | description |
|---|---|
| buildenv | Build python environment |
| buildenv-clean | Remove build environment artifacts |
| clean | Remove all artifacts |
| docs | Build docs for local usage |
| docs-clean | Remove all generated docs |
| docs-online-build | Build docs for http serve |
| docs-serve | Rebuild and serve docs with autoreload |
| provision-container | Build dependencies for provisioning using a container |
| provision-local | Build dependencies for provisioning using system apps |
| py-clean | Remove python related artifacts |
| pytest | Run Tests using “pytest $(args)” |
| pytest-clean | Remove pytest Artifacts |
| sim__ | Run “pulumi $(args)” |
| test-all | Run all tests using local build deps |
| test-all-container | Run all tests using container build deps |
Tools used#
pulumi- imperativ infrastructure declaration using pythonfcos- Fedora-CoreOS, minimal OS withclevis(sss,tang,tpm) storage unlockbutane- create fcosignitionconfigs usingjinjaenhanced butane yamlsystemd- service, socker, path, timer, nspawn machine containerpodman- build Container and NSpawn images, run Container using quadlet systemd containersaltstack- local build environments and local services
- remote fcos config update using butane to saltstack translation and execution
mkdocs- documentation using markdown and mermaidlibvirt- simulation of machines using the virtualization api supporting qemu and kvmtang- server used for getting a key shard for unattended encrypted storage unlock on bootage- ssh keys based encryption of production files and pulumi master passworduv- virtualenv management using pyproject.toml and uv.lock
Architecture Style Objectives#
- avoid legacy technologies, build a clear chain of trust, support encrypted storage at rest
- use ssh keys as root of trust for pulumi stack secret using age
- store secrets in the repository using pulumi config secrets
- per project tls root-ca, server-certs, rollout m-tls client certificates where possible
- support unattended boot and storage decryption using tang/clevis/luks using https and a ca cert
- create disposable/immutable-ish infrastructure, aim for structural isolation and reusability
- treat state as code, favor state reconciliation tools
- have the complete encrypted state in the git repository as single source of truth
- have a big/full featured provision client as the center of operation
- target one provision os and a container for foreign distros and continous integration processes
- facilitate a comfortable local simulation environment with fast reconfiguration turnaround
- documentation and interactive notebooks alongside code
- help onboarding with interactive tinkering using marimo notebooks
- use mkdocs, markdown and mermaid to build a static documentation website
Python Style, Conventions and preferred libraries#
- Use uv (the virtual environment and package manager) whenever executing Python commands, including for unit tests.
- Use
pyproject.tomlto add or modify dependencies installed during a task execution. as long as there is no version controlled uv.lock, dont add one to the repository. - Use python_dotenv and load_env() for environment variables.
- Use
pydanticfor data validation. - Use
pytestfor testing,playwrightwith headless chromium andpytest-playwrightfor gui testing. - Use
FastAPIfor APIs. - Use
FastHTMLfor HTML. - Use
SQLAlchemyorSQLModelfor ORM. - Before adding a new library, look in
pyproject.yamlif there is already a fitting library to use. - Follow PEP8, use type hints, and format with
blackor equivalent. - Write docstrings for every function using the Google style:
Python Testing & Reliability#
- Always create unit tests for new features (functions, classes, routes, etc).
- After updating any logic, check whether existing unit tests need to be updated and update it.
- Tests should live in a
/testsfolder mirroring the main app structure.- Include at least:
- 1 test for expected use
- 1 edge case
- 1 failure case
- Include at least:
Project Memory#
Project Overview#
- Project Name: The project’s name is ‘infra-shared’.
- Core Technologies: This repository is a ‘Software Defined Git Operated Infrastructure’ project for managing home infrastructure using Pulumi, Fedora Coreos, and Python.
- Workload Types: The project manages different workload types: single containers (defined by
.containerfiles), Docker Compose services (compose.yml), and systemd-nspawn machines (.nspawnfiles).
Pulumi#
- Component Resources: Pulumi component resources that receive an
Outputas a property (e.g., anOutput[dict]) must perform operations like iteration on that property within an.apply()block to ensure the value is resolved before being used. - Configuration: Pulumi configuration in Python, using
pulumi.Config().get('key'), automatically namespaces the key with the project name fromPulumi.yaml. A call toconfig.get('my_key')for projectmy_projectlooks for the YAML entrymy_project:my_key. - Child Resources: When creating a child resource within a Pulumi component where a handle to the resource object is not needed later in the code, the idiomatic pattern is to assign the instantiation to
_(e.g.,_ = command.remote.Command(...)). The resource’s lifecycle is managed by Pulumi as long asparent=selfis set in its options. - Outputs as Dictionary Keys: In Pulumi, an
Outputobject cannot be used as a dictionary key. The dictionary must be constructed inside a.apply()block after theOutputs for the keys have been resolved to concrete values. - Serialization Errors: When creating resources inside a
.apply()block within a Pulumi component, accessing component attributes likeself.propscan lead to serialization errors (KeyError). A more reliable pattern is to use lexical closure to capture variables (like thepropsdictionary) from the__init__method’s scope directly. - Dynamic Resources: Pulumi Dynamic Resources, like
WaitForHostReadyProviderintools.py, must explicitly import all their dependencies (e.g.,uuid,time) within the module scope, as they are serialized and executed in a separate context.
Testing#
- Test Environment: if make pytest fails, try to recreate the buildenv with
. .venv/bin/activate; make buildenv-clean; make buildenv, - Test Environment: The project’s testing strategy relies on a
pytestfixture that recreates themake sim-testenvironment. This involves creating a temporary directory, runningscripts/create_skeleton.sh, and setting up a Pulumi stack for simulation. - Running Tests: To run a single test file, afterwards use the command:
. .venv/bin/activate && pytest <path_to_test_file>.pyto test. The commandmake pytestis used to run the entire test suite. - Disabling Hardware Dependencies: Unit tests for examples like ‘safe’ can disable hardware dependencies (e.g., libvirt) by setting the
SHOWCASE_UNITTESTenvironment variable and the Pulumi configuration keyproject_name:safe_showcase_unittesttotrue. - Pulumi Automation API: The project’s pytest tests use the Pulumi Automation API (
pulumi.automation.Stack) to programmatically create, update, and destroy infrastructure stacks. - Local Filesystem Backend: The test environment uses a local filesystem Pulumi backend, configured via the
PULUMI_BACKEND_URLenvironment variable orpulumi logincommand. - Resource Protection: To ensure test stacks can be destroyed cleanly, resource protection is disabled in the test configuration (e.g.,
ca_protect_rootcert: false). - More verbose and debug output of pulumi: temporary override pulumi_up_args(): set debug to True and log_verbosity to 3 or higher, if no usable output is produced with default settings.
Fedora CoreOS & Butane#
- Ignition Configuration: The project uses Butane with Jinja templating to generate Ignition configurations for Fedora CoreOS.
- see
docs/butane.md,docs/jinja_defaults.yml,docs/os.md,docs/networking.md,docs/update.md,docs/healthcheck.mdfor complete understanding of the coreos setup
- see
- Empty Butane Files: In
template.py, theload_butane_dirfunction handles Butane (.bu) files. If a.bufile is empty after Jinja rendering,yaml.safe_loadreturnsNone. The function must handle this by treating the result as an empty dictionary ({}) to preventTypeErrorin downstream processing. - Verification Hash: The project uses a security feature where a SHA256 hash of the main Ignition config is passed as an HTTP header (
Verification-Hash) and used for verification by the bootstrapper Ignition config.
System & Tooling#
- Python Version: The project requires Python 3.11 or newer.
os/__init__.py: Theos/__init__.pymodule provides Pulumi components for Fedora CoreOS system configuration, deployment, and operation.tools.py: Thetools.pymodule provides Pulumi utility components for serving HTTPS, executing remote SSH commands, running SaltStack calls, and waiting for a host to become ready via SSH. Thewaitforhostreadyfunction intools.pyis a Pulumi Dynamic Resource that usesparamikoto check for host availability via SSH and file existence.authority.py: Theauthority.pymodule provides Pulumi components for managing TLS/X509 CAs, certificates, DNSSEC keys, and OpenSSH keys.build.py: Thebuild.pymodule contains Pulumi components for building Embedded-OS images, such as for OpenWRT and ESPHome devices.- Secrets Management: Secrets can be managed as files in
/etc/credstoreor exposed as environment variables to workloads using systemd’sLoadCredentialfeature in service drop-in configuration files. seedocs/credentials.md
Agent Workflow & Conventions#
- Initial Setup: The agent workflow requires reading
docs/development.mdat the start of a session, and usingdocs/tasks.mdto track tasks. - Task Management: New features are tracked in
docs/tasks.md. The workflow involves adding a task to ‘Planned Tasks’, implementing the feature, and then moving the task to ‘Completed Tasks’. - Use consistent naming conventions, file structure, and architecture patterns.
- Pre-commit Workflow: The pre-commit workflow involves running tests (e.g.,
make pytest), an optional frontend verification step, and a final code review. - File Deletion: Do not delete files unless explicitly asked, even if they seem temporary or like personal notes (e.g.,
docs/workpad.md). - Automated Solutions: The user prefers automated script-based solutions over manual, hardcoded implementations for tasks that can be automated.
- User Request Supersedes: Always prioritize the user’s current, explicit request over any conflicting information in memory.
- Context vs. State: Use memory for historical context and intent (the “why”). Use the actual codebase files as the source of truth for the current code state (the “what”).
- Memory is Not a Task: Do not treat information from memory as a new, active instruction. Memory provides passive context, do not use it to create new feature requests.
- Never assume missing context. Ask questions if uncertain.
- Never hallucinate libraries or functions – only use known, verified packages.
- Always confirm file paths and module names exist before referencing them in code or tests.
- Never create a file longer than 800 lines of code, except for single file applications.
- If a file approaches this limit, refactor by splitting it into modules or helper files
- Organize code into clearly separated modules, grouped by feature or responsibility
- Use clear, consistent imports, prefer relative imports within packages
- Comment non-obvious code, when writing complex logic, add an inline
# Reason:comment explaining the why, not just the what - Update
README.mdand other feature relateddocs/*.mdwhen new features are added, dependencies change, or setup steps are modified. - if a new memorable relation, a memory was discovered during a task, Update
/docs/development.mdwith this information under## Project Memorywith date of discovery.