Skip to content

Training a Model

The Model Builder generates a training workflow from a form and commits it to your repository. When the push lands, Gitea Actions triggers the workflow on your chosen runner and executes training. Artifacts are uploaded to the Actions run for you to download when training completes.

Use the Model Builder when you want to:

  • Train a modulation recognition, signal classification, or other RF ML model on a curated dataset
  • Fine-tune a pre-trained wireless foundation model (WavesFM) on your own data
  • Run hyperparameter optimisation with Optuna across multiple training trials

What the workflow produces:

  • best.pt: the best PyTorch checkpoint by validation metric
  • best.onnx: ONNX export of the best model (if ONNX export is enabled)
  • log.txt: JSONL training log with per-epoch metrics
  • Optionally: confusion_matrix.png, parameter sweep plots

Heads-up: Model Builder is currently flagged as a beta feature. You will see a development notice at the top of the page during runs.

  • A curated radio_dataset in your repository Library: see Curating a Dataset
  • At least one runner registered under Workflows → Management → Runners
  • RIAHUB_BASE_URL set as a repository variable or secret: set it to https://riahub.ai (or your instance URL) under Settings → Variables → Actions
  • Write access to the target repository where the workflow file and artifacts will be committed

Navigate to your repository, then click Model Builder in the left sidebar. The overview page shows the two sub-tools:

  • Model Builder: configure & launch training (Train mode or HPO beta).
  • Compression: pruning and quantisation for edge deployment.

Click Open Builder on the Model Builder card, then select Model Trainer from the top navigation.

Model Builder overview page


Step 2: Get oriented on the Model Trainer page

Section titled “Step 2: Get oriented on the Model Trainer page”

The Model Trainer page loads with a pipeline header at the top, configuration cards in the middle, and a live YAML preview on the right.

Empty Model Trainer page

The page header is your control bar:

FieldWhat it controls
ProgressConfiguration completeness (e.g. 8 / 9 sections done).
ActiveNumber of currently running jobs.
RunnerThe Actions runner that will execute the training (e.g. dawson-gpu).
ModeTrain (default) or Hyperparameter Optimization (beta).
RepoTarget repository where the workflow YAML is committed.
Hide YAML / Submit RunToggles the live YAML preview and submits the configured run.

Header bar with runner, mode and repo selectors

Each configuration card shows a status pill on the right (DONE, READY, ACTIVE) so you always know which parts of the run are configured.


Click the Repo dropdown in the header and choose the repo that owns this training job. Only repositories you can push to are listed.

Repository dropdown

The selected repo determines:

  • Where the .riahub/workflows/train.yaml file will be committed.
  • Which runners are available in the Runner dropdown.

The Task section is the first card under CONFIGURATION. It controls what you are training a model to do.

Task section

  1. Toggle between Built-in and Custom tasks. Built-in covers common RF tasks (classification, detection, etc.); Custom lets you point at a task.yaml manifest in a connected repo.
  2. Task: pick the task type, e.g. Classification.
  3. Selection Metric: the metric used to choose the best checkpoint (e.g. Accuracy).
  4. Selection Mode: Maximize or Minimize the selection metric.
  5. Save Artifacts: turn on to keep model checkpoints from the run.

Expand the Models card and pick a template from the model picker. The right choice depends on your hardware and the complexity of the problem:

TemplateTypical use
MobileNetV3Good starting point: runs on a cpu runner, trains in 10–30 minutes on a small dataset
ResNet18Slightly higher capacity than MobileNetV3; use when accuracy matters more than speed
WavesFM Linear ProbeFast WavesFM adaptation: train only the classification head; GPU runner recommended
WavesFM LoRADeeper WavesFM fine-tuning with low-rank weight matrices; GPU runner required

For the modulation recognition tutorial, MobileNetV3 on a cpu runner is the right choice.

For each model you can override per-model hyperparameters surfaced by the manifest:

  • MODEL: the model entry file (e.g. iq_tiny_cnn).
  • HIDDEN CHANNELS: and any other model-specific knobs (e.g. 16).

When a WavesFM template is selected, two additional fields appear:

ParameterDefaultNotes
TaskrmlMust match a WavesFM-supported task name
LoRA rank32Lower values train faster; higher values adapt more
LoRA alpha64Scaling factor (alpha / sqrt(rank)); leave at 2 × rank

Expand the Datasets card. This is where you wire the model to the data on disk.

You have two ways to attach a dataset:

  • Browse Library: click and select a curated radio_dataset from your repository Library. The builder reads the HDF5 attributes to detect the number of classes and the input shape. The OID (object identifier) of the selected dataset is written into the generated workflow’s download step so the runner can fetch it from MinIO.
  • Per-path file pickers: for Train Path, Validation Path, and Test Path, click the dropdown to browse .h5 dataset files in your repos (the path format is Datasets/<file>.h5 (owner/repo@branch)).

Dataset path selectors with dropdown

Below the paths are the loader-level controls:

Dataset advanced settings

FieldPurpose
Batch SizeSamples per training step (default 32 in the UI; 256 for the modulation recognition tutorial defaults).
Num WorkersParallel data-loader workers.
Drop Last / Persistent Workers / Pin MemoryStandard PyTorch DataLoader switches.
Validation Split / Test Split / Split SeedUse these if you want the loader to create the splits instead of supplying separate files (defaults to 80 / 20 train / val).
Label Key / IQ Key / Metadata Key / SNR KeyKeys inside the .h5 file to read for labels, IQ samples, metadata and SNR.
Classes FileOptional file mapping class indices to names.

Sensible defaults are pre-filled across the remaining cards. Adjust only what you need.

ParameterDefaultNotes
Epochs20Increase to 30–50 for small datasets
Batch size256Reduce if the runner runs out of memory
Learning rate0.001Adam/AdamW default
OptimiserAdamWSGD, Adam, AdamW, RMSprop available
LR schedulerCosineAnnealingLRSmooth decay; suits short runs
Train / val split80 / 20
Evaluation metricsaccuracy, f1Add precision, recall, auroc as needed
Export ONNXOnRecommended: required for edge deployment
Upload confusion matrixOffEnable to get a confusion matrix PNG in the artifacts

The Trainer card sets runtime and training-loop behaviour.

Trainer settings

  • Device: Auto, CPU, or a specific GPU.
  • Seed: random seed for reproducibility (e.g. 42).
  • Epochs: number of training epochs.
  • AMP Enabled: automatic mixed precision (on by default for GPU runs).
  • Autocast Dtype: float32, float16, or bfloat16.
  • Progress Bar, Checkpoint Every N Epochs, Early Stopping Patience, Gradient Clip Norm, Component Modules: optional fine-tuning controls.

Criterion, Optimizer, LR Scheduler, Evaluation, Export

Section titled “Criterion, Optimizer, LR Scheduler, Evaluation, Export”

Each of the remaining cards controls one piece of the training recipe:

Criterion, optimizer and LR scheduler cards

CardWhat to set
CriterionLoss function, e.g. cross_entropy.
OptimizerOptimizer name (e.g. adam), learning_rate, weight_decay, epsilon.
LR SchedulerOptional: click the card to configure a schedule (warmup, cosine, etc.).
EvaluationMetrics captured at evaluation time (e.g. capture_predictions, save_confusion).
ExportOutput format(s): typically ONNX export with a chosen opset and dynamic-batch flag.

On the right side of the page, the YAML Training Configuration panel renders the train.yaml that will be submitted. As you change fields on the left, this updates in real time.

Use Edit to hand-tweak any value the UI does not expose, or Reset to go back to the canonical UI-generated config.

Full configuration with all sections marked DONE

When every card shows DONE (or READY), the Progress counter in the header reaches 9 / 9 and the run is ready to submit.


Click the Runner dropdown in the header (or View Available Runners) to see registered runners. Select a runner whose label matches the compute you need.

Runner labelHardwareAppropriate for
cpuCPU-onlyTutorial runs, small datasets (< 100 k slices)
gpu-t4NVIDIA T4Medium datasets, WavesFM LP
gpu-a100NVIDIA A100Large datasets, WavesFM LoRA, HPO sweeps

If no runner is online, the workflow will queue and wait. Check runner status at Workflows → Management → Runners.


Click Submit Run (also labelled Train) in the top-right of the header. The Model Builder:

  1. Posts to the backend to render the workflow and training config YAML
  2. Commits .riahub/workflows/train.yaml and .riahub/train_configs/train.yaml to your repository
  3. Redirects you to the repository’s Actions tab

A Submit Model Trainer dialog appears first:

Submit dialog with commit options

  1. Confirm the Target Repository and the Training Machine (runner).
  2. The dialog shows the upload path: .riahub/workflows/train.yaml.
  3. Optionally add an extended commit description.
  4. Choose Commit directly to main or Create a new branch for this commit.
  5. Click Commit Changes.

The workflow triggers automatically on the branch push.


You are redirected to the Workflows tab of your repo and the new Training run appears at the top. The right pane shows the live job steps:

Training workflow running

The Actions run shows one job with these steps:

StepWhat it does
Runner infoPrints OS, CPU, and GPU info
Download datasetFetches the HDF5 file from MinIO using the dataset OID
Checkout configsSparse-checks out .riahub/train_configs/
QMB TrainingRuns qmb train --config-path .riahub/train_configs/train.yaml
Collect training artifactsGathers best.pt, *.onnx, and optional PNG outputs
Upload training artifactsUploads a training-artifacts zip to Actions artifact storage

Click any step to expand its live log.

Live training log output

Training logs appear in the QMB Training step in real time. Expect output like:

Epoch 1/20: train_loss: 0.9842 val_loss: 0.9511 val_accuracy: 0.6421
Epoch 2/20: train_loss: 0.7213 val_loss: 0.6901 val_accuracy: 0.7834
Epoch 20/20: train_loss: 0.1021 val_loss: 0.0988 val_accuracy: 0.9876
Best val accuracy: 0.9901 (epoch 19)

When the workflow finishes, the header turns green and the Artifacts panel lists the produced bundle (e.g. training-artifacts).

Completed training run with artifacts

From here you can also Re-run all jobs: useful for re-running on a different runner without changing config.


Training artifacts are stored in Gitea Actions artifact storage, not pushed back to the Library automatically.

When the run finishes:

  1. On the Actions run page, click training-artifacts under Artifacts.
  2. Extract the zip: you will find best.pt, best.onnx, and optionally confusion_matrix.png.

To register the model in your repository’s Library, you can use either the Git LFS CLI or the RIA Hub web UI.

Terminal window
git lfs track "*.pt" "*.onnx"
git add .gitattributes
mkdir -p models/
cp /path/to/best.onnx models/modrec-tutorial-v1.onnx
cp /path/to/best.pt models/modrec-tutorial-v1.pt
git add models/
git commit -m "add trained modrec model v1"
git push
  1. In your repository, click Add a file → Upload file.

  2. Drop the artifacts produced by the run (confusion_matrix.png, model.onnx, model.ckpt).

  3. RIA Hub recognises large binary types and offers Track with Git LFS for .png, .onnx, and .ckpt. Tick each box.

    Upload trained artifacts with LFS tracking

  4. A fourth tile, .gitattributes, is added automatically to record the LFS patterns.

    Artifacts tracked with LFS plus .gitattributes

  5. Commit the changes: directly to main or via a pull request.

The files now show up in the repo file listing alongside your code and dataset:

Repository with model artifacts

RIA Hub picks up the new files on push and registers them in the Library:

  • modrec-tutorial-v1.ptpytorch_checkpoint asset
  • modrec-tutorial-v1.onnxonnx_graph asset

Open confusion_matrix.png in the repo to confirm the model trained successfully:

Confusion matrix at 97.76% accuracy

If accuracy looks good, the .onnx artifact is ready to flow into:

  • Model Builder → Compression to prune/quantise for edge deployment.
  • Application Packager to bundle into a deployable application.
  • RIA Testbed Conductor to run on real radio hardware.

The workflow does not trigger after submit

Section titled “The workflow does not trigger after submit”

The workflow file has an on.push.branches trigger for the branch the Model Builder targeted. If you push to a different branch, the workflow will not fire. Check that the branch name in .riahub/workflows/train.yaml matches the branch you are on.

The RIAHUB_BASE_URL variable must be set so the runner can build the MinIO download URL. Set it under Settings → Variables → Actions in your repository, value https://riahub.ai (or your instance URL).

Reduce batch size in the Model Builder form and re-submit, or switch to a larger runner.

Model accuracy is poor (< 60 % on modulation recognition)

Section titled “Model accuracy is poor (< 60 % on modulation recognition)”

Check these in order:

1. **Class balance**: use the [Inspector](/guides/dataset-manager/inspector/) Balance view; unequal class counts hurt accuracy
2. **Label consistency**: use the Sample view to confirm slices visually match their labels
3. **Epochs**: try 30–50 epochs; 20 may not converge on small datasets
4. **SNR**: if synthetic recordings use very high noise power, signals become indistinguishable; re-generate with lower `--noise-power`

Open the card; one required field is empty or invalid. The YAML panel on the right also highlights the missing key.

You do not have push access to any repos under the current project. Switch projects or request access.

No runner is registered for the selected repo. Register one via Settings → Actions → Runners on the repo or organisation.

Change Mode in the header to Hyperparameter Optimization. Extra columns (sweep ranges, search space) appear on each relevant section. HPO is currently in beta.

Use the Edit button on the YAML panel for anything not exposed in the UI. The UI will not overwrite custom keys it does not recognise.


  • Hyperparameter optimisation: open Model Builder → HPO to run an Optuna sweep across learning rate, batch size, and architecture variants
  • Edge deployment: take the best.onnx to the Application Packager to build a Holoscan inference application and deploy it to a registered Screens agent
  • Example files: The RIA_Example repository includes a pre-curated Datasets/example_radio_dataset.h5 and example .ckpt/.onnx model files, and a ready-to-adapt train.yaml workflow.