Data Package
The ria_toolkit_oss.data package provides abstract data types and interfaces for radio machine learning, including signal recording management and dataset frameworks for RF signal processing.
Annotation
Section titled “Annotation”Represents a labeled segment in signal data with time and frequency boundaries.
class Annotation( sample_start: int, sample_count: int, freq_lower_edge: float, freq_upper_edge: float, label: str = None, comment: str = None, detail: dict = None,)Parameters:
| Parameter | Type | Description |
|---|---|---|
sample_start | int | Starting sample index |
sample_count | int | Number of samples in annotation |
freq_lower_edge | float | Lower frequency boundary |
freq_upper_edge | float | Upper frequency boundary |
label | str | Display label |
comment | str | Human-readable description |
detail | dict | Custom metadata |
Methods:
| Method | Description |
|---|---|
is_valid() | Validates sample count and frequency bounds |
overlap(other) | Quantifies overlap area with another annotation |
area() | Returns bounding box area (samples × frequency) |
to_sigmf_format() | Exports to SigMF JSON format |
Recording
Section titled “Recording”Encapsulates complex IQ samples with metadata and annotations in a C × N structure (C channels, N samples).
class Recording( data, metadata: dict = None, annotations: list = None, dtype = None, timestamp = None,)Parameters:
| Parameter | Type | Description |
|---|---|---|
data | array-like | Complex array of IQ samples |
metadata | dict | Key-value recording information |
annotations | list | List of Annotation objects |
dtype | complex type | NumPy data type |
timestamp | float or int | Epoch time |
Properties:
| Property | Description |
|---|---|
data | Read-only complex array (read-only for >1024 samples) |
metadata | Dictionary of recording attributes |
annotations | List of Annotation objects |
shape | Data array dimensions |
n_chan | Number of channels |
rec_id | Unique 64-character recording identifier |
dtype | Element data type |
timestamp | Recording epoch timestamp |
sample_rate | Sample rate from metadata |
Metadata methods:
| Method | Description |
|---|---|
add_to_metadata(key, value) | Append new key-value pair |
update_metadata(key, value) | Modify existing metadata |
remove_from_metadata(key) | Delete metadata entry |
Data manipulation:
| Method | Description |
|---|---|
astype(dtype) | Create copy with specified dtype |
trim(num_samples, start_sample) | Extract signal segment |
normalize() | Scale maximum amplitude to 1.0 |
Export methods:
| Method | Description |
|---|---|
to_sigmf(filename, path, overwrite) | Save as SigMF format |
to_npy(filename, path, overwrite) | Binary NumPy format |
to_wav(filename, path, target_sample_rate, bits_per_sample, overwrite) | WAV with embedded YAML metadata |
to_blue(filename, path, data_format, overwrite) | MIDAS Blue legacy format |
Visualization:
| Method | Description |
|---|---|
view(output_path, **kwargs) | PNG plot of signal |
simple_view(**kwargs) | Simplified PNG/SVG visualization |
RadioDataset
Section titled “RadioDataset”Abstract base class providing an interface for iterable machine learning datasets containing radio signal examples.
Properties:
| Property | Description |
|---|---|
source | Path to dataset source file |
shape | Tuple of dataset dimensions |
data | NumPy array of all examples |
metadata | Pandas DataFrame of example metadata |
labels | List of metadata column headers |
Dataset manipulation:
| Method | Description |
|---|---|
augment(class_key, augmentations, level, target_size, classes_to_augment, inplace) | Supplement data through transformations |
subsample(class_key, percentage, inplace) | Randomly reduce examples by percentage |
resample(quantity_target, class_key, inplace) | Adjust examples per class to target count |
homogenize(class_key, example_limit, inplace) | Equalize class sizes |
drop_class(class_key, class_value, inplace) | Remove entire class |
get_class_sizes(class_key) | Return class size dictionary |
delete_example(idx, inplace) | Remove single example |
Abstract methods:
| Method | Description |
|---|---|
inspect() | Generate dataset visualization |
default_augmentations() | List default transformations |
IQDataset
Section titled “IQDataset”Abstract subclass of RadioDataset specialized for IQ sample processing (M × C × N shape: examples × channels × samples).
| Method | Description |
|---|---|
trim_examples(trim_length, keep, inplace) | Reduce example length; keep options: "start", "end", "middle", "random" |
split_examples(split_factor, example_length, inplace) | Fragment examples into shorter chunks |
SpectDataset
Section titled “SpectDataset”Abstract subclass of RadioDataset optimized for spectrogram processing (M × C × H × W: examples × channels × height × width).
DatasetBuilder
Section titled “DatasetBuilder”Abstract base class providing a factory interface for creating radio datasets from common sources.
Properties:
| Property | Description |
|---|---|
name | Dataset identifier |
author | Creator information |
url | Source URL |
sha256 / md5 | Integrity checksums |
version | Current version identifier |
latest_version | Available update version |
license | Usage rights information |
Methods:
| Method | Description |
|---|---|
download_and_prepare() | Fetch data and create HDF5 source |
as_dataset(backend) | Create RadioDataset instance; backend: "pytorch" or "tensorflow" |
Dataset Utilities
Section titled “Dataset Utilities”split(dataset, lengths)
Section titled “split(dataset, lengths)”Deterministically partitions a dataset by recording ID to prevent leakage, using a greedy algorithm for recordings with the most slices first.
random_split(dataset, lengths, generator)
Section titled “random_split(dataset, lengths, generator)”Stochastically partitions a dataset while respecting recording ID boundaries.
Both functions accept length lists as absolute counts or fractions summing to 1.0, with remainders distributed via round-robin.
Dataset Source Files
Section titled “Dataset Source Files”Datasets are stored as HDF5 files containing:
- Example data arrays
- Pandas DataFrame metadata
- Recording ID associations for preventing train/test leakage