Skip to content

Data Package

The ria_toolkit_oss.data package provides abstract data types and interfaces for radio machine learning, including signal recording management and dataset frameworks for RF signal processing.

Represents a labeled segment in signal data with time and frequency boundaries.

class Annotation(
sample_start: int,
sample_count: int,
freq_lower_edge: float,
freq_upper_edge: float,
label: str = None,
comment: str = None,
detail: dict = None,
)

Parameters:

ParameterTypeDescription
sample_startintStarting sample index
sample_countintNumber of samples in annotation
freq_lower_edgefloatLower frequency boundary
freq_upper_edgefloatUpper frequency boundary
labelstrDisplay label
commentstrHuman-readable description
detaildictCustom metadata

Methods:

MethodDescription
is_valid()Validates sample count and frequency bounds
overlap(other)Quantifies overlap area with another annotation
area()Returns bounding box area (samples × frequency)
to_sigmf_format()Exports to SigMF JSON format

Encapsulates complex IQ samples with metadata and annotations in a C × N structure (C channels, N samples).

class Recording(
data,
metadata: dict = None,
annotations: list = None,
dtype = None,
timestamp = None,
)

Parameters:

ParameterTypeDescription
dataarray-likeComplex array of IQ samples
metadatadictKey-value recording information
annotationslistList of Annotation objects
dtypecomplex typeNumPy data type
timestampfloat or intEpoch time

Properties:

PropertyDescription
dataRead-only complex array (read-only for >1024 samples)
metadataDictionary of recording attributes
annotationsList of Annotation objects
shapeData array dimensions
n_chanNumber of channels
rec_idUnique 64-character recording identifier
dtypeElement data type
timestampRecording epoch timestamp
sample_rateSample rate from metadata

Metadata methods:

MethodDescription
add_to_metadata(key, value)Append new key-value pair
update_metadata(key, value)Modify existing metadata
remove_from_metadata(key)Delete metadata entry

Data manipulation:

MethodDescription
astype(dtype)Create copy with specified dtype
trim(num_samples, start_sample)Extract signal segment
normalize()Scale maximum amplitude to 1.0

Export methods:

MethodDescription
to_sigmf(filename, path, overwrite)Save as SigMF format
to_npy(filename, path, overwrite)Binary NumPy format
to_wav(filename, path, target_sample_rate, bits_per_sample, overwrite)WAV with embedded YAML metadata
to_blue(filename, path, data_format, overwrite)MIDAS Blue legacy format

Visualization:

MethodDescription
view(output_path, **kwargs)PNG plot of signal
simple_view(**kwargs)Simplified PNG/SVG visualization

Abstract base class providing an interface for iterable machine learning datasets containing radio signal examples.

Properties:

PropertyDescription
sourcePath to dataset source file
shapeTuple of dataset dimensions
dataNumPy array of all examples
metadataPandas DataFrame of example metadata
labelsList of metadata column headers

Dataset manipulation:

MethodDescription
augment(class_key, augmentations, level, target_size, classes_to_augment, inplace)Supplement data through transformations
subsample(class_key, percentage, inplace)Randomly reduce examples by percentage
resample(quantity_target, class_key, inplace)Adjust examples per class to target count
homogenize(class_key, example_limit, inplace)Equalize class sizes
drop_class(class_key, class_value, inplace)Remove entire class
get_class_sizes(class_key)Return class size dictionary
delete_example(idx, inplace)Remove single example

Abstract methods:

MethodDescription
inspect()Generate dataset visualization
default_augmentations()List default transformations

Abstract subclass of RadioDataset specialized for IQ sample processing (M × C × N shape: examples × channels × samples).

MethodDescription
trim_examples(trim_length, keep, inplace)Reduce example length; keep options: "start", "end", "middle", "random"
split_examples(split_factor, example_length, inplace)Fragment examples into shorter chunks

Abstract subclass of RadioDataset optimized for spectrogram processing (M × C × H × W: examples × channels × height × width).


Abstract base class providing a factory interface for creating radio datasets from common sources.

Properties:

PropertyDescription
nameDataset identifier
authorCreator information
urlSource URL
sha256 / md5Integrity checksums
versionCurrent version identifier
latest_versionAvailable update version
licenseUsage rights information

Methods:

MethodDescription
download_and_prepare()Fetch data and create HDF5 source
as_dataset(backend)Create RadioDataset instance; backend: "pytorch" or "tensorflow"

Deterministically partitions a dataset by recording ID to prevent leakage, using a greedy algorithm for recordings with the most slices first.

Stochastically partitions a dataset while respecting recording ID boundaries.

Both functions accept length lists as absolute counts or fractions summing to 1.0, with remainders distributed via round-robin.


Datasets are stored as HDF5 files containing:

  • Example data arrays
  • Pandas DataFrame metadata
  • Recording ID associations for preventing train/test leakage