Fleet Status
| Drone ID | Lat | Lon | Alt (m) | Battery | State |
|---|---|---|---|---|---|
| Login to view fleet status | |||||
Telemetry — select a drone row
Click a drone row to see live telemetry.
A real-time autonomous drone fleet management platform that coordinates multi-drone operations over a 60 GHz millimetre-wave mesh network. Nine microservices collaborate via NATS JetStream messaging, Redis state caching, and RaimaDB persistent storage to deliver adaptive beamforming, autonomous mesh re-routing, and live operator control.
Generates a JWT bearer token required by all protected endpoints. Enter any username — the platform uses role-based tokens (operator role) signed with HS256. The token is stored in the browser session and automatically attached to every subsequent request. Tokens expire after 60 minutes.
Registers drones with the Auth IAM service, generating NKey credentials for each drone's NATS connection. Pre-filled with three drones (drone-001, drone-002, drone-003). Real drone firmware would use these credentials to authenticate against the NATS server before publishing telemetry.
Sends a mission command to the Fleet Orchestration service via the gateway. Each mission specifies a drone ID, mission ID, and a set of GPS waypoints. The Fleet Orchestration service records the mission in RaimaDB and would relay commands to the target drone over its NATS subject.
Broadcasts an immediate halt command to the entire fleet. All in-progress missions are aborted. This is an authenticated, logged action — the operator identity is recorded. In a live deployment this triggers a fleet-wide safe-landing sequence.
Publishes synthetic GPS and battery telemetry to NATS on behalf of the three drones, bypassing physical hardware. The Telemetry Ingest service picks up these messages, validates them, and writes position and battery state to Redis with a 30-second TTL. Fleet Status auto-refreshes to show the results.
Simulates two drones repositioning and publishing RSSI measurements to each other. The Beamforming Control service monitors RSSI values and — when signal strength drops below −75 dBm — computes new antenna steering angles (azimuth/elevation) using Haversine geometry and publishes a beamform command back to the drone. The Beamform Commands panel shows the resulting steering instructions.
Injects link-up or link-down events into the mesh. The Mesh Routing service maintains a live adjacency graph and recomputes BFS shortest paths whenever a link changes state. Link Up builds a connection between two drones; Link Down removes it; Build Chain connects all three drones in sequence. The SVG graph and BFS route table update within seconds of each event.
Polls /api/v1/fleet/status every 5 seconds. The gateway scans all drone:*:pos keys in Redis and returns current GPS coordinates and battery level for each active drone. Drones vanish from the table if their Redis keys expire (30 s TTL), simulating loss of telemetry link.
Auto-polls /api/v1/mesh/topology every 3 seconds and renders the current adjacency state as an SVG graph. Nodes are arranged in a circle; edges are colour-coded by RSSI (green > −65 dBm, yellow −65 to −75, red < −75). The BFS route table alongside shows the shortest inter-drone path for every pair.
Visualises the microservice call chain for each action you trigger. When you click a button, the relevant nodes light up in sequence — showing which services are involved, in what order, and what transport (HTTP or NATS) connects them. Useful for understanding the internal event flow without reading logs.
Five durable streams partition message traffic: TELEMETRY (GPS, battery, RSSI), COMMANDS (mission & beamform), TOPOLOGY (link events & adjacency), SIM (simulation ticks), ALERTS (threshold breaches). Services subscribe to specific subjects; JetStream guarantees at-least-once delivery with configurable retention.
All live drone state (position, battery, PHY layer metrics, beamform parameters) is stored in Redis with a short TTL. This creates a self-healing presence model: a drone that stops transmitting automatically disappears from Fleet Status without any explicit deregistration step.
The Beamforming Control service runs a continuous loop (10 Hz). When a drone's RSSI to a peer falls below −75 dBm, it calculates the azimuth and elevation angles between the two GPS positions using the Haversine formula and publishes a steering command. This keeps the directional 60 GHz antenna locked on the strongest signal path.
Mesh Routing maintains an in-memory adjacency graph updated by LinkEvent messages. On each change it runs BFS from every node to compute shortest-hop paths across the fleet, then publishes the new topology to mesh.topology.updated. The gateway and UI consume this to keep the graph current.
| Drone ID | Lat | Lon | Alt (m) | Battery | State |
|---|---|---|---|---|---|
| Login to view fleet status | |||||
Click a drone row to see live telemetry.
The platform is at MVP status (v0.1.0). All nine microservices are running and the core telemetry, beamforming, and mesh-routing pipelines are functional. The items below are specified in the system requirements but not yet implemented — they represent the gap between the current prototype and a production-ready deployment.
Auth/IAM currently generates UUID-based placeholder credentials instead of real Ed25519 keypairs. Production drones require genuine NKey keypairs generated via the NATS nkeys library so that the NATS server can cryptographically verify drone identity.
The NATS server starts with only JetStream enabled — no authentication configuration. The subject-level permission policies defined in Auth/IAM are generated but never loaded into the NATS server. Any client can currently connect and publish to any subject without credentials.
Session keys need to be written to Redis on token issuance (1 hr TTL) so that tokens can be explicitly revoked — on logout, credential rotation, or operator suspension. Currently the Auth/IAM service validates JWTs statelessly via signature verification only, so there is no way to invalidate a token before it expires naturally.
Credential rotation logic is implemented and persists to RaimaDB, but there is no scheduled rotation loop, no operator endpoint to trigger rotation, and no mechanism to notify affected drones to re-authenticate. The requirement calls for periodic automated rotation with drone notification.
TLS needs to be enabled across the full stack — NATS, Redis, and all HTTP service-to-service communication. Currently all traffic is plaintext. The gateway is intended to perform TLS termination, but no certificates or TLS configuration exist anywhere in the stack.
Redis-backed rate limiting needs to be added to the gateway to cap request rates per operator JWT and return HTTP 429 on breach. Currently any authenticated client can make unlimited requests with no throttling in place.
The mission dispatch endpoint returns immediately without contacting Fleet Orchestration. Missions are never published to the drone command JetStream subject and never stored in RaimaDB. Fleet Orchestration runs but receives no mission commands from the gateway.
Pending acknowledgements are tracked when a mission is dispatched and cleared when an ACK arrives, but no background task checks for timed-out entries to trigger a re-send. The requirement specifies that unacknowledged commands must be automatically retried — this retry loop is never started.
The Analytics service needs a durable JetStream consumer on the TELEMETRY and TOPOLOGY streams to drive post-run KPI computation from recorded data. Currently KPIs are computed from in-memory data passed directly to service methods — no stream-replay consumer exists and the scenario comparison endpoint is never fed from live stream history.
Two Memcached keys are yet to be implemented: a drone registry (active drone ID list, invalidated on registration and deregistration) and per-service config blobs (invalidated on config push). Only the route table written by Mesh Routing is currently stored in Memcached.
The Simulation Bridge is planned as a four-stage pipeline — Gazebo/AirSim for flight dynamics, NYUSIM for mmWave channel modelling, srsRAN for PHY/MAC layer emulation, and ns-3 for mesh routing simulation. Currently all four stages are replaced by a single Python physics model with linear RSSI decay and random drift, which is sufficient for integration testing but does not accurately model 60 GHz propagation or real flight behaviour.
Drone-side edge NATS servers (leaf nodes) are planned to connect to the central cluster, enabling local-first message routing, lower latency, and continued operation during uplink loss. Currently a single central NATS instance handles all traffic — drones have no local broker and lose all messaging capability if the uplink drops.
This paper presents a framework for integrating machine learning and artificial intelligence capabilities into a millimeter-wave (mmWave) mesh networking platform for autonomous drone operations. We examine five principal domains of AI integration: onboard perception and navigation, predictive beamforming control, multivariate anomaly detection, adaptive mission planning, and large-model operator interfaces. For each domain we characterise the problem formulation, survey applicable methods, describe the integration architecture, and identify open research questions. A central contribution of this work is the demonstration that a subject-oriented message bus architecture (NATS JetStream) provides a uniform integration surface for heterogeneous AI components without requiring modification to core platform services. We further describe how the platform's virtual testbench — comprising coupled flight dynamics, channel emulation, PHY/MAC, and mesh routing simulators — constitutes a complete training data pipeline for learned components across all five domains.
Unmanned aerial vehicle (UAV) swarms operating over millimeter-wave mesh networks present a class of engineering problems that classical control and signal processing approaches address incompletely. The coupling between physical drone dynamics, radio propagation characteristics, network topology, and mission objectives creates a high-dimensional operational state space that admits learned representations more naturally than hand-crafted models. This paper examines where and how machine learning methods can be introduced into such a platform, with emphasis on architectural compatibility and the practical path from simulation-trained models to deployed systems.
The reference platform consists of a mmWave mesh network operating at 60 GHz with phased-array beamforming at each node, a microservice architecture communicating via NATS JetStream, a layered virtual testbench for hardware-absent development, and Redis/RaimaDB for state management and persistence. The platform is described in full in the companion technical specification. This paper concerns only the AI augmentation surface.
We restrict our treatment to AI capabilities that (a) can be integrated without modifying core platform services, (b) can be trained or validated using the existing simulation testbench, and (c) address documented limitations of classical approaches in the target operational environment. Section 2 addresses onboard navigation. Section 3 addresses beamforming control. Section 4 addresses anomaly detection. Section 5 addresses mission planning and fleet coordination. Section 6 addresses large-model operator interfaces. Section 7 addresses the simulation testbench as a training data pipeline. Section 8 discusses the uniform integration architecture. Section 9 identifies open problems.
GPS-denied autonomous navigation in unstructured environments requires estimation of ego-motion and environmental structure from onboard sensors. Classical approaches — geometric visual odometry, iterative closest point (ICP) terrain matching — exhibit well-characterised failure modes in low-texture environments, under adverse weather, and in the presence of sensor noise distributions not anticipated during design. The research question is whether learned methods improve robustness in precisely these failure conditions.
We review the class of end-to-end learned odometry models including DROID-SLAM and TartanVO, which replace hand-crafted feature detectors with learned representations trained on large corpora of camera motion sequences. We characterise the generalisation properties of these models to novel environments and the inference cost at relevant frame rates on Jetson Orin class hardware.
mmWave radar returns are sparse relative to LiDAR and exhibit multipath and clutter artefacts that degrade classical ICP performance. We survey learned denoising and classification approaches applicable to IWR-series mmWave radar outputs, and evaluate their contribution to terrain matching accuracy across surface types represented in the simulation environment model.
In weight-constrained single-camera configurations, dense depth estimation from monocular imagery — using models in the Depth Anything family — supplements sparse radar returns. We examine the fusion of monocular depth with radar point clouds in an extended Kalman filter formulation and characterise the accuracy improvement over either modality alone.
We describe a deep reinforcement learning formulation for reactive obstacle avoidance using the fused sensor state as input. The simulation testbench provides the training environment; domain randomisation over obstacle geometry and density provides the policy robustness necessary for sim-to-real transfer.
Drift accumulation without external reference; performance degradation in completely featureless environments; compute budget constraints at small airframe scales.
Classical beamforming controllers are reactive — they adjust beam pointing in response to observed RSSI degradation. At high drone velocities and during rapid manoeuvres, the latency between signal degradation and corrective beam adjustment produces link interruptions that impact throughput and mesh stability. The hypothesis is that a predictive model — conditioning beam angle commands on anticipated future positions — can reduce link interruption frequency and duration.
We formulate beam angle prediction as a sequence-to-sequence problem: given a window of historical drone trajectories and RSSI measurements, predict the optimal beam angle at time t+k. We survey LSTM and transformer-based architectures for this task and characterise the prediction horizon over which learned models outperform classical gradient ascent controllers.
The DeepMIMO framework provides a structured methodology for generating beam-channel correspondence datasets from ray-traced channel models. We describe the pipeline from Gazebo trajectory simulation through NYUSIM channel computation to DeepMIMO dataset construction, and the resulting training corpus characteristics.
NVIDIA Sionna enables gradient computation through the channel simulation model, permitting end-to-end training of beamforming policies by backpropagation through the channel. We examine whether this approach produces policies with superior generalisation relative to those trained on pre-generated datasets.
The learned predictor replaces the PID/gradient ascent control law within the existing Beamforming Control microservice. The NATS interface — subscribing to drone.telemetry.rssi and publishing to drone.cmd.beamform — is unchanged. We characterise the performance improvement in terms of link interruption frequency and duration across a set of standard manoeuvre profiles executed in the simulation testbench.
Generalisation across antenna configurations not represented in training; beam prediction under simultaneous multi-drone topology change; calibration of simulated channel models to real 60 GHz hardware.
Threshold-based anomaly detection — the current baseline — monitors individual telemetry channels independently. Failures that manifest as correlated degradation across multiple channels below individual thresholds are not detected until a single channel crosses its limit, at which point the failure may be advanced. We examine whether multivariate learned detectors provide earlier and more accurate fault identification.
An autoencoder trained on nominal flight telemetry learns a compact representation of healthy multivariate sensor state. Reconstruction error at inference time provides an anomaly score sensitive to deviations not captured by any single variable. We describe the training procedure using JetStream-replayed nominal flight data and characterise detection latency and false positive rate on a set of injected fault scenarios.
Isolation forest provides a complementary detection approach with lower inference cost and more interpretable feature importance scores than deep models. We compare detection performance across fault types and discuss the conditions under which each method is preferable.
We define a fault taxonomy derived from the simulation testbench's fault injection capability: gradual RF degradation, sudden link failure, mechanical vibration anomaly, GPS spoofing (where GPS is present), beamforming misalignment, and multi-link simultaneous failure indicative of environmental interference. We characterise the detection performance of each method across this taxonomy.
Anomaly scores are published to the existing alerts.* subject hierarchy. The Alert Service is extended to consume model-generated scores alongside threshold-based events. No changes are required to downstream subscribers.
Distribution shift between simulation-generated training data and real hardware telemetry; anomaly detection under non-stationary operational conditions; root cause attribution from anomaly scores.
Classical path planners optimise a single objective (typically distance or time) subject to geometric constraints. Operational objectives for drone fleets include link quality maintenance, battery consumption, terrain avoidance, and task completion — a multi-objective problem in a dynamically changing environment. Additionally, optimal assignment of sub-tasks to individual drones in a fleet is a combinatorial optimisation problem whose complexity scales with fleet size.
We describe the augmentation of classical planners (A*, RRT*) with learned cost functions trained on historical flight data incorporating link quality observations, battery consumption profiles, and terrain traversal difficulty. The learned cost function transforms the geometric planning problem into one that reflects real operational constraints.
When a link failure, drone fault, or environmental change invalidates the current mission plan, replanning must occur rapidly. We formulate replanning as a Markov decision process and describe a DRL policy trained in the simulation testbench across a library of failure scenarios. We characterise replanning latency and mission completion rate relative to scripted fallback behaviour.
The variable-size fleet and dynamic task set make fixed-architecture neural networks unsuitable for task allocation. Graph neural network architectures process fleet and task state as a graph and produce allocation policies that generalise across fleet sizes not seen during training. We describe the formulation and evaluate performance on benchmark allocation problems.
Safe exploration during online policy adaptation; formal verification of learned planners against operational constraints; handling of adversarial environments not represented in the training distribution.
The operator interface to a drone platform requires translation between natural human intent and structured machine commands (mission compilation) and between structured machine state and human-interpretable explanations (post-flight debrief). These translation problems are well-suited to large language models but require grounding in platform-specific structured data to avoid hallucination.
We describe a system in which an operator specifies mission objectives in natural language and an LLM with access to platform schema and constraint specifications produces a structured mission plan in the format consumed by the Fleet Orchestration service. We examine prompt engineering approaches, structured output enforcement, and constraint validation as complementary techniques for ensuring plan correctness.
Post-flight debriefing requires synthesising information across large volumes of telemetry data stored in RaimaDB. We describe a retrieval-augmented generation (RAG) architecture in which an LLM answers operator queries by retrieving relevant telemetry segments and generating grounded natural language explanations. We characterise the accuracy and latency of this approach on a set of representative post-flight queries.
When the anomaly detection system (Section 4) generates an alert, an LLM can synthesise a contextual explanation from the surrounding telemetry. We examine the relationship between explanation quality and the structured context provided to the model, and define an evaluation framework for explanation accuracy.
Hallucination in safety-critical mission planning contexts; evaluation methodology for natural language explanations of technical events; latency of LLM inference relative to operator response time requirements.
The platform's virtual testbench — comprising Gazebo flight dynamics, NYUSIM channel emulation, srsRAN PHY/MAC simulation, and ns-3 mesh routing simulation — constitutes an end-to-end training data pipeline for learned components across all five domains described above. This section characterises the testbench as a data generation system rather than a validation system.
We describe the structure and volume of training data producible by the testbench for each learned component: trajectory-channel correspondence for beamforming (Section 3); labelled nominal and anomalous telemetry for fault detection (Section 4); environment-cost correspondence for path planning (Section 5); and sensor-state sequences for navigation policy training (Section 2).
The testbench's fault injection and scenario parameterisation capabilities support systematic domain randomisation — variation of environmental parameters across training episodes to improve model robustness. We describe a randomisation schedule covering weather conditions, terrain types, drone configurations, and failure modes, and characterise its effect on sim-to-real transfer performance.
The simulation clock can be advanced at up to 100× real time, enabling training data generation at rates that would be impractical with real hardware. We characterise the fidelity tradeoffs introduced by clock acceleration and identify the minimum fidelity requirements for each model class.
Full NYUSIM + srsRAN channel simulation at 100× clock rate is computationally intensive. We examine learned surrogate models that approximate the channel and PHY pipeline at lower cost, enabling higher training throughput. NVIDIA Sionna's differentiable channel model is evaluated as one such surrogate.
A central observation of this work is that the platform's message bus architecture provides a uniform integration surface for AI components that requires no modification to existing services. Any AI component can be introduced as a microservice that subscribes to existing NATS subjects, applies a model, and publishes results on new or existing subjects. We formalise this interface and demonstrate its application across all five domains.
We describe the end-to-end pipeline from simulation data generation through model training to deployment as a NATS-connected microservice, and identify the tooling required at each stage.
Because AI components are isolated services behind a NATS interface, model updates can be deployed and rolled back without coordination with other services. We describe a versioning scheme compatible with the JetStream durable consumer model.
To be completed. Key works include: