Communication System

How nodes communicate with each other

Communication System

This page explains how OctoMY™ nodes communicate with each other, the design decisions behind the protocol, and how the various components work together.


Comms carrier

In node-to-node communications, we need a carrier mechanism that allows transfer of data between two endpoints.

The carrier mechanism is implemented as a class with the following mandate:

  • Handle a single protocol, such as UDP
  • Connect to/disconnect from the other side
  • Transfer data to/from the other side
  • Adhere to desired opportunity interval (see Comms channel below)
  • Provide statistics about the latest events such as transfer rates, error rates etc.
  • Give a good guess on whether we are connected or not based on the latest events
  • Provide robust error handling and error reporting

The CommsCarrier abstract base class defines this interface.

Supported carriers

Carrier Status Description
UDP Supported User Datagram Protocol - low-latency, connectionless internet protocol
Local In Development QLocalServer/QLocalSocket for non-network communication. Useful for testing and virtual endpoints
Bluetooth Planned Short range low-power wireless protocol
NFC Planned Near Field Communication - short range wireless that guarantees proximity

Why UDP over TCP?

Before selecting UDP as the main workhorse protocol, we considered TCP. We ultimately decided against TCP because:

  • TCP pretends that network traffic is an assured linear unbroken stream of bytes. This is easy to understand but far from how networks actually work.
  • This abstraction makes TCP hard to implement effectively. It has taken decades of evolution for TCP to become as good as it is today, but it remains limited by this fallacy.
  • Some needs are better met by not thinking about the network as a linear stream of bytes.

Design Decision: Why UDP over TCP?

We chose UDP because TCP's stream abstraction hides the packet-based reality of networks. For real-time robotics, we need explicit control over reliability vs. latency tradeoffs.

Alternatives considered: TCP, QUIC, SCTP Tradeoff: More complex reliability implementation on our side Benefit: Lower latency, better NAT traversal, explicit packet boundaries


Comms channel

CommsChannel is the main node-to-node communications API. It depends on CommsCarrier as a transport.

It exploits the benefits of communications over low-latency connectionless carriers such as UDP by modeling them closely in a way that hides their inherent complexities. The API embraces the relatively short payload size of ~512 bytes that is common for such carriers as the largest continuous array of data that may be transferred at one time.

Two-layer architecture

Communications have two layers:

  1. Intrinsic Layer - Reserved for internal affairs of CommsChannel
  2. Courier Layer - Reserved for the application layer that wishes to conduct communication

These layers should not depend directly on one another and their implementations should not be mixed.

Two-layer architecture

The stateless vs. low-bandwidth conflict

We have conflicting requirements:

  • The protocol should be as stateless as possible for robustness
  • The protocol should be as low-bandwidth as possible

Where's the conflict? For every mode/state you add, you save bytes going over the wire (the other side "remembers" the state), but for every state you store, you add an opportunity for failure from missing state-carrying packets. Sending all state in every packet means no state is ever dropped, but it adds up to a lot of data.

Solution: The Sync Mechanism

To alleviate this conflict, we:

  1. Add a mechanism to verify state so mismatches can be detected and state re-sent
  2. For data that is small and/or important, re-send more often
  3. For data that is larger and/or less important, re-send less often

We call this mechanism "sync," and every packet may request sync.

Design Decision: The sync mechanism

The sync mechanism resolves the tension between stateless robustness and bandwidth efficiency. Instead of choosing one extreme, we allow variable re-send frequencies based on data importance.

Alternatives considered: Pure stateless (wasteful), pure stateful (fragile), TCP-style reliability (too slow) Tradeoff: More complex protocol implementation Benefit: Optimal bandwidth use while maintaining robustness for critical data

How Comms channel works

  1. Users register Couriers that each maintain their own state and keep data fresh for sending opportunities. Each courier handles a certain type of packet with a certain priority and desired sending frequency.

  2. CommsChannel decides the speed at which packets are sent and which couriers get their packets sent on each opportunity ("Opportunity Interval").

  3. CommsChannel may send non-payload data or special-purpose network-only packets to sustain its operation. If there's no data from couriers, CommsChannel sends no-op packets to facilitate calculation of network characteristics like round trip time.

  4. CommsChannel binds to a local address and port but doesn't discriminate where inbound traffic arrives from - all UDP packets inherently contain their source identification.

  5. Communication between Remote and Agent is initiated when the user presses "connect" in the UI, at which point Agent attempts to contact all trusted Remotes at their last known addresses until responses are received.


Couriers

The CommsChannel holds a list of registered Couriers and keeps them happy. Couriers may be active or inactive - only active couriers receive data or get sending opportunities.

Important couriers

Courier Purpose
AgentStateCourier The main courier. Exposes agent state and facilitates multiple Remotes managing state together with the Agent. Each parameter has metadata so parties know expected values if communications fail. Maintains a stream of idle packets to assure clients that data is current.
BlobCourier General-purpose assured transfer for arbitrary-size data blobs. Handles retransmission of bad packets. Has async API for progress events, abort/fail, and completion. Used for binary files, etc.
SensorsCourier Transfers sensor data from Agent to Remote
DiscoveryCourier Handles node discovery protocol

Courier mandate

The Courier interface exposes a CourierMandate object to CommsChannel revealing:

  • Urgency - How soon does this courier need to send? (milliseconds until next send time)
  • Priority - How important is this courier? (0-255, lower = lower priority)
  • Accept Reads - Does this courier accept incoming data?
  • Want to Send - Does this courier have data to send?
  • Payload Size - Maximum bytes expected when sending

Sessions

Sessions represent active communication channels between two nodes.

Session establishment

Sessions are established through a handshake similar to the 3-way TCP handshake. Both parties need each other's valid RSA public key and network address before the handshake (exchanged during pairing).

Security Consideration

Sessions require both parties to possess each other's RSA public keys before handshake initiation. These keys must be exchanged through a secure pairing process. Never bypass this requirement - it's the foundation of OctoMY™'s security model.

Handshake protocol

In the handshake, party A is the initiator:

1. A sends SYN to B:
   - Desired SESSION-ID
   - NONCE
   - Encoded with B's public key

2. B answers SYN-ACK to A:
   - Full ID
   - Desired SESSION-ID
   - Return NONCE
   - New NONCE
   - Encoded with A's public key

3. A answers ACK to B:
   - Full ID
   - Return NONCE
   - Well received

At this point, the session is established.

Note: For every step, if a party is waiting for data that doesn't arrive, it resends its last message at regular intervals until it does (or an error occurs).

Session handshake flow

Resolving race conditions

If A and B both send simultaneously, both become initiators and chaos ensues. This is resolved by:

  1. Detecting when both parties hold the same role (both initiator or both adherent)
  2. Looking up ID-duel - the winner keeps current status, loser must change
  3. Dropping the packet with a log entry and resuming handshake with correct roles

Discovery and pairing

Discovery is the mechanism by which nodes seek out and establish secure connections. Pairing is the mechanism by which users assign trust levels to discovered nodes.

Key principles

  • Discovery is separate from pairing - discovered signatures remain regardless of trust settings
  • Users can trust/distrust/re-trust discovered nodes freely
  • Removing discovery records is an advanced debugging operation

Discovery process

  1. Physical proximity is established (in order of security preference):

    • NFC: Physical proximity implied by NFC range
    • Camera/QR: Physical proximity implied by scanning QR code
    • Bluetooth: Physical proximity implied by Bluetooth range
    • Zoo: Posting expiring GPS coordinates with pairing signature
    • LAN: Identifying common gateway
  2. Exchange signatures and public keys

  3. Exchange challenge/responses to verify legitimacy

  4. Show identicons in list on each node for user verification

  5. Optional: Conduct multi-factor authentication for improved security

  6. Update node list to show security level of each node

Discovery process flow

Pairing process

Once a node is discovered (secure communication established), the user verifies that signatures match expectations by inspecting identity information.


Multiplexing in controls

The Agent lives the easy life - connecting is simply an on/off switch in its UI. Remotes must juggle connections for multiple Agents.

In Remote, there exists one ClientWidget instance per Agent that is communicating. It's responsible for adding/removing appropriate couriers to CommsChannel dynamically as needed.

Session management

  • On application start, all entries in AddressBook are added to CommsSessionDirectory with stored state, ensuring sessions persist across restarts.

  • CommsChannel can be in normal or honeymoon mode:

    • Honeymoon mode: Enabled just after communications are enabled. Pings all inactive nodes continuously at short intervals.
    • Normal mode: Pings inactive nodes at exponentially decaying frequency based on time since last activity.
  • Any node that replies to a ping is upgraded to active

  • Active nodes without valid sessions start a handshake

  • Celibacy: Explicitly completed connections skip pings for a grace period

  • The last active node is always treated as no older than 1 hour

Pro Tip

When debugging connection issues, check if nodes are stuck in "honeymoon mode." This can happen if the initial connection attempt fails but isn't properly cleared. Use the connection status widget to see current mode.


Intrinsic features

Intrinsic parts of CommsChannel include:

Session management

  • Session initiation/handshake
  • Exchange of transmission control data
  • Session tear-down

Bandwidth management

  • Detection of available bandwidth
  • Throttling to avoid excessive use
  • Continuous monitoring and adjustment to optimize packet flow

Encryption management

  • Signing and verification using RSA key pairs
  • Generation and exchange of encryption keys
  • Generation of security primitives (nonces)
  • Encryption and decryption of protocol text

Reliability management

  • Maintaining UDP connection over unreliable network components using STUN services or antler packets
  • Detection and removal of duplicate/corrupt packets
  • Detection and re-sending of missing packets
  • Reordering of ordered packet sequences

Note: Expensive features like reliability and encryption are invoked only when needed. When extensive intrinsic data is needed, separate "intrinsic packets" are sent; lesser data accompanies each packet.


Client architecture

The final piece of the communication puzzle is the client architecture, which orchestrates couriers and their data flow.

How clients work

When two nodes discover each other and complete pairing, each node creates an Associate entry in their AddressBook identifying the other.

When the comms service activates as part of the "online" service level, all Associates are instantiated into Clients in the node's client list.

Client structure

Clients are running communication envoys. Each entry represents a remote node and maintains the best known state of that node.

Component Description
Update Timer Facilitates an update interval
Node The node instance using this client
Associate The associate this client represents
CourierSet The set of couriers for this connection (node-type specific)
ConnectionStatus Simplified connection status using heuristics
ISyncParameters Parsing of data from couriers

Client architecture

Client types

There are specialized clients for each node type:

Client Purpose
AgentClient Communication with Agent nodes
RemoteClient Communication with Remote nodes
HubClient Communication with Hub nodes

Each inherits from a base NodeClient containing common client mechanisms.


Component reference

Component QDoc API
Comms API
CommsCarrier API
CommsCarrierUDP API
CommsChannel API
CommsSession API
CommsSessionDirectory API
Courier API
CourierMandate API
AgentStateCourier API
BlobCourier API
SensorsCourier API
DiscoveryCourier API
Associate API
AddressBook API
AgentClient API
RemoteClient API
HubClient API

Glossary

Term Definition
Discovery Automatic process where nodes find each other and exchange details for future identity verification. May be aided by manual processes like camera discovery.
Pairing Manual process where the operator assigns trust to a previously discovered node.
Node Any participant in an OctoMY™ network (Agent, Hub, or Remote)
Client All data pertaining to another node that this node knows about
Birth/Delivery The transition of a node from uninitialized to initialized. Before initialization, nodes lack necessary identification and security primitives to participate in discovery and pairing.
In this section
Topics
communication architecture network UDP technical
See also