Communications
How communications are carried out
Topics
- communications
- architecture
Communications
The main mode of node-to-node communication in OctoMY™ is carried over UDP. While other carriers such as Bluetooth® are supported for authentication and pairing purposes, UDP is the day-to-day workhorse.
Before selecting UDP, we considered many alternatives. For example, TCP was a strong contender. We ultimately decided to drop TCP because:
- TCP tries to pretend that network traffic is an assured linear unbroken stream of bytes. This has the benefit of being easy to understand and use, but:
- This is far from how a network actually works, making TCP hard to implement effectively. It has taken decades of evolution for TCP to become as good as it is today, but it is still limited by this fallacy.
- Some needs may actually be better met by not thinking about the network as a linear stream of bytes, so going the length to pretend that it is can get in the way.
CommsChannel
CommChannel is the main node-to-node communications API in OctoMY™, and it is a wrapper around the UDP code found in Qt5. It exploits the benefits of communications over UDP by modeling them closely in a way that hides their inherent complexities. This means, among other things, that the API embraces the UDP payload size of ~512 bytes as the largest continuous array of data that may be shifted at one time. CommsChannel may work over other carriers in the future, but this packet-centric view will not change.
- Communications have two "layers": the intrinsic layer and the courier layer.
- The courier part is reserved for the application layer that wishes to conduct communication using comms channel.
- The intrinsic part is reserved for internal affairs of commschannel.
- Intrinsic and courier parts of commschannel should not depend directly on one another and their respective implementations should not be mixed.
NOTE: We have some conflicting requirements for the protocol:
- The protocol should be as stateless as possible for robustness.
- The protocol should be as low-bandwidth as possible.
Where is the conflict? For every mode/state you add, you save bytes going over the wire (because the other side will "remember" the state), while for every mode/state you store, you add an opportunity for failure resulting from missing state-carrying packets. Sending all state in every packet means no state is ever dropped, but it adds up to a lot of data.
To alleviate this conflict, we do the following:
- Add a mechanism to verify the state so that mismatches can be detected and state can be re-sent.
- For data that is small and/or important, we re-send more often.
- For data that is larger and/or less important, we re-send less often.
We call this mechanism "sync," and every packet may request sync.
The CommChannel API works like this:
- Users of the API register Couriers that each is responsible for keeping the latest data fresh and ready for sending should an opportunity present itself. Couriers each tend to a certain type of packet with a certain priority and desired sending frequency. It is up to each courier to maintain its own state.
- CommChannel is in charge and decides the speed at which packets are sent and which couriers get their packets sent on each opportunity.
- CommChannel may at any time send non-payload data in each packet or even special-purpose network-only packets to sustain its operation. If there is no data to be sent by couriers, CommChannel may send no-op packets that facilitate the calculation of network characteristics such as round trip time. This is done transparently to the couriers (so you could say couriers are on a higher layer in the OSI model).
- CommChannel binds to a local address and port but does not really discriminate from where inbound traffic arrives. All packets are treated equally as all UDP packets inherently contain an identification of the source.
- Communication between control and agent is initiated when the user presses the "connect" buttons in the respective user interfaces, at which point agent will attempt to contact all trusted controls at their last known address until one or more answers are received. From then on, all connections that did not result in valid responses will be closed and only retried periodically.
- All agent-initiated communication will be broadcast to all active controls in parallel. Remote-initiated transfers will remain private.
Courier
The CommsChannel holds a list of registered Couriers and will try to keep them happy. Couriers may be active or inactive, only active couriers will receive data or get sending opportunities by the CommsChannel. There are many types of Couriers, each looking after different sets of data. Here is a list of important Couriers:
- AgentStateCourier - The main courier. It exposes the state of the agent and facilitates a way for multiple controls to manage the state together with the agent. Each parameter has metadata assigned to it so that each party knows what value to expect for that parameter should communications fail, and the API provides mechanisms for the clients to ask such questions as "is this data verified?" and "How fresh is this value?" The AgentStateCourier maintains a stream of idle packets at all times as a means to assure each client that the latest data is correct.
- BlobCourier - General-purpose assured transfer mechanism to move blobs of data that are of arbitrary size. Takes care of retransmission of bad packets. Has an asynchronous API that allows clients to subscribe to events such as "progress changed," "transmission aborted/failed," and "transmission complete." Used for sending big objects such as binary files, etc.
CourierMandate
The Courier interface allows couriers to expose a CourierMandate object to CommsChannel. This object reveals the following:
- How urgent does this courier feel her need to send data? Expressed as a number of milliseconds until the next sending time.
- How high priority does this courier feel that she is? Expressed as an integer 0-255 where a lower number is lower priority.
- Do we accept reads? Couriers that do not wish to accept data will have any data sent to them discarded.
- Do we want to send? Couriers that do not wish to send will not receive sending opportunities.
- Payload size. When sending, we expect at most the number of bytes as expressed by this integer.
Multiplexing in Controls
The agent lives the easy life. The decision to be connected is simply an on-off switch in its user interface. The controls on the other hand have to juggle between connections for any agents that it will control. How is this managed?
We will disregard hub in this section and focus on remote. In remote, there exists one ClientWidget instance per agent that is communicating. That instance is responsible for adding and removing the appropriate couriers to CommsChannel dynamically as needed. The CommsChannel itself will simply look at the currently registered couriers and work with those:
- Upon the start of the application, all entries in NodeAssociateStore are added to CommsSessionDirectory with their stored state. This ensures that the session will persist across application terminations.
- CommsChannel may be in normal or honeymoon mode where honeymoon mode is typically enabled in a period just after communications are enabled, or at the user's discretion.
- CommsChannel in honeymoon mode will ping all nodes that are not active continuously at a short (seconds) interval until the honeymoon mode wears off.
- CommsChannel in normal mode will ping all nodes that are not active at an exponentially decaying frequency as a function of how long since their last recorded ping/activity response.
- Any node that replies to a ping is immediately upgraded to active.
- Any node that is active but without a valid session has a handshake started, and communication is maintained with them through a newly created session until completion or until the connection times out.
- If communication with an active node is explicitly completed, it will not receive pings, even during honeymoon, for a specified grace period. This is called celibacy.
- If communication with an active node simply times out, it will instead continue to partake in the pinging as normal.
- The node that was last active is always treated as if it is no older than 1 hour and will, in inactive periods, thus be pinged as often.
- A packet is defined as initial when it contains a session ID of 0.
- A packet is defined as broken when it does not adhere to the protocol by displaying a lack/excess of data or data in the wrong format.
- A packet is defined as whole when it is not broken.
- A packet is defined as hacked when it is whole according to comms protocol but broken according to tamper protocol.
Protocols
The comms protocol is the language of OctoMY™ in network communication.
The tamper protocol is an extra layer of protection beside the comms protocol to detect attempts to tamper with communications. It has validation checks that are not necessary by the comms protocol but add tells to the authenticity of the data.
NOTE: The implementation of the comms protocol has the highest priority, tamper protocol will start implementation as soon as we have MVP working.
Session
- If A sends first, A becomes initiator and B becomes adherent. All is dandy
- If B sends first, B becomes initiator and A becomes adherent. All is dandy
- If A & B send exqactly at the same time both A & B become initiator and chaos ensues
This is resolved in the following manner:
- Detect if both A & B are initiator
- If we receive a packet indicating that the other party thinks they hold the same role as us (both initiator or both adherent)
- Look up ID-duel and the winner gets to keep current status while the looser must change.
-
Drop the packet with a log entry and let the flow of handshake resume but now with both having correct role.
-
Detect duplicate connection pairs in sessions, and remove the one that is inferior in ID-duel
- Time since first ever successfull connection
- Time since first ever connection attempt, successfull or not
- Time since last successfull connection
- Time since last unsuccessful connection attempt
// Did handshake complete already?
//if(session->established()) {
// TODO: Handle this case:
/*
When an initial packet is received after session was already established the following logic applies:
+ If packet timestamp was before session-established-and-confirmed timestamp, it is ignored
+ The packet is logged and the stream is examined for packets indicating the original session is still in effect.
+ If there were no sign of the initial session after a timeout of X seconds, the session is torn down, and the last of the session initial packet is answered to start a new hand shake
+ If one or more packets indicating that the session is still going, the initial packet and it's recepient is flagged as hacked and the packet is ignored. This will trigger a warning to the user in UI, and rules may be set up in plan to automatically shut down communication uppon such an event.
+ No matter if bandwidth management is in effect or not, valid initial packets should be processed at a rate at most once per 3 seconds.
*/
//} else {
//}
Handshake
Sessions are established through a handshake that is similar in concept to the 3-way TCP handshake. Both parts of the handshake (A & B) will need the other part's valid RSA pubkey and network address before the handshake may take place. This is exchanged through the pairing process that happens prior.
In the handshake, part A is the initiator:
- A sends SYN to B:
- Hi B. Here is my
- DESIRED SESSION-ID
- NONCE
- ENCODED WITH YOUR (B's) PUBKEY.
- Hi B. Here is my
- B answers SYN-ACK to A:
- Hi A. HERE IS MY
- FULL-ID
- DESIRED SESSION-ID
- RETURN NONCE
- NEW NONCE
- ENCODED WITH YOUR (A's) PUBKEY back at you.
- Hi A. HERE IS MY
- A answers ACK to B:
- Hi again B. HERE IS MY
- FULL-ID
- RETURN NONCE
- WELL RECEIVED.
- Hi again B. HERE IS MY
At this point, the session is established.
NOTE: For every step in this protocol, if any part is waiting for data from the other that does not arrive, it will attempt to resend its last message on a regular interval until it does (or some error condition or other state change occurs).
Node
- Agent -> AgentControls -> AgentCourierSet
- Remote -> ClientWidget -> RemoteCourierSet
- CommsChannel - A mode of communication over UDP, utilizing Couriers to transfer different kinds of data.
- CommsSession - A session of using CommsChannel. Starts as rogue, proceeds with a handshake to exchange full ID and 64-bit nonceID, and agree on security protocols and bandwidth limits.
- CommsSessionDirectory - The list of sessions in use by the CommsChannel.
- CommsSignature - [DEPRECATED, use full key->id() string instead] This used to be a special-purpose identification mapping between full ID and short-hand 64-bit integer ID used only by CommsChannel and friends. It has sort of been replaced by session-ids.
- NodeAssociate - Address book entry for one node. Stored in NodeAssociateStore. Meant to be persistent between invocations.
- NodeAssociateStore - Place to keep NodeAssociates.
- ReliabilitySystem - Separate system to maintain reliability in the communications. Enabled when needed.
- FlowControl - Separate system to maintain flow control (apply throttling and idling to avoid constipation and stalls) in the communications. Enabled when needed.
- Courier - A class responsible for a certain part of communications over CommsChannel.
- CourierSet - A collection of couriers to be handled as one.
- AgentCourierSet - Specialization of CourierSet as convenience for Agent.
- RemoteCourierSet - Specialization of CourierSet as convenience for Remote.
Intrinsic parts of comms include the following:
- Session management
- Session initiation/handshake
- Exchange of transmission control data
- Session tear-down
- Bandwidth management
- Detection of available bandwidth
- Throttling to avoid excessive bandwidth use
- Continuous monitoring and adjustment of protocol parameters to optimize the flow of packets (including the priority and timing of couriers)
- Encryption management
- Signing and sign verification of un-encrypted protocol text based on client RSA key pairs
- Generation and exchange of encryption keys based on client RSA key pairs
- Generation of security primitives such as nonces
- Encryption and decryption of protocol text
- Reliability management
- Maintaining a continued UDP connection over unreliable network components such as consumer-grade routers and wireless radios with poor coverage by dispatching necessary communication with STUN services or sending antler packets.
- Detection and removal of duplicate and corrupt packets
- Detection and re-sending of missing packets
- Reordering of ordered packet sequences
Please note that the expensive and complex intrinsic features such as reliability and encryption of CommsChannel are invoked only when needed.
When the amount of data needed for intrinsic features is extensive, separate "intrinsic packets" will be sent, while other lesser-in-size intrinsic data such as counters will instead accompany each packet. Protocol dictates when such dedicated packets will be needed or not, and changes in this part of the protocol should not affect the higher-level courier interface.