- 1 Introduction
- 2 Architecture Overview
- 3 Components
- 4 Performance Testing
- 5 Work in Progress
OpenNMS Drift is an OpenNMS sponsored project and the goal is to support Streaming Telemetry and Forensics via Flows.
The Flow Collector listens on a UDP or TCP port for Flow Packages, parses the incoming data and enriches it with OpenNMS knowledge (e.g. location, exporter address, categories, tags, etc.) and persists it afterward to the Flow Persistence Storage.
- listen for Netflow 5,9, IPFIX and sFlow packages and parse them accordingly
- Enrich the protocol data with useful OpenNMS knowledge, e.g.
- The location the NetFlow package is coming from
- The address of the exporter
- Node ID
- Save the data in the Flow Persistence Storage
- Initially the collector should be able to collect and persist ~3000 flow packages or ~90000 flows per second. In order to achieve this, the FlowCollector should be independently deployable and scalable from OpenNMS.
The detailed Architecture of the FlowCollector is as follows:
The Flow Exporter exports Flow Packages via UDP or TCP in various formats (e.g. Netflow 5, IPFIX). This component does not need to be implemented.
The Flow Parser is responsible for listening and parsing incoming UDP/TCP Flow Packages and allow other components of the Flow Collector to work with the data.
The Flow Parser should be able to parse the following protocols:
- Netflow v5 (Netflow v5 Format), Netflow v9 (RFC3954)
- IPFIX (aka Netflow v10, RFC3917, RFC5102)
- SFLOW (RFC3176)
- J-Flow (Juniper-Flow, synonym for Netflow)
Some Flow Packages define Flow Templates which need to be cached, as well as in some cases it may be required to cache Flow Packages before parsing if a Flow Template has not yet been received. See the according to specifications above.
The Flow Parser should also add the IP Address from the Flow Exporter, as this may not be part of the Flow Package itself.
It may be suitable to run the Flow Parser on a Minion.
The Flow Enricher is responsible to add relevant meta-data to the received Flow Package. To achieve this, the Flow Enricher may need access to a running OpenNMS.
Enriched data may be:
- The location the Flow Package was exported from
- The node the Flows may be associated with
- A service a Flow may be associated with (e.g. to allow port to service mappings)
The FlowWriter is responsible to convert the enriched Flow Package into a Flow Document. Each Flow Document represents one concrete Flow and therefore one Flow Package may result in multiple Flow Documents. The Flow Document may be enriched with further information, such as the protocol (e.g. Netflow, sFlow) and the vendor (cisco, juniper, etc). It is afterwards persisted to the Flow Persistence Storage.
With IPFIX and Netflow 9, there are a lot of fields which could be persisted. It may be useful to have some kind of configuration to determine which fields are persisted and which fields are ignored.
Flow Persistence Storage
The Flow Persistence Storage is responsible for storing Flow Documents received from the Flow Writer.
It is proposed to use ElasticSearch as the Flow Persistence Storage.
- Be able to persist ~3000 flow packages or ~90000 flows per second (with room to grow, should be scalable)
- Allow running complex queries to analyze already persisted data
The Flow API is a Rest API provided by OpenNMS. It allows querying the data from the Flow Persistence Storage and may prepare the data for easier usage, e.g. in the Flow UI.
- May work as a proxy to the Flow Persistence Storage
- May allow querying Flow Documents and prepare them for easier usage in the Flow UI (e.g. Sankey diagram)
- Provide additional functionality for Forensic Analysis, etc.
The Flow UI makes use of the Flow API to show the persisted Flow Documents in an aggregated fashion to the user.
The Flow UI is to be interpreted as an abstract term. There probably will be no concrete "Flow UI" but multiple UIs making use of the Flow API.
- Present data from the Flow API in a Sankey Diagram
3rd Party Tools
3rd Party Tools such as Kibana or Grafana may be used to interact with the Flow Persistence Storage or the Flow API to further work with the collected data.
We've built a full-stack solution on top of Kubernetes that can be used to deploy a test environment: https://github.com/j-white/drift-e2e
Running on top of GCP, an Elasticsearch cluster with the following specifications was able to handle indexing of 100k flows/second:
- 3 * Master Nodes
- 2GB Heap
- 1 vCPU
- 4 * Client Nodes
- 8GB Heap
- 2 vCPU
- 9 * Data Nodes
- 16GB Heap
- 4 vCPU
- 80GB SSD
- Where IOPS = Min(IOPS Per GB * Number of GBs, 30000)
- = Min(30 * 80, 30000)
- = Min(2400, 30000)
- = 2400 IOPS
[jesse@noise ~]$ kubectl -n $(gizmo-ns) get pods NAME READY STATUS RESTARTS AGE es-client-3042550706-2fdfx 1/1 Running 0 42m es-client-3042550706-k5p74 1/1 Running 0 42m es-client-3042550706-v0wmd 1/1 Running 0 42m es-client-3042550706-vlxm7 1/1 Running 0 42m es-data-0 1/1 Running 0 42m es-data-1 1/1 Running 0 41m es-data-2 1/1 Running 0 41m es-data-3 1/1 Running 0 40m es-data-4 1/1 Running 0 40m es-data-5 1/1 Running 0 40m es-data-6 1/1 Running 0 39m es-data-7 1/1 Running 0 39m es-data-8 1/1 Running 0 38m es-master-3104414070-02t0n 1/1 Running 0 42m es-master-3104414070-0xpv1 1/1 Running 0 42m es-master-3104414070-p63c0 1/1 Running 0 42m
curl http://elasticsearch:9200/_cat/shards flow-2017-11-21-14 13 p STARTED 6879780 525.5mb 10.8.0.7 es-data-7 flow-2017-11-21-14 13 r STARTED 6879780 525.2mb 10.8.2.7 es-data-2 flow-2017-11-21-14 15 r STARTED 6881308 521.5mb 10.8.3.8 es-data-3 flow-2017-11-21-14 15 p STARTED 6881308 526.3mb 10.8.1.8 es-data-6 flow-2017-11-21-14 4 r STARTED 6880905 522.8mb 10.8.3.8 es-data-3 flow-2017-11-21-14 4 p STARTED 6880905 524.9mb 10.8.0.7 es-data-7 flow-2017-11-21-14 10 p STARTED 6880981 530.3mb 10.8.5.9 es-data-8 flow-2017-11-21-14 10 r STARTED 6880981 526.7mb 10.8.1.8 es-data-6 flow-2017-11-21-14 12 p STARTED 6881542 524.6mb 10.8.0.6 es-data-1 flow-2017-11-21-14 12 r STARTED 6881542 523.3mb 10.8.0.7 es-data-7 flow-2017-11-21-14 5 r STARTED 6879653 528.1mb 10.8.5.9 es-data-8 flow-2017-11-21-14 5 p STARTED 6879653 527.3mb 10.8.2.7 es-data-2 flow-2017-11-21-14 1 p STARTED 6880033 527.3mb 10.8.5.9 es-data-8 flow-2017-11-21-14 1 r STARTED 6880033 526.8mb 10.8.5.8 es-data-4 flow-2017-11-21-14 7 r STARTED 6878374 524.2mb 10.8.0.6 es-data-1 flow-2017-11-21-14 7 p STARTED 6878374 529.1mb 10.8.1.7 es-data-0 flow-2017-11-21-14 6 r STARTED 6883309 521.8mb 10.8.4.8 es-data-5 flow-2017-11-21-14 6 p STARTED 6883309 526.7mb 10.8.1.8 es-data-6 flow-2017-11-21-14 2 p STARTED 6880868 527mb 10.8.4.8 es-data-5 flow-2017-11-21-14 2 r STARTED 6880868 528.5mb 10.8.1.7 es-data-0 flow-2017-11-21-14 3 p STARTED 6882940 526.6mb 10.8.0.6 es-data-1 flow-2017-11-21-14 3 r STARTED 6882940 521.4mb 10.8.5.8 es-data-4 flow-2017-11-21-14 8 r STARTED 6880665 526mb 10.8.0.7 es-data-7 flow-2017-11-21-14 8 p STARTED 6880665 529mb 10.8.5.8 es-data-4 flow-2017-11-21-14 11 p STARTED 6883804 526.9mb 10.8.4.8 es-data-5 flow-2017-11-21-14 11 r STARTED 6883804 526.7mb 10.8.0.6 es-data-1 flow-2017-11-21-14 9 p STARTED 6885797 528.5mb 10.8.3.8 es-data-3 flow-2017-11-21-14 9 r STARTED 6885797 529.2mb 10.8.2.7 es-data-2 flow-2017-11-21-14 14 p STARTED 6883259 526.6mb 10.8.2.7 es-data-2 flow-2017-11-21-14 14 r STARTED 6883259 528.8mb 10.8.1.8 es-data-6 flow-2017-11-21-14 0 p STARTED 6875882 527.2mb 10.8.3.8 es-data-3 flow-2017-11-21-14 0 r STARTED 6875882 528.9mb 10.8.1.7 es-data-0
Using these numbers to estimate the storage requirements, we end up with roughly 80 bytes per flow.
With a rate of 100k flows per second, and 1 replica, this requires 100000*80*2 = 15.25 MB/s.
For a 30 day retention period, this is roughly equivalent to 40TB.
Work in Progress
- Prototype: https://github.com/opennms/drift-prototype
- Working branch: https://github.com/OpenNMS/opennms/tree/features/drift
- Latest Docker snapshot build is tagged as drift: https://hub.docker.com/r/opennms/horizon-core-web/tags