Scalable SNMP Polling
One of the major challenges when building a modern network monitoring
system is the need to constantly collect vast amounts of SNMP data for
statistical and alerting purposes. A scalable SNMP poller is one of the
key requirements for a monitoring system to be successful.
SNMP is a fairly simplistic protocol where the monitoring application sends a
request to a remote device for a specific piece of data (eg. interface transmit
octets) and waits for a response. Due to things like network latency and CPU
load of the remote device, it may take some time for the device to respond,
therefore getting through the poller workload in a timely fashion can be a
massive challenge in large network infrastructures.
Monolithic Poller Architecture
A monolithic poller typically sends its requests synchronously.
That is, it sends a request to device A, waits for a response, then
sends a request to device B. The major flaw with this architecture
is that network latency significantly restricts the number
of requests the poller can perform each minute.
For example, a network latency of 100 milliseconds means the poller
would only be able to perform 600 requests per minute. At 30 MIB
objects per request, that is only 18,000 objects, or data for only
1,800 interfaces (assuming a minimum of 10 MIB objects per
Multi-Process Poller Architecture
A multi-process poller architecture is a crude method of increasing
scalability. This is typically done by dividing the workload into
N lumps, where N is the number of poller processes fired off. 10
poller processes would typically increase the scale of the above
example to 180,000 objects, or 18,000 interfaces.
- Many more processes to manage
- Complex poller configuration
- Increase in system memory usage
- Each poller process is still affected by network latency
Multi-Threaded Poller Architecture
A multi-threaded poller is another crude method of increasing
scalability. This is done by firing off many threads, each thread
handling a single request / response transaction.
Threads consume less memory than processes because they share the
same virtual memory address space, file descriptors and process
state information. They are faster to start up and context switch.
Refer to this Wikipedia - Threads vs Processes article.
- Linear increase in scale
- Consumes less memory than a multi-process poller
- Complex poller configuration
- Added complexity of managing many threads
- Threads have to perform locking so they don't clobber each other
- Each poller thread is still affected by network latency
- Significantly more complicated to implement, test and debug
- One rogue thread can crash all other threads because they all share
the same memory and process state information.
- Has no scale advantage over a multi-process poller
Distributed Poller Architecture
A distributed poller architecture is another way of increasing
scale. By deploying multiple pollers closer to the end devices, you
can significantly reduce the underlying network latency issues,
therefore the pollers are able to perform many more requests each
- Faster to get through its polling workload due to lower network latency
- Significantly more expensive to deploy and maintain
- Requires deploying many remote physical or virtual servers
In Summary ...
None of the above architectures come anywhere close to providing a cost
effective scalable solution.
- Single monolithic synchronous pollers do not scale.
- Multi-threaded pollers are grossly inefficient due to all the thread
- Distributed pollers are plainly too expensive to deploy and maintain.
The AKIPS Poller Architecture
- Monolithic architecture - one process, no threads
- Asynchronous polling algorithm - polls 1000s of devices at the same time.
- Synchronous device requests algorithm - N outstanding requests
to each device at any one time (defaults to 1). This significantly
reduces the chances of over running the CPU in the remote devices.
- In-flight window tuning on a per device basis. This feature completely
negates the issue of polling large remote devices with high network latency.
- Direct database insertion - achieved by tightly coupling the configuration,
time-series and event databases into a single process.
- MIB database integrated directly into the SNMP engine for high speed
packet encoding and decoding.
- 60 second polling interval for every MIB object
- Scales to over 10 million MIB objects per minute
- 40 second polling window each minute
- Integrated Ping Poller (RTT measured in microseconds)
- 15 second Ping interval for every device
- Interleaving of Ping and SNMP requests for good packet mix
- ASUS P8H67-M PRO motherboard
- Intel ® Core ™ i5-2500K CPU @ 3.30GHz
(4 cores, no hyperthreading)
- 16 gigabytes of 1333Mhz DDR3 RAM (Stream benchmark of 16.4 Gigabytes/sec)
- 3 x 1 Terrabyte Western Digital WD10EZRX
disks (setup in a single ZFS pool).
- Operating system - FreeBSD ® 10.2 RELEASE
- Total cost - less than $1000
Note: This is a four year old test server from our test lab. The
system uses a stock standard FreeBSD 10.2 installation. No special
modifications are performed except for a few sysctl values bumped
Network Topology Monitored:
Poller CPU usage (1 second samples)
- Total of 7540 devices
- 2998 devices using SNMPv2
- 4542 devices using SNMPv3 (SHA Authentication, AES 256 encryption)
- Total of 407,160 interfaces (13 objects per interface)
- 5,360,940 polled MIB objects per minute (average of 134,000 per second),
which includes things like CPU, Memory, Temperature, PSU and Fan States,
- 30,160 pings per minute (502 per sec)
The poller has a 40 second SNMP polling window, starting at second
5 and ending at second 45. This leaves the system mostly idle at
the start and end of each minute for other processes to do work
with little CPU and I/O contention.
You'll see from the Poller CPU graph, there is still a lot
of available headroom on this system to at least double the number
of polled MIB objects.
AES encryption to 4500 SNMPv3 devices has very little impact
because the encryption is offloaded to hardware on the Core i5. The
poller constantly tracks the Engine ID, BootTime and BootCount for
each SNMPv3 device, therefore rarely needs to perform engine
Poller Context Switches (1 second samples)
Involuntary context switching
can significantly impact performance, which is one of the major drawbacks
of multi-threaded pollers. Since the AKIPS Ping/SNMP poller is a single
monolithic process, it does not experience high levels of involuntary context
switching when there are ample idle CPU cores available.
Some Observations ...
- A single AKIPS poller process easily scales to 10 million MIB objects per
minute using commodity grade hardware. There will always be an upper limit
but we have the luxury of firing off a second and third poller process.
- A well engineered asynchronous poller absolutely blows away a poller running
hundreds of processes or threads. It's like turning up to drag race: one car
with a well tuned supercharged V8 whilst the other with hundreds of weed
wacker motors, each with their own fuel tank, spark plug and starter cord.
- How well does AKIPS scale against other commercial or open source products ?
That's very difficult to determine because every vendor uses their own
"gobbly goop" terminology, so you are not comparing apples with
apples. AKIPS collects 13 MIB objects for every interface every minute,
while most other vendors collect a smaller subset at five minute intervals.
- With network security such an important issue these days, we highly recommend
moving entirely to authenticated and encrypted SNMPv3. There is very little
measuable impact to data collection.
- AKIPS runs solely on FreeBSD®,
whilst most of other vendors use one of the many Linux® distributions.
- The AKIPS SNMP implementation has been engineered from scratch, whilst
most other commercial and open source software are based on the legacy
implementations like Net-SNMP.