$USER@bit-of-a-byte ~ $ cat $(find /var/log -name '*.log' -print | awk 'NR>10&&NR<=15')

An Actual Update

So I recently purchased a new MacBook Pro, and in the time it took me to get the device, add in all my new stuff, migrate everything over, establish a new backup system, and contend with school work, this blog ended up becoming very neglected. It didn’t help matters much when I logged on from Safari and realised none of my fonts were working properly.

I believe I have now solved the font issue (woff files vs woff2 files), and I have completed all the setup of this system, including getting my ruby workflow transitioned. So, in theory, we will be able to get back to the good stuff here shortly.

I have recently joined a small game development startup with some very talented people, and was berated into using C++ for the game by the way this particular industry works, so you can expect some future posts about my frustrations with developing a cross platform game engine in C++.

The goal, for now, is to do the following:

  • Write a post on backup software evaluation, and why I went with the software I did go with.
  • Finish up the Icinga2 Tutorials that have driven quite a lot of traffic to this page.
  • Write some stuff on game design.

So we will see how that works out. As always, thanks for reading.

P.S. I have decided that I don’t really like having analytics on this page, so over the next few days I am going to root through the code and remove them. Thank you for your patience.

Blog Cleanup

I just spent the past thirty minutes going back and removing a number of the links that were in my posts up to this point. Nothing important, just links to Wikipedia or Man Pages, but after speaking with a few friends, I realized the links weren’t exactly as useful as I thought they were. In addition to this my build time on Travis-CI had hit about four minutes and all I have to build is about eight pages, so that’s absurd, and not really scalable. All the important links have been left in, along with any interesting or amusing ones.

If you are looking for the links, you can go into the github repository for this site, and look for the commits prior to “Automatic Commit: cleaning up links”.

This also helps to maintain my sanity, so there’s that. Fifty links in one article was causing me to worry quite a lot about dead links.

Icinga2 Tutorial Part 4 - Expanding Checks to SNMP

EDIT (2018/12/09): These guides haven’t been updated since 2015. It is possible that there are dead links, or that the configuration syntax has changed dramatically. These posts are also some of the most popular on my blog. I plan to do a new guide eventually, but for right now please take the following entries with a grain of salt.

Introduction ## {: #icinga2-part-4-introduction }

Well I have finally persuaded myself to continue writing these posts by completely deleting all the configuration I had already set up. It is worth noting that I have switched over to Debian Jessie, for no other reason than to cause myself more frustration and suffering. Anyways, let’s get started.

SNMP is considered an Agent-Based Check, and is actually quite flexible. You can even go as far as to code in custom return options, to check things you normally wouldn’t be able to check over snmp, for example, apt status, and other such things.

It is worth noting that due to using a very small LAN, I will not be fiddling around with SNMPv3, I will be going with straight SNMPv1, just with a modified community string. We will get started with my core router, Djehuti. It is outside the scope of this tutorial to discuss how to enable SNMP on your device, but if you use a Ubiquiti device, hey that might come soon.

Starting from this post forward, I will be embedding code here instead of referring to an external link, as embedding will encourage me to be a bit more complete in my explanations. So, with all of that said, let’s get started.

Initial Setup

To monitor SNMP we will be using the Manubulon SNMP Plugins. So we first need to install them.

zyradyl@captor:~$ sudo apt-get install nagios-snmp-plugins

Now we need to open up the main Icinga2 Configuration file and add in the proper include to allow us to use these plugins. You may notice while poking around this file that there are many things you either don’t need or would like to change. I do plan to come back to this file at a later time, but feel free to edit this file before that happens. Once you have made the proper changes, restart Icinga2 so the new settings take effect.

zyradyl@captor:~$ sudo vim /etc/icinga2/icinga2.conf

    include <manubulon>

zyradyl@captor:~$ sudo service icinga2 restart

With that, we can move on to creating configuration files!

Djehuti

We will be starting with my core router, which is running SNMPv1. The first thing we will want to do is to add some essential variables to our host directive so that we don’t have to redefine them with every service.

//
// Host Declaration Block
//
object Host "djehuti.zyradyl.org" {
    // Define the host IPv4 Address
    address             = "10.0.0.1"
    // Define a basic functionality test
    // Hostalive does a basic ICMP ECHO to the target
    // specified in the address directive.
    check_command       = "hostalive"
    // Define SNMP Variables
    vars.snmp_address   = "10.0.0.1"
    vars.snmp_community = "zyradyl"
    // These are not strictly needed. I add them
    // so I know at a glance what version of snmp
    // I am using.
    vars.snmp_v2        = "false"
    vars.snmp_v3        = "false"
}

The new additions are any of the var.snmp* commands located under the check_command line. With our host variables set up, we can now move on to defining a service. The first service defined in the Icinga2 Manubulon Documentation is the snmp-load check. Seems like a good starting place to me!

SNMP-Load

//    
// Service Declaration Block
// Service:     snmp_load
// Description: Uses SNMP commands to check the load averages
//              on the device.
//
object Service "snmp-load" {
    host_name           = "djehuti.zyradyl.org"
    // Set the type of load check to use.
    vars.snmp_load_type = "netsl"
    // Set the Load Average warning threshold.
    vars.snmp_warn      = "5,3,2"
    // Set the Load Average critical threshold.
    vars.snmp_crit      = "6,5,3"
    check_command       = "snmp-load"
}

I feel I should take a minute to explain the warning and critical variables, because the icinga2 documentation doesn’t do a very good job. When checking load averages on *nix systems, there are three parameters:

  • Average Load over one minute
  • Average Load over five Minutes
  • Average Load over fifteen minutes

Since my router is a dual core device, I have set it up so that if the system is at full load for 15 minutes, I get a warning. If it has one process over full load for five minutes, I get a warning. If it is three processes over full load in one minute, I want a warning. Same thing applies to critical. If you are trying to figure out what to set your levels at, I tend to use the following formulas:

  • Warning:
    • 1min: 2*(Number of Cores)+1
    • 5min: (Number of Cores)+1
    • 15min: (Number of Cores)
  • Critical:
    • 1min: 3*(Number of Cores)
    • 5min: 2*(Number of Cores)+1
    • 15min: (Number of Cores)+1

Once you have your file saved, restart Icinga2, and check the web interface. Your new check will likely have an Unknown Status in purple, just click on the check, and manually run it by clicking “Check Now” in the right most panel.

With that, we can move on to the next check!

SNMP-Memory

//
// Service Declaration Block
// Service:     SNMP-Memory
// Description: Uses SNMP commands to check status of RAM
//              and swap on the device.
//
object Service "snmp-memory" {
    host_name      = "djehuti.zyradyl.org"
    // Set the Memory warning for Ram and swap Respectively.
    // Uses percents.
    vars.snmp_warn = "50,0"
    vars.snmp_crit = "80,0"
    check_command  = "snmp-memory"
}

The warning and critical values are expressed as percentages of the total amount of their applicable setting. The first one applies to RAM and the second value corresponds to swap. Restart Icinga2 and log on to the web interface to check that the new service works.

SNMP-Storage

//
// Service Declaration Block
// Service:     SNMP-Storage
// Description: Uses SNMP commands to check the status of disk
//              storage space.
//
object Service "snmp-storage" {
    host_name              = "djehuti.zyradyl.org"
    // Uses percents.
    vars.snmp_warn         = "50"
    vars.snmp_crit         = "80"
    // Specify which partition to monitor
    vars.snmp_storage_name = "/root.dev"
    check_command          = "snmp-storage"
}

The snmp_storage_name variable is used to specify which device you want to check the status of. If you aren’t sure which device you need to check, set it to blank, then let it run. It will return a list of partitions that you can check. Simply enter the name into that variable and you are good to go.

Just as memory, snmp-storage uses percent values in the warning and critical threshold variables.

SNMP-Interfaces

I personally like to specify a different service block for each interface that I am monitoring, so I am not sure if it is possible to mix interfaces together, but I don’t see any reason it wouldn’t be possible. I’m going to list the interface configurations below, and if any variables need to be explained I will do that below the code.

//
// Service Declaration Block
// Service:     SNMP-Interface
// Description: Uses SNMP commands to check the status of
//              various network interfaces on device.
//
object Service "snmp-int-lan" {
    host_name                      = "djehuti.zyradyl.org"
    // Define interface variables.
    vars.snmp_interface            = "eth0"
    vars.snmp_interface_label      = "LAN"
    vars.snmp_interface_perf       = "true"
    vars.snmp_interface_bits_bytes = "true"
    vars.snmp_interface_megabytes  = "true"
    vars.snmp_interface_noregexp   = "true"
    vars.snmp_warncrit_percent     = "true"
    // Set warning and crits to 100 to disable.
    vars.snmp_warn                 = "100,100"
    vars.snmp_crit                 = "100,100"
    check_command                  = "snmp-interface"
}

//
// Service Declaration Block
// Service:     SNMP-Interface
// Description: Uses SNMP commands to check the status of
//              various network interfaces on device.
//
object Service "snmp-int-wan" {
    host_name                      = "djehuti.zyradyl.org"
    // Define interface variables.
    vars.snmp_interface            = "eth1"
    vars.snmp_interface_label      = "WAN"
    vars.snmp_interface_perf       = "true"
    vars.snmp_interface_bits_bytes = "true"
    vars.snmp_interface_megabytes  = "true"
    vars.snmp_interface_noregexp   = "true"
    vars.snmp_warncrit_percent     = "true"
    // Set warning and crits to 100 to disable.
    vars.snmp_warn                 = "100,100"
    vars.snmp_crit                 = "100,100"
    check_command                  = "snmp-interface"
}

//
// Service Declaration Block
// Service:     SNMP-Interface
// Description: Uses SNMP commands to check the status of
//              various network interfaces on device.
//
object Service "snmp-int-dmz" {
    host_name                      = "djehuti.zyradyl.org"
    // Define interface variables.
    vars.snmp_interface            = "eth2"
    vars.snmp_interface_label      = "DMZ"
    vars.snmp_interface_perf       = "true"
    vars.snmp_interface_bits_bytes = "true"
    vars.snmp_interface_megabytes  = "true"
    vars.snmp_interface_noregexp   = "true"
    vars.snmp_warncrit_percent     = "true"
    // Set warning and crits to 100 to disable.
    vars.snmp_warn                 = "100,100"
    vars.snmp_crit                 = "100,100"
    check_command                  = "snmp-interface"
}

So a few things in here need some explanation. The variable vars.snmp_interface specifies which interface we will be checking. vars.snmp_interface_noregexp is related to this in that it tells icinga2 to not use regex matching. vars.snmp_interface_label configures a label that will be shown in the console. vars.snmp_interface_megabytes, and vars.snmp_interface_bits_bytes tells Icinga2 that we want to see bandwidth measured in megabits. These variables can be adjusted accordingly. Finally, vars.snmp_interface_perf tells Icinga2 that we want to monitor bandwidth usage.

As for warning and critical values, while I like to monitor my bandwidth, I don’t actually care how high it goes, at least not at the moment. More relevant than that is the fact that my bandwidth is much less than a gigabit, but let’s move on from that. vars.snmp_warncrit_percent says that we are going to specify our warning and critical thresholds as a percent of total available bandwidth on that port. I then set vars.snmp_warn, and vars.snmp_crit to 100 so that it is effectively disabled.

Once activating these services, you should reset Icinga2. It is worth noting that you will first get a pending, and then an unknown status for about five minutes, depending on your check time. Icinga compares the newest reading to a previous reading that is sufficently old enough, which is usually about five minutes, to calculate what has changed. Until you have a row in the database that is the proper age, you will get a big Unknown status. Nothing to worry about, check back in a half hour.

Conclusion ## {: #icinga2-part-4-conclusion }

There is one more snmp check that is available, and that is the process check. While I previously used this setup on my core router, it ended up causing some rather wonky effects, so I have elected to not use it. This check would be useful to monitor the status of a mission critical process, such as a webserver or even a database server. It works by searching the process list for the number of times a string appears, and then going from there. I may cover this in the future, but I won’t be at the moment.

Thank you for reading, and I hope to have part five up with less of a lag time. I am also planning to do Icinga2 integration with slack soon, so stay tuned for that!

HughesNet Gen4 and IPv6 PMTUD: A Tragedy

So before I went to bed last night I started experiencing some very odd issues with my connection. I could connect to Skype, but I couldn’t visit twitter. I could talk to Google, but not GitHub. I was able to ping my HT1100 gateway, but my Icinga2 monitoring system reported a socket timeout of longer than 10 seconds on HTTP.

I spent about three hours on it before I finally went to bed. I even went as far as to spend five more dollars to purchase a 500MB token in order to see if maybe I was being penalized for using too much data in my throttled state, as I have been making use of aria2 to manage large downloads that would otherwise suffer from a mysterious decryption failure when I was downloading via HTTPS. Didn’t help.

This morning I bit the bullet and changed my LAN’s MTU from 1280 to 1500. You may be wondering why I refer to this as having to bite the bullet, since an MTU of 1500 is standard. Well, come to find out that PMTUD is broken on HughesNet. Something about what is done on the HughesNet side causes the packets to become too large. Now, IPv6 is supposed to handle this by sending ICMPv6 Type 2 “Packet too Large” notices to the end point. While debugging PMTUD the other day (about a week ago) I set up my firewall so that ALL ICMPv6 is allowed, rather than having to itemize the different types. Still, no good. I was getting silent failures that I had to use test-ipv6.com to resolve, and they still indicated packets were becoming too large for my connection. In a fit of irritation, I set my entire LAN to an MTU of 1280. Eureka, I have IPv6 functionality.

Obviously after last night, I can’t exactly use that solution anymore, for a reason I have yet to understand. So my MTU is set back to 1500, and without using HughesNet’s squid proxy (what they call web acceleration), IPv6 fails. Oh, I didn’t mention that did I? Yeah, their web acceleration lets IPv6 work. However, I run my own squid proxy, locally, which is even faster than theirs. It also saves me bandwidth. So I keep web acceleration off.

  • Web Acceleration On = PMTUD works, IPv6 BUT I have double caching. Not good.
  • Web Acceleration Off = IPv6 Fails. Not acceptable.

Just to see if I could get something reset in the modem, I even called tech support. The solution of “It is a problem with your LAN” was obvious, but still frustrating due to the issues described above. However, I am planning to reach out to their support today via social media, so I am pushing this article live, without links, just so I have something to refer to. I will come back to link to the major terms in this a little later. Wish me luck.

Just as an example..

of what I mean when I say I spend my time meddling in software or hardware that I really don’t have any business meddling in, I’ve got this website split into two branches now: Master, and Development. I do all my work in development, then push to remote. Travis-CI then pulls the changes and builds them, then runs HTML-Proofer over the output, and then sends me an e-mail if something is broken. If nothing is broken, then I rebase my Development branch on master, and then merge it into master to make sure my changes are preserved.

I guess it should read “I meddle in software that I don’t need in the slightest.”