Blog O' Matty


Kubernetes Conntrack Requirement

This article was posted by Matty on 2018-01-17 15:22:27 -0500 EST

This evening while building out a new cluster I came across another fun kubelet error:

Jan 17 20:15:53 kubworker5.prefetch.net kube-proxy.v1.9.0[23071]: E0117 20:15:53.807410   23071 proxier.go:1701] Failed to delete stale service IP 10.2.0.10 connections, error: error deleting connection tracking state for UDP service IP: 10.2.0.10, error: error looking for path of conntrack: exec: "conntrack": executable file not found in $PATH

The message was relatively straight forward. My kubelet daemon couldn’t find the conntrack executable to remove a service. If you aren’t familiar with conntrack(8) the manual page has a solid description:

conntrack provides a full featured userspace interface to the netfilter connection tracking system that is intended to replace the old /proc/net/ip_conntrack interface. This tool can be used to search, list, inspect and maintain the connec‐ tion tracking subsystem of the Linux kernel. Using conntrack , you can dump a list of all (or a filtered selection of) currently tracked connections, delete connections from the state table, and even add new ones.

What perplexed me about this was the use of exec() to interface with conntrack. I had been under the assumption that Kubernetes used the native APIs exposed to userland through the netfilter conntrack shared library. After 20-minutes of reading code I came across the ClearUDPConntrackForIP in conntrack.go which cleared that up:

err := ExecConntrackTool(execer, parameters...)

Installing the conntrack executable on my workers cleared up the issue and my service was removed. I’m learning the only way to truly learn Kubernetes is by reading code. And there’s a LOT of code. :)

Using the output of a command to control ansible playbook flow

This article was posted by Matty on 2018-01-06 11:37:34 -0500 EST

I’ve been spending a good amount of my spare time trying to learn the ins and outs of kubernetes and terraform. To really get the gist of how Kubernetes works under the covers I’ve been automating Kubernetes the hard way with terraform and ansible. There are a a couple of dependencies in the Kubernetes world. One dependency is the control plane’s reliance on etcd. After configuring and starting my etcd cluster I wanted to check the cluster health before moving forward. You can retrieve the health status of an etcd node with the endpoint health option:

$ etcdctl endpoint health

http://127.0.0.1:2379 is healthy: successfully committed proposal: took = 651.381µs

Ansible provides a really cool feature to assist with these situations: the do-until loop. The do-until loop allows you to run a command a fixed number of times (the retries parameter contains the #) and continue once the until criteria is met. In my case I had ansible check for ‘is healthy’ in the stdout:

---
- hosts: kubcontrollers
  tasks:
  - shell: etcdctl --endpoints=[http://127.0.0.1:2379] endpoint health
    register: result
    until: result.stdout.find("is healthy") != -1
    retries: 5
    delay: 10

I’ve read through a few playbooks that use this to accomplish rolling restarts and upgrades. Nifty feature!

Getting your kubernetes node names right

This article was posted by Matty on 2017-12-30 10:28:23 -0500 EST

This past weekend while bootstrapping a new kubernetes cluster my kubeletes started logging the following error to the systemd journal:

Dec 30 10:26:10 kubworker1.prefetch.net kubelet[1202]: E1230 10:26:10.862904    1202 kubelet_node_status.go:106] Unable to register node "kubworker1.prefetch.net" with API server: nodes "kubworker1.prefetch.net" is forbidden: node "kubworker1" cannot modify node "kubworker1.prefetch.net"

Secure kubernetes configurations use client certificates along with the nodename to register with the control plane. My kubeconfig configuration file contained a short name:

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: STUFF
    server: https://apivip:443
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: system:node:kubworker1
  name: default
current-context: default
kind: Config
preferences: {}
users:
- name: system:node:kubworker1
  user:
    as-user-extra: {}
    client-certificate-data: STUFF
    client-key-data: STUFF

But the hostname assigned to the machine was fully qualified:

$ uname -n

kubworker1.prefetch.net

After re-reading the documentation there are two ways to address this. You can re-generate your certificates with the FQDN of your hosts or override the name with the kubelet ‘–hostname-override=NAME’ command line option. Passing the short name to the kubelet ‘–hostname-override’ option provided a quick fix and allowed my host to register:

$ kubectl get nodes

NAME         STATUS    ROLES     AGE       VERSION
kubworker1   Ready     <none>    13m       v1.9.0

I need to do some additional digging to see what the best practices are for kubernetes node naming. That will go on my growing list of kubernetes questions to get answered.

Debugging a silly node application bug with the inspect interface

This article was posted by Matty on 2017-12-23 11:51:02 -0500 EST

Last night while working on one of my many side projects I came across a relly weird Javascript issue. Here is an extremely simplified version of the code I was debugging:

$ cat app.js

const myLib = require('./lib.js');
var i = myLib.populateObject("Ollie", "Awesome");
console.log(i);

$ cat lib.js

var populateObject = (name, breed) => {
    var obj = { name: name, breed: breed }
    return 
    {
        obj
    }
}

module.exports = {
    populateObject
}

When the code ran it would return undefined for the object called obj even though console.log() showed it as a valid object inside the function:

$ node app.js

undefined

A seasoned Javascript developer would look at the code above and immediately see the flaw. Being new to javascript it wasn’t immediately clear to me why this wasn’t working. So, I figured this would be as good a time as any to learn how to use the inspect debugger to toubleshoot my issue. The inspect debugger can be accessed by invoking node with the inspect option and the code to run:

$ node inspect app.js

< Debugger listening on ws://127.0.0.1:9229/8d8caf5a-07d1-4043-9418-1acc6935c973
< For help see https://nodejs.org/en/docs/inspector
< Debugger attached.
Break on start in app.js:1
> 1 (function (exports, require, module, __filename, __dirname) { const myLib = require('./lib.js');
  2 var i = myLib.populateObject("Ollie", "Awesome");
  3 console.log(i);

Once you are inside the debug shell you can use c to continue execution, n to step to the next line, s to step into functions, o to step out of functions and bt to get a backtrace. You can also use setBreakpoint and clearBreakpoint to set and clear breakpoints on specific lines of code and repl to interrogate objects. To see what was going on I stepped into the populateObject function, set a breakpoint on the return line and hit n to see what was run:

debug> s
break in lib.js:2
  1 (function (exports, require, module, __filename, __dirname) { var populateObject = (name, breed) => {
> 2     var obj = { name: name, breed: breed }
  3     return 
  4     {

debug> setBreakpoint(3)
  1 (function (exports, require, module, __filename, __dirname) { var populateObject = (name, breed) => {
  2     var obj = { name: name, breed: breed }
> 3     return 
  4     {
  5         obj
  6     }
  7 }
  8 

debug> c
break in lib.js:3
  1 (function (exports, require, module, __filename, __dirname) { var populateObject = (name, breed) => {
  2     var obj = { name: name, breed: breed }
> 3     return 
  4     {
  5         obj

debug> n
break in lib.js:3
  1 (function (exports, require, module, __filename, __dirname) { var populateObject = (name, breed) => {
  2     var obj = { name: name, breed: breed }
> 3     return 
  4     {
  5         obj

debug> n
break in app.js:3
  1 (function (exports, require, module, __filename, __dirname) { const myLib = require('./lib.js');
  2 var i = myLib.populateObject("Ollie", "Awesome");
> 3 console.log(i);
  4 
  5 });

The return completed but for some reason line 5 wasn’t being evaluated. Then it dawned on me! One of my Javascript instructors mentioned that semicolons were implied if they aren’t specifically specified. So my hypothesis became was there an implicit semicolon added to the return that casued it to return without any values? To see if that was the case I moved the opening brace to be on the same line as the return statement:

var populateObject = (name, breed) => {
    var obj = { name: name, breed: breed }
    return {
        obj
    }
}

module.exports = {
    populateObject
}

And ran my program again. Low and behold it worked! To truly verify this was the issue I re-ran the program under inspect and compared the new output with the previous output I collected:

debug> s
break in lib.js:2
  1 (function (exports, require, module, __filename, __dirname) { var populateObject = (name, breed) => {
> 2     var obj = { name: name, breed: breed }
  3     return {
  4         obj

debug> setBreakpoint(3)
  1 (function (exports, require, module, __filename, __dirname) { var populateObject = (name, breed) => {
  2     var obj = { name: name, breed: breed }
> 3     return {
  4         obj
  5     }
  6 }
  7 
  8 module.exports = {

debug> n
break in lib.js:3
  1 (function (exports, require, module, __filename, __dirname) { var populateObject = (name, breed) => {
  2     var obj = { name: name, breed: breed }
> 3     return {
  4         obj
  5     }

debug> n
break in lib.js:5
* 3     return {
  4         obj
> 5     }
  6 }
  7 

debug> n
break in app.js:3
  1 (function (exports, require, module, __filename, __dirname) { const myLib = require('./lib.js');
  2 var i = myLib.populateObject("Ollie", "Awesome");
> 3 console.log(i);
  4 
  5 });

debug> repl
Press Ctrl + C to leave debug repl
> i
{ obj: Object }
> console.log(i)
< { obj: { name: 'Ollie', breed: 'Awesome' } }

In the output above you can see that we are now reaching line 5 in the program which wasn’t occurring before. This was a silly bug (the result of staying up WAY too late to code) but I learned a TON about the inspect interface in the process. Hopefully as I write more code and study the correct way to do things silly bugs like these will become less prevalent. Viva inspect!

Disabling LLMNR on hosts that use the systemd stub resolver

This article was posted by Matty on 2017-12-22 07:19:26 -0500 EST

While performing a routine audit of my desktop this morning I noticed that the systemd stub resolver was listening on TCP port 5355:

$ netstat -pant | grep 5355

tcp        0      0 0.0.0.0:5355            0.0.0.0:*               LISTEN      2236/systemd-resolv 

TCP port 5355 is used for Link-Local Multicast Name Resolution (LLMNR) which is completely unnecessary for my set up at home. So I ventured off to /etc/systemd and came across the resolved.conf file. While perusing resolved.conf(5) I came across the following two configuration directives:

LLMNR=
Takes a boolean argument or “resolve”. Controls Link-Local Multicast Name Resolution support (RFC 4794[1]) on the local host. If true, enables full LLMNR responder and resolver support. If false, disables both. If set to “resolve”, only resolution support is enabled, but responding is disabled. Note that systemd-networkd.service(8) also maintains per-link LLMNR settings. LLMNR will be enabled on a link only if the per-link and the global setting is on.

MulticastDNS=
Takes a boolean argument or “resolve”. Controls Multicast DNS support (RFC 6762[2]) on the local host. If true, enables full Multicast DNS responder and resolver support. If false, disables both. If set to “resolve”, only resolution support is enabled, but responding is disabled. Note that systemd- networkd.service(8) also maintains per-link Multicast DNS settings. Multicast DNS will be enabled on a link only if the per-link and the global setting is on.

Setting both values to no and bouncing the systemd-resolved.service service stopped systemd-resolv from listening on *:5355. If you are interested in learning more about LLMNR you can check out RFC 4795. This helped me clarify a number of questions I had about the protocol and when to use it.