Cody Bunch Some Random IT Guy - OpenStack, DevOps, Cloud, Things

[PowerShell] Joining a domain

Recently my world has been centered more and more around Windows. Lately this is not a Bad Thing™. In fact, Windows Server Core and PowerShell have both come a LONG way. In the not so recent past, I wrote about how to set up Active Directory with PowerShell. In this post, I show you how to use PowerShell to join said domain.

Getting Started

The process that follows assumes you have:

  • An Active Directory domain.
  • A server to join to the domain.
  • Optional: Said server is Windows Server Core
  • A user account with access to said domain.
  • A local user account on the server to be joined.

How to do it

To join the server to the domain, we will:

  1. Set DNS to use the Domain Controller
Get-NetAdapter -InferfaceIndex
Set-DnsClientServerAddress -InterfaceIndex 2 `
    -ServerAddresses ("10.127.16.100")
  1. Optional: Rename the computer

This can be done in two ways, either rename & reboot, or rename as part of the join. I have found the two reboot process to work more consistently.

Rename and reboot

# If you have more work to do, remove the -Reboot 
# from the Rename-Computer command and use:
# Reboot-Computer -Force

Rename-Computer -NewName "app-01" -Reboot

Rename at Join

Add-Computer -DomainName "codybunch.local" `
    -NewName "app-01" `
    -LocalCredential 'Administrator' `
    -DomainCredential 'codybunch\Administrator' `
    -Restart -Force
  1. Join the domain

Last, we join the domain.

Note: Only the -DomainName parameter is required. If the others are left unspecified, you will be prompted.

Add-Computer -DomainName "codybunch.local" `
    -LocalCredential 'Administrator' `
    -DomainCredential 'codybunch\Administrator' `
    -Restart -Force

My cloudbase-init.conf file

Because finding a complete sample was harder than I want to admit, here is a copy of my current cloudbase-init file.

Original source here.

NSXpert in 72 hours? VMware NSX Resources

Asking for NSX Help

Community is a wonderful thing. Just today I needed to learn all I could about NSX. The goal was to become an NSX “Expert” by Monday. NSX is a bit too complex for that, but, when asked, the community responded with plenty of links and suggestions. What follows here are a pile of links and my rough plan of attack.

Video Resources

While I’m sure there are more out there, I plan to start with the #vBrownBag NSX series by Tim Davis (@ALDTD). There are 3 pretty intense videos in the series:

Also worth mentioning, the official VMware NSX YouTube Channel.

UPDATE 2018-06-09 @ 09:40

Written Resources

There are a huge number of guides here, but they are comprehensive.

You will need to search for NSX specific labs. Recommended was getting started and then working as best as you could along with the study guides linked prior.

In depth guides for getting up and going on VMware NSX

Validated Reference Designs for NSX

When all else fails check the docs.

When the docs fail, check the communities forums.

UPDATE 2018-06-09

A few more links came up overnight that should be shared.

A PDF Guide to architecting NSX solutions for service providers. This will help you wrap your head around some of the considerations for how to deploy NSX within the context of the VMware SDDC.

The ultimate HOL, is of course, setting it up yourself. This is a subset of the docs that walks you through an installation.

UPDATE 2018-06-09 @ 09:40 More Links!

UPDATE 2018-06-09 @ 12:21 Additional links from @williamwilloby

Classes

While I don’t think I’ll have time over the next few days to attend a class, VMware has quite a few available: https://mylearn.vmware.com/mgrReg/plan.cfm?plan=48389&ui=www_edu

Cheat Sheet - ipmitool

Some quick commands I’ve found handy for operating remote systems via ipmitool:

Check status:

root@lab-c:~# ipmitool -I lan -H 10.127.20.10 -U root chassis status
Password:
System Power         : off
Power Overload       : false
Power Interlock      : inactive
Main Power Fault     : false
Power Control Fault  : false
Power Restore Policy : unknown
Last Power Event     :
Chassis Intrusion    : inactive
Front-Panel Lockout  : inactive
Drive Fault          : false
Cooling/Fan Fault    : false
Sleep Button Disable : allowed
Diag Button Disable  : allowed
Reset Button Disable : allowed
Power Button Disable : allowed
Sleep Button Disabled: false
Diag Button Disabled : false
Reset Button Disabled: false
Power Button Disabled: false

Power Operations:

Useful here are on, off, soft, cycle, reset:

root@lab-c:~# ipmitool -I lan -H 10.127.20.10 -U root power on
Chassis Power Control: Up/On

root@lab-c:~# ipmitool -I lan -H 10.127.20.10 -U root power off
Chassis Power Control: Down/Off

root@lab-c:~# ipmitool -I lan -H 10.127.20.10 -U root power soft

root@lab-c:~# ipmitool -I lan -H 10.127.20.10 -U root power reset

Change to / from pxe boot:

root@lab-c:~# ipmitool -I lan -H 10.127.20.10 -U root chassis bootdev pxe
Set Boot Device to pxe

root@lab-c:~# ipmitool -I lan -H 10.127.20.10 -U root chassis bootdev bios
Set Boot Device to bios

Reset the ipmi controller:

Note: This may need to be sent more than once to actually do the thing.

root@lab-c:~# ipmitool -I lan -H 10.127.20.10 -U root  mc reset [ warm | cold ]
Sent cold reset command to MC

Set a bunch of hosts to pxe & reboot:

Here I’ll supply two of these. The first will do hosts in parallel, with a random delay up to MAXWAIT. This is for any number of reasons. The primary being to be nice to the power infrastructure where you are performing the resets. It is also a useful snippet for chaos style resets.

seq -f "10.127.20.%g" 1 100 | xargs -I X "sleep $((RANDOM % MAXWAIT)) ipmitool -I lan -H X -U root chassis bootdev pxe && ipmitool -I lan -H X -U root power reset"

This second option is a bit more yolo, and fires off all the resets ta once.

seq -f "10.127.20.%g" 1 100 | xargs -I X "ipmitool -I lan -H X -U root chassis bootdev pxe && ipmitool -I lan -H X -U root power reset"

Building ADFS with PowerShell

I have had a need recently to have a number of OpenSource projects authenticate against Microsoft Active Directory. While there are many ways to do this, ADFS, or Active Directory Federation Services allows us to use SAML, which in turn can be tied into 3rd party Single Sign On tools (Okta, Facebook, etc.)

Getting started

In order to use this script, you will need:

  • A Windows server, either 2012R2 or 2016
  • Active Directory
    • Schema level of at least 2012R2
  • User account with Domain Admin permission
  • PowerShell 5.x
    • Older versions may work, but are untested

Installing ADFS with PowerShell

To install ADFS with powershell, log into the Windows server where ADFS is to be deployed, and:

  1. Open PowerShell
  2. Download the script (Full script also included below)
  3. Review & run the script

How it works

Now that you’ve installed ADFS, let’s examine what we actually ran.

The script first installs NuGet. This is used to install 3rd party modules.

Get-PackageProvider -Name NuGet -ForceBootstrap
Install-PackageProvider nuget -Force

Next, the PSPKI module is installed and loaded into the current shell. We use this module to create the self-signed SSL certificate needed to install ADFS:

Install-Module -Name PSPKI -Force
Import-Module -Name PSPKI

With the PSPKI module loaded, we can now create a self-signed SSL certificate, and install it into the Windows certificate store:

Note: Replace $fqdn with the FQDN for the ADFS host.

$selfSignedCert = New-SelfSignedCertificateEx -Subject "CN=$fqdn" `
    -ProviderName "Microsoft Enhanced RSA and AES Cryptographic Provider" `
    -KeyLength 2048 -FriendlyName 'OAFED SelfSigned' `
    -SignatureAlgorithm sha256 `
    -EKU "Server Authentication", "Client authentication" `
    -KeyUsage "KeyEncipherment, DigitalSignature" -Exportable `
    -StoreLocation "LocalMachine"

$certThumbprint = $selfSignedCert.Thumbprint

When creating the SSL certificate, we stored the thumbprint for the certificate in a variable so we can use it again when configuring ADFS.

The next several commands are responsible for installing and configuring the ADFS role:

$user  = "$env:USERDOMAIN\$env:USERNAME"
$credential = New-Object `
    -TypeName System.Management.Automation.PSCredential `
    -ArgumentList $user, $securePassword

Install-WindowsFeature -IncludeManagementTools -Name ADFS-Federation

Import-Module ADFS
Install-AdfsFarm -CertificateThumbprint $certThumbprint `
    -FederationServiceName $fqdn `
    -ServiceAccountCredential $credential

This final bit of the script grabs the username of the current user, and then creates a credential object for the service account that ADFS will use.

Next it installs the ADFS feature Install-WindowsFeature.

The final bit imports the ADFS PowerShell module and configures ADFS to:

  • Use the SSL certificate created earlier
  • Assign a service name. (All the ADFS URLs use this)
  • Assign the service account

Test it out

You can validate that the ADFS role was installed and is running by browsing to https://<FQDN OF HOST>/adfs/fs/federationserverservice.asmx after the certificate warning you should get a bunch of XML.

You can also validate ADFS with the following PowerShell commands:

Install-Module ADFSDiagnostics -Force
Import-Module ADFSDiagnostics -Force

Test-AdfsServerHealth | ft Name,Result -AutoSize

If ADFS is working, you’ll see something like this:

ADFS is working

There is more!

The script provided creates a self-signed SSL certificate. While that will get you up and running in the lab, is not how you should deploy this in production. If you have a different certificate, say from an internal CA, or otherwise trusted CA, you can use it with this script. First ensure it is part of your Windows certificate store, then substitute your certificate’s thumbprint in the following line and continue to use the script:

$certThumbprint = "Your SSL Cert Thumbprint here"

Summary

In this post we used PowerShell to install, configure, and validate Active Directory Federation Services (ADFS). This in turn enables you to use Active Directory as an identity provider with all manner of 3rd party SSO tools.

Resources

Slack Night Mode with Rambox

This is a quick post to remind me how I got around the eye-razors that the default bright colored Slack client is. First, the end result:

NightMode Screenshot

So, it’s not quite perfect, but it’s workable. The theme itself is CSS, and there are a few ways to get Slack to use said CSS, depending on how you consume Slack. The links in the resources section below discuss how to do it via browser or desktop client. What follows here, is how to apply said theme using Rambox.

Note: I feel like I’m a bit late to the party both theme wise and to Rambox. Rambox is everything Adium / Pidgin wanted to be when it grew up, and lets me pull in Slack, Tweetdeck, and others into one spot.

Night Mode for Slack in Rambox

To “enable” night mode, open Rambox, and then select “configure” for the Slack service to change:

configure slack

In the resulting window, expand the “Advanced” section at the bottom:

Advanced settings

In the “Advanced” text field, copy and paste the code from here.

Resources

The theme itself, along with how to force Rambox to load it came from here:

Metrics: Part 2 - InfluxDB

The first stop in our metrics adventure was to install and configure Netdata to collect system level statistics. We then configured all of the remote Netdata agents to send all of their data to a central proxy node. This is a great start, however, it is not without some challenges. That is, the data supplied by Netdata, while extensive, needs to be stored elsewhere for more than basic reporting, analysis, and trending.

What then should we do with said data? In addition to allowing you to stream data between Netdata instances as we did in our prior post, you can also stream to various databases, both standard and specialized.

As we are exploring TICK stack, we will stream our metrics data into InfluxDB, a specialized time-series database.

InfluxDB

InfluxDB is the “I” in TICK Stack. InfluxDB is a time series database, designed specifically for metrics and event data. This is a good thing, as we have quite an extensive set of system metrics provided by Netdata that we will want to retain so we can observe trends over time or search for anomalies.

In this post we will configure InfluxDB to receive data from Netdata. Additionally, we will reconfigure our Netdata proxy node to ship metric data to InfluxDB.

The following sections rely heavily on the Ansible playbooks from Larry Smith Jr. A basic understanding of Ansible is assumed

Netdata - Configure Netdata to export metrics

Take a moment to review the configuration and metrics collection architecture from our first post.

Reviewed? Good. While Netdata will allow us to ship metrics data from each installed instance of Netdata, this can be quite noisy, or not otherwise provide the control you would like. Fortunately, the configuration to send metrics data is the same in either case.

One other consideration when shipping data from Netdata to InfluxDB, is how best to take the data in. Netdata supports different data export types: graphite, opentsdb, json, and prometheus. Our environment will be configured to send data using the opentsdb telnet interface.

Note: As none of these are native to InfluxDB, they are exceedingly difficult to use with InfluxDB-Relay.

To reconfigure your Netdata proxy using the ansible-netdata role, the following playbook can be used:

---
- hosts: netdata-proxies
  vars:
    netdata_configure_archive: true
    netdata_archive_enabled: 'yes'
    netdata_archive_type: 'opentsdb'
    netdata_archive_destination: ":4242"
    netdata_archive_prefix: 'netdata'
    netdata_archive_data_source: 'average'
    netdata_archive_update: 1
    netdata_archive_buffer_on_failures: 30
    netdata_archive_timeout: 20000
    netdata_archive_send_names: true
  roles:
    - role: ansible-netdata

The variables above tell netdata to:

  • Archive data to a backend
  • Enable said backend (as netdata only supports one at a time)
  • Configure the opentsdb protocol to send data
  • Configure the host and port to send to

Additionally, it configures some additional features:

  • Send data once a second
  • Keep 30 seconds of data in case of connection issues
  • Send field names instead of UUID
  • Specify connection timeout in milliseconds

Once this playbook has run, your netdata instance will start shipping data. Or, trying to anyways, we haven’t yet installed and configured InfluxDB to capture it. Let’s do that now.

InfluxDB - Install and configure InfluxDB

As discussed above, InfluxDB is a stable, reliable, timeseries database. Tuned for storing our metrics for long term trending. For this environment we are going to install a single small node. Optimizing and scaling are a topic in and of themselves. To ease installation and maintenance, InfluxDB will be installed using the ansible-influxdb role.

The following Ansible playbook configures the ansible-influxdb role to listen for opentsdb messages from our Netdata instance.

---
- hosts: influxdb
  vars:
    influxdb_config: true
    influxdb_version: 1.5.1
    influxdb_admin:
      enabled: true
      bind_address: "0.0.0.0"
    influxdb_opentsdb:
      database: netdata
      enabled: true
    influxdb_databases:
      - host: localhost
        name: netdata
        state: present
  roles:
    - role: ansible-influxdb

A quick breakdown of the settings supplied:

  • Configure influxdb rather than use the default config
  • Use influxdb 1.5.1
  • Enable the admin interface
  • Have InfluxDB listen for opentsdb messages and store them in the netdata database
  • Create the netdata database

After this playbook run is successful, you will have an instance of InfluxDB collecting stats from your Netdata proxy!

Did it work?

If both playbooks ran successfully, system metrics will be flowing something like this:

nodes ==> netdata-proxy ==> influxdb

You can confirm this by logging into your InfluxDB node and running the following commands:

Check that InfluxDB is running:

# systemctl status influxdb
● influxdb.service - InfluxDB is an open-source, distributed, time series database
   Loaded: loaded (/lib/systemd/system/influxdb.service; enabled; vendor preset: enabled)
   Active: active (running) since Sat 2018-04-14 18:29:03 UTC; 9min ago
     Docs: https://docs.influxdata.com/influxdb/
 Main PID: 11770 (influxd)
   CGroup: /system.slice/influxdb.service
           └─11770 /usr/bin/influxd -config /etc/influxdb/influxdb.conf

Check that the netdata database was created:

# influx
Connected to http://localhost:8086 version 1.5.1
InfluxDB shell version: 1.5.1
> SHOW DATABASES;
name: databases
name
----
netdata
_internal

Check that the netdata database is receiving data:

Connected to http://localhost:8086 version 1.5.1
InfluxDB shell version: 1.5.1
> use netdata;
Using database netdata
> show series;
key
---
netdata.apps.cpu.apps.plugin,host=netdata-01
netdata.apps.cpu.build,host=netdata-01
netdata.apps.cpu.charts.d.plugin,host=netdata-01
netdata.apps.cpu.cron,host=netdata-01
netdata.apps.cpu.dhcp,host=netdata-01
netdata.apps.cpu.kernel,host=netdata-01
netdata.apps.cpu.ksmd,host=netdata-01
netdata.apps.cpu.logs,host=netdata-01
netdata.apps.cpu.netdata,host=netdata-01
netdata.apps.cpu.nfs,host=netdata-01
netdata.apps.cpu.other,host=netdata-01
netdata.apps.cpu.puma,host=netdata-01
netdata.apps.cpu.python.d.plugin,host=netdata-01
netdata.apps.cpu.ssh,host=netdata-01
netdata.apps.cpu.system,host=netdata-01
netdata.apps.cpu.tc_qos_helper,host=netdata-01
netdata.apps.cpu.time,host=netdata-01
netdata.apps.cpu_system.apps.plugin,host=netdata-01

Summary

With that, you now have high resolution system metrics being collected and sent to InfluxDB for longer term storage, analysis, and more.

Metrics: Adventures in Netdata, TICK stack, and ELK Stack

Yay metrics! I have sort of a love-hate relationship with metrics. That is, I hate them, but they like to come pester me. That said, having metrics is a useful way of knowing what is going on in your various systems and if the services you are responsible for are actually doing things they are supposed to be doing.

Generically metrics collection breaks down into 3 smaller categories:

  • System stats collection and storage
  • Log collection and storage
  • Using data to answer questions

In order to keep things manageable, however, this post will cover how to get metrics data into one place. In later posts, we will handle stroring metrics data in TICK stack, log collection with ELK stack, and using the collected data to answer some questions.

Today’s metrics collection

When I was getting my footing in IT, the state of the art was syslog-ng, with logwatch & grep, or if you could keep your logs under 500MB/day, Splunk. Metrics collection was done with Cacti, and service status was watched by Nagios. All of these tools still exist, but in the world of 2004, they were… well, they had not evolved yet.

Today, what have new tools, techniques, and methods of handing data that can be more effective. ELK Stack (Elasticsearch, Logstash, Kibana) along with rsyslog for log shipping provides centralized log collection and storage, along with an interface to make things easy to query.

TICK Stack (Telegraf, InfluxDB, Chronograf, Kapacitor), like ELK, provides a set of tools to collect time series data, and do interesting things with it. Alerting, anomaly detection, dashboards, and so on. To get data into TICK stack, there are a number of tools to collect system level statistics. My tool of choice is Netdata. It allows for creative architectures, store and forward logging, and integration to a large number of backends. The default dashboard is also pretty slick.

netdata dashboard

The following sections rely heavily on the Ansible playbooks from Larry Smith Jr..

If you have vagrant, virtualbox or libvirt and would like to play along, the lab that accompanies this post can be found here. To start the lab, run vagrant up and go fetch a coffee.

Netdata - System stats collection

Netdata is a sort of all-in-one system stats package. It can pull metrics from about everything. Netdata’s architecture lets you mix and match components as needed. Either storing data in its local statsd server, proxying data between instances, keeping a local cache, and so forth.

For this exercise, we will be configuring Netdata to operate in Proxy mode. That is, each netdata agent will collect system metrics and ship them upstream to the proxy node.

The end result will look like this: netdata architecture

Image and additional documentation can be found here.

Netdata - Install and configure proxy node

The installation of netdata is handled by the ansible-netdata role. To install the proxy-node we first need to generate a UUID to serve as the API key.

$ uuidgen
154dabe0-1d91-11e8-9f06-eb85cbb006ef

Next we add our configuration variables to group_vars/all/all.yml (If you’re not using the lab, these values can be placed with the rest of your variables).

---
# Vars for the netdataproxy

# Defines info about enabling/scheduling auto updates for Netdata version
# https://github.com/firehol/netdata/wiki/Installation#auto-update
netdata_auto_updates:
  enabled: false

# Defines if Netdata should store data in a backend
netdata_configure_archive: false

# Defines if Netdata streaming should be configured
# https://github.com/firehol/netdata/wiki/Monitoring-ephemeral-nodes
netdata_stream_enabled: true
netdata_stream_master_node: ''
# Defines location of Netdata stream configuration file
netdata_stream_config_file: '/etc/netdata/stream.conf'

# Defines Netdata API Key (must be generated with command uuidgen)
netdata_stream_api_key: '154dabe0-1d91-11e8-9f06-eb85cbb006ef'

Highlights from these variables:

  • Tells our netdata master to not configure an archive datastore
  • Stream server is enabled
  • Defines the API key the agents need in order to send data

Next we create a playbook to install netdata on our nodes:

---
- hosts: netdata-proxies
  roles:
    - role: ansible-netdata

We’re not going to run this playbook just yet.

Netdata - Install and configure collection agents

Next up, we provide a different set of configuration values for the nodes that will run the agent. These variables follow:

# Defines info about enabling/scheduling auto updates for Netdata version
# https://github.com/firehol/netdata/wiki/Installation#auto-update
netdata_auto_updates:
  enabled: false

# Defines if Netdata streaming should be configured
# https://github.com/firehol/netdata/wiki/Monitoring-ephemeral-nodes
netdata_stream_enabled: true

# Defines location of Netdata stream configuration file
netdata_stream_config_file: '/etc/netdata/stream.conf'

# Defines Netdata API Key (must be generated with command uuidgen)
netdata_stream_api_key: '154dabe0-1d91-11e8-9f06-eb85cbb006ef'

# Defines Netdata master node
netdata_stream_master_node: 'stats-01'

netdata_configure_archive: false

Highlights:

  • Auto updates disabled
  • Streaming data to a master node is enabled
  • Configures the hostname of the master node
  • Tells our agents not to configure an archive datastore

To complete the installation, we add a section to our playbook to install the netdata agents:

---
- hosts: netdata-proxies
  vars_files:
    - vars/proxies.yml
  roles:
    - role: ansible-netdata

- hosts: netdata-agents
  vars_files:
    - vars/agents.yml
  roles:
    - role: ansible-netdata

Netdata - Perform the installation

The installation is then performed much the same as running any other Ansible playbook:

ansible-playbook -i inventory.yml playbooks/install_netdata.yml

The output should look a bit like this: ansible-netdata output

Netdata - Viewing stats

Now that the install has finished, browse to the dashboard of the master node (http://1.2.3.4:19999). You will now have additional nodes that can be selected to view data. Something like this:

netdata dashboard with multi hosts

Summary

Well, that’s it for getting data from our nodes into a central location. Our next step will be to put this data somewhere, say InfluxDB.

ASIC Mining on Raspberry Pi

I’ve had some pretty terribad ideas in the past. Not the least of which is OpenStack Swift on USB Keys, or the pre-chaos engineering random VM snapshot deleter. In that vein, I bring you ASIC Bitcoin mining on Raspberry Pi.

Raspberry Pi Mining Cluster

As you read, keep in mind, that the goal here, like in the afore mentioned posts is not to be practical. Rather this is a “because I can” project. The conclusion of which will be to run the miners inside containers backed by Kubernetes. But, that is for another time.

The Gear

For this project, I reused my Kubernetes / OpenFaaS cluster, and added some ASICs. Here’s a reminder of the parts:

Note: Those are Amazon links. I’m not sure if my affiliate account is still active, but if so, this is full disclosure that they may indeed be affiliate links.

The setup

These are still configured as they were in my OpenFaaS post.

Installing cgminer

For these particular ASICs, one needs to first compile cgminer with the appropriate support. To ensure I can do this again at some point, I wrote an Ansible playbook to do the heavy lifting for me:

For those not familiar with Ansible, here’s what it is doing:

  • 12-17: Use apt to install prerequisite packages (build-essential, and so on)
  • 19-29: Create, and then ensure directories exist for the source and build
  • 31-39: Downloads the patched cgminer source with 2PAC support
  • 41-59: Runs both the prebuild setup and then compiles cgminer
  • 61-65: Configures cgminer
  • 67-95: Sets up cgminer to start on boot

This is then installed, sort of like this:

ansible-playbook -i inventory.yml playbooks/install_cgminer.yml

After a long while (these are raspberry pi’s after all), the service is installed, and you are mining:

Service status:

pi@node-02:~ $ sudo systemctl status cgminer
● cgminer.service - cgminer
   Loaded: loaded (/etc/systemd/system/cgminer.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2018-02-16 06:49:23 UTC; 3 weeks 3 days ago
 Main PID: 433 (screen)
   Memory: 7.6M
      CPU: 1.471s
   CGroup: /system.slice/cgminer.service
           ├─433 /usr/bin/SCREEN -S cgminer -L -Dm /home/pi/cgminer.sh
           ├─452 /bin/bash /home/pi/cgminer.sh
           └─466 ./cgminer --config /home/pi/cgminer.conf

Check in on cgminer itself:

# screen -r cgminer

cgminer version 4.10.0 - Started: [2018-03-12 21:10:08.545]
-----------------------------------------------------------------------------------------
 (5s):9.456G (1m):11.10G (5m):8.409G (15m):4.248G (avg):10.46Gh/s
 A:960  R:4096  HW:1  WU:138.4/m | ST: 1  SS: 0  NB: 2  LW: 1987  GF: 0  RF: 0
 Connected to mint.bitminter.com diff 64 with stratum as user evad
 Block: 3fa89e3b...  Diff:3.29T  Started: [21:11:25.670]  Best share: 1.77K
-----------------------------------------------------------------------------------------
 [U]SB management [P]ool management [S]ettings [D]isplay options [Q]uit
 0: GSD 10019882: COMPAC-2 100.00MHz (16/236/390/1) | 10.59G / 10.47Gh/s WU:138.4/m A:960
 R:0 HW:1--------------------------------------------------------------------------------

Summary

This was a fun one. To get this a bit more stable, I likely need to relocate the “cluster” to my server cabinet for better cooling, the little USB keys get painfully hot. Another thing on the todo list for this, is to have cgminer run inside a container, and then on K8S.

Linux Filesystem Performance for Virt Workloads

As I spend quite a bit of time (an understatement, I assure you) standing up and tearing down different virtualized lab environments, I wanted to spend a little bit less on it overall. Thus, in addition to tuning some parameters at runtime, I spent some time benchmarking the difference between Virtualization engines, filesystems, and IO schedulers.

Before we begin, let’s get a few things out of the way:

  • TL;DR - The winner was libvirt/kvm with ext4 in guest, xfs on host, with noop
  • Yes, my hardware is old.
  • No, this is not exhaustive, nor super scientific

Test Hardware

CPU: 2x Quad-Core AMD Opteron(tm) Processor 2374 HE RAM: 64GB Disk: 4x 1T 5400 rpm disks, Raid 1 Controller: LSI MegaRAID 8708EM2 Rev: 1.40 OS: Ubuntu 16.04 LTS

Old hardare is old. But hey, it is a workhorse.

Test Workload

Building an openstack-ansible All-In-One, for the Pike release of OpenStack.

Rationale: To be honest, this is the workload I spend the most time with. Be it standing one up to replicate a customer issue, test an integration, build a solution, and so on. Any time I save provisioning, is time I can spend doing work

Further, the build process for an all-in-one is quite extensive and encompasses a wide variety of sub workloads: haproxy, rabbitmq, galera, lxc containers, and so on.

Test Matrix

The test matrix worked out to 8 tests in all:

Host FS Guest FS IO Sched Virt Engine
xfs xfs noop KVM
xfs xfs noop vbox
xfs ext4 noop KVM
xfs ext4 noop vbox
xfs xfs deadline KVM
xfs xfs deadline vbox
xfs ext4 deadline KVM
xfs ext4 deadline vbox

Test Process

Prepwork:

  • Create four different boxes with Packer (2 filesystems * 2 virt engines).
  • Creating a vagrant file that corresponded to each scenario
  • Create bash script to loop through the scenarios

Test:

As the goal was to reduce the time spent waiting on environments, each environment was tested with:

$ time (vagrant up --provider=$PROVIDER_NAME)

Results

Here are the results of each test. Surprisingly, ext4 on xfs was faster in all cases. Who’d have thought.

Host FS Guest FS IO Sched Virt Engine Time
xfs xfs noop KVM 174m48.193s
xfs xfs noop vbox 213m35.169s
xfs ext4 noop KVM 172m5.682s
xfs ext4 noop vbox 207m53.895s
xfs xfs deadline KVM 172m44.424s
xfs xfs deadline vbox 235m34.411s
xfs ext4 deadline KVM 172m31.418s
xfs ext4 deadline vbox 209m43.955s

Test 1:

  • Host FS: xfs
  • Guest FS: xfs
  • Virt Engine: libvirt/kvm
  • Host IO Scheduler: noop
  • Total Time: 174m48.193s

Test 2:

  • Host FS: xfs
  • Guest FS: xfs
  • Virt Engine: vbox
  • Host IO Scheduler: noop
  • Total Time: 213m35.169s

Test 3:

  • Host FS: xfs
  • Guest FS: ext4
  • Virt Engine: libvirt/kvm
  • Host IO Scheduler: noop
  • Total Time: 172m5.682s

Test 4:

  • Host FS: xfs
  • Guest FS: ext4
  • Virt Engine: vbox
  • Host IO Scheduler: noop
  • Total Time: 207m53.895s

Test 5:

  • Host FS: xfs
  • Guest FS: xfs
  • Virt Engine: libvirt/kvm
  • Host IO Scheduler: deadline
  • Total Time: 172m44.424s

Test 6:

  • Host FS: xfs
  • Guest FS: xfs
  • Virt Engine: vbox
  • Host IO Scheduler: deadline
  • Total Time: 235m34.411s

Test 7:

  • Host FS: xfs
  • Guest FS: ext4
  • Virt Engine: libvirt/kvm
  • Host IO Scheduler: deadline
  • Total Time: 172m31.418s

Test 4:

  • Host FS: xfs
  • Guest FS: ext4
  • Virt Engine: vbox
  • Host IO Scheduler: deadline
  • Total Time: 209m43.955s

Conclusions

The combination that won overall was an ext4 guest filesystem, with an xfs host filesystem, on libvirt/kvm with the noop IO scheduler.

While I expected virtualbox to be slower than KVM, an entire hours difference was pretty startling. Another surprise was that ext4 on xfs outperformed xfs on xfs in all cases.