Tuesday, November 27, 2018

TIL: Ansible engine raw module "needs" gather_facts: no

Okay, the title says it all, but let's unpack that.

Ansible Playbooks are instructions for running ansible modules against a target host. They can be very very simple. Here's one of the simplest:
---
- name: A very simple playbook
  hosts: all
  tasks:
    - name: pingo
      ping:

This merely runs the ansible "ping" module on "all" hosts. (Ie, whatever hosts are passed in on the command line when this playbook is called.)

A note about the ping module. It is not the normal networking definition of "ping". Network folk will be accustomed to using "ping" to send an ICMP packet to a node (at which point the node would typically send an ICMP ack.) Rather, the ansible module "ping" is a check that the node is up and that the basic needs of ansible are supported on the node, i.e., python is installed.

So... the inquiring mind asks what do you do in a situation if python is NOT installed? Can you still take advantage of some of Ansible? But of course.

The ansible "raw" module allows you to basically do something like the following:
# raw: uptime
# ssh targetnode and_execute_this_command
ssh target.example.net uptime

So here we'd get the uptime of the target node (assuming it was running ssh, we had login authority, and that uptime was installed and in the default path of the effective user.)

So, it seems like it would be straightforward to create an ansible playbook that takes advantage of the raw module.

---
- name: raw uptime
  hosts: all
  tasks:
  - name: Run a raw command
    raw: uptime

and here we run into issues. This playbook won't work on a node that doesn't have python installed. (It will work on one that does.) Why is that? Because of the "secret sauce" called fact gathering. Every playbook as it runs, will run the ansible "setup" module to gather facts on the node before running any of the explicit tasks. The setup module is an implicit task and is noted in the module reference, "[t]his module is automatically called by playbooks"

NOTE: I've scattered some handy links within this document so that you can learn more about these. I'd recommend following them and then coming back here after you have familiarized yourself with ansible, modules, ping, raw, setup, and gather_facts.

So, how do we make this work then? If you read the gather_facts link, you probably know that you can bypass it very simply. You set a "gather_facts" to no in your playbook. Consequently you end up with this as the right playbook for a node without python where you want to know the uptime.

---
- name: raw uptime
  hosts: all
  gather_facts: no
  tasks:
  - name: Run a raw command
    raw: uptime

So a simple one line addition.



And how did I get in this situation? One of the most common cloud operating systems (aka cloud images) is one called "cirros". Cirros is a very minimal linux and as such, it does not include python. Moreover, there really isn't an effective way to "add" python to it (though possibly could be done with a staticly built file--I'll leave that as an exercise for the reader.)

CIrros is frequently used in a cloud environment (i.e., OpenStack) to validate that the cloud itself is working well. From within cirros you can login (as it provides transparent credentials) and check on the networking, etc. Basically it's a quick and dirty way to make sure your cloud is operating as intended.

I regularly spin up one or more cirros instances as soon as I build an openstack--whether that be an all-in-one devstack or an entire production cloud. In both cases, cirros is my "go to" tool to validate the cloud. (Thanks Scott.)



... and one more thing, you would normally just run the command uptime using the command module to get the uptime. But doing so requires the python infrastructure ansible relies on. Here's that "normal" or typical way.

---
- name: A very simple command
  hosts: all
  tasks:
    - name: uptime
      command: uptime

and even if you add "gather_facts: no" to it, the cmmand module itself still requires python so you really really need the raw module and the "gather_facts: no" setting.

Friday, August 10, 2018

Life of a #Remotee

I work remotely for Red Hat. Primarily at home but also in a coffee shop with several co-workers. And, oh yeah, forgot to mention, I travel heavily.
So I need to be able to work remote. But I don't want to take all my gear with me, so I leave some of it at home and plugged in. And mosh allows me to connect/reconnect etc.

I (re-)discovered mosh last weekend when prepping for a trip. I didn't want to carry my bulky laptop to the mountains that day, so I set up remote access through my cable modem. Of course, it's trivial to set up a port forward from my new Google Wifi AP and routers to my home machine. But that gives you connectivity, not persistence. So I pulled down the "mobile shell" mosh and set it up quickly.

IT JUST WORKS.®™

I decided to do this blog post after typing internally:
So, I started a mosh session from home to home last Sunday. I've been to Denver Airport. on-board a Southwest flight, Saint Louis hotel, Saint Louis enterprise customer, and back, and that session just keeps running. I had heard of mosh before but using it is even easier. I used to "work around" this with screen sessions but mosh is even simpler than that.

So, setup is easy peasy. Install mosh. Find a UDP port you can forward back to your persistent (home) node. You probably also want to forward back a TCP port for ssh.

mosh --ssh="ssh -p$SOMEPORT" -p $SOMEUDP  $HOMEIP

You can find your home ip (from home) with this:

export HOMEIP=$(curl ifconfig.co) # but I save this to a file as well so maybeexport HOMEIP=$(curl ifconfig.co |tee ~/bin/myhomeip)

You can port forward the default ssh port (22) or something slightly more obscure. The default UDP port range for mosh starts at 60000 through 61000. I picked a port in that range.

Both the SOMEPORT and SOMEUDP need port forwarded (using your router setup) to the actual node you want to use.

One other thing you will want to check out as a #remotee is wireguard. I'll write it up once I've switched my vpn-ness over to it. Wireguard currently uses some packages to install a kernel module that gets built with out of tree dpdk.  See wireguard, hat tip to Jess Frazelle for Dockerfiles mentioning Wireguard and oh yeah, this guy.

Saturday, July 21, 2018

Fedora Kernel (well 4.17 kernel) issue resolvable

I've been using a Thinkpad p50 for work since I joined Red Hat. And I'm running Fedora on it instead of Red Hat Enterprise Linux workstation so that I can be more current.

However, that bit me recently when I upgraded to Fedora 28. The built-in Intel dual-band AC 8260 (rev 3a) failed to work on the newer 4.17 kernel. This led me down some dark roads as I switched to an rc 4.18 kernel which had even more problems (though that kernel did have working iwlwifi.)

A productive Saturday morning, led me to patch and test my 4.17 Fedora 28 kernel successfully. The short version is you need to revert one change in the kernel as detailed here:
https://lkml.org/lkml/2018/7/1/104

and then build Fedora kernel rpms (which I haven't done in more than a decade and a lot has changed in that decade). See:
https://fedoraproject.org/wiki/Building_a_custom_kernel

and it has one thing out of date as well. The command:
# sudo /usr/libexec/pesign/pesign-authorize-user

should be just:
# sudo /usr/libexec/pesign/pesign-authorize
A Fedora rpm kernel build gives you time to mow the yard, bike to Starbucks, etc. So don't expect it to finish in minutes. Use the scale of hours.

PS. In linkifying this post, I see that 4.17 COULD HAVE BEEN A CONTENDER for Linux kernel 5.0. That would have scared me enough to hold off... oh well. YMMV. Enjoy.

Wednesday, June 27, 2018

TIL: boto3 can't parse env variables properly

So in today's saga of fighting the Internet, I bring this salient point to your attention:

boto3 can't parse environment variables for AWS credentials properly

and it will error kind of like this:

 ValueError: a Credential=WHACKADOODLEGGET5Q\r/20180627/us-east-1/sts/aws4_request

or possibly like this:

caught_exception\nValueError: Invalid header value '
    AWS4-HMAC-SHA256 Credential=WHACKADOODLET5Q\\r/20180627/us-east-1/ec2/aws4_request, SignedHeaders=content-type;host;x-amz-date, Signature=8f84f8d811f4dcb45da5f8fadf90ae8390d5d358b4024bf0d964090032dea1c3'\n", 
    "module_stdout": "", 
    "msg": "MODULE FAILURE", 

This appears to be an error in parsing the  \r portion of the URI. (Based on a few google queries.)

Using ~/.aws/credentials seems to do the trick (with the exact same key values):

[default]
aws_access_key_id=WHACKADOODLEGGGET5Q
aws_secret_access_key=bn+++krandometyperyforwhackadoodle1YjKMWqzv


The environment variables I was using were of course:

export AWS_ACCESS_KEY_ID=WHACKADOODLEGGGET5Q
export \ AWS_SECRET_ACCESS_KEY=bn+++krandometyperyforwhackadoodle1YjKMWqzv


Thursday, June 7, 2018

Find your libvirt instances...

I'm at Red Hat these days and working on cloud manageability which you will see me write as m11y more often than not.

I recently had a need to utilize Ansible for a demo and created a slug of VMs to use in my ansible inventory. However, there is no way (no obvious way?) I can find with virsh or virt-manager to determine what IP got assigned to a VM. A bit of digging shows that when using the cloud-images, dhcp is used to acquire an IP address and that dnsmasq manages those IP addresses.

In my case, these VMs were attached to virbr0 and it is that instance of dnsmasq that I needed to inspect.

Like most of my very short blog posts, I'm just writing this down so I can find it myself the next time I need it. In this case the file that "issued" the IP addresses is:

/var/lib/libvirt/dnsmasq/virbr0.status

That shows the MAC and IP address associations (but no indication of "which" libvirt domain they were actually associated.) A further exercise for me and the reader will be to tie the domain back to the IP address (if possible.)

One other related note: If you use "virt-clone" to clone a cloud-image that has been previously booted, it will typically boot fine but WITHOUT an IP address. This is due to cloud-init not properly re-running. Blowing away some of the first run cloud-init stuff takes care of that.

sudo rm -rf /var/lib/cloud
sudo rm -rf /var/log/cloud-init*

(This may be overkill to "purify" a cloud-instance but it certainly does the ttrick.)

You need to re-run cloud-init at this point (either directly or a quick reboot) and you should find that your instance has an IP address now.

It's a pretty quick and easy step to convirt virbr0.status into an inventory...

sudo cat /var/lib/libvirt/dnsmasq/virbr0.status |grep ip-address |awk '{ print $NF }' |sed  -e 's/"//' -e 's/",//'

or you could turn your inventory into a dynamic inventory based on that file, exercise also left to the reader (but be sure to make the output valid JSON dict.)