Tuesday, December 21, 2021

github ssh keys are finicky

 Okay, github uses some hash magic to figure out what account is affiliated with a particular ssh key. If you have two accounts using the same key, it has to guess (and does so poorly) about which user is trying to write.

Moreover, it tries keys somewhat sequentially. If the first in our ssh-agent (via ssh-add -L) is associated with a user, that's the only one it tries. If not, I believe it will go onto the next one (but again, if that second key is associated with multiple github users, it's going to hash to a specific user.) This is all because you can't tell github what user to use. De facto, the key is hashing to one and only one user and it ass/u/mes that is the user it will end up using.

But, the sad sad sad part is that it will not give you any more information.

You can set a specific "id" to use in our .ssh/config file like this:

Host github.com

Identityfile /home/USERNAME/.ssh/id_githubonly_rsa



Wednesday, October 27, 2021

elevator=none elevator=noop are both overcome by events

 You can't just pass in on the kernel commandline "elevator=none". If you do in ~5.3 or newer, you'll see:

[    0.105899] Kernel parameter elevator= does not have any effect anymore.

               Please use sysfs to set IO scheduler for individual devices.

Monday, April 5, 2021

xorriso is the bomb

 So, I wanted to remaster the SystemRescue iso. A trivial (20 character addition) was needed. I could not for the life of me figure out how to do this.

The answer (after 3-4 different multi-hour sessions trying to figure this out) was to use a tool called: xorriso. And to bear in mind that the iso itself needs to be bootable via an ISO or burnt CD/DVD as well as via UEFI and possible even via USB. This means it needs to be an everything bagel (so to speak). I found the "howto" for another distro and used that technique with the slight differences for the system rescue iso (which buries the efiboot.img under archiso since it is arch linux based.)

There was also this handy little guide re: xorriso. That was part of figuring this out. (I've always used genisoimage and friends prior to needing to remaster this.)

Thursday, April 1, 2021

ugly code that works

 I had the thought, "ugly code that works", as I finished a PR today. Project isn't technically open sourced (YET!) so I can't say much more but man, I worked sooooo long and hard on this trivial problem that I needed to vent.

The solution to the problem was to do something in cloud-init. Do the same thing in a systemd unit. And all of that was just to work around a sysctl issue. The solution is ugly code that works (in particular as it has a sleep 180 embedded in a script.) There is no reason in the world that I should be writing a systemd unit that calls a two line script where one line is sleep 180. But, there was no way to get the sysctl working with a systemd.path unit. (I tried, repeatedly.)

And the code works. I also then (after getting a merge) googled, "ugly code that works", and saw someone has already written this blog (though thankfully it wasn't me this time). See: https://dev.to/tonetheman/ugly-code-that-works-4i7l

"Do not be afraid to write ugly code that works." Also, don't be too surprised if it does break. Ugly code can be fragile also.

Wednesday, March 31, 2021

terraform is logical but not natural

The subject kind of says it all. If you can use environment variables (and you can and probably should) in terraform with the invocation/usage of TF_VAR_environment_variable_name, why in the ever loving world can you not do the same in modules (or sub-modules as I think of them)? Apparently environment variables don't propagate into the submodules.

So, you basically "re-declare" when you define/invoke the submodule. See: https://stackoverflow.com/questions/53853790/terraform-how-to-pass-environment-variables-to-sub-modules-in-terraform

In my case, I'm building "N" VMs in submodules and there is a "vms.tf" that has the "module "NAME" {} invocation (and of course there are "N" of these) so I had to do something like:
module "FIRST" {variableone = var.variableone}
module "NTH" {variableone = var.variableone}

"N" times and then at the top level (main.tf or variables.tf) something like:

variable "variableone" {
  description = "Cascade environment variable in terraform"
  default = ""


and then
export variableone="myvaluegoeshere"

in my environment.

Monday, March 15, 2021

It's been a minute and a few years....

 So, I just noticed it's been a minute since I last posted. I typically only post things that I need to find again in the future--and apparently that's been less often lately.

Today, I needed to figure out WHY IN THE WORLD my LG 4K HDMI monitor was popping throughout the day. And, indeed I did, but first, how did I get here?

I am a longtime Ubuntu user. This machine was built with Ubuntu in 2017-11 and upgraded to LTS release in 2018. I do daily apt updates but recently bit the bullet and brought it up to 2020 LTS release. That went very well and I see a number of improvements. However, it also started making a LOUD popping noise that I couldn't tie to any particular user activity.

A bit of googling later and I found that it was likely related to snd_hda_intel module and its power_save settings. power_save defaults to 1 (on this machine and similar) as it is a laptop and power_save is a good thing for a laptop. However, 98% of the time, it is now running primarily as a desktop. You know, COVID-19 and nowhere to go....

I confirmed the issue by further googling and found my friend Major Hayden's post when I ran into this same thing: https://major.io/2019/03/04/stop-audio-pops-on-intel-hd-audio/ it has a better writeup, more detail etc. But I post here so that I can easily find this myself for future me. I also thanked past Major here: https://twitter.com/davidmedberry/status/1371588276176363521

Friday, October 25, 2019

Deja Vu All Over Again

I recently joined Cray Research, the supercomputer folk, after 1.5 years doing pre-sales engineering for a technology I love. However, I didn't actually love (nor even really like) sales itself. So I'm back to doing engineering.

Two days after Cray hired me, their merger with Hewlett Packard Enterprise (HPE) completed, so I'm back at HP (but now HPE). Most of the impact of that change occurs on January 1, 2020.

So not only the yogiberra-ism of "Deja Vu All Over Again" but also the godfather-ism of "When I thought I was out, they pull me back in."

Monday, July 1, 2019

ansible tower token authentication

Reminder to self mostly, when refreshing your memory about tokens, start with this page:
ansible authentication methods and tokens

I'll come back here when I have something more substantive to say about this. The PAT token is dead easy, straight-forward and has naught to do with Point After Try.

Friday, April 12, 2019

letsencrypt with certbot

Well, the title is the task I was trying to accomplish but I kept getting an error. Turns out, the awscli in Ubuntu is seriously out of date. It gives an error like:
'AWSHTTPSConnection' object has no attribute 'server_hostname'
when using certbot (more on that below). The simple and easily googleable fix was to remove the ubuntu awscli package and pip install a newer version:
sudo apt-get remove awscli
pip install --upgrade awscli
I'd recommend doing that pip install in a venv (python virtual environment), especially if you have other "cloud tools" installed that way.

Now, why was I doing this and what does the title really mean? Most websites these days need to have an "SSL Cert" that is a signature by a certification authority. Really folks, you need to be doing this these days. Many businesses will not let you browse to a site that has a self signed cert and won't let you browse to a non-https site at all. But this is super easy as Let's Encrypt and certbot do all the work. I merely followed the steps here:

(Make sure you have certbot installed first. Your OS may have it packaged or "brew install certbot" on a Mac.)

And as with all of my recent posts, this is just mostly so I won't spend another 1/2 day trying to remember or recreate this.

And in all fairness, there are also a number of Ansible playbooks and/or roles for doing this. Here's some info on that:
(Ansible letsencrypt module was renamed more generically as "ACME" as it actually uses ACME and Let's Encrypt adheres to that web standard.)

Saturday, February 23, 2019

More fun with Ansible

I've been in an Ansible Solutions Architect (SA) role at Red Hat for about a year, but I still learn new things about Ansible every day.

When I explored ARA I first became familiar with Ansible Callbacks (that get called at the conclusion of tasks, plays, etc.) And I've been needing to make some modifications (filters, etc) to the PLAY RECAP at the end of an Ansible play. Note that there are numerous callbacks pre-written listed here, but occasionally you need to write a custom one. In this case, I just wanted a better understanding of what those pre-written ones can do. And, lo and behold, there's a nicely documented page that shows you that.

Thank you Random Hero.

Saturday, February 9, 2019

Weird tab behavior in Google Chrome

I run Chrome as my primary browser (so far) and it has never failed me. Yesterday however, I began to see a very strange behavior. As soon as I would click on any tab (other than the first tab), Chrome would start cycling down through the tabs. I.e., if I clicked on the 4th one, it would switch to that, then to the 3rd, then the 2nd, and finally the first tab (where it would remain.)

Switching to a new window (with only one tab) would work fine but as soon as another tab was opened, the same behavior.

Survived through reboots, chrome upgrades etc.

I think I have isolated this to either a funky (dirty?) keyboard or flakey mouse. Once I disconnected both the external keyboard and mouse, things returned to normal. Now doing the bisect to see if mouse or keyboard. One note: My kitchen has been under renovation for the last month. Consequently, I've done a lot more "eating over the keyboard" than normal, so maybe I just dropped some weird crumb that effectively doees control-pageup repeatedly (or some other previous tab command over and over.) I didn't notice this behavior in other "tab oriented" programs (such as gnome terminal or Firefox.)

Updates here if I further resolve this.

Oh and some search terms in case anyone else runs into this:
(occurred in both)
tab switching
autotab switching
tab bug
google chrome tab bug
google chrome tab autoswitching bug

(Oh and for those playing along at home: restarted chrome numerous times, disabled all extensions, rebooted, upgraded Chrome, upgraded all Ubuntu packages--basically did all the "best practices" I could think of to work around this. The only work around seems to be disconnecting mouse and keyboard (which were plugged into a USB C dongle providing legacy USB connections.) System is HP Spectre x360 15" touch with 8th gen i7 running Ubuntu 18.04.2

Mouse seems to be working fine.

Blew some dust/gunk/ick out of my keyboard and now everything seems to be working again. (The peripherals are attached in the same order, same location.) So LIKELY the keyboard? The world may never know (and I'm sure the world will never care.)

Tuesday, November 27, 2018

TIL: Ansible engine raw module "needs" gather_facts: no

Okay, the title says it all, but let's unpack that.

Ansible Playbooks are instructions for running ansible modules against a target host. They can be very very simple. Here's one of the simplest:
- name: A very simple playbook
  hosts: all
    - name: pingo

This merely runs the ansible "ping" module on "all" hosts. (Ie, whatever hosts are passed in on the command line when this playbook is called.)

A note about the ping module. It is not the normal networking definition of "ping". Network folk will be accustomed to using "ping" to send an ICMP packet to a node (at which point the node would typically send an ICMP ack.) Rather, the ansible module "ping" is a check that the node is up and that the basic needs of ansible are supported on the node, i.e., python is installed.

So... the inquiring mind asks what do you do in a situation if python is NOT installed? Can you still take advantage of some of Ansible? But of course.

The ansible "raw" module allows you to basically do something like the following:
# raw: uptime
# ssh targetnode and_execute_this_command
ssh target.example.net uptime

So here we'd get the uptime of the target node (assuming it was running ssh, we had login authority, and that uptime was installed and in the default path of the effective user.)

So, it seems like it would be straightforward to create an ansible playbook that takes advantage of the raw module.

- name: raw uptime
  hosts: all
  - name: Run a raw command
    raw: uptime

and here we run into issues. This playbook won't work on a node that doesn't have python installed. (It will work on one that does.) Why is that? Because of the "secret sauce" called fact gathering. Every playbook as it runs, will run the ansible "setup" module to gather facts on the node before running any of the explicit tasks. The setup module is an implicit task and is noted in the module reference, "[t]his module is automatically called by playbooks"

NOTE: I've scattered some handy links within this document so that you can learn more about these. I'd recommend following them and then coming back here after you have familiarized yourself with ansible, modules, ping, raw, setup, and gather_facts.

So, how do we make this work then? If you read the gather_facts link, you probably know that you can bypass it very simply. You set a "gather_facts" to no in your playbook. Consequently you end up with this as the right playbook for a node without python where you want to know the uptime.

- name: raw uptime
  hosts: all
  gather_facts: no
  - name: Run a raw command
    raw: uptime

So a simple one line addition.

And how did I get in this situation? One of the most common cloud operating systems (aka cloud images) is one called "cirros". Cirros is a very minimal linux and as such, it does not include python. Moreover, there really isn't an effective way to "add" python to it (though possibly could be done with a staticly built file--I'll leave that as an exercise for the reader.)

CIrros is frequently used in a cloud environment (i.e., OpenStack) to validate that the cloud itself is working well. From within cirros you can login (as it provides transparent credentials) and check on the networking, etc. Basically it's a quick and dirty way to make sure your cloud is operating as intended.

I regularly spin up one or more cirros instances as soon as I build an openstack--whether that be an all-in-one devstack or an entire production cloud. In both cases, cirros is my "go to" tool to validate the cloud. (Thanks Scott.)

... and one more thing, you would normally just run the command uptime using the command module to get the uptime. But doing so requires the python infrastructure ansible relies on. Here's that "normal" or typical way.

- name: A very simple command
  hosts: all
    - name: uptime
      command: uptime

and even if you add "gather_facts: no" to it, the cmmand module itself still requires python so you really really need the raw module and the "gather_facts: no" setting.

Friday, August 10, 2018

Life of a #Remotee

I work remotely for Red Hat. Primarily at home but also in a coffee shop with several co-workers. And, oh yeah, forgot to mention, I travel heavily.
So I need to be able to work remote. But I don't want to take all my gear with me, so I leave some of it at home and plugged in. And mosh allows me to connect/reconnect etc.

I (re-)discovered mosh last weekend when prepping for a trip. I didn't want to carry my bulky laptop to the mountains that day, so I set up remote access through my cable modem. Of course, it's trivial to set up a port forward from my new Google Wifi AP and routers to my home machine. But that gives you connectivity, not persistence. So I pulled down the "mobile shell" mosh and set it up quickly.


I decided to do this blog post after typing internally:
So, I started a mosh session from home to home last Sunday. I've been to Denver Airport. on-board a Southwest flight, Saint Louis hotel, Saint Louis enterprise customer, and back, and that session just keeps running. I had heard of mosh before but using it is even easier. I used to "work around" this with screen sessions but mosh is even simpler than that.

So, setup is easy peasy. Install mosh. Find a UDP port you can forward back to your persistent (home) node. You probably also want to forward back a TCP port for ssh.

mosh --ssh="ssh -p$SOMEPORT" -p $SOMEUDP  $HOMEIP

You can find your home ip (from home) with this:

export HOMEIP=$(curl ifconfig.co) # but I save this to a file as well so maybeexport HOMEIP=$(curl ifconfig.co |tee ~/bin/myhomeip)

You can port forward the default ssh port (22) or something slightly more obscure. The default UDP port range for mosh starts at 60000 through 61000. I picked a port in that range.

Both the SOMEPORT and SOMEUDP need port forwarded (using your router setup) to the actual node you want to use.

One other thing you will want to check out as a #remotee is wireguard. I'll write it up once I've switched my vpn-ness over to it. Wireguard currently uses some packages to install a kernel module that gets built with out of tree dpdk.  See wireguard, hat tip to Jess Frazelle for Dockerfiles mentioning Wireguard and oh yeah, this guy.

Saturday, July 21, 2018

Fedora Kernel (well 4.17 kernel) issue resolvable

I've been using a Thinkpad p50 for work since I joined Red Hat. And I'm running Fedora on it instead of Red Hat Enterprise Linux workstation so that I can be more current.

However, that bit me recently when I upgraded to Fedora 28. The built-in Intel dual-band AC 8260 (rev 3a) failed to work on the newer 4.17 kernel. This led me down some dark roads as I switched to an rc 4.18 kernel which had even more problems (though that kernel did have working iwlwifi.)

A productive Saturday morning, led me to patch and test my 4.17 Fedora 28 kernel successfully. The short version is you need to revert one change in the kernel as detailed here:

and then build Fedora kernel rpms (which I haven't done in more than a decade and a lot has changed in that decade). See:

and it has one thing out of date as well. The command:
# sudo /usr/libexec/pesign/pesign-authorize-user

should be just:
# sudo /usr/libexec/pesign/pesign-authorize
A Fedora rpm kernel build gives you time to mow the yard, bike to Starbucks, etc. So don't expect it to finish in minutes. Use the scale of hours.

PS. In linkifying this post, I see that 4.17 COULD HAVE BEEN A CONTENDER for Linux kernel 5.0. That would have scared me enough to hold off... oh well. YMMV. Enjoy.

Wednesday, June 27, 2018

TIL: boto3 can't parse env variables properly

So in today's saga of fighting the Internet, I bring this salient point to your attention:

boto3 can't parse environment variables for AWS credentials properly

and it will error kind of like this:

 ValueError: a Credential=WHACKADOODLEGGET5Q\r/20180627/us-east-1/sts/aws4_request

or possibly like this:

caught_exception\nValueError: Invalid header value '
    AWS4-HMAC-SHA256 Credential=WHACKADOODLET5Q\\r/20180627/us-east-1/ec2/aws4_request, SignedHeaders=content-type;host;x-amz-date, Signature=8f84f8d811f4dcb45da5f8fadf90ae8390d5d358b4024bf0d964090032dea1c3'\n", 
    "module_stdout": "", 
    "msg": "MODULE FAILURE", 

This appears to be an error in parsing the  \r portion of the URI. (Based on a few google queries.)

Using ~/.aws/credentials seems to do the trick (with the exact same key values):


The environment variables I was using were of course:

export \ AWS_SECRET_ACCESS_KEY=bn+++krandometyperyforwhackadoodle1YjKMWqzv

Thursday, June 7, 2018

Find your libvirt instances...

I'm at Red Hat these days and working on cloud manageability which you will see me write as m11y more often than not.

I recently had a need to utilize Ansible for a demo and created a slug of VMs to use in my ansible inventory. However, there is no way (no obvious way?) I can find with virsh or virt-manager to determine what IP got assigned to a VM. A bit of digging shows that when using the cloud-images, dhcp is used to acquire an IP address and that dnsmasq manages those IP addresses.

In my case, these VMs were attached to virbr0 and it is that instance of dnsmasq that I needed to inspect.

Like most of my very short blog posts, I'm just writing this down so I can find it myself the next time I need it. In this case the file that "issued" the IP addresses is:


That shows the MAC and IP address associations (but no indication of "which" libvirt domain they were actually associated.) A further exercise for me and the reader will be to tie the domain back to the IP address (if possible.)

One other related note: If you use "virt-clone" to clone a cloud-image that has been previously booted, it will typically boot fine but WITHOUT an IP address. This is due to cloud-init not properly re-running. Blowing away some of the first run cloud-init stuff takes care of that.

sudo rm -rf /var/lib/cloud
sudo rm -rf /var/log/cloud-init*

(This may be overkill to "purify" a cloud-instance but it certainly does the ttrick.)

You need to re-run cloud-init at this point (either directly or a quick reboot) and you should find that your instance has an IP address now.

It's a pretty quick and easy step to convirt virbr0.status into an inventory...

sudo cat /var/lib/libvirt/dnsmasq/virbr0.status |grep ip-address |awk '{ print $NF }' |sed  -e 's/"//' -e 's/",//'

or you could turn your inventory into a dynamic inventory based on that file, exercise also left to the reader (but be sure to make the output valid JSON dict.)

Thursday, August 3, 2017

Replacing Android device with a new one

So.... TILLW, if you have a new android phone, the very first thing you want to do is tell it you want to copy an existing phone you have in your possession. There appears to be no way (other than resetting the phone back to factory defaults) after the first screen to have one phone be setup like another. This is inane.

TIL it ain't so easy to enable root login in ubuntu

So... TIL today. You can enable root login in sshd_config. You can set a password for the root as well of course. However, you can't actually ssh into the box if you have any keys (or at least "some" keys) in your ssh-agent. So, by running ssh-agent bash and then logging in as root to target, I get asked for a password. If I alternatively have an ssh key you get the message that
Please login as the user "ubuntu" rather than the user "root".

Sunday, June 5, 2016

Android Tap & Go... Almost as Easy as It Sounds

I had one teensy-weensy little problem with Android's Tap & Go feature for new phone set up. Although the NFC portion works fine, it is also VERY IMPORANT to enable Bluetooth PRIOR to running the sync. The sync code will temporarily turn on the bluetooth but it won't actuall work. Just enable bluetooth on the OLD phone prior to trying the Tap and Go and then it truly will be Tap and Go.

I had no luck googling for this issue or solution so noting it here for others to find. The phrase, "Sorry, Something Went Wrong" will appear on your new phone shortly have making the sync chime.

Wednesday, May 11, 2016

TIL From Users Today

I learn something about OpenStack each and every day and one of my broadest sources of OpenStack knowledge comes from my customers, the users of Time Warner Cable's OpenStack implementation.

Today, users' taught me, passively, by using it, about Shelving an instance. 

Suspending an instance puts it to sleep (via CPU suspension states) but the instance (virtual machines) is still using all of the memory and disk and cpu resources it was before (though the actual host CPU usage in this case is very very minimal.) Subtly different yet similar seems to be Pausing an instance. (I've never used this myself but am now checking to see if my users are using this.)

Stopping an instance still has the instance assigned to a hypervisor (essentially a compute host) and still "sort of" consumes resources on that host, most notably the ephemeral disk.

Shelving an instance makes a glance image snapshot of the instance and then removes all reference from the original hypervisor and compute host. It is consuming glance snapshot space in so doing but nothing on the hypervisor or compute host. So it is using the most minimal amount of resources possible.

Now to make this a real blog, I need to go add some links so that this is more shiny.

Monday, July 27, 2015

Time Warner Cable (TWC) OpenStack Summit Talks in Tokyo -- Vote Now

(Using my soapbox to pitch some Time Warner Cable proposed talks for Tokyo.)

Many of these are our experience running an OpenStack Cloud in production at Time Warner Cable. Others are particular areas of interest. And one is just for fun....

Click the shiny link to vote up or vote down, however you feel.

Abstract Author
More Info... Voting (link to abstract on openstack.org)
Duct Tape, Bubble Gum, and Bailing WireEric Peterson, TWC,
GoDaddy and others
Fernet Tokens: Why They're Better and How to SwitchMatt Fischer and the Frenet teamOps
Moving a Running OpenStack CloudMatt Fischer
Eric Peterson
Customer Pain Points Revealed in TWC's OpenStackKevin Kirkpatrick
David Medberry
Building the right underlay, lessons learnedSean Lynn
Darren Kara
Monitoring OpenStack with Monasca -- Totally Worth the EffortBrad Klein
Ryan Bak
Overcoming the Challenges. LBaaS in practice at TWCJason Rouault
Sean Lynn
Upgrading OpenStack Without Breaking Everything (Including Neutron!)Clayton O'Neill
Sean Lynn
Integration & Deployment Testing of an OpenStack CloudClayton O'Neill
Matt Fischer

OpenStack TriviaDavid Medberry
Kevin Kirkpatrick

Owls, Cats, Toads, Rats: Managing Magical Pets - VM persistenceDavid Medberry
Craig Delatte
Enterprise IT Strategies
Other Ways to ContributeDavid Medberry
Eric Peterson
How To Contribute
LibVirt: Where are we today?David Medberry
Sean Lynn
Related OSS Projects
OpenVSwitch: Where are we today?David Medberry
Sean Lynn
Related OSS Projects

Monitoring: How to get the outcomes you want and want the outcomes you get!Steve Travis
Ryan Bak
Brad Klein
Monitoring / Operations
An all SSD Ceph cluster: Tips, Tricks, and lessonsBryan Stillwell
Craig Delatte
The evolution of Openstack Storage Services at TWCAdam Vinsh
Craig Delatte
Enterprise IT strategies
Building a self-repairing cloud with AIRyan BakMonitoring / Operations
Configuring Swift Using Puppet-swift: considerations and examples.
Adam VinshOPS

Thursday, May 28, 2015

Request for Reviews from Online Stores

Dear googlemonopriceamazon, please don't ask me to do reviews for items delivered via slow delivery. If I request slow delivery, there is a REALLY GOOD CHANCE I'm not going to use the item right away. So, don't ask me to review it when I haven't even taken it out of the package. You should only ask for reviews when things are shipped overnight. (Though arguably, if you are in that kind of hurry you don't have time to do reviews.) So I guess, I'm just saying, stop asking for reviews altogether.

And if I could have expressed this in a brief way, it would have been a tweet.

And, no, don't file a patent for this. I freely grant this info to the world....

Sunday, May 17, 2015

OpenStack, a really big deal

Okay, most folks know I've spent the last 4 years involved with OpenStack and have been attending summits since Boston. This is the first time however I've been overwhelmed with the number of other folks attending the summit and we aren't even there yet.

I'm at 36,000 feet over NW USA headed to Vancouver from Denver on United UA 323. By sight alone, I've identified 10 different companies sending folks to OpenStack (and I'm sure there are many more companies represented on the flight that I don't know.) About 30 folks that I know (or at least know of) are on this very flight for a week of design, operations notes exchange, and marketing. WOW. I'm expecting more than 6,000 folks in YVR this week--maybe 7-8K.

Friday, March 27, 2015

Cinder Multi-Backend GOTCHA

When making puppet changes to handle OpenStack Cinder with multi-backends, we created a really painful problem. The multi-backend basically moves our Cinder (with ceph backend) from basically a default name of our cluster (like cluster01) to a backend specific name (cluster0@ceph) so that we can add additional backends (like cephssd and solidfire and 3par).

Unfortunately, this had the really bad side-effect of dropping the "host" that was providing just "cluster01" service. All attached volumes continued to work fine. However, it became impossible to launch new instances with these volumes, terminate (properly) instances with these volumes, or delete these older volumes.

The fix (once you understand the problem) is very straightforward:
cinder-manage volume update-host --currenthost cluster01 --newhost cluster01@ceph

NOTE: Don't do this unless these are really the same services. Presumably safe if you have a single cinder backend and are prepping for multi-backend.

Wednesday, March 11, 2015

OpenStack Clients on Mac OS Yosemite

Clearly I haven't written a blog post recently enough. That must explain the negative karma behind running into a tricky problem.

As we all know, "pipin' ain't easy". I held off upgrading to Yosemite for a long time so as not to break my functioning Mac with OpenStack clients. However, I found time and reason to upgrade last weekend and had helped several others through the Mac client solutions a week prior.

Alas, it was not to be a sweet journey for me. I endedup with the dreaded xmlrpc_client issue. Some coworkers determined that you could resolve this by uninstalling "six" and then reinstalling "six". Unfortunately that doesn't really work. A "sudo -H pip uninstall six" did do the right thing, but "easy_install six" never succeeded. And I should note the reason that six in pip doesn't work in the first place is that the Mac OS Yosemite itself has a downrev version of six installed.

The trick, at least in my case, was to "type -a easy_install" and note that there are two versions of easy_intall. Pick the second one and you are off to the races. Here's the steps if you are still reading:

dmbp:~ dmedberry$ sudo easy_install six #fails
Traceback (most recent call last):
  File "/usr/bin/easy_install-2.7", line 7, in
    from pkg_resources import load_entry_point
  File "/Library/Python/2.7/site-packages/pkg_resources/__init__.py", line 3020, in
    working_set = WorkingSet._build_master()
  File "/Library/Python/2.7/site-packages/pkg_resources/__init__.py", line 616, in _build_master
    return cls._build_from_requirements(__requires__)
  File "/Library/Python/2.7/site-packages/pkg_resources/__init__.py", line 629, in _build_from_requirements
    dists = ws.resolve(reqs, Environment())
  File "/Library/Python/2.7/site-packages/pkg_resources/__init__.py", line 807, in resolve
    raise DistributionNotFound(req)
pkg_resources.DistributionNotFound: setuptools==1.1.6
dmbp:~ dmedberry$ type -a easy_install
easy_install is /usr/bin/easy_install
easy_install is /usr/local/bin/easy_install
dmbp:~ dmedberry$ sudo /usr/local/bin/easy_install six

Friday, December 12, 2014

Ubuntu Sound Gotcha - Missing Devices

I was laid up at home for a day so instead of using my laptop as a laptop, I went ahead and "docked" manually to my dual screen gear (that has largely been attached to my "work" Macbook Pro.) I used the thunderbolt and hdmi connectors on the side of my Spectre XT (and it was the first time I've ever gotten video to work out of the t-bolt. WIN.) The monitors are Acers with DVI connectors so I'm using some monoprice cabling to convert to T-bolt and HDMI.

That setup will probably explain to some of you what went wrong... but sadly it did not initially to me. I was using the setup and listening to music/youtube etc. After some point in time (and crucially, I'm not sure when exactly), I lost my audio output. The sound notify/menubar/whatever device was still present and still indicated I had sound output, but nothing was being produced.

I entered into the sound settings and lo and behold, there was no device listed in the "Play Sound Through" panel of the output setting. (And I didn't even notice that right away as my eyes were laser focused on the completely zeroed out volume indicator at the top that wouldn't move. Of course it wouldn't move as there was no output device.)

I gradually pieced together what may have happened--I suspect at some point AFTER connecting the monitors, I either suspended/resumed or actually rebooted. It appears that something in the audio detection algorithms determined I should be using HDMI audio (however, these Acer monitors have no speakers.) And by gradually pieced together, I mean I futzed around with this for a couple of hours.

Once it dawned on me that the only thing that I had really changed was the monitors, I went ahead and removed them (which I had done several times during the futzing around) and shutdown entirely and rebooted. VOILA. "Built-in Audio" device reappears and I'm another happy camper.

Sunday, November 10, 2013

Missing USB devices in KVM or QEMU with Ubuntu Saucy?

I've been a long time user of Fitibits. These are small devices you wear on your person to track activity. For instance, the Fitibit Ultra can track steps taken and stairs taken among other things. The device integrates with an application and has a nice webapp to display your history.

Setup on the Ultra was done with a Windows machine as there is (well was) no Linux client. I borrowed my wife's machine (as she already had the software installed as she too had an Ultra.) Once the device was setup, some guys at Canonical created a library for Ubuntu that would allow you to sync the device properly from Ubuntu (without running Windows.) Windows was only needed at setup time.

Fitbit continued to develop new monitors and I recently ordered the Fitbit Force. The Force is essentially a smartwatch. It has a superset of features of the Ultra. It also requires setup via Windows (or Mac) and I still don't own one of those. I figured I'd just use KVM and a Windows guest image.

I'm running Ubuntu Saucy with

libvirt-bin 1.1.1-0ubuntu8
qemu-system 1.5.0+dfsg-3ubuntu5

these versions of libvirt and qemu.

Fitbit Force comes with a small dongle you plug into your usb port. (Essentially, this is a bluetooth 4 device but it is single purposed, pairing only with the Fitbit Force.) 

So I added the USB device info to my virtual machine description (using the virt-manager.py GUI). This allows for specific usb hub and device passthrough. However, the device was never showing up in Windows (as near as I could tell.)  I spent hours debugging this, changing perms on the usb device tree, running as root, etc, to no avail.

To troubleshoot, I used a USB device lister, USBDeview, from Nirsoft. It's freeware (free as in beer). It did a nice job of listing my past and present USB devices by walking the USB tree and the registry entries of past devices. Nope, no Fitbit Force.

A bit of googling and debugging led me to this gem:
"Apparmor blocks usb devices in libvirt in Saucy" bug #1245251. By using the work around in comment #1, my Fitbit Force (and any other USB device I requested) were now available in the guest.

(I had previously tried the apparmor settings mentioned in the Managing KVM page, to no avail. Those instructions pre-date Saucy.)

Hopefully by the time anyone stumbles on this particular post, this bug will already be fixed in Saucy and Trusty, but I like to publish these lessons learned Just In Case.

Oh, and one other note, once the initial setup is done, you can pair your smartwatch to your android phone... once they add the type of android phone you have. The Nexus 5 is apparently slightly too new..... :( so I'll be using that Windows image a bit more than I planned.

Monday, August 5, 2013

Using kdump on Ubuntu in Azure

This is another of my occasional posts that may help the next guy. I call them YAHTNG, yet another helping the next guy,t blog entry...


kdump is a tool that allows you to capture (in a file) the linux kernel state when it crashes (oops). It uses the kexec functionality that's long been part of the linux kernel (since 2004 if memory serves.) In order to use this on linux, you install the linux-crashdump metapackage that in turn depends on the right bits and pieces.

apt-get install linux-crashdump

On different versions of Linux, different bits and pieces get installed. Prior to Raring, 13.04, you get one set of packages and Raring and newer, you get a different set. In either case, on Microsoft's Azure cloud and elsewhere under the hyper-v hypervisor, you will get a hang if  you just install the linux-crashdump package and then experience a crash. This is due to some Azure-specific kernel modules that get loaded in the kexec/kdump kernel. You need to exclude these modules, i.e., blacklist them. Here's how.

Older Ubuntu Releases including Precise

In 12.04 (Precise) and 12.10 (Quantal) you want to edit /etc/init.d/kdump (this is the script that runs at boot time to configure the kdump kernel. The kdump kernel gets loaded into memory and configured via this script.)

--- /etc/init.d/kdump 2013-06-28 00:09:22.400504335 +0000
+++ kdump.nohyperv 2013-06-28 00:16:48.903733116 +0000
@@ -48,6 +48,7 @@ do_start () {
  # Append kdump_needed for initramfs to know what to do, and add
  # maxcpus=1 to keep things sane.
  APPEND="$APPEND kdump_needed maxcpus=1 irqpoll reset_devices"
+ APPEND="$APPEND ata_piix.prefer_ms_hyperv=0 modprobe.blacklist=hv_vmbus,hv_storvsc,hv_utils,hv_netvsc,hid_hyperv"

  # --elf32-core-headers is needed for 32-bit systems (ok
  # for 64-bit ones too).

As you can see, we are simply prohibiting the Azure kernel modules hv_vmbus, hv_storvsc, hv_utils, hv_netvsc, and hid_hyperv from loading in the kdump kernel. They still get loaded in the regular Azure kernel (and you will want to keep them there for performance and behavior reasons.) However, if they load in the kdump kernel, they won't actually work and will "hang" the kdump kernel while they try and connect to the Azure services (or hyper-v services.) Additionally, we  prefer NOT to load the hyper-v module setting for ata_piix by setting it to zero.

After you modify this init script, you will want to reboot. (But take note and read the last section on the crashkernel as you will likely want to make that change as well, prior to rebooting.

Newer Ubuntu Release (Raring and the upcoming Saucy)

The newest releases of Ubuntu include an additional package that handles kdump configuration called kdump-tools. This package manages the kernel modules in a simple config file /etc/default/kdump-tools. You can edit that file to blacklist the appropriate modules:

    67 #KDUMP_CMDLINE_APPEND="irqpoll maxcpus=1 nousb"
    68 KDUMP_CMDLINE_APPEND="irqpoll maxcpus=1 nousb  ata_piix.prefer_ms_hyperv=0 modprobe.blacklist=hv_vmbus,hv_storvsc,hv_utils,hv_netvsc,hid_hyperv "

In addition to preferring to NOT use the ata_piix for hyperv, it also blacklists the same kernel modules as previously mentioned.

Smaller Images

Low memory (extrasmall, small) Azure instances (well, really any small images including small physical machines) unfortunately run into bug #1206691, default crashkernel setting rarely works on a system with little memory. You will need to modify /etc/grub.d/10_linux and set the crashkernel to 128M for any size instance. Do this by simply altering the range here:

   74 # add crashkernel option if we have the required tools
    75 if [ -x "/usr/bin/makedumpfile" ] && [ -x "/sbin/kexec" ]; then
    76     GRUB_CMDLINE_EXTRA="$GRUB_CMDLINE_EXTRA crashkernel=384M-2G:64M,2G-:128M"

    74 # add crashkernel option if we have the required tools
    75 if [ -x "/usr/bin/makedumpfile" ] && [ -x "/sbin/kexec" ]; then
    76     GRUB_CMDLINE_EXTRA="$GRUB_CMDLINE_EXTRA crashkernel=384M-700M:64M,700M-:128M"

Once you have made this change, be sure to update grub:

sudo update-grub

so that the chnage will take effect. You will also want to reboot. Then you can validate that change by inspecting the boot command line:

cat /proc/cmdline

and see that the new value is now shown.

ubuntu@bug1195328-1210:~$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-3.5.0-36-generic root=UUID=39eb48d3-958a-48e0-896e-b6b03cc2342a ro crashkernel=128M console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300

Reference Material

The official references for configuring Ubuntu for kdump are here:

and you should refer to them for procedures for testing and verifying your crashdump setup.

Micosoft Azure has some notes on the kernel modules here:

Sunday, July 14, 2013

Fun with QR Codes

In honor of xkcd's cartoon today, I thought I'd make myself a qr code--a self portrait.

Be sure to follow the QR code in xkcd a couple times and then come back and try this self-portrait. Oh, and you can do your own QR code art at http://www.qrpixel.com/

Tuesday, July 9, 2013

OpenStack Programs Core Developers

It seems to be something I look up fairly regularly: A listing of OpenStack core developers--either so I can get a +2 or just because I need to know if someone is on or not-on a given list.

I found the Canonical list by initially proposing in this blog the wrong list--and apparently that's a fairly common problem. So I'll link to the real list and explain how you might also be referring to the wrong list as well.

Official OpenStack Programs
OpenStack Technical Committee
Lastly, I ran across OpenStack Planet Core Developers when I was creating this list. They may be an aggregator team (ie, bloggers who get aggregated).  And hopefully, this blog will show up in the planet soon.

Now for the wrong list. Not so long ago, much of OpenStack was managed in Launchpad. Consequently, there are also a somewhat correlating list of -core projects in Launchpad. However, I'm not going to reproduce them herein in order to avoid perpetuating them. I will mention the bug that has been posted to help clean them up though: https://bugs.launchpad.net/openstack-ci/+bug/1160277 and it is listed as in progress and did see some activity last month. If you happen to stumble onto this blog post and have some ownership over those dangling -core teams or other defunct/obsolete launchpad teams, go ahead and clean them up (pretty please).

Oh and one other editorial footnote: OpenStack now refers to the individual areas of development as programs, not projects as it used to. You might want to update your mental model to that terminology. Many thanks to ttx for the review of this document (though all errors and faux pas are mine.)

Friday, July 5, 2013

VirtualBox Host-Only Networking

VirtualBox allows one to configure a VM with host-only networking. This can be useful if you are connnecting a number of VMs together and need to put them on the same switch/bridge.

However, it's darn frustrating to figure out how to enable it as all of the googling and manuals indicate you just enable it by selecting settings within the VM.

What they fail to mention though (but is covered in the built-in help in VirtualBox), is that you must first create a device for this host-only networking to use. From the main VirtualBox window, choose Preferences (below the File) menu. That will bring up the global preferences for the VirtualBox hypervisor. Click "Network" and then the "+" sign to add a host-only network device (typically vboxnet0).

It's possible that some versions of VirtualBox create one of these at install time, but on Ubuntu, such is not the case. Now you can create a number of virtual machines and put them on the same nic. Your host OS will also now show that same nic. Here's an example from my laptop's OS:

medberry@handsofblue:~$ ip a show dev vboxnet0
6: vboxnet0: mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000
    link/ether 0a:00:27:00:00:00 brd ff:ff:ff:ff:ff:ff
    inet brd scope global vboxnet0
    inet6 fe80::800:27ff:fe00:0/64 scope link 
       valid_lft forever preferred_lft forever

and that same vboxnet0 is now an option when you select host-only networking in the vm: