Tuesday, April 23, 2024

Endorphins from Good Questions

 Does anyone else get a serotonin blast or endorphins when then generate a good question (to the web, to an expert, to a rando)?

Recently, I've had two unrelated experineces where I finally had generated enough context on a subject to generate good, actionable, valid questions that subsequently furthered my work and exploration of a new topic. It's such a jolt to my system that I love this.

I used to get the same kind of feelings (and neurochemical feedback) from completing a hard problem, or getting a program to run, etc. But these days, it's mostly just around generating a sufficiently good question that I'm no longer blocked in my work. Totally splendiferous.

I have nothing to link here sadly. It's just a blog post about feelings.

NOTE: I've had some of these sensations when working with Google AI and Microsoft Github Copilot. Asking great questions, i.e., prompting, is key to a useful productive interaction with these AI tools.

Tuesday, November 14, 2023

Restart the Mac menubar Dock etc to recover Mission Control

 I've switched to a Mac full-time a few months ago. One weird thing keeps occuring--I'm unable to make menubar selections (which I first notice actually with a keyboard shortcut: Ctrl-Up Arrow or Ctrl-Down Arrow). It's unclear why this happens but typically happens after connecting my mac to an external monitor and then waking it up (although the problem is not instantaneous at that point--it occurs shortly thereafter).

I've tried the normal tips to get the menubar to respond such as killing all Dock processes and similar.

What does appear to work is simply lifting the lid of my Mac. It triggers a "hard" redraw on all of the graphics and restores the menubar functionality and voila! the Mission Control features mentioned above resume proper functionality. (I immediately close the lid which triggers a 2nd redraw but you get the gist.)


Tuesday, April 4, 2023

Slack autohides channels now by default

 Slack also eliminated the ability to search for channel names (no idea why).

So, you need to ENABLE the "Show and sort" ALL button. And or enable Show.... All your conversations in the preferences.

Wednesday, January 4, 2023

A little K8s, Gitea, Helm, Postgres, stuff

 Had a discussion at work about what data gitea puts into postgres. Kind of a scaling/sizing discussion. Mainly, the stateful info needs to be stored and it's not especially dynamic. (That's my short take on the findings.)

I found via google some useful information on seblab's blog: https://blog.seblab.be/

and I'm sure there could be a lot more. I like the way seblab sets up a problem/situation and works it out.


  

Thursday, October 13, 2022

Stop finding semicolons and start finding plus signs.

 I frequently need to use the GNU find command to recursively look through a directory. Less often, I also need to execute a command on the results. There are two well documented ways to execute that command. 

The first is to pipe the output of find into xargs and run the command there. Here's an example.

find . -iname \*yaml -print0 | xargs -0 ls -alFhtr 

The second (and a bit handier if you can remember the proper invocation) is to just add an exec to the find command. I've historically used the find, exec, {} \; method. I've just studied up on that semicolon and along the way was exposed to the find exec {} + method. It doesn't require an escape character as bash (and other shells presumably) don't have a special meaning for +. And + actually means "clomp all the find results together and pass them into the command" which is particularly handy for date comparisons and for other reasons. So try this:

find . -iname \*yaml -exec ls -alFhtr {} +
And consequently, the flags to ls now sort the grouping of files by timestamp in reverse order (leaving the most recent yaml file at the bottom of the output (which is generally what I want).

Tuesday, August 30, 2022

helm knows what your kubernetes ConfigMap started with

 Today I asked my teammates how I might retrieve or recreate a missing ConfigMap. Our systems use helm to deploy ... deployments and one of the suggested this little gem:

helm get all -n namespace deploymentname  |tee deploymentname.yaml

Therein you will likely find the initial contents of the CM and can recreate it. (I just yanked out the rest of the yaml and then applied that file.)  

Tuesday, December 21, 2021

github ssh keys are finicky

 Okay, github uses some hash magic to figure out what account is affiliated with a particular ssh key. If you have two accounts using the same key, it has to guess (and does so poorly) about which user is trying to write.

Moreover, it tries keys somewhat sequentially. If the first in our ssh-agent (via ssh-add -L) is associated with a user, that's the only one it tries. If not, I believe it will go onto the next one (but again, if that second key is associated with multiple github users, it's going to hash to a specific user.) This is all because you can't tell github what user to use. De facto, the key is hashing to one and only one user and it ass/u/mes that is the user it will end up using.

But, the sad sad sad part is that it will not give you any more information.

You can set a specific "id" to use in our .ssh/config file like this:

Host github.com

Identityfile /home/USERNAME/.ssh/id_githubonly_rsa

 

 

Wednesday, October 27, 2021

elevator=none elevator=noop are both overcome by events

 You can't just pass in on the kernel commandline "elevator=none". If you do in ~5.3 or newer, you'll see:


[    0.105899] Kernel parameter elevator= does not have any effect anymore.

               Please use sysfs to set IO scheduler for individual devices.


Monday, April 5, 2021

xorriso is the bomb

 So, I wanted to remaster the SystemRescue iso. A trivial (20 character addition) was needed. I could not for the life of me figure out how to do this.

The answer (after 3-4 different multi-hour sessions trying to figure this out) was to use a tool called: xorriso. And to bear in mind that the iso itself needs to be bootable via an ISO or burnt CD/DVD as well as via UEFI and possible even via USB. This means it needs to be an everything bagel (so to speak). I found the "howto" for another distro and used that technique with the slight differences for the system rescue iso (which buries the efiboot.img under archiso since it is arch linux based.)

There was also this handy little guide re: xorriso. That was part of figuring this out. (I've always used genisoimage and friends prior to needing to remaster this.)

Thursday, April 1, 2021

ugly code that works

 I had the thought, "ugly code that works", as I finished a PR today. Project isn't technically open sourced (YET!) so I can't say much more but man, I worked sooooo long and hard on this trivial problem that I needed to vent.

The solution to the problem was to do something in cloud-init. Do the same thing in a systemd unit. And all of that was just to work around a sysctl issue. The solution is ugly code that works (in particular as it has a sleep 180 embedded in a script.) There is no reason in the world that I should be writing a systemd unit that calls a two line script where one line is sleep 180. But, there was no way to get the sysctl working with a systemd.path unit. (I tried, repeatedly.)

And the code works. I also then (after getting a merge) googled, "ugly code that works", and saw someone has already written this blog (though thankfully it wasn't me this time). See: https://dev.to/tonetheman/ugly-code-that-works-4i7l

"Do not be afraid to write ugly code that works." Also, don't be too surprised if it does break. Ugly code can be fragile also.

Wednesday, March 31, 2021

terraform is logical but not natural

The subject kind of says it all. If you can use environment variables (and you can and probably should) in terraform with the invocation/usage of TF_VAR_environment_variable_name, why in the ever loving world can you not do the same in modules (or sub-modules as I think of them)? Apparently environment variables don't propagate into the submodules.

So, you basically "re-declare" when you define/invoke the submodule. See: https://stackoverflow.com/questions/53853790/terraform-how-to-pass-environment-variables-to-sub-modules-in-terraform

In my case, I'm building "N" VMs in submodules and there is a "vms.tf" that has the "module "NAME" {} invocation (and of course there are "N" of these) so I had to do something like:
module "FIRST" {variableone = var.variableone}
.
.
.
module "NTH" {variableone = var.variableone}

"N" times and then at the top level (main.tf or variables.tf) something like:

variable "variableone" {
  description = "Cascade environment variable in terraform"
  default = ""

}

and then
export variableone="myvaluegoeshere"

in my environment.

Monday, March 15, 2021

It's been a minute and a few years....

 So, I just noticed it's been a minute since I last posted. I typically only post things that I need to find again in the future--and apparently that's been less often lately.

Today, I needed to figure out WHY IN THE WORLD my LG 4K HDMI monitor was popping throughout the day. And, indeed I did, but first, how did I get here?

I am a longtime Ubuntu user. This machine was built with Ubuntu in 2017-11 and upgraded to LTS release in 2018. I do daily apt updates but recently bit the bullet and brought it up to 2020 LTS release. That went very well and I see a number of improvements. However, it also started making a LOUD popping noise that I couldn't tie to any particular user activity.

A bit of googling later and I found that it was likely related to snd_hda_intel module and its power_save settings. power_save defaults to 1 (on this machine and similar) as it is a laptop and power_save is a good thing for a laptop. However, 98% of the time, it is now running primarily as a desktop. You know, COVID-19 and nowhere to go....

I confirmed the issue by further googling and found my friend Major Hayden's post when I ran into this same thing: https://major.io/2019/03/04/stop-audio-pops-on-intel-hd-audio/ it has a better writeup, more detail etc. But I post here so that I can easily find this myself for future me. I also thanked past Major here: https://twitter.com/davidmedberry/status/1371588276176363521

Friday, October 25, 2019

Deja Vu All Over Again

I recently joined Cray Research, the supercomputer folk, after 1.5 years doing pre-sales engineering for a technology I love. However, I didn't actually love (nor even really like) sales itself. So I'm back to doing engineering.

Two days after Cray hired me, their merger with Hewlett Packard Enterprise (HPE) completed, so I'm back at HP (but now HPE). Most of the impact of that change occurs on January 1, 2020.

So not only the yogiberra-ism of "Deja Vu All Over Again" but also the godfather-ism of "When I thought I was out, they pull me back in."

Monday, July 1, 2019

ansible tower token authentication

Reminder to self mostly, when refreshing your memory about tokens, start with this page:
ansible authentication methods and tokens

I'll come back here when I have something more substantive to say about this. The PAT token is dead easy, straight-forward and has naught to do with Point After Try.

Friday, April 12, 2019

letsencrypt with certbot

Well, the title is the task I was trying to accomplish but I kept getting an error. Turns out, the awscli in Ubuntu is seriously out of date. It gives an error like:
'AWSHTTPSConnection' object has no attribute 'server_hostname'
when using certbot (more on that below). The simple and easily googleable fix was to remove the ubuntu awscli package and pip install a newer version:
sudo apt-get remove awscli
pip install --upgrade awscli
I'd recommend doing that pip install in a venv (python virtual environment), especially if you have other "cloud tools" installed that way.

Now, why was I doing this and what does the title really mean? Most websites these days need to have an "SSL Cert" that is a signature by a certification authority. Really folks, you need to be doing this these days. Many businesses will not let you browse to a site that has a self signed cert and won't let you browse to a non-https site at all. But this is super easy as Let's Encrypt and certbot do all the work. I merely followed the steps here:
https://hackernoon.com/easy-lets-encrypt-certificates-on-aws-79387767830b

(Make sure you have certbot installed first. Your OS may have it packaged or "brew install certbot" on a Mac.)

And as with all of my recent posts, this is just mostly so I won't spend another 1/2 day trying to remember or recreate this.

And in all fairness, there are also a number of Ansible playbooks and/or roles for doing this. Here's some info on that:
https://github.com/geerlingguy/ansible-role-certbot
https://docs.ansible.com/ansible/2.5/modules/letsencrypt_module.html 
https://docs.ansible.com/ansible/latest/modules/acme_account_module.html 
(Ansible letsencrypt module was renamed more generically as "ACME" as it actually uses ACME and Let's Encrypt adheres to that web standard.)

Saturday, February 23, 2019

More fun with Ansible

I've been in an Ansible Solutions Architect (SA) role at Red Hat for about a year, but I still learn new things about Ansible every day.

When I explored ARA I first became familiar with Ansible Callbacks (that get called at the conclusion of tasks, plays, etc.) And I've been needing to make some modifications (filters, etc) to the PLAY RECAP at the end of an Ansible play. Note that there are numerous callbacks pre-written listed here, but occasionally you need to write a custom one. In this case, I just wanted a better understanding of what those pre-written ones can do. And, lo and behold, there's a nicely documented page that shows you that.

Thank you Random Hero.


Saturday, February 9, 2019

Weird tab behavior in Google Chrome

I run Chrome as my primary browser (so far) and it has never failed me. Yesterday however, I began to see a very strange behavior. As soon as I would click on any tab (other than the first tab), Chrome would start cycling down through the tabs. I.e., if I clicked on the 4th one, it would switch to that, then to the 3rd, then the 2nd, and finally the first tab (where it would remain.)

Switching to a new window (with only one tab) would work fine but as soon as another tab was opened, the same behavior.

Survived through reboots, chrome upgrades etc.

I think I have isolated this to either a funky (dirty?) keyboard or flakey mouse. Once I disconnected both the external keyboard and mouse, things returned to normal. Now doing the bisect to see if mouse or keyboard. One note: My kitchen has been under renovation for the last month. Consequently, I've done a lot more "eating over the keyboard" than normal, so maybe I just dropped some weird crumb that effectively doees control-pageup repeatedly (or some other previous tab command over and over.) I didn't notice this behavior in other "tab oriented" programs (such as gnome terminal or Firefox.)

Updates here if I further resolve this.

Oh and some search terms in case anyone else runs into this:
ubuntu
chrome
chrome-beta
(occurred in both)
tab switching
autotab switching
tab bug
google chrome tab bug
google chrome tab autoswitching bug

(Oh and for those playing along at home: restarted chrome numerous times, disabled all extensions, rebooted, upgraded Chrome, upgraded all Ubuntu packages--basically did all the "best practices" I could think of to work around this. The only work around seems to be disconnecting mouse and keyboard (which were plugged into a USB C dongle providing legacy USB connections.) System is HP Spectre x360 15" touch with 8th gen i7 running Ubuntu 18.04.2

Mouse seems to be working fine.

Blew some dust/gunk/ick out of my keyboard and now everything seems to be working again. (The peripherals are attached in the same order, same location.) So LIKELY the keyboard? The world may never know (and I'm sure the world will never care.)

Tuesday, November 27, 2018

TIL: Ansible engine raw module "needs" gather_facts: no

Okay, the title says it all, but let's unpack that.

Ansible Playbooks are instructions for running ansible modules against a target host. They can be very very simple. Here's one of the simplest:
---
- name: A very simple playbook
  hosts: all
  tasks:
    - name: pingo
      ping:

This merely runs the ansible "ping" module on "all" hosts. (Ie, whatever hosts are passed in on the command line when this playbook is called.)

A note about the ping module. It is not the normal networking definition of "ping". Network folk will be accustomed to using "ping" to send an ICMP packet to a node (at which point the node would typically send an ICMP ack.) Rather, the ansible module "ping" is a check that the node is up and that the basic needs of ansible are supported on the node, i.e., python is installed.

So... the inquiring mind asks what do you do in a situation if python is NOT installed? Can you still take advantage of some of Ansible? But of course.

The ansible "raw" module allows you to basically do something like the following:
# raw: uptime
# ssh targetnode and_execute_this_command
ssh target.example.net uptime

So here we'd get the uptime of the target node (assuming it was running ssh, we had login authority, and that uptime was installed and in the default path of the effective user.)

So, it seems like it would be straightforward to create an ansible playbook that takes advantage of the raw module.

---
- name: raw uptime
  hosts: all
  tasks:
  - name: Run a raw command
    raw: uptime

and here we run into issues. This playbook won't work on a node that doesn't have python installed. (It will work on one that does.) Why is that? Because of the "secret sauce" called fact gathering. Every playbook as it runs, will run the ansible "setup" module to gather facts on the node before running any of the explicit tasks. The setup module is an implicit task and is noted in the module reference, "[t]his module is automatically called by playbooks"

NOTE: I've scattered some handy links within this document so that you can learn more about these. I'd recommend following them and then coming back here after you have familiarized yourself with ansible, modules, ping, raw, setup, and gather_facts.

So, how do we make this work then? If you read the gather_facts link, you probably know that you can bypass it very simply. You set a "gather_facts" to no in your playbook. Consequently you end up with this as the right playbook for a node without python where you want to know the uptime.

---
- name: raw uptime
  hosts: all
  gather_facts: no
  tasks:
  - name: Run a raw command
    raw: uptime

So a simple one line addition.



And how did I get in this situation? One of the most common cloud operating systems (aka cloud images) is one called "cirros". Cirros is a very minimal linux and as such, it does not include python. Moreover, there really isn't an effective way to "add" python to it (though possibly could be done with a staticly built file--I'll leave that as an exercise for the reader.)

CIrros is frequently used in a cloud environment (i.e., OpenStack) to validate that the cloud itself is working well. From within cirros you can login (as it provides transparent credentials) and check on the networking, etc. Basically it's a quick and dirty way to make sure your cloud is operating as intended.

I regularly spin up one or more cirros instances as soon as I build an openstack--whether that be an all-in-one devstack or an entire production cloud. In both cases, cirros is my "go to" tool to validate the cloud. (Thanks Scott.)



... and one more thing, you would normally just run the command uptime using the command module to get the uptime. But doing so requires the python infrastructure ansible relies on. Here's that "normal" or typical way.

---
- name: A very simple command
  hosts: all
  tasks:
    - name: uptime
      command: uptime

and even if you add "gather_facts: no" to it, the cmmand module itself still requires python so you really really need the raw module and the "gather_facts: no" setting.

Friday, August 10, 2018

Life of a #Remotee

I work remotely for Red Hat. Primarily at home but also in a coffee shop with several co-workers. And, oh yeah, forgot to mention, I travel heavily.
So I need to be able to work remote. But I don't want to take all my gear with me, so I leave some of it at home and plugged in. And mosh allows me to connect/reconnect etc.

I (re-)discovered mosh last weekend when prepping for a trip. I didn't want to carry my bulky laptop to the mountains that day, so I set up remote access through my cable modem. Of course, it's trivial to set up a port forward from my new Google Wifi AP and routers to my home machine. But that gives you connectivity, not persistence. So I pulled down the "mobile shell" mosh and set it up quickly.

IT JUST WORKS.®™

I decided to do this blog post after typing internally:
So, I started a mosh session from home to home last Sunday. I've been to Denver Airport. on-board a Southwest flight, Saint Louis hotel, Saint Louis enterprise customer, and back, and that session just keeps running. I had heard of mosh before but using it is even easier. I used to "work around" this with screen sessions but mosh is even simpler than that.

So, setup is easy peasy. Install mosh. Find a UDP port you can forward back to your persistent (home) node. You probably also want to forward back a TCP port for ssh.

mosh --ssh="ssh -p$SOMEPORT" -p $SOMEUDP  $HOMEIP

You can find your home ip (from home) with this:

export HOMEIP=$(curl ifconfig.co) # but I save this to a file as well so maybeexport HOMEIP=$(curl ifconfig.co |tee ~/bin/myhomeip)

You can port forward the default ssh port (22) or something slightly more obscure. The default UDP port range for mosh starts at 60000 through 61000. I picked a port in that range.

Both the SOMEPORT and SOMEUDP need port forwarded (using your router setup) to the actual node you want to use.

One other thing you will want to check out as a #remotee is wireguard. I'll write it up once I've switched my vpn-ness over to it. Wireguard currently uses some packages to install a kernel module that gets built with out of tree dpdk.  See wireguard, hat tip to Jess Frazelle for Dockerfiles mentioning Wireguard and oh yeah, this guy.

Saturday, July 21, 2018

Fedora Kernel (well 4.17 kernel) issue resolvable

I've been using a Thinkpad p50 for work since I joined Red Hat. And I'm running Fedora on it instead of Red Hat Enterprise Linux workstation so that I can be more current.

However, that bit me recently when I upgraded to Fedora 28. The built-in Intel dual-band AC 8260 (rev 3a) failed to work on the newer 4.17 kernel. This led me down some dark roads as I switched to an rc 4.18 kernel which had even more problems (though that kernel did have working iwlwifi.)

A productive Saturday morning, led me to patch and test my 4.17 Fedora 28 kernel successfully. The short version is you need to revert one change in the kernel as detailed here:
https://lkml.org/lkml/2018/7/1/104

and then build Fedora kernel rpms (which I haven't done in more than a decade and a lot has changed in that decade). See:
https://fedoraproject.org/wiki/Building_a_custom_kernel

and it has one thing out of date as well. The command:
# sudo /usr/libexec/pesign/pesign-authorize-user

should be just:
# sudo /usr/libexec/pesign/pesign-authorize
A Fedora rpm kernel build gives you time to mow the yard, bike to Starbucks, etc. So don't expect it to finish in minutes. Use the scale of hours.

PS. In linkifying this post, I see that 4.17 COULD HAVE BEEN A CONTENDER for Linux kernel 5.0. That would have scared me enough to hold off... oh well. YMMV. Enjoy.

Wednesday, June 27, 2018

TIL: boto3 can't parse env variables properly

So in today's saga of fighting the Internet, I bring this salient point to your attention:

boto3 can't parse environment variables for AWS credentials properly

and it will error kind of like this:

 ValueError: a Credential=WHACKADOODLEGGET5Q\r/20180627/us-east-1/sts/aws4_request

or possibly like this:

caught_exception\nValueError: Invalid header value '
    AWS4-HMAC-SHA256 Credential=WHACKADOODLET5Q\\r/20180627/us-east-1/ec2/aws4_request, SignedHeaders=content-type;host;x-amz-date, Signature=8f84f8d811f4dcb45da5f8fadf90ae8390d5d358b4024bf0d964090032dea1c3'\n", 
    "module_stdout": "", 
    "msg": "MODULE FAILURE", 

This appears to be an error in parsing the  \r portion of the URI. (Based on a few google queries.)

Using ~/.aws/credentials seems to do the trick (with the exact same key values):

[default]
aws_access_key_id=WHACKADOODLEGGGET5Q
aws_secret_access_key=bn+++krandometyperyforwhackadoodle1YjKMWqzv


The environment variables I was using were of course:

export AWS_ACCESS_KEY_ID=WHACKADOODLEGGGET5Q
export \ AWS_SECRET_ACCESS_KEY=bn+++krandometyperyforwhackadoodle1YjKMWqzv


Thursday, June 7, 2018

Find your libvirt instances...

I'm at Red Hat these days and working on cloud manageability which you will see me write as m11y more often than not.

I recently had a need to utilize Ansible for a demo and created a slug of VMs to use in my ansible inventory. However, there is no way (no obvious way?) I can find with virsh or virt-manager to determine what IP got assigned to a VM. A bit of digging shows that when using the cloud-images, dhcp is used to acquire an IP address and that dnsmasq manages those IP addresses.

In my case, these VMs were attached to virbr0 and it is that instance of dnsmasq that I needed to inspect.

Like most of my very short blog posts, I'm just writing this down so I can find it myself the next time I need it. In this case the file that "issued" the IP addresses is:

/var/lib/libvirt/dnsmasq/virbr0.status

That shows the MAC and IP address associations (but no indication of "which" libvirt domain they were actually associated.) A further exercise for me and the reader will be to tie the domain back to the IP address (if possible.)

One other related note: If you use "virt-clone" to clone a cloud-image that has been previously booted, it will typically boot fine but WITHOUT an IP address. This is due to cloud-init not properly re-running. Blowing away some of the first run cloud-init stuff takes care of that.

sudo rm -rf /var/lib/cloud
sudo rm -rf /var/log/cloud-init*

(This may be overkill to "purify" a cloud-instance but it certainly does the ttrick.)

You need to re-run cloud-init at this point (either directly or a quick reboot) and you should find that your instance has an IP address now.

It's a pretty quick and easy step to convirt virbr0.status into an inventory...

sudo cat /var/lib/libvirt/dnsmasq/virbr0.status |grep ip-address |awk '{ print $NF }' |sed  -e 's/"//' -e 's/",//'

or you could turn your inventory into a dynamic inventory based on that file, exercise also left to the reader (but be sure to make the output valid JSON dict.)

Thursday, August 3, 2017

Replacing Android device with a new one

So.... TILLW, if you have a new android phone, the very first thing you want to do is tell it you want to copy an existing phone you have in your possession. There appears to be no way (other than resetting the phone back to factory defaults) after the first screen to have one phone be setup like another. This is inane.

TIL it ain't so easy to enable root login in ubuntu

So... TIL today. You can enable root login in sshd_config. You can set a password for the root as well of course. However, you can't actually ssh into the box if you have any keys (or at least "some" keys) in your ssh-agent. So, by running ssh-agent bash and then logging in as root to target, I get asked for a password. If I alternatively have an ssh key you get the message that
Please login as the user "ubuntu" rather than the user "root".

Sunday, June 5, 2016

Android Tap & Go... Almost as Easy as It Sounds

I had one teensy-weensy little problem with Android's Tap & Go feature for new phone set up. Although the NFC portion works fine, it is also VERY IMPORANT to enable Bluetooth PRIOR to running the sync. The sync code will temporarily turn on the bluetooth but it won't actuall work. Just enable bluetooth on the OLD phone prior to trying the Tap and Go and then it truly will be Tap and Go.

I had no luck googling for this issue or solution so noting it here for others to find. The phrase, "Sorry, Something Went Wrong" will appear on your new phone shortly have making the sync chime.

Wednesday, May 11, 2016

TIL From Users Today

I learn something about OpenStack each and every day and one of my broadest sources of OpenStack knowledge comes from my customers, the users of Time Warner Cable's OpenStack implementation.

Today, users' taught me, passively, by using it, about Shelving an instance. 

Suspending an instance puts it to sleep (via CPU suspension states) but the instance (virtual machines) is still using all of the memory and disk and cpu resources it was before (though the actual host CPU usage in this case is very very minimal.) Subtly different yet similar seems to be Pausing an instance. (I've never used this myself but am now checking to see if my users are using this.)

Stopping an instance still has the instance assigned to a hypervisor (essentially a compute host) and still "sort of" consumes resources on that host, most notably the ephemeral disk.

Shelving an instance makes a glance image snapshot of the instance and then removes all reference from the original hypervisor and compute host. It is consuming glance snapshot space in so doing but nothing on the hypervisor or compute host. So it is using the most minimal amount of resources possible.

Now to make this a real blog, I need to go add some links so that this is more shiny.
http://www.urbandictionary.com/define.php?term=TIL

Monday, July 27, 2015

Time Warner Cable (TWC) OpenStack Summit Talks in Tokyo -- Vote Now

(Using my soapbox to pitch some Time Warner Cable proposed talks for Tokyo.)

Many of these are our experience running an OpenStack Cloud in production at Time Warner Cable. Others are particular areas of interest. And one is just for fun....

Click the shiny link to vote up or vote down, however you feel.

Description
Abstract Author
Forum
Tech/Ops/Bus 
More Info... Voting (link to abstract on openstack.org)
Duct Tape, Bubble Gum, and Bailing WireEric Peterson, TWC,
GoDaddy and others
Ops
Fernet Tokens: Why They're Better and How to SwitchMatt Fischer and the Frenet teamOps
Moving a Running OpenStack CloudMatt Fischer
Eric Peterson
Ops
Customer Pain Points Revealed in TWC's OpenStackKevin Kirkpatrick
David Medberry
Ops
Building the right underlay, lessons learnedSean Lynn
Darren Kara
Ops
Monitoring OpenStack with Monasca -- Totally Worth the EffortBrad Klein
Ryan Bak
Ops/Monitoring
Overcoming the Challenges. LBaaS in practice at TWCJason Rouault
Sean Lynn
Ops
Upgrading OpenStack Without Breaking Everything (Including Neutron!)Clayton O'Neill
Sean Lynn
Ops
Integration & Deployment Testing of an OpenStack CloudClayton O'Neill
Matt Fischer
Ops

OpenStack TriviaDavid Medberry
Kevin Kirkpatrick
Community

Owls, Cats, Toads, Rats: Managing Magical Pets - VM persistenceDavid Medberry
Craig Delatte
Enterprise IT Strategies
Other Ways to ContributeDavid Medberry
Eric Peterson
How To Contribute
LibVirt: Where are we today?David Medberry
Sean Lynn
Related OSS Projects
OpenVSwitch: Where are we today?David Medberry
Sean Lynn
Related OSS Projects

Monitoring: How to get the outcomes you want and want the outcomes you get!Steve Travis
Ryan Bak
Brad Klein
Monitoring / Operations
An all SSD Ceph cluster: Tips, Tricks, and lessonsBryan Stillwell
Craig Delatte
OPS
The evolution of Openstack Storage Services at TWCAdam Vinsh
Craig Delatte
Enterprise IT strategies
Building a self-repairing cloud with AIRyan BakMonitoring / Operations
Configuring Swift Using Puppet-swift: considerations and examples.
Adam VinshOPS

Thursday, May 28, 2015

Request for Reviews from Online Stores

Dear googlemonopriceamazon, please don't ask me to do reviews for items delivered via slow delivery. If I request slow delivery, there is a REALLY GOOD CHANCE I'm not going to use the item right away. So, don't ask me to review it when I haven't even taken it out of the package. You should only ask for reviews when things are shipped overnight. (Though arguably, if you are in that kind of hurry you don't have time to do reviews.) So I guess, I'm just saying, stop asking for reviews altogether.

And if I could have expressed this in a brief way, it would have been a tweet.

And, no, don't file a patent for this. I freely grant this info to the world....

Sunday, May 17, 2015

OpenStack, a really big deal

Okay, most folks know I've spent the last 4 years involved with OpenStack and have been attending summits since Boston. This is the first time however I've been overwhelmed with the number of other folks attending the summit and we aren't even there yet.

I'm at 36,000 feet over NW USA headed to Vancouver from Denver on United UA 323. By sight alone, I've identified 10 different companies sending folks to OpenStack (and I'm sure there are many more companies represented on the flight that I don't know.) About 30 folks that I know (or at least know of) are on this very flight for a week of design, operations notes exchange, and marketing. WOW. I'm expecting more than 6,000 folks in YVR this week--maybe 7-8K.


Friday, March 27, 2015

Cinder Multi-Backend GOTCHA

When making puppet changes to handle OpenStack Cinder with multi-backends, we created a really painful problem. The multi-backend basically moves our Cinder (with ceph backend) from basically a default name of our cluster (like cluster01) to a backend specific name (cluster0@ceph) so that we can add additional backends (like cephssd and solidfire and 3par).

Unfortunately, this had the really bad side-effect of dropping the "host" that was providing just "cluster01" service. All attached volumes continued to work fine. However, it became impossible to launch new instances with these volumes, terminate (properly) instances with these volumes, or delete these older volumes.

The fix (once you understand the problem) is very straightforward:
cinder-manage volume update-host --currenthost cluster01 --newhost cluster01@ceph

NOTE: Don't do this unless these are really the same services. Presumably safe if you have a single cinder backend and are prepping for multi-backend.


Wednesday, March 11, 2015

OpenStack Clients on Mac OS Yosemite

Clearly I haven't written a blog post recently enough. That must explain the negative karma behind running into a tricky problem.

As we all know, "pipin' ain't easy". I held off upgrading to Yosemite for a long time so as not to break my functioning Mac with OpenStack clients. However, I found time and reason to upgrade last weekend and had helped several others through the Mac client solutions a week prior.

Alas, it was not to be a sweet journey for me. I endedup with the dreaded xmlrpc_client issue. Some coworkers determined that you could resolve this by uninstalling "six" and then reinstalling "six". Unfortunately that doesn't really work. A "sudo -H pip uninstall six" did do the right thing, but "easy_install six" never succeeded. And I should note the reason that six in pip doesn't work in the first place is that the Mac OS Yosemite itself has a downrev version of six installed.

The trick, at least in my case, was to "type -a easy_install" and note that there are two versions of easy_intall. Pick the second one and you are off to the races. Here's the steps if you are still reading:

dmbp:~ dmedberry$ sudo easy_install six #fails
Password:
Traceback (most recent call last):
  File "/usr/bin/easy_install-2.7", line 7, in
    from pkg_resources import load_entry_point
  File "/Library/Python/2.7/site-packages/pkg_resources/__init__.py", line 3020, in
    working_set = WorkingSet._build_master()
  File "/Library/Python/2.7/site-packages/pkg_resources/__init__.py", line 616, in _build_master
    return cls._build_from_requirements(__requires__)
  File "/Library/Python/2.7/site-packages/pkg_resources/__init__.py", line 629, in _build_from_requirements
    dists = ws.resolve(reqs, Environment())
  File "/Library/Python/2.7/site-packages/pkg_resources/__init__.py", line 807, in resolve
    raise DistributionNotFound(req)
pkg_resources.DistributionNotFound: setuptools==1.1.6
dmbp:~ dmedberry$ type -a easy_install
easy_install is /usr/bin/easy_install
easy_install is /usr/local/bin/easy_install
dmbp:~ dmedberry$ sudo /usr/local/bin/easy_install six

Friday, December 12, 2014

Ubuntu Sound Gotcha - Missing Devices

I was laid up at home for a day so instead of using my laptop as a laptop, I went ahead and "docked" manually to my dual screen gear (that has largely been attached to my "work" Macbook Pro.) I used the thunderbolt and hdmi connectors on the side of my Spectre XT (and it was the first time I've ever gotten video to work out of the t-bolt. WIN.) The monitors are Acers with DVI connectors so I'm using some monoprice cabling to convert to T-bolt and HDMI.

That setup will probably explain to some of you what went wrong... but sadly it did not initially to me. I was using the setup and listening to music/youtube etc. After some point in time (and crucially, I'm not sure when exactly), I lost my audio output. The sound notify/menubar/whatever device was still present and still indicated I had sound output, but nothing was being produced.

I entered into the sound settings and lo and behold, there was no device listed in the "Play Sound Through" panel of the output setting. (And I didn't even notice that right away as my eyes were laser focused on the completely zeroed out volume indicator at the top that wouldn't move. Of course it wouldn't move as there was no output device.)

I gradually pieced together what may have happened--I suspect at some point AFTER connecting the monitors, I either suspended/resumed or actually rebooted. It appears that something in the audio detection algorithms determined I should be using HDMI audio (however, these Acer monitors have no speakers.) And by gradually pieced together, I mean I futzed around with this for a couple of hours.

Once it dawned on me that the only thing that I had really changed was the monitors, I went ahead and removed them (which I had done several times during the futzing around) and shutdown entirely and rebooted. VOILA. "Built-in Audio" device reappears and I'm another happy camper.