03/04/2020

AWS EC2 instance migration

Recently I received some complanings about load problems on an AWS EC2 t2.medium instance with CentOS 7, despite being a development environment it was under heavy load.
I checked logs and monitoring and excluded any kind of attack, after a speech with the dev team it was clear that the load was ok for the applications running (some kind of elasticsearch scheduled bullshit).

The load was 100% from cpu but I noticed some interesting behavior since a couple of weeks with a lot of steal load.

Looking to EC2 CPU Credits it was crystal clear that we ran out of cpu credits, which turned on some heavy throttling.

Since the developers can’t reduce the load from the applications and the management won’t move from EC2, the solution I suggested was to move to a different kind of instance specifically designed for heavy computational workloads and without cpu credits.

So I made some snapshots and launched a new C5 instance, piece of cake, right?
Well no… as soon as I started the new instance it won’t boot, and returned “/dev/centos/root does not exist” on the logs. :\

So what’s going on here?
Pretty simple, there are significant hardware differences between each type of EC2 instance, for example EC2 C type instances have NVMe SSD storage which require a specific kernel module, same for the network interface with ENA module.

The goal here is to make a new init image with these two modules inside, so during the boot the kernel could use these devices, and find a usable volume for boot and nic for network; the only problem is that we can’t simply boot the system using a live distro and build a new init image with those modules already loaded, remember we’re on AWS not on a good old Vmware instance (sigh…).

First of all I terminated the new instance, it was basically useless, and got back to the starting T2 instance.
Check which kernel version you’re using with “uname -a” and build a new init image including nvme and ena modules using mkinitrd, for example:

mkinitrd -v --with=nvme --with=ena -f /boot/initramfs-3.10.0-1062.18.1.el7.x86_64-nvme-ena.img 3.10.0-1062.18.1.el7.x86_64

Using lsinitrd you can check that your new init image has nvme and ena module files inside.

Now you have to edit your grub config file (/boot/grub2/grub.cfg) and change your first menu entry switching from the old init image to the new one.

Save /boot/grub2/grub.cfg file, CHECK AGAIN YOU HAVE A GOOD SNAPSHOT OR AN AMI, and reboot, nothing should have changed.

Now you can make a new snapshot or AMI and build a new instance from it, choose a C type instance and now it should be able to boot properly.

As you can see the new C5 instance have different storage device names, it has a new nic driver (ena) and it has ena and nvme modules loaded.

Life should be easier without the cloud… again.

13/08/2018

Quick check multipath status

Recently I had a huge activity in a customer’s datacenter, moving rack cabinets around for some works on the power supply lines. I love working on the hardware or inside datacenters, some people consider it a low profile work but I always found it very inspiring and it gives me so much satisfaction, sadly it happens rarely :( BTW, I managed to complete this tasks without shutting down anything and without any downtime thanks to power redundancy on almost any device (server, blade chassis, network or storage switch/device), a couple of 32A extension wires and a very precise action plan. Winner winner chicken dinner! One critical aspect was the storage, we had a lot of systems which extensively use SAN over FC interfaces, and some of the SAN FC switches had only one PSU, any storage path had redundancy but cutting down half of your storage devices on production systems require to be very careful and test everything. If you have a lot of servers with different environments (GNU/Linux, Windows Server and Vmware ESX) and you need to cut off and restore paths multiple times, you need to be very precise in checking paths status to avoid storage losses and potential data corruptions. Here is some quick hints to check your multipath devices on those environments, thanks to command line interface you’ll be able to check many systems with very few commands, save a lot of time and avoid a lot of headheaches.

GNU/Linux

On GNU/Linux checking multipath status is very easy, you’ll only need to run “multipath -ll” and you’ll get the status of each path for every multipath device on your server. Regarding HBA all you need to know is under /sys/class/fc_host directory where you’ll find one host* directory for each device, inside those directories you’ll find port_name and node_name with WWPN and WWN. With basic bash skills and ssh you can easily grab those information on each server, this is a trivial example.

Windows

The only requirement is the fantastic and free PsExec utility from Mark Russinovich Edit a text file with a list of all your server’s ip or hostnames, one per line (server.txt). PsExec @server.txt -e -u <USERNAME> mpclaim -s -d <DEVICE ID> If you want to see all the details (for example node number and port number) of your HBA launch Get-InitiatorPort command on a Powershell instance with superuser grants. PsExec @server.txt -e -u <USERNAME> powershell Get-InitiatorPort

Vmware ESXi

First of all you must enable ssh daemon on each Vmware host (follow this Vmware KB article), if you want to login with ssh keys follow this KB article. For checking multipath status you must run this command “esxcli storage nmp device list”, the output is quite verbose so it’s better to grab only the information we need adding a nice “| grep Working”, each line shows the paths for every datastore on the Vmware server. You can find WWN and WWPN with “esxcli storage core adapter list” As for GNU/Linux server you can easily cycle through your Vmware servers using ssh and bash to grab those information with a single script.

20/09/2016

Dell Latitude E7470

Finally I changed my working laptop, 8 years ago I switched from an old IBM ThinkPad R50 (yes! It was a true IBM ThinkPad!) to a T500 ThinkPad from Lenovo.

It was a good pc, not very powerful but sturdy, with a full size keyboard and so many options for upgrade like any other ThinkPad, a war machine!
Now the glorious T500 needs to retire, everything works but I need an SSD, the screen resolution was ridiculous, CPU and RAM were inadequate to run any virtual machine in local, so I started to look around for a new pc, these were the requirements:

  • CPU at least Core i5
    I don’t need a huge computing power beacause I don’t have to render or compile source (I usually spend most of my working time in an ssh shell) and I don’t want a Boeing 747 fan on my side and a heavy PSU.
  • RAM at least 8GB
  • SSD storage (I think I don’t have to explain why…)
  • Display resolution at least 1920×1080 (I don’t want to go crazy with external display for work)
  • 14″ chassis (I hate those horrible 15,6″ chassis with the imho useless numbers keypad)
  • Business line laptop

I started looking for a laptop with these requirements and I came to the Dell Latitude 5000 series, nice line, solid, realiable and with a great customer care (this is my experience with any Dell product, pc or server).
Sadly I had a bad experience with a Dell partner so I started to looking around for an alternative… but last week one of our historic wholesale providers started to sell Dell products and I found the shiny Latitude E7470 which fits perfectly into my requirements to an honest price… check, check, check!

spacey

So, here it is my brand new laptop, my first experience with a Latitude product.

My first impressions:

  • it’s thin and light (it’s branded as ultrabook although I don’t think it fits the Intel requirements for that) but it’s super sturdy!
  • the display is AWSOME! It fully deserves all the good feedbacks you can find online.
  • great I/O and options, It has 3 USB 3.0 ports (not bad for a thin laptop), two display output (mini DP and HDMI), it has uSIM slot and also an integrated smartcard reader.
  • nice storage performance (more than 500 MBps in sequential read and more than 250 MBps in sequential write) and I read It’s possible to install a second SSD on another slot.

The only complaint I had is about some keys (for example HOME and END keys which I use a lot) that need the FN key, and obviously the stupid Windows 10 scaling which blurs everything (but this is not a Dell problem).

And yes… I have to use Windows for now… :\

Here is the beast

e7470_1 e7470_2

30/04/2016

SSD galore!

I don’t know why, but I always had a bad feeling about Samsung products, every time I bought or tried one of them I always had  so many problems…

In december 2014 I gave to my brother a brand new Samsung 840 Evo SSD for his old MacBook Pro.
The original MacBook Pro hard drive was a crap, 5400rpm and really really slow, with this SSD it would take off like a rocket!

Everything was ok (except the stupid Apple policy regarding trim on ssd…) until the last february when the system became unstable, after some check I found the problem was the SSD.
I went back to the shop to start an RMA procedure, and finally yesterday (after almost two months!) they sent me a brand new 850 Evo with 3D V-NAND.

Let’s see how it works compared to my good old Crucial M4 (which is 4 years and 3 months old!).

This is the Crucial M4, keep in mind that on this SSD I run the OS (Windows 10 Pro) my software and games (Far Cry 4 and Eve Online atm),  I did nothing to preserve it’s lifetime and performance and it’s 75% full.

CrystalDisk_CrucialM4

as-ssd-bench M4-CT128M4SSD2 29.04.2016 22.29.11

This is the new Samsung 850 Evo with 3D V-NAND (clean and absolutely empty).

CrystalDisk_850EVO as-ssd-bench Samsung SSD 850 29.04.2016 22.20.38

The difference on write test is HUGE, access time is also impressive!
To be honest I did not expect these results from my old Crucial M4, it runs very well also after so many years and so many writings on his back, excellent product!

Let’s see if this new SSD will defeat my Samsung curse! ;)

22/04/2016

Time to upgrade

I can’t tolerate these Out Of Memory errors from the latest Call of Duty!
I can’t tolerate the thrashing each time I test something heavy on vmware!

Time to do some upgrade ad switch to 16 gigs of shiny new Corsair RAM for my gaming/testing platform!

ram1

ram2

« Post precedenti