17/06/2016

nmon-script

Nmon is wonderful, if you need to monitor you server resources in realtime it’s your tool, if you need to monitor resources statistics over time and save them it’s your tool, if you need to check what’s the status of your server’s resources in a precise moment it’s your tool.
I can’t imagine a scenario where you don’t need nmon, more useful and flexible than sar, simpler and more straightforward than any other web based tool, imho it’s the perfect companion for collectd.

Sadly during on my last server setup I noticed that the latest nmon package distributed by Epel repository lacks of all the cron scripts you need to automate nmon startup and data collection, which imho are very useful also if you get nmon directly from the official GNU/Linux project site.

Here’s some hints from the old packages, first of all create the /var/log/nmon directory with nobody user as owner.

nmon01

Create a new script in cron for example /etc/cron.d/nmon-script.
This cron will launch /usr/bin/nmon-script every day (for example at midnight).

nmon02

Now you have to create the /usr/bin/nmon-script file (remember to give execution permission) which has:

  • some configuration parameters in /etc/sysconfig/nmon-script
  • commands to kill, cleanup old files (disabled in the example, note the leading # at line 15)

nmon03

Create the /etc/sysconfig/nmon-script which contains some useful varibles (the directory where to save nmon archive files, retention and nmon options).

nmon04

That’s enough, at the next midnight nmon will start to save your resources statistics in /var/log/nmon/<hostname>_AAMMDD_0000.nmon files.

nmon05

You can download all the scripts and files to quickly setup the nmon-script:
nmon-script.tar.gz (656B)
SHA256 hash: 953667d8e2806e4858426fb000d7f3cfc898c53e26ffc7694bf2722442668aa8

[EDIT]

Nmon is not distributed by Epel but from RPMForge!!
Although RPMForge version is quite old it has nmon-script cron, I suggest to move to the latest version from the official GNU/Linux project site which do not have nmon-script.

18/04/2016

A quick update

Well, time has passed since the last update on this blog, it’s time to do a quick recap on some of my new year’s goals.

First of all I archived one of the most important and desired goals: swimming!
During these first 3 and a half months I went swimming almost two times a week, sometimes three, starting with 20 pools at low pace and raising up to the actual 40 pools in 40-45′; as I expected each time I go swimming I feel better and better, actually it’s the only thing makes me feel really good and the only weapon I have against my terrible working stress…

nuoto1

Talking about work I can’t deny we have huge emergencies during the last month, as I predicted (I repeat the same thing for years…) we had great problems on our biggest customer with some stupid custom applications deployed on a huge WebSphere Portal cluster.
Remember the KISS model? My company did the exact opposite, this application produced huge out of memory problems on the Portal jvm, I sent logs and begged developers to fix the huge amount of exceptions we collected but nothing changed  since the problem went really crytical; at the end they fixed the exceptions and made changes to the code and all returned to work normally.

Remember: if you are working on some big enterprise software meatball like WebSphere Portal DO NOT deploy custom applications on that product unless you are ABSOLUTELY sure of their quality!
Use some easy Tomcat or Jboss instance, hundreds of them if you need to scale out for a big workload, you will live better, spent an infinitesimal part of money and will get a better result.

And what about Eve?
Well I finally get a second account, I used the buddy program and made a brand new cyno/scout alt and I’m skilling for the biggest and most ambitious project since I started play: two jumpfreighter pilots!
Yes, it’s not a typo, I need two JF pilots for take my future Rhea in null space and also in hi-sec, so I need a second JF pilot in npc corp to fly safe and now lose this huge ship in some stupid war brawl…
Look at that beauty, isn’t it gorgeous?

Rhea4

30/01/2016

Vmware ESXi Embedded Host Client

I know there are many good hypervisors, some of them are free and are full of advanced features (someone say oVirt?) but if you want to work into virtualization you can’t ignore Vmware.

Don’t get me wrong, I like Vmware products and I use them every day on servers, on my lab workstation and also on my old duty laptop, but sometimes customers tend to be too much conservative and sticked to it.
For example I found many people who prefer to use free Vmware ESXi (without vcenter, vmotion or svmotion) insted other solutions (free or low cost) with all the advanced features that any server hypervisor must have.

One of the most evident limitations of the free ESXi is the client which require a Windows OS, fortunately there’s a wonderful free solution for that: ESXi Embedded Host Client.

The installation is really easy, first of all you must download the installation package (esxui_signed.vib) from the official site and copy on the ESXi host (you can use the datastore browser or copy via scp).
After that you must access ESXi using ssh protocol, and launch “esxcli software vib install” command

esxi01

That’s all, now you can open your browser to url https://youresxiserver/ui and…

esxi02

What?!?!?
Keep calm, if you’re using ESXi 5.5 prior to update 2 there are some known issues, this is one of them.
To solve it we must edit  /etc/vmware/rhttpproxy/endpoints.conf, but the file is locked (operation not permitted error), so we must copy it to a temporary location (for example /tmp), edit it and copy on the original path again.

esxi03

The only change you must to is to comment the line starting with /ui putting a # at the beginning (force write on exit with :x! vi command)

esxi04

Now copy the edited file into its original path with “cp /tmp/endpoints.conf /etc/vmware/rhttpproxy/endpoints.conf” and restart rhttpproxy daemon with “/etc/init.d/rhttpproxy restart”

esxi05

Now try to browse https://youresxiserver/ui/ url (don’t forget the trailing /, another bug in ESXi 5.5U2 and earlier versions) and…

esxi06

esxi07

Mission accomplished!

11/01/2016

Compress logs by date

Sometimes it happens to find a service which rotates logs but don’t rename them using an easy date format (for example logYYYYMMDD.log), that’s horrible if you have to archive those logs :\

Here is a simple bash script I use to compress them based file modification date, check comments for adapt it to your log names.

download: logmonth_1.0.sh
md5sum: 4b587eb3c2d9ac413d81a0bdc055c6cf

 

10/11/2015

Nagios check_oracle_health

Here we are with a new monitoring post, and remember, every day spent working on Nagios is always a great day! :)

This time I want to talk about an awsome Nagios plugin made by ConSol Labs named check_oracle_health.
As you can imagine this extension works on Oracle database and it’s really incredibly helpful for every sysadmin who works with this product, It’s super easy to implement, It’s super easy to understand and It’s super light and efficient compared to the monstrous official Oracle Enterprise Manager.

Assuming you already have a fully functional Nagios server (it’s not important which versions, I tried this plugin on version 2.9 until the latest) you can choose to install check_oracle_health on the Oracle server itself or on another server who has Oracle client with sqlplus installed (to be honest I haven’t tried this second scenario, but I think can work in the same way).
The plugin can work with perl DBD::Oracle or sqlplus client, in this tutorial I will use sqlplus.

First of all download the plugin tar.gz archive, decompress it and enter in its directory

1

After that procede with the classic configure+make+make install procedure like any other GNU/Linux software source, if you want you can change some options, try “./configure –help” for more informations.

2

3

4

Ok, now we have our plugin ready to work, try to launch /usr/local/nagios/libexec/check_oracle_health to verify it’s ok (check the path if you changed it during the configure phase).
Now the nasty part, as I said I will use sqlplus, which require you’ll set the right environment variables to work (NLS_LANG, ORACLE_HOME, ORACLE_BASE, PATH); you can find them logging the database user (for example oracle) and check the user profile (for example inside the ~/.bash_profile).

5

In our scenario we will use nrpe to remotely run our Nagios services, so we have to export this variables for nrpe daemon, to do this you can insert these variables inside the init script for the nrpe daemon (/etc/init.d/nrpe) or inside any incuded file (for example /etc/sysconfig/nrpe) or inside the unit file if you use systemd.

Now on the Oracle database we have to create a user for the plugin and give it the right grants, you don’t want it to use sys or system, don’t you?
Export your ORACLE_SID variable with the right SID, log into sqlplus and launch these commands (change [PASSWORD] with your supersecure password):

create user nagios identified by [PASSWORD];
grant create session to nagios;
grant select any dictionary to nagios;
grant select on V_$SYSSTAT to nagios;
grant select on V_$INSTANCE to nagios;
grant select on V_$LOG to nagios;
grant select on SYS.DBA_DATA_FILES to nagios;
grant select on SYS.DBA_FREE_SPACE to nagios;

8

Now let’s change the /etc/nagios/nrpe.conf file, the objective is to create a single nrpe command that will be useful for every service we will define inside Nagios configuration.
To archive this you can use this syntax, it uses command arguments so you need dont_blame_nrpe=1 directive inside the nrpg.conf file or arguments will not work.

command[check_oracle]=/usr/local/nagios/libexec/check_oracle_health --connect $ARG1$ --method sqlplus --user nagios --password [PASSWORD] --mode $ARG2$ --warning $ARG3$ --critical $ARG4$

The arguments are quite simple:

  • ARG1 is the SID of the database we want to monitor (check your tnsnames.ora file)
  • ARG2 is the specific check we will do with check_oracle_health (read the official documentation for a full list of modes)
  • ARG3 is the warning threshold (%)
  • ARG4 is the critical threshold (%)

9

Restart nrpe daemon to activate all the changes

10

Now let’s try if everythin works, on the Nagios server launch the check_nrpe plugin to simulate what Nagios daemon will do.
This is the syntax:

check_nrpe -H [host or ip address of nrpe server] -c [nrpe command] -a [list of arguments separated by space]

Remember arguments we defined inside the nrpe.conf file:

  • ARG1 is the SID, for example MYORADB
  • ARG2 is the specific check we will do, for example tablespace-usage
  • ARG3 is the warning threshold, for example 80%
  • ARG4 is the critical threshold, for example 90%

11

The last thing you have to do is to finally configure the nrpe service inside Nagios, here is an example of the syntax:

define service{
  use generic-service
  host_name uberoracle.domain.local
  service_description ORACLE tablespaces use
  check_command check_nrpe!check_oracle!MYORADB tablespace-usage 80 90
  }

Our shiny new Oracle monitor! (on an ugly old Nagios 2.9…)

12

« Post precedenti | Post successivi »