Ganglia Monitoring System

Ganglia web interface

]1 Ganglia web interface

In this note we are going to install the Ganglia monitoring system on a Virtual Cluster.

Gangila was initially developed at University of Berkeley. Is free software BSD License. It scales to multiple nodes and multiple clusters. O’Reilly has a book on it.

Ganglia is used by some of the largest sites on the web: Wikipedia and Twitter.

Architecture

Ganglia consists of three different services gmond, gmetad and gweb.

  • gmond
    lightweight process that monitors system ressources and broadcast them on the local subnet. Also it receives broadcast messages from the neihbouring nodes and makes them accessible for polling by gmetad. It is important to note, that only the current state is rembered by the gmond instances and no historical data.
  • gmetad
    Service that aggregates status information from multiple clusters. Per cluster it is sufficient to poll one gmond instance, since the state is shared among the nodes.
  • gweb
    web dashboard for gmetad nodes.

Architecture Scetch

     gmond                gmetad      gweb
     =====                ======      ====
  * <-----> * <---[poll]---> * <-------> *
  | Cluster |                |
  * <-----> *                |
                             |
  * <-----> * <---[poll]-----+
  | Cluster |
  * <-----> *

gmond

  • Gathers data from local system on an independent schedule.
  • Implication: System does not rely on external polling. Many independent poller can queery the cluster. E.g. gmond-zeromq publishes data on zmq bus.
  • gmond seems to run single threaded: (cf. ps -eLf | grep gmond)
  • Can be extended to report metrics provided by scripts in any language. Especially easy: C, C++, Python. gmetric tool provided.
  • Metics are shared between gmond nodes via multicast channels

gmetad

  • Polls the gmond daemons for data.
  • Stores historic data in Round-Robin Database
  • Provides raw data for web interface

Installation

Gangila Monitor

Installation via apt-get is a piece of cake:

ssh VLB1 sudo apt-get install ganglia-monitor

Now start the monitor daemon:

ssh VLB1 sudo service ganglia-monitor start

and test it is collecting metrics by typing in:

nc VLB1 8649

You should see an XML dump of the metrics in your terminal window.

Gangila Meta Daemon

We install the gemetad and the web frontend on the host machine

 sudo apt-get install gmetad

Now start the gmetad daemon by running

sudo service gemtad start

Test its functionality by running:

nc localhost 8651

it should respond with an XML representing the state of all connected nodes (i.e. none).

To get more elaborate information about the meta daemon run it from the command line with enabled debug information:

sudo -u nobody gmetad --debug=10

IP Multicast Setup

Ganglia uses multicast channels to connect different gmond daemons with each other.

It seems surprisingly difficult to get install and test multicast networking. First we need to check if multicast is supported by your kernel (should be) following Stackexchange one can use:

ip maddr show
cat /proc/net/igmp
netstat -ng

to display information about the multicast configuration. Another very helpful source http://sorcersoft.org/resources/notes/multicast.html

We make sure the mutlicast packages are sent over the right ethernet interface by adding the following route:

ssh VLB1 sudo route add -net 224.0.0.0 netmask 240.0.0.0 dev eth0
ssh VLB2 sudo route add -net 224.0.0.0 netmask 240.0.0.0 dev eth0

Ganglia Web Frontend

Ganglia provides a nice php web-site that visualizes the data aggregated by gmetad. Installation and start of the application are rather easy:

sudo apt-get install ganglia-webfrontend
sudo cp /etc/ganglia/apache.conf /etc/apache2/sites-enabled/ganglia
sudo service apache2 reload

Remark: The apache.conf file is a single line:

Alias /ganglia /usr/share/ganglia-webfrontend

Now, you should be able to open the webfrontend by opening the url: http://localhost/ganglia on your host machine.

Configuration

Gangila Monitor

We have two virtual nodes VLB1 and VLB2 running gmond daemon and share their metrics on a multicast channel over the virtual network. To make gmetad aware of those nodes edit the /etc/ganglia/gmetad.conf to contain the following line:

 data_source "Virtual Cluster" 1 VLB1 VLB2

Now restart the gemtad daemon, eg. using

 sudo service gmetad restart

and you should be able to see two virtual machines in the web frontend.

Debugging

Odds are, that something went wrong along the way, to get a better understanding of the problem start the daemons from the command line:

 sudo -u nobody gmetad -d 10

 # on the VMs
 sudo gmetad -d 10

Extensions

There are three different ways to extend ganglia by customized metrics.

  1. Using gmetric tool
  2. Including modules in C/C++
  3. Including modules in Python (via mod_python module)

The gmetric tool allow to set specific values to metrics:

gmetric --name="my_metric" --value="18" --type=int32

It does not, however, allow the repeated execution of a specific script scheduled by the gmond daemon but has to be triggered by an extrenal process like cron.

Crontab

We can add the following line in crontab -e to monitor the size of your www folder every minute

# m h dom mon dow command
* * * * * gmetric --name="size_www" --type=int32 --value=`du -s /var/www | cut -f1`

To see, if this script is executed use

tail -f /var/log/syslog | grep CRON

You should see messages like

Dec 27 12:51:01 VLB CRON[4136]: (user) CMD (gmetric --name="size_www" --type=int32 --value=`du -s /var/www | cut -f1`)

appear every minute. If another line

Dec 27 12:57:01 VLB CRON[4297]: (CRON) info (No MTA installed, discarding output)

appears next to it, then something went wrong.

Catches

  • Crontab uses a different execution environment, then the login shell. To test the environment use something like:

    * * * * * env > ~/cron-env.txt
    

    In my case cron was using a different shell (dash) and the path variable did not contain the current directory (“.”). Therefore environment variables (like $RANDOM) were not working as intended and I as not able to run scripts in my home directory directly.

  • Crontab sends stdout and stderr of the scripts via email. If you dont have an MTA like postfix installed, you will not be able to see the output of your scripts. Solution:

    • install an MTA
    • redirect output to a log file by appending 2>&1 >> ~/cron.log to the crontab line.

Current Setup

My crontab has a single entry that runs a script

# m h dom mon dow command
* * * * * ~/ganglia-metrics.sh 2>&1 >> ~/crontab.log

Note, that the script is called using it’s full path and the output is redirected to a log file. The ganglia-metrics.sh script looks as follows:

#!/bin/bash
GMETRIC=/usr/bin/gmetric

echo `date` "- executing ganglia-metrics.sh"

$GMETRIC --name="size_www" --type=int32 --value=`du -s /var/www | cut -f1`

# some more dummy metrics ...
$GMETRIC --name="date" --type=int32 --value=`date +%s`
$GMETRIC --name="rand" --type=int32 --value=$RANDOM

Note, that the script uses a shebang ‘#!’ in order to be executed by the bash shell.

More examples can be found on github. See https://github.com/vvuksan/ganglia-misc/tree/master/gmetric-python for a python implementation of gmetric.

Python modules

Ganglia can be extended by python modules. In contrast to the gmetric method explained before, these python modules are executed by gmond and do not have to be scheduled by a cron job.

To enable python modules one has to load the python module wrapper as a module. You can see all installed native-modules using:

ls -l /usr/lib/ganglia

Unfortunately the preinstalled gmond.conf version does not include a configuration template, even though the modpython.so file is provided. We have to add the following lines into gmond.conf (cf. https://bugs.launchpad.net/ubuntu/+source/ganglia/+bug/694208):

modules {
    module {
       name = "python_module"
       path = "/usr/lib/ganglia/modpython.so"
       params = "/usr/lib/ganglia/python_modules"
    }
}

include ('/etc/ganglia/conf.d/*.pyconf')

Now run

sudo mkdir -p /usr/lib/ganglia/python_modules /etc/ganglia/conf.d

to create the directories if necessary. Use

sudo gmond -m -d 10

to verify the module is loaded correctly. (You shoud see loaded
module: python_module
at the beginnig followed by no error messages).

Install example python metric

Before we write our own python metric we install the ‘disk_free’ metric from github by Vladimir Vuksan

curl https://raw.github.com/ganglia/gmond_python_modules/master/diskfree/python_modules/diskfree.py | sudo tee /usr/lib/ganglia/python_modules/diskfree.py
curl https://raw.github.com/ganglia/gmond_python_modules/master/diskfree/conf.d/diskfree.pyconf | sudo tee /etc/ganglia/conf.d/diskfree.pyconf

Check that everything was is working fine by running, e.g.

sudo gmond -m -d 10 | grep disk_free

Start gmond again and you should see disk_free metrics in the web interface.

Write our own module

Now, that we know the python module infrastructure works as expected, lets write our own:

cat << EOF | sudo tee /usr/lib/ganglia/python_modules/example.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-

def get_value(name):
    """Return a value for the requested metric"""
    return 17

def metric_init(lparams):
    """Initialize metric descriptors"""

    # create descriptors
    descriptors = []

    descriptors.append({
        'name': "example",
        'call_back': get_value,
        'time_max': 60,
        'value_type': 'float',
        'units': '%',
        'slope': 'both',
        'format': '%f',
        'description': "example metric",
        'groups': 'example'
    })

    return descriptors

def metric_cleanup():
    """Cleanup"""
    pass

# the following code is for debugging and testing
if __name__ == '__main__':
    descriptors = metric_init({})
    for d in descriptors:
        print (('%s = %s') % (d['name'], d['format'])) % (d['call_back'](d['name']))
EOF

save this script in your python modules directory and test its functionality using:

 python /usr/lib/ganglia/python_modules/example.py

Now add the python module to gmond configuration using e.g.

cat << EOF | sudo tee /etc/ganglia/conf.d/example.pyconf
modules {
    module {
        name = "example"
        language = "python"
    }
}

collection_group {
    collect_every = 10
    time_threshold = 180
    metric {
       name_match = "example"
    }
}
EOF

For more information see the official docs.

Why openness benefits research

The following text is jointly authored by David Shotton (david.shotton@zoo.ox.ac.uk) and Heinrich Hartmann (hartmann@uni-koblenz.de). Cf.  OpenCitations.net - blog.

Transparency is essential for trust and credibility in the research community, and true openness brings great opportunities for academia. The internet facilitates the free flow of information and knowledge, and permits new forms of communication both for researchers and for the general public. Already, today’s children can listen freely on the internet to university courses taught by world-leading scientists, and everybody has the best encyclopaedia ever written (Wikipedia) at their fingertips.  These are real game changers. Opening up the research literature is the next logical step.

Open publishing

We believe that the current academic publishing model – whereby researchers give their content to commercial publishers and then buy it back from them at enormous cost by means of journal subscription fees – has become absurd, since it is no longer helping the researcher to distribute his or her findings, but rather prevents the work from being widely read, by hiding it behind subscription pay walls.  Would it not be much better to let this information flow freely, accessible to everybody who wants to read it!

Of course, such a vision of openness for academic publishing raises issues of finance and quality control – who will pay for open access publishing, and how can we ensure that scientific rigor accompanies open publication.  While the internet enables dissemination of information at a fraction of the cost of traditional print publication, publishing clearly involves more than electronic dissemination.  It is for this reason that we, with others, are presently planning a high level conference on modern scientific communication, entitled

Rigor and Openness in 21st Century Science,

to be held in Oxford next spring.

However, new publication funding models are being developed, particularly in the United Kingdom, where Research Councils UK and the Wellcome Trust are insisting that papers reporting research results obtained as a result of their research funding should be published under an open Creative Commons CC-By attribution licence when an article processing charge (APC) is levied, so that the works are freely available for text mining and re-use [1].  What is significant is that they are backing their words with funding to enable it.  Cameron Neylon has recently written a commentary in Nature about the importance of this [2].

Furthermore, peer review is being carefully examined by several forward-looking publishers to determine how well open alternatives to the present system of confidential review actually work.

The role of social media in science

Much academic research is done in relative isolation, because topics have become so specialized that there may be only a few experts in the whole world who really understand each particular research problem.  These experts may be located on different continents, and may not know about one another – a situation that is particularly true for Ph.D. students and other young researchers, who may not yet be familiar with the literature in their field, and who may have formed few personal relationships with colleagues in other institutions through attendance at research conferences.  New forms of academic social media can play a role here, to catalyse interactions between geographically separated academics, and many experiments in this area are being conducted.

Academic social media can also play an important role in filtering the wealth of new articles published every day, and in alerting people to the small fraction of these that are most relevant to them.  Typically, junior researchers rely on recommendations from friends and colleagues about which articles are worth reading, but if academic social media can be used to broaden this recommendation network, they will provide a significant service.

Fears and benefits of openness

Of course researchers, particularly early in their careers, are cautious about sharing their discoveries too early or too widely, for fear they may get ‘scooped’, since they naturally and quite properly wish to obtain credit for their own work by being the first to publish it.  However, what is often missed by people of this mind-set is that working openly with other people can have benefits too.  It can be a lot more fun, can lead to more sustainable motivation, can result in incredibly rapid collaborative progress, and hence can often lead to better results.  An essential pre-requisite for this is the willingness to share one’s ideas and making contact with like-mined people.  An example of a researcher who practices openness in his day-to-day research is Georgio Gilestro, Lecturer in Systems Neurobiology with the Department of Life Sciences at Imperial College London, who publishes his research group’sOpen Lab Book online.

Our personal experience, not at least in the joint Open Citations and Related Work developments described in the next blog post, is that you gain more than you loose by being open!

Reference

[1]       Wellcome Trust announcement: Open access: CC-BY licence required for all articles which incur an open access publication fee – FAQ. Available fromhttp://www.wellcome.ac.uk/stellent/groups/corporatesite/@policy_communications/documents/web_document/WTVM055715.pdf.

[2]       Cameron Neylon (2012). Science publishing: Open access must enable open use. Nature 492: 348–349.  doi:10.1038/492348a.

DJA Meisenheim 2012

This year I gave a course on “Efficient Algorithms” for pupils of age 13-15 at the Deutsche Schuelerakademie in Meisenheim.
Part of the challenge for the students was to write a 25 page documentation of our coursework. We decided to use a wiki to organize the writings. The result can be found here: Documentation.

Education 2.0

University and School education has not changed much since a few hundred years. Basically it is still like this: Teachers lecture to a class, pupils/students have to sit quietly and write down what has been said. Exercises are done at home. While everybody agrees the system is not very good –  it is believed to be the best one available. But is this still true?
In addition to that, education is unfair. I had the privilege to be raised in an academic family and had access to good education in my country. Not everybody is so lucky, and our system makes it currently very hard for “outsiders” to enter: You will not go to the German-Gymnasium if education is not valued by your parents. You will not go to Harward if you come from a bad school – even if you are very interested and hard working. Even worse, if you were born in rural Africa, India or China you can be as smart as you want, as interested as you want – you will NEVER have a chance to get good education.
The internet offers revolutionary new opportunities for everybody:
  • Publish text, audio and videos to the whole world
  • Talk to people around the world for free using Chat, Skype or Google-Hangout
  • Work together using Wikis, Forums or Q&A sites
These new possibilities are real game changers, and society is only slowly starting to embrace this new playground.
How will education transform in this new environment?
Here are some exciting recent developments!

Stanford University offered a free online course on Artificial Intelligence. But they did not just videotape their lectures. No, the material is presented in videos of 10minutes length in which you only see pen and paper and here the talking from behind (YouTube). In addition there is auto-graded homework each week and the students get a certificate when passed the online-exam.
The project went nuts! The initially expected 1000 students signed up the first day. A  total of 100.000 students from 190 countries including India, South Korea, New Zealand and the Republic of Azerbaijan followed the course. Moreover, the material was translated into 44 languages, including Bengali (according to wired.com). The online students discussed questions over the internet, and met for problem sessions in Google Hangouts!

Following the idea MIT launched its online initiative MITx. “MITx will offer a portfolio of MIT courses for free to a virtual community of learners around the world.”

But the concept itself is already quiet a bit older. Khan Academy is an online school and university with over 1m students which has been out there for a while. There is a very inspireing TedTalk by the founder Salman Khan available, which I highly recommend to watch. He explains, for example, how you can swich the classroom concept:

Students take video lectures at home – with the ability to pause them and jump back to a earlier point without emberassing themselves or annoying the teacher.
The homework in done in class! The teacher is merely a coordinator which sends good pupils to help the weak and occaisionally explains something himself.

This is already being done. With huge success. Indeed, I already tried out something similar at a course at the DSA which I ran with my friend Rene - and it worked.

I am very excited to see how education in schools and university will evolve.