The IT infrastructure at CERN is well managed and reasonably well documented, but sometimes in can still take a day or two to figure out how to get something up and running there. I needed to deploy a Flask application that only allowed access to registered CERN users, which meant talking to the single sign-on (SSO) service to authenticate users. After initially thinking this would be a simple task, I spent the standard two days cursing and scratching my head.
This post serves as a brain dump for what I found out, what problems and solutions I came across, and how to — eventually — get the thing working. At the end of this tutorial, we’ll end up with Flask application that allows users to log in using their CERN credentials.
This is a tutorial for setting up the system inside a CERN cloud virtual machine, and uses CERN’s internal authentication mechanisms. This means you will need a CERN computing account to follow along exactly. However, many of the steps here and the problems I found will be common to other setups with a Python web application, a WSGI server, Apache, and Shibboleth.
Super user requirements
Most of the work is done on the command line, and some of the commands require super user/root privileges, which can be gained by prefixing commands with
sudo when run by the user who made the VM.
If an entire block of commands requires such privileges, I will say so in the preceeding paragraph.
If only some commands in a block require it, I’ll add a comment just above the relevant command that applies only to the command below the comment.
So, watch out!
Creating a CERN virtual machine
To use this, follow the short “Getting Started” guide to get a CERN OpenStack account. The most important point for us is on the next page, “Before you start”, which points you to the CERN resources page so that you can activate the OpenStack service on your account.
Within around half an hour, you should be able to log in to the CERN OpenStack portal.
Once you’re in, create a new instance using the
m1.small flavour and choose to boot from the
SLC6 CERN Server - x86_64 [2014-08-05] image, or a similarly recent SLC6 CERN Server image if that exact one isn’t available.
The “instance name” you give the VM will correspond to the URL used to access it.
I’ll assume the VM is called
ssotutorial, which means you can access the machine at
ssotutorial.cern.ch but only inside the CERN network.
General external access isn’t allowed to cloud VMs for security reasons.
(You will need to call the VM something else, as names must be unique across the CERN network, so replace
ssotutorial with the name you’re using.)
Once the virtual machine is provisioned and running, which can take a little while, you should be able to log in with the account you used to create it.
This will probably ask you for a password, which is your regular CERN password.
Setting up the VM
From now on, all commands should be run on the virtual machine, with the above caveat about root privileges.
Update any old packages and reboot the VM, both as root, to ensure any updated kernel becomes the running one.
Log back in to the VM once it has rebooted.
As the root user, add a deploy user to handle running the application.
Unless it is stated to run commands with super user privileges, all the following commands will be run as the
Although I would rather deploy a Flask application with nginx, as I’m more familiar with it and prefer its configuration, Shibboleth support for nginx is not great, and trying it looks like it would be more painful than just going with Apache. The CERN documentation on SSL and SSO also deals only with Apache.
So, let’s install Apache and configure it to start automatically on boot. As root:
As Shibboleth must run over HTTPS which by default is on port 443, we must open that port in the firewall, which by default is pretty locked down. We will also open port 80 so that we can redirect visitors trying to visit over unencrypted HTTP. Again as root:
ssotutorial on port 80 should show the Apache test page.
To encrypt traffic between Apache and clients, we need an SSL certificate for the server. The CERN Grid Certification Authority issues host certificates to CERN users, so visit them and follow “New host certificate (requires certificate authentication)”. To get the certificate, we must generate a certificate signing request on the VM.
This will create two files,
The former is the certificate signing request, which you need to copy the contents of in to the field on the host certificate request form, and the latter is the private key.
The private key should be kept secret, as it is the proof that the server is who it says it is.
After pasting in the certificate signing request and submitting the form, download the Base64-encoded host certificate and certification authority (CA) certificate chain to the VM.
(You can download the certificates to your local machine and then upload them to the VM with
scp host.cert host-chain.p7b ssotutorial.cern.ch:.)
Convert the CA certificate chain to the
.pem format so Apache can read it.
Then move the host certificate, the CA certificate chain, and the private key we generated in to place as root.
Install the SSL Apache module
mod_ssl as root.
Edit the Apache SSL configuration in
/etc/httpd/conf.d/ssl.conf as root to point to the correct certificates, making sure the following lines are present and that the directives only appear once in the file.
(You can use, for example,
sudo vi /etc/httpd/conf.d/ssl.conf to edit a file with super user privileges.)
On CERN advice add the following line after the
LoadModule ssl_module modules/mod_ssl.so line in the same
Restart Apache as root.
Visiting the secure
ssotutorial on port 443 should show the Apache test page.
Unless you have the CERN Grid Authority CA certificate installed in your browser, you will get a warning about the site’s certificate being invalid due to the CA not being recognised.
With SSL set up and working, Shibboleth, part of CERN’s single sign-on stack, will allow itself to run. However, in order for the Shibboleth daemon in the server — which we’ll install shortly — to be able to talk to CERN’s user database, we need to add our application to the approved list, which requires the authorisation of CERN IT.
The SSO management page allows you to register a new SSO application, so go there and fill in the form with the application name like “SSOTutorial”, the application URI and homepage like
https://ssotutorial.cern.ch, and the application description however you like.
It took about 30 minutes for my application to be approved, but it could take longer outside working hours. In the mean time, we’ll set up our Flask application.
Flask and uWSGI
Flask is a web application framework written in Python. It comes with a development server, but for production we need something more robust that can handle things like concurrent connections and multiple worker instances. I chose to use uWSGI as it has a nice Apache module — that we’ll get to soon — and a nice configuration.
First, we’ll set up the environment the application will run in. In order not to pollute the global Python configuration as much as possible, we’ll install the application’s dependencies inside a virtualenv, which we’ll manage with virtualenvwrapper. This means installing pip, a Python package manager, which makes installing modules super simple.
Before we install Flask, the Python development package and a compiler must be installed in order to compile Flask’s C extensions.
Finally, make and set up the virtual environment for the application and create the file structure.
Whenever we work with the application, like running it, it must be done inside the
If you need to ‘reactivate’ it, do
__init__.py with an example Flask application that displays the time on the root URL
wsgi method is the one that the WSGI server will call, but you can test the application now using the Flask test server by running
(You won’t be able to see the test server, which by default binds to
127.0.0.1:5000, in the browser as port 5000 is blocked by the VM’s firewall, but you can do
curl 127.0.0.1:5000 in another session on the VM to see the page load successfully.)
To run the application behind a WSGI server, we’ll install uWSGI and also install Honcho to manage the processes.
Procfile in the root
ssotutorial directory and fill it with the uWSGI startup command.
--buffer-size option sets the uWSGI buffer to 32 kB.
The default size is quite small, as hinted at in the uWSGI things to know guide, and gets easily overloaded by the large headers used during the SSO procedure, which is explained in more detail later.
To run the uWSGI server, inside the
ssotutorial virtual environment and inside the root
ssotutorial folder run
At this point you might notice things getting cumbersome with one SSH session to the VM.
When testing I found it useful to have (at least) two sessions open: one exclusively dealt with the application, like running
honcho start, and the other was for editing other configuration files, restarting Apache, and so on, with super user privileges.
Apache and uWSGI
We’re at the stage where we can combine our previous efforts by getting Apache serving pages via uWSGI.
This requires us to install the
mod_proxy_uwsgi Apache module mentioned earlier.
We first need to install the Apache development files, then download the uWSGI source code, and finally build and install the module.
Have Apache load the new module by editing the main Apache configuration file
/etc/httpd/conf/httpd.conf to add the line
To test everything’s wired up correctly, we can temporarily add the proxy information to the bottom of the
VirtualHost block in
/etc/httpd/conf.d/ssl.conf, which needs to be edited as root.
This will send all traffic matching
/ and its descendants to uWSGI. We’ll create a configuration file specifically for the application in the
conf.d folder later.
Finally, the SELinux permissions need to be relaxed to allow Apache to connect to the proxy.
For all these changes to take effect, restart Apache.
Visiting the secure
ssotutorial on port 443 should now show the Flask page showing the time.
Fantastic! We’ve now got four major pieces in place: Apache, SSL, uWSGI, and Flask. The only thing left to do now is setting up Shibboleth so that we can authenticate our users.
By now the SSO application should have been approved, so we can proceed with installing and configuring Shibboleth. As root, install Shibboleth and its dependencies.
Edit the SELinux configuration from
permissive, as instructed by CERN, and manually apply the change now to avoid the need to reboot.
Next, download the CERN-specific Shibboleth configuration files.
Then, enable the Shibboleth service to run on startup and move the files we just downloaded in to place as root.
/etc/shibboleth/shibboleth2.xml and replace all instances of ` somehost.cern.ch
with the hostname of the VM — ssotutorial.cern.ch` — and make sure this exact line is present.
Restart Apache and the Shibboleth daemon as root for the changes to take effect.
Shibboleth installs an Apache configuration file at
/etc/httpd/conf.d/shib.conf, which sets up the URL
/secure to require a valid Shibboleth session.
If you try to visit this now, however, you’ll get a 404 error.
This is because the
ProxyPass rule we set up earlier, to proxy all traffic to the uWSGI, is overriding the Shibboleth rule, and our Flask application doesn’t have a route set up for
To test that Shibboleth is working, you can comment out the
ProxyPass line in
ssl.conf and restart the
Now when you visit the
/secure path, you should be redirected to the CERN single sign-on page.
If you login successfully, you’ll be redirected back to the
/secure path, which will give you a 404 error because there’s no document there for Apache to serve.
OK, so all the pieces work, now it’s time (finally!) to make them work together.
We want to allow users to log in with their CERN credentials to our Flask application via SSO. Once they’ve logged in, they should see the homepage displayed some information that’s been passed to the application by Shibboleth.
The first step is creating an Apache configuration file for our application in
I’ve put together an example configuration in a gist.
I won’t go through what each Apache directive we’re using does, but I’ve not used anything that wasn’t already in the
shib.conf files provided when we installed
mod_ssl and Shibboleth, respectively, so check them out for further details.
We can download the configuration file directly to the VM.
Read the file to get an idea of what’s going on.
You will need to edit the file to change the name of the host from
Because the settings in
shib.conf are defaults provided for us to base our own work off of, and we’ve now done that, we should stop them from being picked up by Apache.
There’s also a
welcome.conf file which is provided when Apache is installed.
As root, add a
.disabled extension to each file.
Restart Apache as root to reload the new set of configuration files.
The final step is changing our Flask app. All it does now it display the time, so we need to add some login buttons and display the information we get from the sign-in procedure. To make this change simpler, now is a good time to understand the authentication flow.
When Shibboleth is asked to provide authentication at a particular URL,
/login in our case, it checks to see if the user is already authenticated by inspecting a cookie.
If the cookie is not present or is invalid, by containing invalid information or by having expired, the user is redirected to the configured SSO URL (CERN’s SSO login page, in this case).
If the user can’t successfully authenticate, they won’t get past the SSO login page and will only be able to browse URLs not protected by Shibboleth.
If the user provides valid credentials, they are first redirected to the
This is a special route, as the Shibboleth daemon watches it for incoming requests.
An incoming request from the SSO login page consists of a POST’ed XML payload which is parsed by Shibboleth.
The client is then redirected again, this time back to the URL they initially tried to access, along with some headers containing user information provided by the SSO service like their name, username, email, and so on.
(These headers can contain quite a lot of information, which is why we had to increase uWSGI’s buffer size earlier.)
Within our Flask application, we can watch this login URL for incoming requests and extract user information from the headers.
Remember, Flask won’t see a request at
/login unless the user is authenticated, because Shibboleth will redirect them to the SSO login page if they are not.
We can store user information within the Flask
If this information isn’t present in the session, we can redirect users to
/login, initiating the login request, and store it when the user is eventually returned to
Our updated Flask application, available as a gist, now allows the user to log in and log out, storing and destroying their user information as they do so. It uses the Flask-SSO Flask extension to simplify the mapping of the headers from the authentication procedure to the user session object. Let’s install the Flask-SSO module, and overwrite the old application with our updated version.
Now inside the root
ssotutorial directory with the
ssotutorial virtualenv activated, start the uWSGI server again.
When we visit our application, not much has changed…
… but when we click the log in button, we’re redirected to the CERN SSO page, which redirects us back to the Flask app which displays our details. Cool!
So there we are. It’s been a long road, but a pretty neat result at the end! Once all the bits are set up, it’s just knowing how they fit together and what parameters to tweak.
Taking things from here is reasonably straightforward.
If you want to protect the whole site, change the protected location from
If you want to inspect what parameters are returned in the XML to the ADFS endpoint, you can take a look at the attribute map XML at
Feel free to ask any questions in the comments!