Set up the database

History

In this unit, we will be using the free database MariaDB, which is a clone of MySQL and was written by the same author - Michael "Monty" Widenius, from Finland. The history behind the naming is that Monty first created MySQL (named after his daughter, My) which became the most popular open-source database. He sold MySQL to Sun, who were in turn bought by Oracle - who are famous for their commercial database software. As a result, Monty created a clone of MySQL called MariaDB (named after his other daughter). MySQL and MariaDB are compatible in many ways and most of what we learn on this unit will apply equally to both of them, although both Oracle (for MySQL) and the open-source community (for MariaDB) have added new features to their databases.

Although we will be using MariaDB, in some places the name MySQL will still appear - most notably as the name of the command-line program that you use to access the database.

Install the database

Open your VM and install the database with the appropriate commands. On Debian that is:

$ sudo apt install mariadb-{server,client}

What's that funky squiggly bracket doing? Bash expands it to mariadb-server mariadb-client. Similarly if you typed: cc -o hello{,.c} it would get expanded to cc -o hello hello.c. There are loads of shorthand tricks for typing stuff quicker in a shell: try and pick them up as you go!

Question: What does echo {a,b,c}-{1,2,3} print? Try and guess before running it.

The mariadb-server package contains the server that stores the data and lets clients log in. mariadb-client is a command-line client (with the command name mysql); later on we will also use a Java client to access a database from a Java application.

The database server is now installed, but our system won't start it unless you ask it to. On Debian the service manager is SystemD. We can start the server running with:

$ sudo systemctl start mariadb

Check that the service is running with

$ sudo systemctl status mariadb
$ sudo journalctl -u mariadb

Set it to run by default with

$ sudo systemctl enable mariadb

Whilst Linux has more or less standardized on using SystemD as the service manager... it is unpopular in certain quarters. Other systems exist! Alpine Linux (which was the VM image we used to use) uses OpenRC. The BSDs mostly do it with a hodge-podge of shellscripts. Macs use something called launchctl.

The argument against SystemD is that it breaks compatibility with the POSIX OS standard and goes against The UNIX Way (do one thing well); that the developer Lennart Poettering is a bit controversial (soft skills are important!); and quite frankly the overreach of the project is incredible. As well as managing services it can also encrypt your home folder, manage WiFi connections, manage your log files and system name and much, much, more.

The arguments for it are that it is fast, gets rid of a bunch of janky shell scripts, and standardizes things in a way that makes sense for Linux. Linux distro's used to be much more diverse but nowadays the choice is mostly what package manager do you want to use and what desktop do you want by default.

For now SystemD seems to be what we've settled on for Linux. It's mostly fine once you learn it but do try a BSD system and see what it used to be like and if you prefer it!

Security

To log in to a database, you normally need a user account and an authentication factor (such as a password, or a private key). However, in the latest version, mysql user accounts are linked to system user accounts. You should probably secure it though. Running a public-facing database without security will end in databreaches and fines quicker than you can type metasploit.

The default set-up will allow anyone to log in and see some of the database, for example the test tables but this is not particularly secure. Most distributions come with a mysql_secure_installation script that sets up more secure access rights. Run it and set a password for the root user (otherwise it'll be the default root password or blank).

Creating a database

Right, you have a mysql server running! Lets connect it to a database! To create the database:

mysqladmin -u root -p create mydatabase

To connect to it:

mysql -u root -p mydatabase

That should drop you into a prompt! Congratulations! You have a running database.

Importing sample data

We have prepared some sample data that we will be using in this and the following weeks.

First, download the following two files and place them in the same folder as your Vagrantfile. This folder is important as the sample-data.sql script contains hard-coded absolute paths starting in /vagrant/ to import some of the data. You can download the files the same way as you did before with the secure setup file.

cd /vagrant
wget https://raw.githubusercontent.com/cs-uob/COMS10012/master/code/databases/sample-data.sql
wget https://raw.githubusercontent.com/cs-uob/COMS10012/master/code/databases/sampledata.tar
tar xvf sampledata.tar

This creates a folder sampledata with the files we need.

Load the sample data with the following command:

mysql -u root -p -e 'source /vagrant/sample-data.sql'

This pulls in some data in CSV files (have a look at the script if you want) and creates a default user "vagrant" who can log in to the database without a password, but can only read and not write the two sample databases "census" and "elections". There is another database called "data" which starts out empty, and "vagrant" can both read and write it.

It also seems to throw a bunch of errors and was left to us by the people who previously ran the unit. It's on our list of jobs to fix, but it seems to work anyway? YOLO. Pull requests appreciated.

You can now log in to the database and try the following:

  • mysql on its own logs you in, you now get the database prompt MariaDB [(none)]> to show that you haven't loaded a particular database yet.
  • show databases; on the database prompt gives you a list of databases which you have access too.
  • Select one with the USE command, for example use elections;. Your prompt should now read MariaDB [elections]>.
  • show tables; will show the tables in the selected database.
  • There's a Party table in the elections database, so select * from party; will show you the data.
  • You could also use describe party; to show a list of columns and types in the Party table.
  • Finally, exit or Control+D on a line of its own gets you back to the shell.

You can open the SQL and CSV files under /vagrant/sampledata to compare with the output you get from MariaDB. Study this until it makes sense to you. The setup.sql files contain the database schemas and the import.sql ones pull in the sample data.

On a lab machine

On a lab machine, to save disk space your VMs may not remain between reboots - and because they are not hosted on the network file system, if you log in to a different machine next time, your changes will not be saved either but you will get the VM reinstalled from scratch. To make sure the database is ready whenever you need it, open the Vagrantfile in a text editor and add the setup command to the provisioning script.