School of Code Part 1: Squirrel (SQL)

My brother-in-law recently moved into a position where he needs to pick up some systems programming basics. I could write him long emails about which references to check out, or I could just write the info here, so you too can benefit.

For starters, my brother-in-law needs to learn the basics of SQL for reporting needs. SQL (or Squirrel, as my wife calls it), is a standarized database querying language.

To understand SQL, you need to understand a database. A database is a system that organizes data, like a spreadsheet sort of, and provides powerful tools to add, change, and inspect that data.

Almost every website you interact with is essentially a database, with software in front of that database to make pretty web pages to push and pull data to and from the database. Facebook? A huge database that organizes people, their photos and posts, and their connections with other people. This website itself is essentially a database, and some code. The code being wordpress, which is a PHP program that knows how to take data you send from your browser and stick it into the database, then later retrieve that data and put it in some nice templates (or themes, if you will) for your browser.

Databases are for much more than websites though, databases are like one better than a spreadsheet, and they’re great for looking at, manipulating and reporting on data.

There are many database systems to choose from, when I work with my brother in law we’ll be toying with a popular free option: mysql. There are other free options such as postgres, as well as tiny/limited options like sqllite. There are, of course, paid options for serious or large database needs, oracle and mssql are popular variants. Microsoft office comes with a database program called access, which is fairly popular. And there are new and upcoming “nosql” (a fancy/trendy term for less-structured, which gives certain benefits in certain applications over rigidly structured options) database solutions such as cassandra, mongodb, couchdb, etc etc.

If you’re not creating software, you’re likely looking at using either mysql or postgres if you’re lucky, or perhaps access or mssql if you aren’t.

Like a spreadsheet has multiple sheets in it, a database can have multiple databases in it, creating one with SQL is easy:

  2. GRANT ALL ON *.* TO 'jason'@'localhost' IDENTIFIED BY 'somepassword';

Here, we’ve created a database called ‘wordpress’. It’s empty, and a user called ‘jason’ with password ‘somepassword’ can do whatever he likes with it.

A database has something called a table, which is a collection of columns, sort of like a single spreadsheet. So we can think of a database as a collection of databases, and each database being a collection of tables.

A table, like a single spreadsheet, can be as organized or disorganized as you like. Here’s how we create a very simple table:

  1. CREATE TABLE blog_entries (id LONG, entry LONGTEXT);

What we’ve done here is create a table with two columns of data. One column stores a number id for the row in the table, and the other stores a blog entry’s text.

Putting some data into this table is simple:

  1. INSERT INTO blog_entries (id, entry) VALUES (1, 'my first entry');
  2. INSERT INTO blog_entries (id, entry) VALUES (2, 'my second entry');

Now we’ve put two simplistic blog posts into our database, and we can easily work with our data like so:

  1. SELECT * FROM blog_entries;

This will print out all of the data and columns for the table:

id entry
1 my first entry
2 my second entry

Or we could do something like this, to count how many entries have the word “first” in them:

  1. SELECT count(*) FROM blog_entries WHERE entry LIKE '%first%';

We could next create a second table called “comments” with data like so:

id blog_id comment
1 1 my first comment
2 1 my second comment

Here we have two “id” type columns, the first is actually a unique id for each comment row, which is a good idea, always. The second id is what’s called a foreign key, and matches a blog row’s id. In this case both comments have a blog_id of 1, so both comments are associated with the first post we made.

Using more complex SQL we could pull associated data out of the database easily, which is exactly what wordpress is doing when it’s making html for your browser based on stored blog and comment data. SQL does everything your spreadsheet does, it can count, sum, sort, pull data from differing tables into one view that looks like a new table, and more.

The power of SQL and databases over spreadsheets and custom data organization methods really comes with the ability to search for and update, delete, or display only the data you need and disregaurd the rest. You can easily select a small subset of a table’s columns to just extract that data, or we could join data from 4 seperate tables for a particular report.

SQL, like all worthwhile programming topics, is easy to get into and get something done with, but mastery takes time. For starters, I would highly recommend downloading mysql and installing it on your computer, and seeking out some tutorials on the web or some great books to get you started.

The easiest way to get install mysql on your system as a beginner is to go find some LAMP variant. LAMP is a pervasive software bundle that stands for linux (an operating system), apache (web server software), mysql (the database), and php (programming software). Perhaps mysql is all you’ll need, but some of the best tools available will want you to have the apache and php pieces too.

Once you have mysql installed, I’d highly recommend the following resources for learning SQL:

Know that there are awesome tools to make SQL interaction easy:

  • Squirrel SQL is a great GUI tool that works on windows.
  • Sequal Pro is another great GUI tool, this one for OS X.
  • phpmyadmin is a PHP program you can run on your computer to manage your mysql databases in a browser.

When you get stuck (perhaps when you learn about and learn to fear the word “JOIN”), I find, like with regexes, it’s often easier to break a problem down into small pieces and build up to the right answer rather than come up with the fancy ‘right’ (read: complex) solution from the get-go. Generally speaking, if you can come up with something that works, that’s usually more important than something that’s perfect or awe-inspiring.

Later, I’d recommend that you learn about and at least grasp the following concepts, before you hurt yourself:

  • Know how to back your database up and restore it. There’s usually a command line tool to do this
  • Know what autoincrement, indexes, CRUD, data relationships, views, normalization, triggers, stored procedures, and transactionality are.
  • Understand database security issues, particularly SQL injection and permission problems.
  • Find tools, and programming languages to automate your work.
  • Know that the SQL “standard” has quotes around it for bad reasons, those being that every database bastardizes the language this way or that, you must try to avoid this at all costs, b/c your next job may not have mysql and may insist on access (heaven forbid..).

When you really get going, I’d highly recommend that you get comfortable with a programming language that can easily talk to your database of choice. Java, PHP, Ruby, C#, Python, C++, etc all have great built-in or third party libraries available.

From experience I can say for quick and dirty stuff, PHP has mysql built right in, and Java + Spring JDBC works for most everything else for me. There are also popular ORM (object to database mapping) solutions out there like Hibernate, if you want to make things more difficult for yourself. Also, Liquibase is awesome for managing/abstracting database schema changes.

Thanks to Alex Moffat and Eric Evans for their help in teaching me and helping me to understand these topics in the past.