Recursive Queries with MySQL

Discovered something neat with the new version of MySQL and thought it warranted a mention. Storing tree structures in a relational database is a common use case across many different areas of tech. The problem comes when you need to construct a query based on a subset of that tree.

But MySQL 8 has some nice new features that makes doing this a breeze.

For example, let’s assume you have a set of tables that look like this:

CREATE TABLE `files` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  `parent_id` bigint(20) NOT NULL,
  `kind` enum('file','folder') NOT NULL,
  PRIMARY KEY (`id`),
  KEY `parent_id` (`parent_id`)
);

CREATE TABLE `tags` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  PRIMARY KEY (`id`)
);

CREATE TABLE `tag_file` (
  `tag_id` bigint(20) NOT NULL,
  `file_id` bigint(20) NOT NULL,
  PRIMARY KEY (`tag_id`,`file_id`)
);

This is a pretty standard setup for storing a tree setup in a relational table, with the parent_id key referencing id of the parent. And this all works well … right up until you need to query a parent and all of it’s children. So let’s say you want to find all children that have a tag of ‘Important’.

MySQL 8 includes support for recursive common table expressions. Using this, this becomes a pretty easy query. You can create a CTE query and recursively call it! You could so something like this:

with recursive cte (id, parent_id) as (
    select id, parent_id from files where parent_id = ? and kind = 'folder'
    union all
    select p.id, p.parent_id from files p inner join cte on p.parent_id = cte.id where kind = 'folder'
)

select * from files inner join tag_file on (tag_file.file_id = file.id)
inner join tags on (tags.id = tag_file.tag_id) where kind = 'file'
and tag = 'Important' and parent_id in (select id from cte);

(That is a prepared statement, replace the ? with the ID of the parent.)

What surprised me was how fast that query is with the right keys. On a table that has nearly 600,000 items in it, that query completes in about 0.3 seconds. Slow, but considering the number of rows in the table quite fast.

Thanks to this post on Stack Overflow for the heads up.

Did something I wrote help you out?

That's great! I don't earn any money from this site - I run no ads, sell no products and participate in no affiliate programs. I do this solely because it's fun; I enjoy writing and sharing what I learn.

All the same, if you found this article helpful and want to show your appreciation, here's my Amazon.com wishlist.

Read More

Finding Multi-byte Characters in MySQL Fields

So I was recently helping a client with an issue in MySQL where a migration failed to transfer the full contents of some fields. This amounted to a little over 1% of the total messages transferred. In doing some research, we discovered that the one thing every message had in common was the presence of multi-byte (high unicode) characters. In many cases, this was due to a user pasting some text from Microsoft Word.

MySQL mathematical operations and NULL values

So I came across an interesting quirk in MySQL the other day. Let’s say you have a table schema and some values that look like this: +-------------------+------------------+------+-----+---------+-------+ | Field             | Type             | Null | Key | Default | Extra | +-------------------+------------------+------+-----+---------+-------+ | page_id       | varchar(30)      | YES  |     | NULL    |       | | clicks            | int(10) unsigned | YES  |     | NULL    |       | +-------------------+------------------+------+-----+---------+-------+ +---------+--------+ | page_id | clicks | +---------+--------+ | 1 | NULL | +---------+--------+ And then let’s say you pass the following SQL statement to MySQL: update page_click_count set clicks = clicks + 1 where page_id=1; If you come from a loosely-typed language such as PHP, you would probably expect clicks for page_id 1 to now be 1. But that’s not the case in MySQL. After the query is run, the table will still look like this: +---------+--------+ | page_id | clicks | +---------+--------+ | 1 | NULL | +---------+--------+ Not only does the query fail, but it fails with no warnings given. It appears that mathematical operations on null values silently fail. There are a couple of ways around this. The first and most obvious is to set NOT NULL and a default value on the column. In the example above, this would work. The NULL value in that field becomes a 0 and you can to normal mathematical operations on it. But what happens if, for whatever reason, you can’t do that? We actually have this situation in a few places at dealnews, where NULL represents a distinct value of that field that is different from 0. In this case, you can use COALESCE() to fill in the appropriate value for the field. update page_click_count set clicks = coalesce(clicks, 0) + 1 where page_id=1; Edit: Brian Moon informs me that this is actually part of the SQL specification. So hooray for specifications. Still, it’s kind of arcane; in working with MySQL (and PHP) for a decade now, this is the first time I’ve ever actually encountered this. Hopefully this helps someone who was as confused as I was.