Skip to content

Laravel & Recursive, Relation-Constrained, Tree-Shaken Queries

Disclaimer

The syntax for recursive SQL CTE queries varies depending on what you are using. While this will probably work with other DB engines, I've only used this with PostgreSQL. I am also using a PostgreSQL-exclusive SQL statement for optimal efficiency; however, I do note an alternative for non-PostgreSQL servers as well.

Use Case

Let's say we have a Category model with self-referential Parent & Children relationships set up. In addition, there is also a hasMany relationship set up for a Video model. So maybe we have some categories set up with their own sub-categories, and those sub-categories also have sub-categories, and those sub-sub-categories have a video.

Now, let's say we need to fetch only the categories that have at least one published video. That's easy enough - but if we're building out a hierarchical tree, we'll need to also query the ancestral tree all the way up. While this can be done in a few different ways within Laravel, it's typically not very efficient and requires multiple individual queries to be made.

With a recursive CTE (Common Table Expression) query, we can fetch only the required Categories to complete the hierarchy tree in a single query, resulting in much more efficient computation cycles for larger datasets which is my professional assumption... use at your own peril.

The Query

Dataset Example

What we're essentially going for here is to tree-shake the entire group of Categories so that only the following remains:

  • Categories that have at least one video
  • OR Categories that have a descendant at any depth with at least one video

Given the use case above, the visual representation of the categories is as follows:

Original

  • Example 1 (unpublished video ❌)
    • Example 2
      • Example 3 (published video ✔️)
    • Example 4
    • Example 5 (published video ✔️)
      • Example 6
      • Example 7
        • Example 8 (unpublished video ❌)
          • Example 9
        • Example 10 (published video ✔️)
  • Example 11
    • Example 12 (published video ✔️)
  • Example 13 (published video ✔️)
  • Example 14 (unpublished video ❌)


Tree-shaken

  • Example 1
    • Example 2
      • Example 3 (published video ✔️)
    • Example 5 (published video ✔️)
      • Example 7
        • Example 10 (published video ✔️)
  • Example 11
    • Example 12 (published video ✔️)
  • Example 13 (published video ✔️)

The problem here is that there is no limit to how many levels deep a given Category's descendants might go. In practice, I'd imagine the upper limit for most applications would be about 10 levels deep (and that would be a lot of levels), but the idea of having to know the maximum depth of sub-Categories for a given dataset at the time of executing the query feels like an anti-pattern. Instead, I wanted a solution that would work dynamically, regardless of the depth of subcategories we're dealing with.

Enter PostgreSQL CTE

Before looking into this, I hadn't worked with recursive queries before. I recommend reading about them more in-depth on PostgreSQL's website. Essentially what this does is utilize temporary tables for just this query while recursively executing the SQL statements specified within.

Our query will look something like this:

sql
WITH RECURSIVE Hierarchy AS (
	-- Anchor member: Categories directly related to videos OR the root category for tree
	SELECT c.*, v.id AS video_id
	FROM categories c
	LEFT JOIN videos v
		ON c.id = v.category_id
	WHERE v.published IS NOT NULL

	UNION

	-- Recursive member: Categories that are descendants of the directly related categories
	SELECT c.*, h.video_id
	FROM categories c
	JOIN Hierarchy h ON c.id = h.parent_id
)

-- Final select: Retrieve all categories in the hierarchy
SELECT DISTINCT ON (h.id) *
FROM Hierarchy h
WHERE video_id IS NOT NULL

Lets break down what's happening here.

The Common Table Expression

sql
WITH RECURSIVE Hierarchy AS (
	-- Anchor member: Categories directly related to videos OR the root category for tree
	SELECT c.*, v.id AS video_id
	FROM categories c
	LEFT JOIN videos v
		ON c.id = v.category_id
	WHERE v.published IS NOT NULL

	UNION

	-- Recursive member: Categories that are descendants of the directly related categories
	SELECT c.*, h.video_id
	FROM categories c
	JOIN Hierarchy h ON c.id = h.parent_id
)

-- Final select: Retrieve all categories in the hierarchy
SELECT DISTINCT ON (h.id) *
FROM Hierarchy h
WHERE video_id IS NOT NULL

The first line is what tells the SQL server that we're doing a recursive CTE.

Selecting the initial Categories

sql
WITH RECURSIVE Hierarchy AS (
	-- Anchor member: Categories directly related to videos OR the root category for tree
	SELECT c.*, v.id AS video_id
	FROM categories c
	LEFT JOIN videos v
		ON c.id = v.category_id
	WHERE v.published IS NOT NULL

	UNION

	-- Recursive member: Categories that are descendants of the directly related categories
	SELECT c.*, h.video_id
	FROM categories c
	JOIN Hierarchy h ON c.id = h.parent_id
)

-- Final select: Retrieve all categories in the hierarchy
SELECT DISTINCT ON (h.id) *
FROM Hierarchy h
WHERE video_id IS NOT NULL

In the next chunk we're querying for Catagories that have at least one published video. You'll notice that in additional to selecting all the columns in for the Categories, I've added on a video_id column; with this column added it creates a unique trail from a given video's related Category down that category's furthest ancestor.

Note that it actually doesn't matter if the video_id is for a published video or not - we're already specifying that the category needs to have a published video in the WHERE clause, so the value of video_id itself only needs to remain constant in the trail. It's a very minor thing, but I like to imagine it adds to the overall efficiency of the SQL query but honestly I have no idea if it really matters.

Chaining their ancestors

sql
WITH RECURSIVE Hierarchy AS (
	-- Anchor member: Categories directly related to videos OR the root category for tree
	SELECT c.*, v.id AS video_id
	FROM categories c
	LEFT JOIN videos v
		ON c.id = v.category_id
	WHERE v.published IS NOT NULL

	UNION

	-- Recursive member: Categories that are descendants of the directly related categories
	SELECT c.*, h.video_id
	FROM categories c
	JOIN Hierarchy h ON c.id = h.parent_id
)

-- Final select: Retrieve all categories in the hierarchy
SELECT DISTINCT ON (h.id) *
FROM Hierarchy h
WHERE video_id IS NOT NULL

The next SELECT within the recursive statement is the part that strings together the hierarchal trail; if a given video belongs to a root Category (i.e. one that has no parent_id), then this SELECT will return nothing. However if the Category does have a parent_id, then will return that parent. This is the key recursive element of this query; it works regardless of how deep the hierarchy is on a give Category trail.

Returning a clean dataset

sql
WITH RECURSIVE Hierarchy AS (
	-- Anchor member: Categories directly related to videos OR the root category for tree
	SELECT c.*, v.id AS video_id
	FROM categories c
	LEFT JOIN videos v
		ON c.id = v.category_id
	WHERE v.published IS NOT NULL

	UNION

	-- Recursive member: Categories that are descendants of the directly related categories
	SELECT c.*, h.video_id
	FROM categories c
	JOIN Hierarchy h ON c.id = h.parent_id
)

-- Final select: Retrieve all categories in the hierarchy
SELECT DISTINCT ON (h.id) *
FROM Hierarchy h
WHERE video_id IS NOT NULL

The results of the recursive CTE will potentially have multiple instances of a given Category; the higher the Category is in the hierarchy, the more likely it will have duplicated rows in the dataset here. That's because for each video, we'll get the full Category hierarchy. So if a given Category has multiple descendant Categories with a video, it will have as many duplicate rows in the returned dataset.

Obviously, we don't want a bunch of duplicate rows in the dataset, so we'll need ot purge them. Thankfully, PostgreSQL has a cool SELECT DISTINCT ON(COLUMN) construct. While SELECT DISTINCT will select unique rows of data, SELECT DISTINCT ON(COLUMN) will only consider that column for distinct results.

When paired with WHERE video_id IS NOT NULL, the final dataset returned is exactly what we want; every Category needed to build a Category-with-video(s) hierarchy tree and no duplicates.

idparent_idnamevideo_id
1"Example 1"1
21"Example 2"1
32"Example 3"1
51"Example 5"2
75"Example 7"3
107"Example 10"3
11"Example 11"4
1211"Example 12"4
13"Example 13"5

However, of course this data is flat - In order to get an actual hierarchy tree, we'll need to do some restructuring.

Tip: For Non-PostgreSQL Servers

INFO

If you're using PostgreSQL, you can skip to the next section

As of this writing (February 2024), the SELECT DISTINCT ON() is exclusive to PostgreSQL. However, if you adjust the final SQL statement to a normal DISTINCT, you'll get results like this:

sql
SELECT DISTINCT *
FROM Hierarchy gh
WHERE video_id IS NOT NULL
idparent_idnamevideo_id
1"Example 1"1
21"Example 2"1
32"Example 3"1
1"Example 1"2
51"Example 5"2
1"Example 1"3
51"Example 5"3
75"Example 7"3
107"Example 10"3
11"Example 11"4
1211"Example 12"4
13"Example 13"5

The issue here is that while we have enough data to do what we want, we actually have too much; each video has the full trail, leading to duplicate rows for ids 1 & 5.

This can be cleaned up with a post-query filtering step; the snippet below will filter out any duplicates and fill the $unique variable with only the unique rows (unique by ID), which will end with the same results as a PostgreSQL SELECT DISTINCT ON() statement.

php
	$uniqueIds = [];
	$unique = [];
	foreach($results as $result)
	{
		if (!in_array($result->id, $uniqueIds))
		{
			$uniqueIds[] = $result->id;
			$unique[] = $result;
		}
	}

Structuring into a proper hierarchy tree

Now that we have the data we want, let's hop over to the Laravel & PHP side of things and start using it.

First, let's wrap our CTE query into a proper Laravel Eloquent query builder statement:

php
use Illuminate\Support\Facades\DB;

public static function getHierarchyWithVideos()
{
	$results = DB::select(
		DB::raw('
			WITH RECURSIVE Hierarchy AS (
				-- Anchor member: Categories directly related to videos OR the root category for tree
				SELECT c.*, v.id AS video_id
				FROM categories c
				LEFT JOIN videos v
					ON c.id = v.category_id
				WHERE v.published IS NOT NULL

				UNION

				-- Recursive member: Categories that are descendants of the directly related categories
				SELECT c.*, h.video_id
				FROM categories c
				JOIN Hierarchy h ON c.id = h.parent_id
			)

			-- Final select: Retrieve all categories in the hierarchy
			SELECT DISTINCT ON (h.id) *
			FROM Hierarchy h
			WHERE video_id IS NOT NULL
		')
		->getValue(DB::connection()->getQueryGrammar())
	);
}

INFO

After initially having some trouble getting the output of this query with Eloquent, I stumbled upon an answer in a Laracast forum post that mentioned using the getQueryGrammar() here as the argument for getValue(). It wasn't obvious to me what exactly getQueryGrammar() was doing, and others have asked the same question - for now though, since this same syntax is used within the source code of Laravel itself I'm not really worried about it.

Since this is a recursive CTE and beyond what Eloquent supports, we can wrap it in a DB::raw() within a DB::select(), and the getValue() call will return the dataset as stdClass objects, i.e. not related to the actual Model class the table is for.

We now need to build out the multi-level tree and cast each object into the Model class that it actually is. To do this, we'll add an internal function within this one that we'll recursively call:

php
function parseHierarchy($categories, int $parentId = null)
{
	$hierarchy = [];
	foreach ($categories as $category) {

		if ($category->parent_id == $parentId) {

			// Call self for recursive build out
			$children = parseHierarchy($categories, $category->id);

			// $children will be null if not categories exist with the current category's id as their parent_id;
			if ($children) $category->children = $children;

			// Cast to our Laravel Model class; since we already populated the children
			// parameter, no eager loading is necessary for that relationship
			$category = new Category((array) $category);

			// Eager load any existing videos for current category
			$category->load(['videos' => function($query) {
				// Add any eager loading constraints
				$query->where('published', '<>', null);
			}]);

			// Push into main array
			$hierarchy[] = $category;
		}
	}
	return $hierarchy;
}

TL;DR Final Results

php
use Illuminate\Support\Facades\DB;

public static function getHierarchyWithVideos()
{
	$results = DB::select(
		DB::raw('
			WITH RECURSIVE Hierarchy AS (
				-- Anchor member: Categories directly related to videos OR the root category for tree
				SELECT c.*, v.id AS video_id
				FROM categories c
				LEFT JOIN videos v
					ON c.id = v.category_id
				WHERE v.published IS NOT NULL

				UNION

				-- Recursive member: Categories that are descendants of the directly related categories
				SELECT c.*, h.video_id
				FROM categories c
				JOIN Hierarchy h ON c.id = h.parent_id
			)

			-- Final select: Retrieve all categories in the hierarchy
			SELECT DISTINCT ON (h.id) *
			FROM Hierarchy h
			WHERE video_id IS NOT NULL
		')
		->getValue(DB::connection()->getQueryGrammar())
	);

	function parseHierarchy($categories, int $parentId = null)
	{
		$hierarchy = [];
		foreach ($categories as $category) {

			if ($category->parent_id == $parentId) {

				// Call self for recursive build out
				$children = parseHierarchy($categories, $category->id);

				// $children will be null if not categories exist with the current category's id as their parent_id;
				if ($children) $category->children = $children;

				// Cast to our Laravel Model class; since we already populated the children
				// parameter, no eager loading is necessary for that relationship
				$category = new Category((array) $category);

				// Eager load any existing videos for current category
				$category->load(['videos' => function($query) {
					// Add any eager loading constraints
					$query->where('published', '<>', null);
				}]);

				// Push into main array
				$hierarchy[] = $category;
			}
		}
		return $hierarchy;
	}

	return parseHierarchy($results);
}

There you have it; this will return you a tree-shaken & relation-constrained hierarchal tree of Models which have a self-referential parent/child relationship. Of course, you will need to update the table names, Model class, etc. to fit your application.

There are undoubtedly more optimizations you could make to this; the primary thing that comes to mind would be to eager-load the videos in the CTE itself to remove any eager loading at all.

I also played around with abstracting this functionality to be easily reusable for any Model & relationship; however I quickly realized the complexity that would required in order to support all the different relationship types in Laravel and dynamically build out the CTE.