Search

Creating a Hybrid Cache System for Statamic: Part Two

September 3, 2023 —John Koster

#Invalidating the Cache When Templates Change

In this section, we will work to implement one of the more impressive aspects of our custom cache system: invalidating cached content if any of the templates used to create the response have been modified or removed. Before diving into all of the required code changes, let's think through the things we will need to do:

  1. Somehow, determine which templates were used to create the response;
  2. Detect when template files have been removed or modified;
  3. Store template meta-data with our cached responses

We know our templates are stored on the filesystem, so we really only need to find a way to get a list of these paths. Let's assume we can get these eventually and work through some updates to our cache manager implementation.

In app/HybridCacheManager/Manager.php:

1<?php
2 
3namespace App\HybridCache;
4 
5class Manager
6{
7 protected ?string $cacheFileName = null;
8 
9 protected array $viewPaths = [];
10 
11 public static ?Manager $instance = null;
12 
13 // ...
14 
15 public function registerViewPath(string $path): void
16 {
17 if (! in_array($path, $this->viewPaths)) {
18 $this->viewPaths[] = $path;
19 }
20 }
21 
22 public function getCacheData(): array
23 {
24 return [
25 'viewPaths' => $this->viewPaths,
26 ];
27 }
28 
29 // ...
30}

We also need to make the corresponding updates to our HybridCache facade.

In app/HybridCache/Facades/HybridCache.php:

1<?php
2 
3namespace App\HybridCache\Facades;
4 
5use App\HybridCache\Manager;
6use Illuminate\Support\Facades\Facade;
7 
8/**
9 * @method static bool canHandle()
10 * @method static void sendCachedResponse()
11 * @method static string|null getCacheFileName()
12 * @method static void registerViewPath(string $path)
13 * @method static array getCacheData()
14 *
15 * @see \App\HybridCache\Facades\HybridCache
16 */
17class HybridCache extends Facade
18{
19 protected static function getFacadeAccessor()
20 {
21 return Manager::class;
22 }
23}

Our changes are minimal now, and we've just added a place to store the paths to our template files later. Our registerViewPath method will check if we've seen the provided path; if not, it will add it to the internal viewPaths class member.

The other notable addition is the getCacheData method, which returns an associative array. It only returns our viewPaths. We will expand on this in later sections to include other data types. Another option would be to add explicit getters for each data type, but the associative array method also works as long as you are consistent with the implementation. Additionally, I would not expect any third-party system to interact with this method, and its use is essentially "internal."

The next step will be to retrieve the paths of the templates. We can accomplish this by making use of Laravel's view composers feature. View composers are intended to help organize the logic for injecting data into views, but we can take advantage of them to extract the path to each of the templates used to render our site. We will need to update our cache's service provider.

In app/HybridCache/Providers/HybridCacheServiceProvider.php:

1<?php
2 
3namespace App\HybridCache\Providers;
4 
5use App\HybridCache\Facades\HybridCache;
6use App\HybridCache\Manager;
7use Illuminate\Support\ServiceProvider;
8use Illuminate\View\View;
9 
10class HybridCacheServiceProvider extends ServiceProvider
11{
12 // ...
13 
14 public function boot()
15 {
16 $cacheStoragePath = storage_path('hybrid-cache');
17 
18 if (! file_exists($cacheStoragePath)) {
19 mkdir(storage_path('hybrid-cache'), 0755, true);
20 }
21 
22 view()->composer('*', function (View $view) {
23 HybridCache::registerViewPath($view->getPath());
24 });
25 }
26}

Between lines 22 and 24 of our new addition, we've defined a view composer using the * pattern, and this will allow our callback to be invoked each time a view is rendered. Inside our callback, we are using the getPath defined in Laravel's View base class to retrieve the path of each view and ultimately register it with our cache manager.

Now that we register our view paths with the cache manager let's store them with the cached content. We will again be making changes to our ResponsePrepared listener implementation.

In app/HybridCache/Listeners/ResponsePreparedListener.php:

1<?php
2 
3namespace App\HybridCache\Listeners;
4 
5use App\HybridCache\Facades\HybridCache;
6use Illuminate\Routing\Events\ResponsePrepared;
7 
8class ResponsePreparedListener
9{
10 public function handle(ResponsePrepared $event)
11 {
12 $cacheFileName = HybridCache::getCacheFileName();
13 
14 if (! $cacheFileName) {
15 return;
16 }
17 
18 $content = $event->response->getContent();
19 
20 if (mb_strlen($content) == 0) {
21 return;
22 }
23 
24 $responseDependencies = HybridCache::getCacheData();
25 
26 $paths = [];
27 
28 $paths = array_merge($paths, $responseDependencies['viewPaths']);
29 
30 $timestamps = [];
31 
32 foreach ($paths as $path) {
33 $timestamps[$path] = filemtime($path);
34 }
35 
36 $cacheData = [
37 'content' => $content,
38 'paths' => $timestamps,
39 ];
40 
41 file_put_contents($cacheFileName, json_encode($cacheData));
42 }
43}

After visiting the Cool Writings home page in my local development environment, the generated cache file now looks similar to the following:

1{
2 "content": "...HTML content...",
3 "paths": {
4 "\/cache_dev\/resources\/views\/home.antlers.html": 1692915830,
5 "\/cache_dev\/resources\/views\/layout.antlers.html": 1692915976,
6 "\/cache_dev\/resources\/views\/\/_nav.antlers.html": 1692813969,
7 "\/cache_dev\/resources\/views\/\/_footer.antlers.html": 1692813969
8 }
9}

Fantastic! We now have the view paths stored alongside our cached HTML content. To make things feel magical, we must add some logic to invalidate our cache if these files change.

We must implement the invalidation logic before sending our cached contents to ensure that most of the cache logic operates independently of Statamic Control Panel events. To do this, we will make more changes to our cache manager.

In app/HybridCache/Manager.php:

1<?php
2 
3namespace App\HybridCache;
4 
5class Manager
6{
7 // ...
8 
9 public function sendCachedResponse(): void
10 {
11 if ($this->cacheFileName && ! file_exists($this->cacheFileName)) {
12 return;
13 }
14 
15 $cacheContents = json_decode(file_get_contents($this->cacheFileName), true);
16 
17 if (! $cacheContents) {
18 return;
19 }
20 
21 if (! isset($cacheContents['paths'])) {
22 return;
23 }
24 
25 if (! isset($cacheContents['content'])) {
26 return;
27 }
28 
29 foreach ($cacheContents['paths'] as $path => $cachedMTime) {
30 if (! file_exists($path) || filemtime($path) > $cachedMTime) {
31 @unlink($this->cacheFileName);
32 
33 return;
34 }
35 }
36 
37 echo $cacheContents['content'];
38 exit;
39 }
40}

The most critical changes we've made are between lines 29 and 35. We iterate each path that was stored alongside the cached content and compare their current modified date/time with the value stored in the cache file.

If the file no longer exists, or the modified times are different we remove the cache file and return from the sendCachedResponse method early to generate a new response.

Another exciting thing about our current implementation is that the path invalidation logic is generalized. We have done this to be able to add different types of paths to our cached content later without having to think too hard about how we named something or add extra methods that are not necessary.

With these changes out of the way, we should receive cached responses if we visit our site and refresh a few times. However, if we modify the layout.antlers.html template file, our cached response should automatically invalidate, and the changes should be reflected in our browser.

#Detecting Request Dependencies

In the last section, we implemented automatic cache invalidation when we made changes to the templates that helped create the final content of a request. In this section, we will work to implement something similar for the actual content of the cached page. Figuring out how to invalidate the cache based on the current page or entry would be straightforward, but what about the additional content on a cached page?

For example, what if we had a navigation menu on the page and updated the display title of one of those pages? How would we go about automatically invalidating the cache in that scenario? What about assets, taxonomy terms, asset meta-data, or even global variables? Before answering those questions, let's look at the case of accessing the current page or entry details.

To get the details of the current page or entry, we can take advantage of Statamic's cascade feature, which provides data to front-end templates. We can provide a callback function that will be invoked once Statamic has fetched the relevant data for the current request. For example, we could use the following to retrieve the file path of the current page or entry:

1<?php
2 
3use Statamic\Facades\Cascade;
4use Statamic\Structures\Page;
5 
6Cascade::hydrated(function (\Statamic\View\Cascade $cascade) {
7 /** @var Page $content */
8 $content = $cascade->content();
9 
10 if (! $content) {
11 return;
12 }
13 
14 $entry = $content->entry();
15 
16 if (! $entry) {
17 return;
18 }
19 
20 $path = $entry->path();
21 
22 // Do something with the path here.
23});

Using the path retrieved using this method, we could add it to our cached content's paths array and have our cache invalidate if changes are made. However, using a similar method to track down all other content within the response would be unreliable and time-consuming.

To help solve our problem, we will use a less-known Statamic feature: changing the class bindings for various Stache repositories. The Stache is Statamic's internal system for storing and retrieving flat-file content for your site; the fact that we are relying on this, in addition to using file paths for cache invalidation, is the reason our cache implementation will be limited to the flat-file storage driver. But what even are Stache class bindings?

If you have ever used the {{ dump }} tag to inspect what values are available to you in a template, you may have noticed something similar to the following output:

1array:60 [
2 "api_url" => Statamic\Fields\Value {}
3 "articles" => Statamic\Entries\EntryCollection {
4 #items: array:3 [
5 0 => Statamic\Entries\Entry {
6 #id: "7ac0bdda-1b84-45f8-ac52-2575dd7e8251"
7 #collection: "articles"
8 #blueprint: null
9 #date: Illuminate\Support\Carbon @848275200 {}
10 #locale: "default"
11 #afterSaveCallbacks: []
12 #withEvents: true
13 #template: null
14 #layout: null
15 #slug: "dance"
16 #data: Illuminate\Support\Collection {}
17 #supplements: Illuminate\Support\Collection {}
18 #withComputedData: true
19 #initialPath: "cache_dev/content/collections/articles/1996-11-18.dance.md"
20 #published: true
21 #selectedQueryColumns: null
22 #selectedQueryRelations: []
23 #origin: null
24 }
25 1 => Statamic\Entries\Entry {}
26 2 => Statamic\Entries\Entry {}
27 ]
28 #escapeWhenCastingToString: false
29 },
30 // ...
31]

If we look at the articles array in the output, we can see many references to Statamic\Entries\Entry; this is the class that the Stache uses by default when returning results. This class contains all our interaction methods, such as save and saveQuietly. Our goal will be to change which class Statamic returns; this will let us override some fundamental methods to help track when entries, assets, or other types of content are utilized when creating the response for site visitors.

To get started, we will create four new files.

In app/Data/Asset.php:

1<?php
2 
3namespace App\Data;
4 
5use Statamic\Assets\Asset as StatamicAsset;
6 
7class Asset extends StatamicAsset
8{
9}

In app/Data/Entry.php:

1<?php
2 
3namespace App\Data;
4 
5use Statamic\Entries\Entry as StatamicEntry;
6 
7class Entry extends StatamicEntry
8{
9}

In app/Data/Term.php:

1<?php
2 
3namespace App\Data;
4 
5use Statamic\Taxonomies\Term as StatamicTerm;
6 
7class Term extends StatamicTerm
8{
9}

In app/Data/Variables.php:

1<?php
2 
3namespace App\Data;
4 
5use Statamic\Globals\Variables as StatamicVariables;
6 
7class Variables extends StatamicVariables
8{
9}

We deliberately placed our custom data classes outside of the HybridCache namespace. This decision was made because these classes can only be substituted once. If we tied them too closely to our cache system, it would hinder other developers from using their own unique implementations should we make our cache system available for wider use. Therefore, developers will need to tailor their data classes to work with our cache system.

The four files we created correspond to different data types we are interested in tracking during the response creation process. The Asset class is used to represent uploaded media assets, such as images, videos, documents, etc.; the Entry class will be used to represent each item of a collection, like a blog post or page; the Term class is used to represent an individual taxonomy term, and finally, the Variables class is used by Statamic when it returns a set of global variables.

In order to swap the Stache class bindings, we need to update our application's service provider.

In app/Providers/AppServiceProvider.php:

1<?php
2 
3namespace App\Providers;
4 
5use App\Data\Asset;
6use App\Data\Entry;
7use App\Data\Term;
8use App\Data\Variables;
9use Illuminate\Support\ServiceProvider;
10use Statamic\Contracts\Assets\Asset as AssetContract;
11use Statamic\Contracts\Entries\Entry as EntryContract;
12use Statamic\Contracts\Globals\Variables as VariablesContract;
13use Statamic\Contracts\Taxonomies\Term as TermContract;
14use Statamic\Statamic;
15 
16class AppServiceProvider extends ServiceProvider
17{
18 /**
19 * Register any application services.
20 */
21 public function register(): void
22 {
23 $this->app->bind(AssetContract::class, Asset::class);
24 $this->app->bind(EntryContract::class, Entry::class);
25 $this->app->bind(TermContract::class, Term::class);
26 $this->app->bind(VariablesContract::class, Variables::class);
27 }
28}

To ensure the rest of the article goes smoothly, we will run the following command from the root of our project to ensure we utilize our custom classes:

1composer dump-autoload

If you use Laravel's php artisan serve command or tools like Valet, you may have to restart your site for the changes to take effect. If we were to repeat our {{ dump }} take experiment, we should now see our custom classes listed instead of Statamic's.

Now that we are utilizing our custom data classes, we can begin retrieving the file paths for each piece of content. As we examined earlier, looking into the Cascade to fetch all these details will be error-prone and time-consuming. Our strategy to achieve this will be similar across the various data types.

We will start by changing our manager implementation and the corresponding facade class.

In app/HybridCache/Manager.php:

1<?php
2 
3namespace App\HybridCache;
4 
5class Manager
6{
7 protected bool $canCache = true;
8 
9 protected ?string $cacheFileName = null;
10 
11 protected array $entryIds = [];
12 
13 protected array $termIds = [];
14 
15 protected array $globalPaths = [];
16 
17 protected array $viewPaths = [];
18 
19 protected array $assetPaths = [];
20 
21 public static ?Manager $instance = null;
22 
23 public function __construct()
24 {
25 self::$instance = $this;
26 }
27 
28 public function canCache(): bool
29 {
30 return $this->canCache;
31 }
32 
33 public function abandonCache(): void
34 {
35 $this->canCache = false;
36 }
37 
38 public function getCacheFileName(): ?string
39 {
40 return $this->cacheFileName;
41 }
42 
43 public function registerEntryId(string $id): void
44 {
45 if (! in_array($id, $this->entryIds)) {
46 $this->entryIds[] = $id;
47 }
48 }
49 
50 public function registerTermId(string $id): void
51 {
52 if (! in_array($id, $this->termIds)) {
53 $this->termIds[] = $id;
54 }
55 }
56 
57 public function registerGlobalPath(string $path): void
58 {
59 if (! in_array($path, $this->globalPaths)) {
60 $this->globalPaths[] = $path;
61 }
62 }
63 
64 public function registerViewPath(string $path): void
65 {
66 if (! in_array($path, $this->viewPaths)) {
67 $this->viewPaths[] = $path;
68 }
69 }
70 
71 public function registerAssetPath(string $path): void
72 {
73 if (! in_array($path, $this->assetPaths)) {
74 if (! file_exists($path)) {
75 $this->canCache = false;
76 
77 return;
78 }
79 
80 $this->assetPaths[] = $path;
81 }
82 }
83 
84 public function getCacheData(): array
85 {
86 return [
87 'viewPaths' => $this->viewPaths,
88 'entryIds' => $this->entryIds,
89 'termIds' => $this->termIds,
90 'globalPaths' => $this->globalPaths,
91 'assetPaths' => $this->assetPaths,
92 ];
93 }
94 
95 public function canHandle(): bool
96 {
97 // Ignore all request types except GET.
98 if ($_SERVER['REQUEST_METHOD'] != 'GET') {
99 return false;
100 }
101 
102 $cacheDirectory = realpath(__DIR__.'/../../storage/hybrid-cache');
103 
104 if (! $cacheDirectory) {
105 return false;
106 }
107 
108 $requestUri = mb_strtolower($_SERVER['REQUEST_URI']);
109 
110 $this->cacheFileName = $cacheDirectory.'/'.sha1($requestUri).'.json';
111 
112 return file_exists($this->cacheFileName);
113 }
114 
115 public function sendCachedResponse(): void
116 {
117 if ($this->cacheFileName && ! file_exists($this->cacheFileName)) {
118 return;
119 }
120 
121 $cacheContents = json_decode(file_get_contents($this->cacheFileName), true);
122 
123 if (! $cacheContents) {
124 return;
125 }
126 
127 if (! isset($cacheContents['paths'])) {
128 return;
129 }
130 
131 if (! isset($cacheContents['content'])) {
132 return;
133 }
134 
135 foreach ($cacheContents['paths'] as $path => $cachedMTime) {
136 if (! file_exists($path) || filemtime($path) > $cachedMTime) {
137 @unlink($this->cacheFileName);
138 
139 return;
140 }
141 }
142 
143 echo $cacheContents['content'];
144 exit;
145 }
146}

In app/HybridCache/Facades/HybridCache.php:

1<?php
2 
3namespace App\HybridCache\Facades;
4 
5use App\HybridCache\Manager;
6use Illuminate\Support\Facades\Facade;
7 
8/**
9 * @method static bool canHandle()
10 * @method static void sendCachedResponse()
11 * @method static string|null getCacheFileName()
12 * @method static void registerViewPath(string $path)
13 * @method static void registerEntryId(string $id)
14 * @method static void registerAssetPath(string $path)
15 * @method static void registerTermId(string $id)
16 * @method static void registerGlobalPath(string $path)
17 * @method static array getCacheData()
18 * @method static bool canCache()
19 * @method static void abandonCache()
20 *
21 * @see \App\HybridCache\Facades\HybridCache
22 */
23class HybridCache extends Facade
24{
25 protected static function getFacadeAccessor()
26 {
27 return Manager::class;
28 }
29}

Most of our changes surround supplying different pieces of information to our cache manager.

Some of our methods suggest we supply identifiers instead of file paths to our cache manager. During initial development and experimentation, I found some situations where attempting to fetch the path for entries and terms directly would cause unexpected behaviors or errors. We will use the identifiers to retrieve the paths before saving our cached response.

Our other additions are the canCache and abandonCache methods. We will use the abandonCache method to prevent saving any cached response and the corresponding canCache method to help make that determination.

One situation where we might decide against saving a cached response is when we're generating a response but can't reliably locate the local metadata of an asset to invalidate our cache automatically. Later, we'll encapsulate this method within an Antlers tag, allowing us to prevent page caching directly from a template.

With our supporting methods out of the way, we can now look at how we will extract this information. Our goal is for the content to notify us when a template uses it. This method isn't entirely foolproof, as we sometimes cache more content dependencies, like when we render navigation menus. However, it is arguably better for our cache to invalidate too often than never to invalidate.

In app/Data/Asset.php:

1<?php
2 
3namespace App\Data;
4 
5use App\HybridCache\Facades\HybridCache;
6use Statamic\Assets\Asset as StatamicAsset;
7 
8class Asset extends StatamicAsset
9{
10 public function url()
11 {
12 if ($this->metaExists()) {
13 HybridCache::registerAssetPath($this->disk()->path($this->metaPath()));
14 } else {
15 HybridCache::abandonCache();
16 }
17 
18 return parent::url();
19 }
20}

In app/Data/Entry.php:

1<?php
2 
3namespace App\Data;
4 
5use App\HybridCache\Facades\HybridCache;
6use Statamic\Entries\Entry as StatamicEntry;
7 
8class Entry extends StatamicEntry
9{
10 public function __construct()
11 {
12 parent::__construct();
13 }
14 
15 public function id($id = null)
16 {
17 $entryId = $this->fluentlyGetOrSet('id')->args(func_get_args());
18 
19 if ($this->id) {
20 HybridCache::registerEntryId($this->id);
21 }
22 
23 return $entryId;
24 }
25}

In app/Data/Term.php:

1<?php
2 
3namespace App\Data;
4 
5use App\HybridCache\Facades\HybridCache;
6use Statamic\Taxonomies\Term as StatamicTerm;
7 
8class Term extends StatamicTerm
9{
10 public function id()
11 {
12 $termId = parent::id();
13 
14 if ($termId) {
15 HybridCache::registerTermId($termId);
16 }
17 
18 return $termId;
19 }
20}

In app/Data/Variables.php:

1<?php
2 
3namespace App\Data;
4 
5use App\HybridCache\Facades\HybridCache;
6use Statamic\Globals\Variables as StatamicVariables;
7 
8class Variables extends StatamicVariables
9{
10 public function get($key, $fallback = null)
11 {
12 HybridCache::registerGlobalPath($this->path());
13 
14 return parent::get($key, $fallback);
15 }
16}

We designed these implementations to override methods that templates likely invoke during rendering. We chose these methods through experimentation. If you want to adopt a similar approach in your projects, you'll need to test and modify it based on your project's unique requirements. Of all these implementations, the asset one stands out as the least intuitive.

When someone calls the asset's url method, we check for the asset's local metadata. If metadata doesn't exist, we call the abandonCache method we mentioned earlier. But if metadata is present, we retrieve the full path from the asset container's disk; without this step, we'd only obtain a relative path.

We now have a system to detect our response's content dependencies, but we are not actively doing anything with them when creating the cached responses. Our next step will be to update our App\HybridCache\Listeners\ResponsePreparedListener implementation.

In app/HybridCache/Listeners/ResponsePreparedListener.php:

1<?php
2 
3namespace App\HybridCache\Listeners;
4 
5use App\HybridCache\Facades\HybridCache;
6use Illuminate\Routing\Events\ResponsePrepared;
7use Statamic\Facades\Entry;
8use Statamic\Facades\Term;
9 
10class ResponsePreparedListener
11{
12 public function handle(ResponsePrepared $event)
13 {
14 if (! HybridCache::canCache()) {
15 return;
16 }
17 
18 $cacheFileName = HybridCache::getCacheFileName();
19 
20 if (! $cacheFileName) {
21 return;
22 }
23 
24 $content = $event->response->getContent();
25 
26 if (mb_strlen($content) == 0) {
27 return;
28 }
29 
30 $responseDependencies = HybridCache::getCacheData();
31 
32 $paths = [];
33 
34 $paths = array_merge($paths, $responseDependencies['viewPaths']);
35 $paths = array_merge($paths, $responseDependencies['globalPaths']);
36 
37 $paths = array_merge($paths, Entry::query()
38 ->whereIn('id', $responseDependencies['entryIds'])
39 ->get()
40 ->map(fn ($entry) => $entry->path())
41 ->all());
42 
43 $paths = array_merge($paths, Term::query()
44 ->whereIn('id', $responseDependencies['termIds'])
45 ->get()
46 ->map(fn ($term) => $term->path())
47 ->all());
48 
49 $paths = array_merge($paths, $responseDependencies['assetPaths']);
50 
51 $timestamps = [];
52 
53 foreach ($paths as $path) {
54 $timestamps[$path] = filemtime($path);
55 }
56 
57 $cacheData = [
58 'content' => $content,
59 'paths' => $timestamps,
60 ];
61 
62 file_put_contents($cacheFileName, json_encode($cacheData));
63 }
64}

The changes utilize Statamic's data repositories to retrieve the absolute file paths for the entries and terms we have identifiers for. Statamic provides many repositories that simplify queries and filtering content across various conditions and scenarios. If you are unfamiliar with these repositories, you should check out the official documentation.

Because we are utilizing file paths for content dependencies, we do not need to update our cache invalidation logic. If we were to manually clear our cache and then update any content that appears within a cached response in a text editor, our cache should now automatically invalidate itself.

Get the PDF version on LeanPub Grab the example code on GitHub Proceed to Creating a Hybrid Cache System for Statamic: Part Three