Batch updating content
Updating fields or manipulating files across a potentially large amount of content is a common task, in particular when working with an existing site. As Kirby does not use a database that would provide global search-and-replace operations, we have to work with loops that come with great flexibility but also certain limitations.
This recipe provides an extendable boilerplate route to comfortably update large amounts of content or files.
Whenever running batch operations on content, make sure to create a full backup first.
It all starts with a simple loop
Before addressing more advanced examples and related challenges, let's have a look at the barebone workflow:
This loops through all pages of the entire site (the
true boolean in
index() includes drafts as well), adds a prefix to every page's field
myfield and saves that to the file system.
Of course this could also be limited to any subset of pages, e.g. to specifically target all articles that are children of a hypothetical
All that counts is that the
$collection variable has to be a Pages collection.
A (temporary) route to trigger the batch loop
While we could create a dedicated plugin for this purpose or temporarily hack it into a controller or template, a clean and easy way is to create a route in the site's config file:
Now our update loop can be triggered using the URL
mydomain.com/batch-update-content. This includes a check to verify admin status; the route will fail unless logged in.
It is a good idea to delete or comment-out this route when done, to keep things clean and safe.
On multi-language sites, we have to specify a language both on read and write.
$page->content('en')->myfield()->value() to retrieve the desired content, and
$page->update(['myfield' => $newValue]) receives an additional second argument with the target language:
$page->update(['myfield' => $newValue], 'en').
All code samples in this recipe are templates to start with, leaving out additional checks for missing fields etc. To avoid runtime errors, consider adding clauses like
if ($page->content()->has('myfield')) while crafting your custom update loops.
Advanced batch processing
foreach loop in above code is our workhorse, and can be adjusted to pretty much any scriptable update task at hand:
Updating structure fields
While the first example deals with the one-dimensional case of prefixing a string in a text field, structure fields require a few extra lines of code, as their content is stored in YAML format:
Changing a tag field
Similarly we can search-and-replace an existing tag (tags fields store a comma-separated list) with a new version of it:
split() method similarly applies when dealing with a multiselect field.
Updating content within blocks
When dealing with contents in a blocks field, we have to create a modified clone of the original Blocks object and replace the field's value with that (see also this recipe to see the underlying principle).
Similarly, we could change other attributes of Blocks objects; for instance we could manipulate the
$old['type'] variable to convert all blocks of type
text into a (hypothetical) custom block type
The same logic can be used to loop over files and do something with them. For example, we could loop over all attached files of template
image and resize them to a maximum width of 2000 pixels.
For compatibility with the next iteration in this recipe's code, we loop over each page's images while looping through all pages; of course the script could also be rewritten to directly deal with a collection of files:
This would make the second
foreach loop obsolete as
$page would already be a file, not a page, object.
The sky is the limit: within the loop, we could even access an API, retrieve data from a database, deal with complex blocks, alter layout fields' data, etc.
Dealing with script timeouts
While straightforward in theory, running such loops over a great amount of pages or files may eventually lead to server timeouts. While we sometimes have control over adjusting the cutoff time in the server configuration or may be able to trigger long-running scripts on the PHP command line, a robust way of updating content is to process the batch in a staggered process.
For this purpose, we introduce a
$offset index, contained in the route's URL, in combination with a
$limit variable, and extend the routing script to repeatedly call itself until done:
This is a suitable approach for dealing with updates on sites with up to several thousands of pages (performance may vary; depending largely on the host system). Every time the script has processed a batch, the browser is instructed to load a new URL (with the new offset included), hence avoiding excessive execution times.
While using Kirby's
go() function instead of returning some minimum valid HTML5 calling
A version for really big amounts of pages
We might still run into timeout or memory issues, if the indexed amount of pages is very large. This is due to
$site->index() having to crawl through the entire content store every time the route is calling itself.
To mitigate this problem, we can keep the collection to be processed in a log file and then process it in small chunks, so
index() is only called on the first run:
Only the page IDs applicable for the current run are turned into a Pages collection (this is important, as creating large Pages collections from IDs is another potential bottleneck on content-heavy sites). Finally, the script cleans up after itself.
Voilá – your Swiss Army knife for content updates
Now we hold a universal boilerplate for any kind of batch updates, without server timeout worries. To adapt to different use cases, all we need to change are:
$collectionvariable's definition to control what pages to process, and
- the code contained within the
foreachloop as outlined in the examples above.
The ideal value for
$limit has to be determined through trial-and-error – a safe value of 25, even 100 or more, should be fine for most tasks altering content fields; if pages have large amounts of files attached and when manipulating those files rather than just content fields, a much lower number may be advised (for example resizing images can be a time-consuming process and quickly hit the 30 or 60 second timeout threshold on some shared hosting environments).