Grabbing a Filtered Directory Tree Using PHP Iteration

Iterate or not to Iterate … that is the question!  Excuses for misquoting Shakespeare dear readers, but I had to grab your attention away from your current attention-grabbing-addiction somehow!  So, now that I’ve got your attention (presumably not having lost it by now!), I would like to discuss the pressing topic of grabbing entire directory trees with a single command.  Granted, this would normally be a rather mundane task, so to add a further twist, I want to exclude certain directories at the same time.

RecursiveDirectoryIterator

In the SPL (Standard PHP Library) there lives an incredibly useful iterator known as RecursiveDirectoryIterator.  As an example, we parse the directory structure of a typical Laminas project:

├── composer.json
├── config
│   ├── application.config.php
│   ├── autoload
│   │   ├── global.php
│   │   └── laminas-developer-tools.local-development.php
│   ├── development.config.php.dist
│   └── modules.config.php
├── COPYRIGHT.md
├── data
│   └── cache
├── docker-compose.yml
├── Dockerfile
├── LICENSE.md
├── module
│   ├── Application
│   │   ├── config
│   │   │   └── module.config.php
│   │   ├── src
│   │   │   ├── Controller
│   │   │   │   ├── IndexControllerFactory.php
│   │   │   │   └── IndexController.php
│   │   │   ├── Module.php
│   │   │   └── Service
│   │   │       └── Calendar.php
│   │   ├── test
│   │   │   └── Controller
│   │   │       └── IndexControllerTest.php
│   │   └── view
│   │       ├── application
│   │       │   └── index
│   │       │       ├── calendar.phtml
│   │       │       └── index.phtml
│   │       ├── error
│   │       │   ├── 404.phtml
│   │       │   └── index.phtml
│   │       └── layout
│   │           └── layout.phtml
├── public
│   ├── css
│   ├── img
│   ├── index.php
│   ├── js
│   └── web.config
├── README.md
├── Vagrantfile
└── vendor
    ├── autoload.php
    ├── bin
    ├── composer
    │   ├── autoload_classmap.php
    │   └── LICENSE
    ├── laminas
    │   ├── laminas-component-installer
    │   ├── laminas-config
    etc.

As you can imagine, the vendor directory has a ton of open-source software installed via Composer.  Further, the public directory includes lots of stylesheets and JavaScript.  So the task at hand is to iterate through the directory structure, excluding these two directories.  Your first thought might be to define a path and create a RecursiveDirectoryIterator and be done with it.  Throw in a simple foreach() loop and we’re good to go, right?  (Don’t answer!  Rhetorical question.) Before we dive into the code, please be aware that by default RecursiveDirectoryIterator returns an iteration with the full filename (including path) as a key, and an SplFileInfo object as the value.

So, let’s get down to producing some code to achieve the desired results.  A good place to start might be to define a function (or class method) that determines the acceptance criteria.  In this case we want to be able to exclude one or more directory paths from the final output.  Thus we define a simple function accept() that returns FALSE if the given path includes any of the directory paths in the $excludes array:

function accept(string $name, array $excludes = [])
{
    $result = TRUE;
    if ($excludes) {
        foreach ($excludes as $item) {
            if (strpos($name, $item) !== FALSE) {
                $result = FALSE;
                break;
            }
        }
    }
    return $result;
}

Next we define a function show() that performs the actual iteration, using accept() to include or exclude iteration entries.

function show(string $path, Iterator $iteration, array $excludes)
{
    $output = '';
    foreach ($iteration as $key => $value)
        if (accept($key, $excludes))
            $output .= str_replace($path, '', $key) . "\n";
    return $output;
}

Finally, we create the RecursiveDirectoryIterator instance, giving it the initial path, and a flag to eliminate the “dot” directories (e.g. “.” and “..”).

$path = '/path/to/laminas_project';
$excludes = ['/vendor','/public'];
$iteration = new RecursiveDirectoryIterator($path, FilesystemIterator::SKIP_DOTS);
echo show($path, $iteration, $excludes);

And here is the resulting output:

README.md
module
phpcs.xml
composer.phar
COPYRIGHT.md
composer.json
docker-compose.yml
.gitignore
Vagrantfile
phpunit.xml.dist
data
config
composer.json.bak
CHANGELOG.md
Dockerfile
LICENSE.md
composer.lock

Wait, you might cry out (well, maybe not, but for the sake of argument, imagine an outraged developer screaming insults at the PHP engine :-), what happened to all the subdirectories and associated files?  Good question!  Oddly, the RecursiveDirectoryIterator was doing its job!  It returns the first entry in the path specified, and the recursively continues to provide subsequent entries in the path specified.  So, in the case of the RecursiveDirectoryIterator, its recursion isn’t that it goes “deep”, but rather that it goes through the entire directory path specified before it stops.  To up its game so to speak, we need to call upon the mighty RecursiveIteratorIterator class.

RecursiveIteratorIterator

The relationship between any given iterator and RecursiveIteratorIterator is like that of a bodybuilder to steroids.  This class causes the associated iterator to continue to iterate until all child nodes have been explored.  When associated with RecursiveDirectoryIterator, it is perfect for parsing entire directory sub-trees.  One word of caution, however, is that if you point it to the wrong path, especially paths with thousands of files and hundreds of subdirectories … you can quickly enter PHP Fatal Error territory. That consideration aside, RecursiveIteratorIterator is a really cool classname, isn’t it?  It gives one a warm fuzzy Department-Of-Redundancy-Department kind of feeling doesn’t it?  (Monty Python Fans take note!)

All joking aside, let’s have a look at its application to the code described above.  Really, the only thing that needs to be done is to wrap the RecursiveDirectoryIterator instance into a RecursiveIteratorIterator instance, and we’re good to go.  The modified code might appear as follows:

$iteration = new RecursiveDirectoryIterator($path, FilesystemIterator::SKIP_DOTS);
$recurse = new RecursiveIteratorIterator($iteration);
echo show($path, $recurse, $excludes);

And here are the results we expected from the start:

README.md
module/Signups/view/signups/index/events.phtml
module/Signups/view/signups/index/index.phtml
module/Signups/src/Module.php
module/Signups/src/Controller/IndexController.php
module/Signups/config/module.config.php
module/Test/view/test/list/index.phtml
module/Test/view/test/index/index.phtml
module/Test/src/Module.php
module/Test/src/Controller/ListController.php
module/Test/src/Controller/IndexController.php
module/Test/config/module.config.php
module/Test/config/module.config.php.bak
module/Application/test/Controller/IndexControllerTest.php
module/Application/view/application/index/calendar.phtml
module/Application/view/application/index/index.phtml
module/Application/view/error/404.phtml
module/Application/view/error/index.phtml
module/Application/view/layout/layout.phtml
module/Application/src/Module.php
module/Application/src/Models/EventsModel.php
module/Application/src/Factory/AdapterFactory.php
module/Application/src/Factory/EventsModelFactory.php
module/Application/src/Controller/IndexControllerFactory.php
module/Application/src/Controller/IndexController.php
module/Application/src/Service/Calendar.php
module/Application/config/module.config.php
phpcs.xml
composer.phar
COPYRIGHT.md
composer.json
docker-compose.yml
.gitignore
Vagrantfile
phpunit.xml.dist
data/cache/.gitkeep
config/application.config.php
config/modules.config.php
config/autoload/README.md
config/autoload/laminas-developer-tools.local-development.php
config/autoload/db.local.php
config/autoload/development.local.php
config/autoload/global.php
config/autoload/.gitignore
config/autoload/local.php.dist
config/autoload/development.local.php.dist
config/development.config.php
config/development.config.php.dist
composer.json.bak
CHANGELOG.md
Dockerfile
LICENSE.md
composer.lock

But wait … there’s more!  Let me pose you a question: wouldn’t it be nice to do all this with a single iterator?  Hah!  That got your attention, didn’t it?  So, without further ado, let’s have a look at the last iterator class to be discussed in this article: FilterIterator.

FilterIterator

As with RecursiveIteratorIterator, the FilterIterator class cannot stand alone: it provides a wrapper for an existing iterator.  But there’s a bigger problem: this class is marked abstract which means you cannot use it directly!  This makes perfect sense when you understand that the abstract method accept() (sound familiar?) simply cannot be defined by the PHP core development team.  They have no idea what needs to be filtered.  Accordingly its definition is left to the developer.  This still doesn’t stop it from being super-annoying, however!  Why do I need to develop an entirely new class which extends FilterIterator just because it’s abstract?  Arghhhh!!!  Hang on folks … there is another way!

Many of us tend to forget one of the most discussed new feature of PHP 7.0: the anonymous class.  This feature was discussed endlessly, and the subject of many an article or blog post.  Eventually it was forgotten and faded into obscurity.  But, it just so happens that an anonymous class might be just the ticket in the situation we are discussing in this article.

Imagine the following:

  • We create an iterator in the form of an anonymous class that extends FilterIterator
  • In the anonymous class we define a static property to contain an array of directory paths to exclude
  • We move the logic from the accept() function described above into a class method
  • VOILA: we’re done!

The only real change that needs to be made in accept() is to not accept any arguments, and substitute parent::current() in place of $name.  If $excludes becomes a public static property, it can be assigned from the calling program. Here is how the alternative code solution might appear:

$iteration = new RecursiveDirectoryIterator($path, FilesystemIterator::SKIP_DOTS);
$recurse = new RecursiveIteratorIterator($iteration);
$filter = new class($recurse) extends FilterIterator {
    public static $excludes = [];
    public function accept() {
        if (!self::$excludes) return TRUE;
        $actual = 0;
        foreach (self::$excludes as $item)
            if (strpos(parent::current(), $item) !== FALSE) $actual++;
        return !((bool) $actual);
    }
};
$filter::$excludes = $excludes;
foreach ($filter as $key => $value)
    echo str_replace($path, '', $key) . "\n";

An added benefit is that we no longer need the show() function.  In this example the iteration itself already includes filtering, so all we need to do is to iterate through the pre-filtered result.  The resulting output is not shown here as it’s identical to the output from the previous code example.  So, to summarize, a single iterator, FilterIterator, allows us to produce a single iteration that doesn’t need any additional logic.

That’s about all for today dear readers.  Happy coding!