Just Your Garden-Variety Path

Wherein we learn that when we follow the well-beaten path we are still sometimes lead astray. Humans are best at following humans — it’s the root of our nature. And hundreds of millions of humans, consequently, can most certainly be wrong.

THE WEEKLY CHALLENGE – PERL & RAKU #182 Task 2


“The best place to find God is in a garden. You can dig for him there.”

— George Bernard Shaw


Common Path

Submitted by: Julien Fiegehenn

Given a list of absolute Linux file paths, determine the deepest path to the directory that contains all of them.

Example
Input:
    /a/b/c/1/x.pl
    /a/b/c/d/e/2/x.pl
    /a/b/c/d/3/x.pl
    /a/b/c/4/x.pl
    /a/b/c/d/5/x.pl

Ouput:
    /a/b/c

ANALYSIS

Years ago, a mentor once said to me about the Unix operating system, in its then-multitude of commercial flavors: “It’s text files all the way down. It’s just text”.

In the spirit of that sentiment, instead of setting up some parallel abstraction of a filesystem using hashes, and descending this as a linked list until we find the first hash with more than one key, we’ll do something completely different: we’ll simply parse the strings with regular expressions.

METHOD

How? Well, the root-most directory in all the strings will be common to each, right? Or at least that’s the idea. So why don’t we make a pass over all the strings, locate the topmost common root directory and substitute it out — deleting it — from every string? This shortens all the strings a bit and then we go again. If we perform every string substitution successfully in the pass, we continue and append that root part to the output, but once we fail a match then we’ve found our first divergent path. We then jump out of the loop and whatever root portion we’ve gathered by that point is our result.

The method works in part because we keep out eye on the ball as to what we really want from the data. And that is the directory string common to all paths. We’re systematically destroying the path data as we build up the result, but that doesn’t matter here. You may even notice the processing on the paths leaves them in a incomplete state, as in the last pass we physically crop the strings until somewhere in the middle of the pass, leaving some done and some not. Again this is unimportant. It’s the completed root string we’re after.

Some caveats: first we’re going to assume we have been given a set of well-constructed absolute paths. Meaning no relative portions in the form of backtracking or no-ops with dot directories. And no links either, hard or soft, to complicate things. Not really sure how that’d even work but let’s disqualify them outright anyway so I don’t have to think it through. Every directory we see will be assumed to be exactly what it says it is.

And no pathological directory names designed specifically to break the regex. Just no. I’m tired and want to have a little fun for change.

PERL 5 SOLUTION

The regular expression we’re going to design is going to somehow match directory delimiters, so right off the bat we should probably pick a different delimiter pair when we perform that match, to avoid having to escape anything. Some people like curly brackets, and dollar signs are in my experience pretty common, but I find these don’t scan easily for me — that’s one of the problems with Perl harnessing all of those extra characters for its own nefarious uses. Really, when it comes down to it I should pick something from the Unicode character set, but for a nice clean look I prefer a vertical pipe and the /x modifier to place a space in there. It works for me until my regex needs a logical OR.

There are in fact two regular expressions required, but only the first is complex, where we identify the current leading path element. Let’s break it down:

  1. We will want to capture the result, to refer to it later, so we start with a leading parenthesis.
  2. An absolute path should start with leading slash. But in case it for some reason it might not, we can still add a question mark without any penalty. I don’t see it breaking anything.
  3. The meat of the matter is that we now wish to match anything at all except a directory delimiter. To do this we make a negated character class by making the first character a carat. The only character we want to exclude is the forward slash, and we want to allow as many characters as we find, as long as there is at least one there.
  4. Close those parentheses and we’re done.

Here’s what we’ve got:

m| (/?[^/]+) |x

Some people say Perl looks like line-noise. I have no need for the haters and find this quite beautiful.

Just a couple of notes remain on the processing. Notice we do need to assign the capture to a real, named variable because Perl gets confused when we try using $1 in a separate match. It’s reserved within the second regex, even if we don’t actually use it. I think things are much more explicit this way anyways.

Another thing is the loop, where I avoid while (1) { ... } because I’ve always thought, since I were a wee bairn, that it looked quite tacky — we’re force-fitting an infinite loop with a conditional that cannot fail. Sure it works and I understand fully why it exists, but that doesn’t make it look any better to me. It’s a necessary and useful kludge is what it is, a historical hand-me-down. Enter Perl, where at least some other person seems to have felt the same way and installed a special case for us, where if we leave the conditional out entirely we get the same behavior. I find this more pleasing. There’s no flimsy pretext — there is nothing that will stop the loop from spinning.

Until, of course, we step off the ride. There might be a metaphor for life in there somewhere.

    use warnings;
    use strict;
    use utf8;
    use feature ":5.26";
    use feature qw(signatures);
    no warnings 'experimental::signatures';
    
    my @files = qw(
                    /a/b/c/1/x.pl
                    /a/b/c/d/e/2/x.pl
                    /a/b/c/d/3/x.pl
                    /a/b/c/4/x.pl
                    /a/b/c/d/5/x.pl                 
                );
    
    my $common = '';
    my $root;
    
    LOOP: while () {
        $files[0] =~ m| (/?[^/]+) |x or last;
        $root     = $1;
        s/$root// or last LOOP for @files;
        $common  .= $root;
    }
    
    say $common;


The Perl Weekly Challenge, that idyllic glade wherein we stumble upon the holes for these sweet descents, is now known as

The Weekly Challenge – Perl and Raku

It is the creation of the lovely Mohammad Sajid Anwar and a veritable swarm of contributors from all over the world, who gather, as might be expected, weekly online to solve puzzles. Everyone is encouraged to visit, learn and contribute at

https://theweeklychallenge.org