Directed Acyclic Graph – Programming Excursions in Perl and Raku

Wherein we sing the praises of the buddy system as we descend down into geometric patterns…

THE WEEKLY CHALLENGE – PERL & RAKU #117

episode one:
“Missing Without Action”

Task 1

Missing Row

You are given text file with rows numbered 1-15 in random order but there is a catch: one row in missing in the file.

11, Line Eleven
1, Line one
9, Line Nine
13, Line Thirteen
2, Line two
6, Line Six
8, Line Eight
10, Line Ten
7, Line Seven
4, Line Four
14, Line Fourteen
3, Line three
15, Line Fifteen
5, Line Five

Method

A keen observer might note that the directions state the problem with the file, but not exactly what to do about this situation. A snarkier person might simply identify this fact and report back: “You’re right!” and be done with it; and with this response be technically correct, the worst kind of correct. I suppose one could even go further and simply state: “If you say so…” and be just as valid, as there is no right or wrong response to a question not asked.

I am not that person. I do try to find the good in things, where there is good to be had. So let’s ask some questions ourselves, and fill in the blanks. We spend a remarkable amount of our days filling in the blanks, you know; assuming things work the way we think they do, and will continue to do so even if we look away. Closing your eyes before stepping into traffic is never a good idea. Object permanence is a notion we become acquainted with quite early in life. The peek-a-boo game may be riotous fun for a while, but eventually we catch on. Or, you know, most of us do. I can’t speak for everyone.

But that is neither here nor there. Today we have a situation — an observation — and what, I say, can we do with it? We have no idea as to the provenance of the file, so no matter our detective skills I think it quite unlikely we’ll have any headway in tracking down who stole the missing line. On the other hand, it does seem likely that we could, if we could identify the placement, infer what the missing text was from the lines we do have. So that’s a start.

Although nothing would give me greater pleasure than using Machine Learning to make a solid guess at the missing text, our document corpus is rather sparse, in only a dozen examples or so; training a model would present some difficulties in this hair-brained approach. It certainly doable, though, but that adventure will have to wait for now.

Alternately, we could report the missing line, or lines, and let the user decide for themselves what to do with this information. This is quite reasonable and sensible goal, so let’s do that.

The Plan

Humans are excellent at identifying patterns; it’s a talent we’ve cultivated throughout history to give us the evolutionary edge to — you know — not die. Recognizing a situation that will kill you is, all-in-all, a pretty valuable talent to have, and the ones who are best at it have a tendency to not die, which in turn allows them the opportunity to pass these street-smarts and genetic predispositions on to their offspring. As such we, the descendants of the ones who lived, are really good at recognizing patterns. So what can we see in these squiggly lines before us today?

Well, it looks like each line has a number, a comma, and an accompanying text that labels the line with the earlier number, only here written out in English words. We’re going to make a leap now, and presume the numbers are line numbers, the text reenforcing this fact, and the comma, well, is a comma. Matching up each line to its physical line-number in the file would thus leave conspicuous holes in the location of any missing lines. If we figure out what those holes are we can then report their line-numbers. Then having done our structural analysis the user can take things from there.

PERL 5 SOLUTION

The meat of the solution is to split each line at the comma. We can add a an optional parameter to split to limit the number of elements in the resultant list to 2, the leading number and the following text. The comma, having done its duty, is removed. Having created a tuple, the split string can be assigned as a key and value pair to a lookup hash. If we find the largest value from the list of hash keys that puts an upper bound on the expected line numbers in our file. We can then iterate through a list of numbers up to the maximum and see whether a hash key exists for each line. Better yet, we set this up as a filter on the list of line numbers, only allowing through those lines that have no keys. These are the missing lines.

In the absence of more detailed information we’ve made more than a few assumptions here. Our report presenting our findings is longer than our logic, but the two list processing operations drive a lot of data-manipulation in a very compact form. I love working with the Perl list functions.

use warnings;
use strict;
use utf8;
use feature ":5.26";
use feature qw(signatures);
no warnings 'experimental::signatures';
use List::Util qw( max );

my %lookup  = map  { split /,/, $_, 2 } <> ;
my @missing = grep { ! exists $lookup{$_} } (1..max(keys %lookup));

## output
if (my $count = scalar @missing) {
    say $count, ($count == 1 
        ? " line is "
        : " lines are "), "missing:";
    say "line $_" for @missing;
}
else {
    say "all lines accounted for!";
}

The results of our efforts is a nice little report on the integrity of the file.

I took the liberty of striking out another random line from the file to jazz things up. Hope you like it:

[colincrain@boris:~/Code/PWC]$  perl misng-in-seq.pl missing.txt 
2 lines are missing:
line 4
line 12

raku solution

In Raku we get to use some proper Set Theory, which is cool, to say the least. There is a bit of explicitly coercing numbers to strings to numbers and back to strings going on, because hash keys are strings and using the Raku set operations the string “10” is not the same thing as the number 10, so the number will never be an element of the set. I’m not sure what I think of this, but it appears to be the case. We just need to keep our numbers and strings straight, which we’re not used to being required to do in Perl. Such is life. It did, however allow me to catch a bug where coercing the keys to numbers before determining the maximum value ended up counting the number of keys, so the range was 1 to 13 and I was never checking keys “14” and “15”. All this worked just fine under the hood in Perl, but at least the type behavior in Raku is consistent. Here we need to tell max to calculate its result on the list elements as integers, rather than the list as an integer, which, as we noted, is its length.

unit sub MAIN ( $file = './missing.txt' ) ;

my %set = $file.IO
    .lines
    .map( *.comb: /\d+/, 1 )
    .Set;
    
my @missing =  (1..%set.keys.max({$_.Int})).grep: { $_.Str ∉ %set } ;

if @missing.elems -> $count {
    say $count, ($count == 1
        ?? " line is "
        !! " lines are "), "missing:";
    "line $_".say for @missing;
}
else {
    say "all lines accounted for!";
}

episode two:
“Only One Peak, But So Many Ways Down the Mountain”

task 2

Find Possible Paths

Submitted by: E. Choroba

You are given size of a triangle.

Write a script to find all possible paths from top to the bottom right corner.

In each step, we can either move horizontally to the right (H), or move downwards to the left (L) or right (R).

BONUS: Try if it can handle triangle of size 10 or 20.

Example 1:

Input: $N = 2

           S
          / \
         / _ \
        /\   /\
       /__\ /__\ E

Output: RR, LHR, LHLH, LLHH, RLH, LRH

Example 2:

Input: $N = 1

           S
          / \
         / _ \ E

Output: R, LH

Method

When I first looked at this problem I didn’t see the forest from the trees, so to speak. We have in front of us something that looks a lot like a tree, binary or similar, only more interconnected: there’s a familiar parent-child relationship but also some limited parent-parent bonding, which is a curious twist. The potential pathways we do have are restricted according to a set of rules; the technical term for this is a Directed Acyclic Graph, or DAG.

DAGS are fun, and can pop up to model a variety of real-world scenarios. Two approaches immediately suggest themselves:

treat the problem as a tree-traversal through some sort of node structures, or
alternately perhaps as a graph of edges and vertices, using either nodes or a specialized graph framework to abstract the system.

In either case, though, we start at the top and then proceed to ricochet down from node to node like a multiplexed pachinko machine tracing all the possible pathways.

In my first stab at it I modeled this behavior in a two-step process. Starting at the top of the pyramid we descend downward, choosing either a left-hand L path or a right-hand R. Lists of pathways are collected to get to each possible base position at every level: at the first level these are the nodes 2 and 3, in the second 4, 5, and 6:

                                     1
                                ↙︎     ↘︎
                            ↙︎             ↘︎
                         2     →     →     3
                    ↙︎     ↘︎            ↙︎     ↘︎
               ↙︎              ↘︎    ↙︎             ↘︎
            4     →     →     5     →     →     6
       ↙︎     ↘︎             ↙︎     ↘︎             ↙︎     ↘︎
   ↙︎             ↘︎     ↙︎             ↘︎    ↙︎              ↘︎
7     →     →     8     →     →     9     →     →     10

Descending the triangle from the apex all of the downward paths leading to the nodes found at each level are appended with the appropriate character, and when combined these become the set of all possible ways to get to that position. Or almost: once this has been done, however, we aren’t quite finished, because we must also account for horizontal movement — each position can be approached from the left as well. So, for example, we will still need to add all the paths to 5 to the paths to 6, with an additional H appended. Because the Hs will compound, we need to do these assignments from left to right.

We can’t just compute the right corner of each level; we will need to complete the lists of paths for each position from left to right across the nodes, and we further need to keep the partial path data around to compute the positions on the next level. In the end once we’ve added the horizontal pathways on the very last level, the solution is the set of paths for the rightmost position. This we begin to see is going to start becoming a very long list of options, as every option at each vertex creates a new valid pathway and all of these can be traced to the lower right corner. Perhaps we could compute the number of paths mathematically, but it’s clear from the examples we want to know exactly what the descriptions of those paths are. So we are stuck assembling potentially a very large number of strings to describe all the ways to go.

This technique is all well and good, and works well, just computing whatever data is relevant at a given placement and extending the solutions from there, adding to each partial solution as we go in a large multidimensional array.

But then I went sideways.

I started thinking about the triangular numbers we can see down the right hand side, and triangular matrices in general, and then I thought maybe we could do something a little more… dynamic.

All this talk of matrices, partial solutions and the lower right corner holding the answer made me consider a dynamic computing paradigm, with all the usual accompanying haziness surrounding a specific definition of what exactly that means. A matrix of partial solutions to a complex problem, assembling the solution algorithmically as we traverse across the multidimensional space — sounds a lot like dynamic programming to me. It’s not a perfect fit, but dynamic programming isn’t the most rigorously defined concept itself. We aren’t really making a dynamic decision at each cell, per se, but then again we could also say our decision is very simple: aggregate. Let’s go with it and see what happens.

So how can we use this thinking to adapt our algorithm? It all has to do with the movement: down and to the right. The real insight came when I skewed the triangle so the left side was a vertical axis. Then it began to come together: we can move down, or to the right, or in a combined move down-and-to-the-right motion. Notice an inexorable pattern?

1
↓ ↘︎
↓      ↘︎
2 → → 3
↓ ↘︎    ↓ ↘︎
↓     ↘︎ ↓    ↘︎
4 → → 5 → → 6
↓ ↘︎    ↓ ↘︎      ↓ ↘︎
↓     ↘︎ ↓    ↘︎  ↓      ↘︎
7 → → 8 → → 9 → → 10

When we look at the now-rectangular grid as a matrix, every cell is connected to between one and three others. This still seems quite complex, with little gain, until we consider the left-hand column, where each cell descending can only be reached from the one above it.

According to the rules, each directional pathway is associated with a label; our skewing slightly alters the topology but doesn’t really fundamentally change anything. This left column can only be composed of chains of Ls, one more added at each level. That’s not only not complicated, but easy, and can be precomputed from the size of the triangle. From there we can move a step rightward and compute the second column values from top to bottom, from the first column values or elements directly above if any. Then we can toss out the first column values as we no longer need them. Now instead of growing, every column we compute will be one element shorter than the previous, until the last column only contains the single final result list.

PERL 5 SOLUTION

In the first incarnation, we descend the triangle layer by layer. Even if we only keep the data for a solitary layer as we compute the next, the required number of positions grows as we descend, as the triangle base gets larger. We need to make two passes, first for the vertical connections, then the horizontal, compounding additional Hs as we move left to right.

my $tri_size = shift // 2;
my @mat = ( [ '' ] );

for (1..$tri_size) {
    my @level;
    for my $pos (0..@mat-1) {
        for my $sol ( $mat[$pos]->@* ) {
            push $level[$pos]->@*   , $sol . 'L';
            push $level[$pos+1]->@* , $sol . 'R';
        }
    }
    for my $pos (1..@level-1) { 
        push $level[$pos]->@*, map { $_ . 'H'} $level[$pos-1]->@*;
    }
    @mat = @level; 
}

say "results: ", scalar $mat[-1]->@*;;
say $_ for $mat[-1]->@*;

In the second algorithm we move across the data sideways instead of vertically. As before we discard data we no longer need, but the number of positions to compute is reduced as we move towards the lower right corner. The number of solutions at each position can still grow massive for large sided triangles — there’s no getting around that — but memory is managed more efficiently along the way, and we only need a single iteration at each step.

my $tri_size = shift // 10;
my $mat = [ map { ['L' x $_] } (0..$tri_size) ];   ## construct the left column

for my $pos (1..$tri_size) {            
    my @next;
    for my $level ($pos..$tri_size) {      ## column position, working downward
            push $next[$level]->@*, (map { $_ . 'L' } $next[$level-1]->@*) 
                if defined $next[$level-1];
            push $next[$level]->@*, (map { $_ . 'R' } $mat->[$level-1]->@* );
            push $next[$level]->@*, (map { $_ . 'H' } $mat->[$level]->@*)
    }
    $mat = \@next;
}

say "results: ", scalar $mat->[-1]->@*;;
say $_ for $mat->[-1]->@*;

Running the count of solutions through the Online Encyclopedia of Integer Sequences finds sequence

A006318

Large Schröder numbers (or large Schroeder numbers, or big Schroeder numbers).

1, 2, 6, 22, 90, 394, 1806, 8558, 41586, 206098, 1037718, 5293446,
27297738, 142078746, 745387038, 3937603038, 20927156706, 111818026018,
600318853926, 3236724317174, 17518619320890, 95149655201962,
518431875418926, 2832923350929742, 1552146764887509

The number of solutions to the triangle of a given size T(n) is the next-larger Schroeder number, S(n+1).

I was able to verify results to n = 13, or 142,078,746 paths, just counting the paths and not printing them. At T(13) we still started running out of memory, as you can see by the system time spend swapping to the SSD. Satisfied, I didn’t try for T(14).

[colincrain@boris:~/Code/PWC]$  time perl triangular-tour.pl 13
result:
142078746

real	6m52.523s
user	2m42.176s
sys	2m48.754s

On the subject of triangles of size 10 and 20: well 10 is reasonable, with the improved algorithm computing all 1,037,718 paths in less than a second. However we can see the Large Schröder number S(21), corresponding to a triangle of size 20, is 17,518,619,320,890. I think we can safely say that 17 trillion strings isn’t really a practical thing to try and produce.

Raku Solution

unit sub MAIN (Int $tri-size = 5) ;

my @mat = (0..$tri-size).map({ ('L' x $_ ).Array });

for 1..$tri-size -> $pos {
    my @next;
    for $pos..$tri-size -> $level {
        @next[$level-1].defined && 
            @next[$level].push: |@next[$level-1].map({ $_ ~ 'L' });
        @next[$level].push: |@mat[$level-1].map({ $_ ~ 'R' });
        @next[$level].push: |@mat[$level].map({ $_ ~ 'H' });
    }
    @mat = @next;
}

say "results: ", @mat[*-1].elems;;
.say for |@mat[*-1];

The Perl Weekly Challenge, that idyllic glade wherein we stumble upon the holes for these sweet descents, is now known as

The Weekly Challenge – Perl and Raku

It is the creation of the lovely Mohammad Sajid Anwar and a veritable swarm of contributors from all over the world, who gather, as might be expected, weekly online to solve puzzles. Everyone is encouraged to visit, learn and contribute at

https://perlweeklychallenge.org

Tag: Directed Acyclic Graph

Who Went Missing on the Triangular Tour?

THE WEEKLY CHALLENGE – PERL & RAKU #117

Task 1

Missing Row

Method

The Plan

PERL 5 SOLUTION

raku solution

task 2

Find Possible Paths

Example 1:

Example 2:

Method

PERL 5 SOLUTION

Raku Solution