# Drawing a Line from Root to Leaf to Table

Wherein we find the through lines that define us and the values accrued along the way.

#### THE WEEKLY CHALLENGE – PERL & RAKU #93

episode one:
“A Bore is a Straight Line That Find a Wealth in Division”

### Max Points

##### Submitted by: Mohammad S Anwar

You are given set of co-ordinates @N.

Write a script to count maximum points on a straight line when given co-ordinates plotted on 2-d plane.

##### Example 1:
``````|
|     x
|   x
| x
+ _ _ _ _

Input: (1,1), (2,2), (3,3)
Output: 3
``````
##### Example 2:
``````|
|
| x       x
|   x
| x   x
+ _ _ _ _ _

Input: (1,1), (2,2), (3,1), (1,3), (5,3)
Output: 3``````

#### Method

Two points determine a line. Thus between every two points given a line segment can be scribed, and we have created a line segment. This segment in turn exists on a mathematical construct extended infinitely in both directions.

Our task is to see how many of these infinite lines, one for each pair of points, overlap, and to see if they are in fact the same line — at its heart this is a problem of combinatorics, of combinations of two points. A single point can be determined to either fall or not on a given line, but we first need a line to check against, and the only way to find those is to look at pairs of points. In fact, if we’re going to look at every pair of points anyway, we already have all the information we need to derive a solution.

• Every pair of points determines a line segment AB with endpoints (x1, y1) and (x2, y2).
• Use those points to find a line equation in the form y = mx + b
• points in the segments sharing the same line are in the same line, right? (spoiler: yes)

Each segment will fall on exactly one infinite line of the form y = mx + b, with m being the slope of the line and b the intersection with the y-axis.

To calculate the slope and constant of the through line from the pairs of points we calculate the delta for the ys over the delta for the xs. The y-offset will be the same for both points, so we can use either to calculate b:

m = (y2y1) / (x2x1)

y = mx + bb = y1mx1

Once we have a pair of coefficients (m,b), these can be turned into a unique hash key. Segments on the same line will find their way into the same hash bucket, and we can count them.

It’s often repeated that one should avoid using floats for hash keys, as the type cannot perfectly represent certain numbers. Perl will quite happily stringily our numbers for us, but as we’re dealing with pairs of numbers anyway, we’ll provide our own routine for generating keys using `sprintf`.

So far so good, but there’s a few problems with this plan. The first is asymptotic: in the slope-intercept equation of a line, when the line approaches vertical, the slope becomes infinite. Allowing for this and providing “Inf” for a value doesn’t solve the problem either, because at that point the x-axis offset becomes blurred as well. Normally we’d just define the line as “x = 3” or such and be done with it, but there’s no allowance made for this at the moment. I’m also rather concerned about lines that are getting very close to vertical and losing precision. How many 0s should we include, anyway?

The thing is, we don’t really care what the equation actually is. All we care about is a set of coefficients that uniquely identify a given line. One way to get out is provide more coefficients. If we ran the equations as “run over rise” at the same time we calculate the normal slope we can hand over that version as well. It’s not a very useful equation in the conventional sense, but here it serves our purpose just fine. When the normal slope goes infinite the inverted slope goes to 0, with the x-intecept recorded. Given the 4 values, we can hand them over to an exponential format in `sprintf`, use a lot of zeros, and create some arbitrarily long and complicated hash key for the lookup. Again, all we need is a unique mapping.

The other problem is a little bit stickier in that we haven’t actually answered the right question. What we’ve done is found out how many segments share the same line, not how many points, and considering the points we’re looking for are by definition colinear, multiple segments will end up sharing a common point. In fact, each point will have a line segment running to every other collinear point. For example, running the combinations on the set

{ (1,1), (2,2), (3,3), (4,4) }

gives a total of 6 line segments between the various points, and it should be obvious they all lie on the same line. What we’ve done here is counted the binomial coefficient n-choose-2, rather than the number of of individual points on the line. What to do? Binomial coefficients are all over the place, but running the equation for binomial coefficients backwards isn’t something commonly done. I thought about this for a while, and here’s where we can be a little clever:

The formula for the binomial coefficient for n chose 2 is

C(n,2) = n (n – 1) / 2

which, being quadratic, is not exactly difficult but kind of bother to solve for n. But with the way we’re counting line segments here, n will, for one, always be positive, and (n – 1) will always be less than n. It follows that n2 > n(n – 1), and (n – 1)2 < n(n – 1). Which means that if we take the square root of twice our value and round up, that’s n.

Let C(n,2) = B
n (n – 1) / 2 = B
n (n – 1) = 2B
n – 1 < √2B < n
→ n = ⎡ √2B ⎤

That’s the ceiling function at the end. Nice.

##### PERL 5 SOLUTION

Honestly at the end of the methodology outlined above, we’ve provided a solution to the challenge. We’ve found the quantity of collinear points on the space. Which is just a dimensionless number, like 3.

Which, frankly, isn’t very satisfying.

What, for example, are those points? What can we say about the line? These are the obvious questions, and shall we say they… bothered me. Nagged. Hung over like a dark, oppressive cloud. So I stayed up and altered my nice clean code, grafting in some containers to keep track of everything throughout. Which is kind of surprisingly messy. The data structures go unusually deep, with hashes of arrays of arrays of arrays, the last being the pairs of points in the space. Dereferencing became non-trivial, but certainly doable.

I took the tack of welding on the new capability on top of the old, rather than a proper refactor. The first hash, for example, which counts the instances, could be eliminated, as the value matches the size of the array of collected points in the second one I grafted on. The clever binomial trick could be eliminated too, as being superfluous — we have the actual points to count in the end. But where’s the fun in that? I also added a random point generator so we could realistically look at large data sets without having to add them in.

```random selection of 1000 points in a 1000 x 1000 point space

binomial is 15
quantity is 6
points are:
[ 567 849 ]
[ 721 541 ]
[ 789 405 ]
[ 808 367 ]
[ 824 335 ]
[ 956 71 ]

line is y = -2 x + 1983

```

As might well be expected, as the number of random points increases the longest collinear set of points is more likely to fall a line with a simpler slope coefficient.

``````use warnings;
use strict;
use feature ":5.26";
use Algorithm::Combinatorics qw( combinations );
use List::Util qw( max );
use POSIX qw( ceil );

## ## ## ## ## MAIN:

my \$RANDOM = 200;     ## set to sub in random number of data points

## input array is aggregated point pairs: 1, 1, 2, 2, 3, 4 -> (1,1,) (2,2) (3,4)
my @input = @ARGV;
my @points;
my %lines;
my %l2;
my %line_lookup;
my (\$x, \$y);
while (@input > 0) {
(\$x, \$y, @input) = @input;
push @points, [\$x, \$y];
}

if (\$RANDOM) { @points = random_points(\$RANDOM) } ;

## make combinations of points and hash line coefficients between them
my \$iter = combinations( \@points, 2 );
while ( my \$c = \$iter->next ) {
my \$line = get_line(\$c);
my \$key = make_hashkey(\$line);
\$lines{\$key}++;                     ## the simple counting hash
\$line_lookup{\$key} = \$line;         ## added: xref to remember line coordinates for output
}

## calculate the simple solution
my \$max = max( values %lines );
say "binomial is \$max";
say "quantity is ", reverse_binomial( 2 * \$max );

## added: create a more detailed report
my %inline;
my (\$k, \$v);
while  ( (\$k,\$v) = each %l2 ) {
if (scalar(@\$v) == \$max) {
for my \$c (@\$v) {
for my \$p (@\$c) {
say "point @\$p";
my (\$s, \$t) = @\$p;
\$inline{ "\${s}_\${t}" } = \$p;
}
}
last;
}
}

say '';
say "points are: ";
say "[ \$inline{\$_}->@* ]" for sort {    \$inline{\$a}->  \$inline{\$b}->
|| \$inline{\$a}->  \$inline{\$b}->  } keys %inline;

say '';
if (\$line_lookup{\$k}-> == "Inf") {
say "line is x = \$line_lookup{\$k}->";
}
else {
say "line is y = ", \$line_lookup{\$k}->, " x + ", \$line_lookup{\$k}->;
}

## ## ## ## ## SUBS:

sub get_line {
my \$combo = shift;
my (\$p1, \$p2) = \$combo->@*;
my (\$m, \$b, \$n, \$c);
if ( (\$p2-> - \$p1->) == 0 ) {
(\$m, \$b) = ("Inf",0) ;
}
else {
\$m = (\$p2->-\$p1->) / (\$p2->-\$p1->);
\$b = \$p1-> - (\$m * \$p1->);
}
if ( (\$p2-> - \$p1->) == 0 ) {
(\$n, \$c) = ("Inf",0)
}
else {
\$n =  (\$p2->-\$p1->) / (\$p2->-\$p1->);
\$c = \$p1-> - (\$n * \$p1->);
}
return [\$m, \$b, \$n, \$c];
}

sub make_hashkey {
my \$key = sprintf "%.6e_%.6e_%.6e_%.6e", @\$quad;
return \$key;
}

sub reverse_binomial {
use POSIX qw(ceil);
my \$b = shift;
return ceil(sqrt \$b);
}

sub random_points {
## any set of points can be transformed into the x, y > 0 quadrant so we'll
## just start there
my \$range = \$_;
my \$count = \$_;
my @out;

for (1..\$count) {
my \$x = int(rand(100)+1);
my \$y = int(rand(100)+1);
push @out, [\$x, \$y];
}

return @out;
}
``````

episode two:
“I Must Walk a Crooked Road”

### Sum Path

##### Submitted by: Mohammad S Anwar

You are given binary tree containing numbers `0-9` only.

Write a script to sum all possible paths from root to leaf.

##### Example 1:
``````Input:
1
/
2
/ \
3   4

Output: 13
as sum two paths (1->2->3) and (1->2->4)
``````
##### Example 2:
``````Input:
1
/ \
2   3
/   / \
4   5   6

Output: 26
as sum three paths (1->2->4), (1->3->5) and (1->3->6)``````

#### Method

One of the most bothersome aspects of binary tree algorithms isn’t the manipultion of the tree itself, but rather inputting the data into a program in the first place. One way to get around the difficulties and accomplish this is to serialize the data: the tree is considered as a single continuous array of level node values, with each level containing 2 times the elements of the level previous, with a starting index immediately after the end of the previous level. Empty nodes are still allocated space in the indexing to preserve pattern continuity. In Perl null-set empty nodes would be allocated undef.

Because the levels are of known, calcuable size, any node in the tree can be directly addressed with a simple formula. The serialized transform remains a binary tree, with any action on the familiar form available to the latter, with a suitable transform applied to the function. It sidesteps a lot of mess.

The numbering of the indices follows a strict formal progression. Each level starts at the index 2^n – 1, with n being the level of the tree, starting at 0.

``````                               0
1                     2
3         4           5         6
7     8   9    10     11    12  13   14``````

Not displayed here, the next level of the tree would be level 4, comprised of 2^4 positions, with a starting index at position 2^4 – 1. A quick visual inspection confirms this.

Knowing the encoding, we can directly address individual nodes by their position in the array. For a given index n, the children, if any, for that index will be located at the positions 2n+1 and 2n+2. To find the parent, if the index is even, subtract 1, if odd, 2, then divide the result by 2.

To solve the task, we can easily set up a recursive routine to trace paths descending through every node of the tree, with a base case summing the path to the terminus and adding that total to the overall sum.

As a bonus, we’ll gather the completed paths at the termination case and report on those paths found at the end.

##### PERL 5 SOLUTION

``````[colincrain@MacBook-Pro:~/Code/PWC]\$  perl 93_2_root_to_leaf_to_table.pl 2 3 4 5 undef undef 6 7 8 9 1 2 3 4 5
sum 68

paths found:
2 → 3 → 5 → 7
2 → 3 → 5 → 8
2 → 4 → 6 → 4
2 → 4 → 6 → 5``````

I must say it was tempting to build a proper Node object, and maybe a BinaryTree to hold the metadata, but the serialized encoding makes everything so easy it was hard to justify. So all we need to do is recurse whenever a child node is defined, keeping a copy of that path with us, and if no children present themselves sum the working path. A package variable keeps the running grand total.

```use warnings;
use strict;
use feature ":5.26";

## ## ## ## ## MAIN:

# tree:
#                           3
#                4                     5
#           1         8           6         2
#        ∅     9   ∅     ∅     9     7   4     ∅
#
# paths: (3,4,1,9)  = 17
#        (3,4,8)    = 15
#        (3,5,6,9)  = 23
#        (3,5,6,7)  = 21
#        (3,5,2,4)  = 14
#                  ------
#                     90

our @tree = @ARGV;
@ARGV == 0 and @tree = (   3,
4, 5,
1, 8, 6, 2,
undef, 9, undef, undef, 9, 7, 4, undef   );
our \$sum = 0;
our @paths;

descend(0, []);

say "sum \$sum\n";
say "paths found:";
say join ' → ', @\$_ for @paths;

## ## ## ## ## SUBS:

sub descend {
my (\$idx, \$partial_path) = @_;
my @path = @\$partial_path;
push @path, \$tree[\$idx];

## base case
unless ( defined \$tree[2*\$idx+1] or defined \$tree[2*\$idx+2]) {
\$sum += \$_ for @path;
push @paths, \@path;
return;
}

for ( 1, 2 ) {
descend( 2*\$idx+\$_, \@path ) if defined \$tree[2*\$idx+\$_];
}
}

```

The Perl Weekly Challenge, that idyllic glade wherein we stumble upon the holes for these sweet descents, is now known as

The Weekly Challenge – Perl and Raku

It is the creation of the lovely Mohammad Sajid Anwar and a veritable swarm of contributors from all over the world, who gather, as might be expected, weekly online to solve puzzles. Everyone is encouraged to visit, learn and contribute at

https://perlweeklychallenge.org