September 2020 – Programming Excursions in Perl and Raku

Wherein we look down from above to measure the depth of dark trapped water and take the time to number those positives that remain when we view the world in black and white.

episode one:
“Be Positive, You Count”

TASK #1 › Count Set Bits

Submitted by: Mohammad S Anwar

You are given a positive number $N.

Write a script to count the total numbrer of set bits of the binary representations of all numbers from 1 to $N and return $total_count_set_bit % 1000000007.

Example 1:

Input: $N = 4

Explanation: First find out the set bit counts of all numbers i.e. 1, 2, 3 and 4.

    Decimal: 1
    Binary: 001
    Set Bit Counts: 1

    Decimal: 2
    Binary: 010
    Set Bit Counts: 1

    Decimal: 3
    Binary: 011
    Set Bit Counts: 2

    Decimal: 4
    Binary: 100
    Set Bit Counts: 1

    Total set bit count: 1 + 1 + 2 + 1 = 5

Output: Your script should print `5` as `5 % 1000000007 = 5`.

Example 2:

Input: $N = 3

Explanation: First find out the set bit counts of all numbers i.e. 1, 2 and 3.

    Decimal: 1
    Binary: 01
    Set Bit Count: 1

    Decimal: 2
    Binary: 10
    Set Bit Count: 1

    Decimal: 3
    Binary: 11
    Set Bit Count: 2

    Total set bit count: 1 + 1 + 2 = 4

Output: Your script should print `4` as `4 % 1000000007 = 4`.

Method

The way I see it, there are three parts to this task. Over a loop of values, we need to first create a binary representation of a number, then count and sum the 1s present in those values. For the third and puzzlingly, seemingly unrelated part, we then modulo the total by one billion and seven.

In weeks past, we’ve found easy ways to convert decimal to binary. Then, for counting the 1s, well this is the same as summing every digit; these, being binary, are only ever going to be either 1 or 0. Breaking the digit string into a list of elements and then adding these together accomplishes this well; the single-number count is then added to a running total for all of the binary numbers of 1 up to the target value.

As for the modulo phase, a little explanation is in order. The value 1000000007 is ultimately arbitrary, and is selected because it:

is large, but not too large, and
is prime

Beyond these criteria, there is no meaning behind that specific choice of number, and 1,000,000,033 would do just as well, or 2,000,000,033. Speaking to the first point, that the number is large, what this selection does is produce a verifiable, reproducible result that fits into common integer data types without any risk of overflow. Apparently this value has become the go-to choice for coding competitions where the result need to be controlled for the specific architecture used to calculate it.

By use of some mathematical equalities, the taking of the modulo value can even be done at every step of a calculation involving very, very large numbers, constructing the “correct” result without ever internally exceeding the range of a 32-bit integer. Although sometimes easily accommodated, this is not always the simplest thing to do to every algorithm, and here there is no requirement to process either an unusually big range or for that matter work properly on 32-bit machines. So even should we wish to include values past 2³² in our intermediary steps, the 64-bit normal architecture these days gives us considerably more range.

So, sure. But what use is all this taking of the modulo beyond creating a unique answer? It kind of insults my data processing sensibilities as the result created exists in a tautological space: the answer is created this way because it is the answer. There is surely use for techniques to work with theoretically huge numbers while actually manipulating smaller, modulo representations of them; this is a very interesting and complex idea that shows up in cryptography computations, for instance. And crypography, it should be noted, is non-trivial. But that doesn’t seem relevant to be the problem space here. To me it seems to pollute the integrity of the data to no great purpose; the only saving grace is the point at which we become a one-way function is very large and hence not likely to come into play.

In any case the bottleneck here never was the accumulated value but rather the very large string representation of a binary number. A check over at the OEIS reveals no particularly easy way to generate the sequence mathematically; perhaps the best plan would be to assault the number using powers of two to chop it recursively into more manageable pieces and get the counts of 1s for those. It gets really messy really fast, and although I believe it may provide a way to compute the 1_000_000_000_000th member of this series without the need to create the terabyte-sized string to write it as 1s and 0s first. Then again it seems to require a terabyte sized sequence of smaller sums to do it so maybe nothing is gained. I’m pretty confident I’m over-thinking this by this point.

So we’re not going to bother to do that. I do wonder if anyone will.

Perl Solution

My first thought was to count the 1s involved chopping the binary string representation into an array of elements, then sum the elements. It’s a very easy to follow idea, but on reflection involves creating a potentially very large intermediary array at every iteration should the binary string be itself large, and the length of the binary string is already going to be about three times the size of a given decimal value, give or take. This ends up being a lot of copying data to no great end, so I came up with a different way.

Using an iterator over the length of the string, we can instead leave it intact, non-destructively examining each character in place using substr(). In this form, we don’t even need to copy the binary representation at all, and the whole thing speeds up around 100%. Nice.

[colincrain:~/PWC]$  perl 79_1_count_set_match.pl 10000000
out:  114434632

use warnings;
use strict;
use feature ":5.26";

## ## ## ## ## MAIN:

my $input = shift // 100;
my $tot;

for my $i (1..$input) {
    my $bin = sprintf "%b", $i;
    my $j = length $bin;
    while (--$j >= 0 ) {
        substr( $bin, $j, 1 ) and $tot++;
    }
}
my $out = $tot % 1000000007;

say "out:  $out";

Raku Solution

In Raku things get quite condensed, to say the least, when we break apart the string into an array and sum the contents. This isn’t perhaps the most efficient way to go about things, but it’s certainly concise.

unit sub MAIN (Int $input = 100000) ;

my $tot += .base(2).comb.sum for ^$input;

say "input: $input";
say "total: ", $tot %= 1000000007;

episode two:
“Still Waters Run Deep”

TASK #2 › Trapped Rain Water

Submitted by: Mohammad S Anwar

You are given an array of positive numbers @N.
Write a script to represent it as Histogram Chart and find out how much water it can trap.

Example 1:

Input: @N = (2, 1, 4, 1, 2, 5)
The histogram representation of the given array is as below.

     5           #
     4     #     #
     3     #     #
     2 #   #   # #
     1 # # # # # #
     _ _ _ _ _ _ _
       2 1 4 1 2 5

Looking at the above histogram, we can see, it can trap 1 unit of rain water between 1st and 3rd column. Similarly it can trap 5 units of rain water between 3rd and last column. Therefore your script should print 6.

Example 2:

Input: @N = (3, 1, 3, 1, 1, 5)
The histogram representation of the given array is as below.

     5           #
     4           #
     3 #   #     #
     2 #   #     #
     1 # # # # # #
     _ _ _ _ _ _ _
       3 1 3 1 1 5

Looking at the above histogram, we can see, it can trap 2 units of rain water between 1st and 3rd column. Also it can trap 4 units of rain water between 3rd and last column.
Therefore your script should print 6.

Method

Water falls from the sky and gathers, so that’s where we’ll start: at the top, descending. As we traverse the range from maximum to minimum, each level of the histogram is systematically examined, and an array is created from the indices of elements that extend to that height or beyond. A single result represents the solitary highest point that in itself cannot contain a volume, but any two adjectant elements in this array represent a gap that, given the opportunity, will fill with water.

Water is assumed to collect in the interstitial areas up to the boundaries of the histogram columns, thus at every level the non-inclusive spaces between a pair of indices represents an expanse of water at a depth of one unit. Every pair of indices together therefore expresses a volume of water, and by examining each set in turn, a running tally gathers the total amount of units counted so far. Adjacent indices, representing a level plain, will simply have a gap of zero and will not fill. Repeating this process as we descend down the histogram until we arrive at the minimum value ultimately yields the total volume of water trapped by the whole structure.

It’s funny about these challenges: first we calculated the internal rectilinear volumes within a histogram, assessing the spaces contained within the terrain described. Now we’ve calculated the volume of the mountainside lakes formed by the peaks. Next perhaps we can calculate the surface area of the water to determine how many mosquitos can breed, which will in turn determine the number of bats the ecosystem can support, which perhaps can live in caves in the interior spaces. Just in time for Halloween.

Perl Solution

Of note in the Perl 5 solution is the simultaneous calculation and assignment of both the minimum and the maximum of the array by taking a slice of the sorted list, which I thought was a nicely efficient way of going about things.

To calculate our pooled water, we digest the @peaks array by shifting off one member and comparing it to the head of the remaining array; the difference of these indices, minus 1, is the size of the gap between the two. When the array size drops to one there can be no more gaps and we move to the next level.

I decided it would be nice to provide a visual aide and draw the histogram, and so wrote a routine to do this. When I was finished I noticed how similar the top-to-bottom looping was to what I was already doing, so instead I decided to combine the two, and condensed drawing the data to a single line inline to the other processing. Normally I’d say output should be always be compartmentalized to its own logical space, but this was too clean to pass up. As drawing the histogram is itself rather whimsical, breaking my own self-imposed norms could be considered more of the same. I am nothing if not consistent in my whimsy.

[colincrain:~/PWC]$  perl 79_2_water_inside_a_duck.pl 
histogram:

6   #        
5   #       #
4   #       #
3 # # #     #
2 # # #     #
1 # # # # # #
- - - - - - -
  3 6 3 1 1 5

volume: 10

use warnings;
use strict;
use feature ":5.26";

## ## ## ## ## MAIN:

my @input = @ARGV;
@input = (3,6,3,1,1,5) if scalar @ARGV == 0;

my ($min,$max) = (sort @input)[0,$#input];

say "histogram:\n";

my $vol;
for my $level (reverse($min..$max)) {
    my @peaks = grep { $input[$_] >= $level } (0..$#input);

    ## draw the histogram while we're here
    say "$level ", join ' ', map { $input[$_] >= $level ? '#' : ' ' } (0..$#input);

    while (scalar @peaks > 1) {
        my $start_idx = shift @peaks;
        $vol += $peaks[0] - $start_idx - 1;
    }
}

## out
say join ' ', ("-") x (scalar @input + 1);
say '  ', join ' ', @input;

say "\nvolume: $vol";

Raku Solution

In Raku we get min and max out of the box, so we can skip that step right off the bat. Also we get to use rotor to process our @peaks array, which avoids the somewhat clunky and not exactly intuitive “shifting and comparing to the first element” process for gathering successive index pairs in that array. So that’s nice.

The function rotor(2=>-1) says to divide the designated array into slices of 2 elements, and after each slice, backtrack 1 index from the end of the last selection. In this way we incrementally grab out pairs to the topic, the first composed of the first and second element, the second the second and third, until no more complete groups can be made. I cannot express how handy this behavior is in automatically selecting views of a list.

After that we again print relevant data for the histogram while we’re looping through the levels, only in this case we can be a little more sophisticated with our value field, using fmt much like sprintf to print the data in a 2-character, left-justified space. And then, of course, once I had done that I had to accommodate 2-digit data on the bottom line as well, which in turn involved widening histogram columns a bit to match. Whew! I suppose I could have found the widest space required to draw that largest value and automatically scaled everything to that, but we’re getting a little involved for a simple demonstration here, no? Perhaps I’ll get to it, or then again maybe I’ll leave it as an exercise for the reader. It’s not a particularly hard task, but after a while all this digression does start to distract from the main course. I think 2 digits is enough to get the idea, so there you go.

[colincrain:~/PWC]$  raku 79-2-water-inside-a-duck.raku 9 12 10 7 8 6 9 8
histogram:

12 |    ##                  
11 |    ##                  
10 |    ## ##               
9  | ## ## ##          ##   
8  | ## ## ##    ##    ## ##  
7  | ## ## ## ## ##    ## ##  
6  | ## ## ## ## ## ## ## ##
---+------------------------
   | 9  12 10 7  8  6  9  6  

volume: 6

unit sub MAIN (*@input) ;

@input.elems == 0 and @input = 9,12,10,6,8,6,9,6; 
my $vol;
say "histogram:\n";

for (@input.min..@input.max).reverse -> $level {
    my @peaks = @input.keys.grep({ @input[$_] >= $level });
    $vol += ($_[1] - $_[0] - 1) for @peaks.rotor(2=>-1);
    
    ## draw the histogram while we're looping through the levels
    say $level.fmt("%-2s | "), (^@input).map({ @input[$_] >= $level ?? '##' !! '  ' }).join: ' '; 
}

## out
say '---+' ~ '---' x @input.elems;
# say ('   |', |@input, "\n").join: ' ';
say '   | ' ~ @input.map(*.fmt("%-3s")).join ~ "\n";

say "volume: ", $vol;

Month: September 2020

Count, Set, Match: Standing Water in Mountain Pools

TASK #1 › Count Set Bits

Example 1:

Example 2:

Method

Perl Solution

Raku Solution

TASK #2 › Trapped Rain Water

Example 1:

Example 2:

Method

Perl Solution

Raku Solution