Time Out-of-Joint — at Set Intervals – Programming Excursions in Perl and Raku

Work, eat, sleep, work, eat, sleep, work…

THE WEEKLY CHALLENGE – PERL & RAKU #127

episode one:
“Not One of Us”

Task 1

Disjoint Sets

Submitted by: Mohammad S Anwar

You are given two sets with unique integers.

Write a script to figure out if they are disjoint.

The two sets are disjoint if they don’t have any common members.

Example

Input: @S1 = (1, 2, 5, 3, 4)
       @S2 = (4, 6, 7, 8, 9)
Output: 0 as the given two sets have common member 4.

Input: @S1 = (1, 3, 5, 7, 9)
       @S2 = (0, 2, 4, 6, 8)
Output: 1 as the given two sets do not have common member.

Method

When we use the word “set”, what we mean — at least in the land of core vanilla Perl — could be a number of things. Here we are given unique arrays, which is ok, because arrays have elements like set members, and we can say the same things about these elements as we can members, such as an element with this value exists or not. On the other hand, elements have an order and a placement, which set members do not. Viewed this way, it’s really more accurate to consider the set as being the collection of values of the elements, not the elements themselves. The array is not the set, but rather a means for addressing the specific members of the set should we wish to select one or many for some operation.

Another imperfect way to store a set in Perl would be to use a hash, with the members being the keys. This data structure more closely mimics the model from set theory, but as stated is not itself a perfect fit either. For one, anything can be a member of a set, in theory, including another set, but a hash key is limited to anything that can be stringified. This is not a terrible limitation in practice, generally, but remains one nevertheless . Hash keys offer excellent random access using the exists and delete keywords, and provide this access completely independent of scaling, from tens to tens of millions of elements.

A disadvantage to modeling a set with a hash is that inputting hash values is problematic compared to using an array, which has a natural extension into a list of command line values, and another difference to take under consideration is that in this model we have the value for the key unallocated, so we have that association waiting around for us should we want to use it, free of charge. In a multiset, that is a set allowing repeated elements, we could record a quantity in the value, creating a structure known as a bag. This idea makes tallying frequency distributions extremely easy. The keys and values can also be used together with references to create nested data structures, such and trees.

This use of key-value association doesn’t seem like what we might traditionally consider set behavior, but this relationship itself can be modeled in set theory, bringing the whole affair around full-circle. So hashes really are sets after all!

Right? No seriously, right? Well I suppose that could be considered a little confusing. Personally I consider hashes a better fit, but arrays can be an easier choice for I/O. The thing is, both arrays and hashes can be considered “right”, as set theory doesn’t care how you write the sets; the relationships work the same whether you draw lists of bracketed items: “{apple, pear, handgun}” or circles with shapes in them. The notation is irrelevant to the underlying logic, but it can make that logic easier or harder to use.

PERL 5 SOLUTION

Keeping to the spirit of the set theory data model, we’ll input an array as in the examples, but immediately convert that data into a hash for processing.

The I/O portion of the script, getting the array sets into their containers to be operated upon is left as an exercise to the reader.

By converting the array to a hash with a simple mapping, keeping the value undefined, we can, as suggested above, check for existence in the set by using the exists keyword. The second set is kept as an array to make iterating across its values a simple, straightforward operation. If none of the elements in the second set are found in the first, then we can conclude that the sets are disjoint.

use warnings;
use strict;
use utf8;
use feature ":5.26";
use feature qw(signatures);
no warnings 'experimental::signatures';


my @S1 = (1, 2, 5, 3, 4);
my @S2 = (4, 6, 7, 8, 9);

say disjoint(\@S1, \@S2);



sub disjoint ($s1, $s2) {
    my %sethash = map { $_ => undef } $s1->@*;

    for my $member ( @S2 ) {
        return 0 if exists $sethash{"$member"};
    }   
    return 1;
}

episode two:
“Anger Management 101 – Identify the Problem “

task 2

Conflict Intervals

Submitted by: Mohammad S Anwar

You are given a list of intervals.

Write a script to find out if the current interval conflicts with any of the previous intervals.

Example

Input: @Intervals = [ (1,4), (3,5), (6,8), (12, 13), (3,20) ]
Output: [ (3,5), (3,20) ]

    - The 1st interval (1,4) do not have any previous intervals to compare with, 
          so skip it.
    - The 2nd interval (3,5) does conflict with previous interval (1,4).
    - The 3rd interval (6,8) do not conflicts with any of the previous intervals 
          (1,4) and (3,5), so skip it.
    - The 4th interval (12,13) again do not conflicts with any of the previous intervals
          (1,4), (3,5) and (6,8), so skip it.
    - The 5th interval (3,20) conflicts with the first interval (1,4).

Input: @Intervals = [ (3,4), (5,7), (6,9), (10, 12), (13,15) ]
Output: [ (6,9) ]

    - The 1st interval (3,4) do not have any previous intervals to compare with,
          so skip it.
    - The 2nd interval (5,7) do not conflicts with the previous interval 
          (3,4), so skip it.
    - The 3rd interval (6,9) does conflict with one of the previous intervals 
          (5,7).
    - The 4th interval (10,12) do not conflicts with any of the previous intervals 
          (3,4), (5,7) and (6,9), so skip it.
    - The 5th interval (13,15) do not conflicts with any of the previous intervals 
          (3,4), (5,7), (6,9) and (10,12), so skip it.

Method

Although the examples seem to imply that the intervals are perhaps sorted on their upper bound, a couple of points come to mind. Starting from the first:

Um, what?
Is that really a way to sort intervals? I mean, sure, why not? The first value makes a little more sense, but then again a basic part of the idea here is that some intervals will overlap others, so by definition at least some of either the first of second values will be out of order, depending on the method chosen. They really can’t be sorted. They’re expected to be interwoven.

So we’re going to punt on figuring that detail out and say the intervals will be assumed to be in no particular order. This is the only thing that makes any sense to me. Any apparent ordering in the examples will be considered coincidental.

This in turn means we will need to compare each element, as it’s processed, to every other previously included element. This will make the time complexity blow up, sure, but we’re not going to worry about that right now. Without any more specific connections to a real-world data set it’s hard to get around this. But, for example, if these were time frames, then there are only so many hours in the day, so it might make sense to merge overlapping intervals and only store the one merged structure rather than the original two, saving a bit of space and computational complexity. But that would be a data-specific example that may not be worth the overhead in what we have here. Without more context it becomes hard to say, and as such premature optimization to add the effort.

In the spirit of the Set Theory theme to tonight’s proceedings, we’ll again consider the intervals as a set, and as we add members to the set we’ll check them against all of the other members as we did in the task above, but in this case the function will be more complicated than simply checking existence. Here we need to compare the enclosed spans. This we can do with relational operators.

There are a total of four ways that two intervals can overlap and conflict. Note I am not going to consider two intervals sharing a point at a boundary to be a conflict. For example the time intervals four to eight and eight to ten do not conflict, even though they share the point-in-time of eight o’clock. The point is considered infinitesimal so there is no overlap, which is the true source of the conflict.

But back to our four ways. Given intervals A and B each with an upper and lower boundary:

A _lower is inside B (or B _upper is inside A)
B _lower is inside A (or A _upper is inside B)
A surounds B completely (or B is contained entirely within A)
B surounds A completely (or A is contained entirely within B)

As you can see each definition is accompanied by frame-reversed version, switching the names but describing the same overlap. These are not considered different conflicts, just different names for the same conflict. A comparison using either reference frame can be used to detect one of these conflicts.

It turns out we only need three checks to cover all possibilities. The “Set” interval is the existing set member being compared to.

Set _upper > New _lower > Set _lower
Set _upper > New _upper > Set _lower
Set_lower > New_lower & New _upper > Set _upper

The third possibility checked is the new interval enveloping a set member, and the fourth possibility, the new interval being inside a set member, will already be detected by case 1 (and again in case 2, if we don’t short-circuit out).

Note the frame reference is a bit arbitrary, but we do need to consistently pick one or the other to make the three cases work together with logical consistency to minimize the checks.

PERL 5 SOLUTION

use warnings;
use strict;
use utf8;
use feature ":5.26";
use feature qw(signatures);
no warnings 'experimental::signatures';

my @input;
# @input = ( [1,4], [3,5], [6,8], [12, 13], [3,20]  );  ## ex-1
# @input = ( [3,4], [5,7], [6,9], [10, 12], [13,15] );  ## ex-2
@input = ( [1,4],  [5,20], [30,50], [25,60]);  

find_running_conflicts( @input );

sub find_running_conflicts ( @intervals ) {
    my %set;
    my @conflicts;
    
    for my $ivl ( @intervals ) {
        push @conflicts, $ivl if has_conflict( $ivl, \%set );
        $set{ join ':', $ivl->@* } = $ivl;
    }
    
    if (scalar @conflicts) {
        say "conflicts found:";
        say join ', ', 
            map { '[' . (join ',', $_->@*) . ']' } 
            @conflicts;
    }
    else {
        say "no conflicts";
    }

}

sub has_conflict ($ivl, $set) {
    my $si;
    for my $k (keys $set->%*) {
        $si = $set->{$k};
        if (     $si->[1]  > $ivl->[0] > $si->[0] 
            or   $si->[1]  > $ivl->[0] > $si->[0] ) {
               return 1;
        }
        if (     $si->[0]  > $ivl->[0] 
            and  $ivl->[1] > $si->[1] ) {
               return 1;
        }
    }
    return 0;
}

The Perl Weekly Challenge, that idyllic glade wherein we stumble upon the holes for these sweet descents, is now known as

The Weekly Challenge – Perl and Raku

It is the creation of the lovely Mohammad Sajid Anwar and a veritable swarm of contributors from all over the world, who gather, as might be expected, weekly online to solve puzzles. Everyone is encouraged to visit, learn and contribute at

https://theweeklychallenge.org

THE WEEKLY CHALLENGE – PERL & RAKU #127

Task 1

Disjoint Sets

Example

Method

PERL 5 SOLUTION

task 2

Conflict Intervals

Example

Method

PERL 5 SOLUTION

Share this:

Related

Leave a comment Cancel reply