Cool Key Lime Pie

Wherein we look down the line, the bore, and discover whether we have everything we need or need to look elsewhere…

THE WEEKLY CHALLENGE – PERL & RAKU #207 Task 1


Color is the keyboard, the eyes are the harmonies, the soul is the piano with many strings. The artist is the hand that plays, touching one key or another, to cause vibrations in the soul.

— Wassily Kandinsky


Keyboard Word

Submitted by: Mohammad S Anwar

You are given an array of words.

Write a script to print all the words in the given array that can be types using alphabet on only one row of the keyboard.

Let us assume the keys are arranged as below:

Row 1: qwertyuiop
Row 2: asdfghjkl
Row 3: zxcvbnm
Example 1
Input: @words = ("Hello","Alaska","Dad","Peace")
Output: ("Alaska","Dad")
Example 2
Input: @array = ("OMG","Bye")
Output: ()

ANALYSIS

Not counting any added-on computer-specific extras, at the top or around the sides, a normal English-language keyboard contains five rows of keys: one comprising the number digits, three containing letters and at the bottom a space bar and some special qualifiers. We are asked whether we can construct a given word using only one of these available rows. The letters we will need to do this are kept to the central portion of those middle rows, with the outer keys in each devoted to various brackets and punctuation, along with functions, such as to select character case or ratchet the text down to the beginning of the next line, or delete a character already written.

Thus we have three rows on our keyboard that come into play here, as all of the letter options of the alphabet are contained within. We can also note the apostrophe is located on one of these rows as well, but we’ll get to that. Suffice to say for now it works — we’ll be needing that. Or not. But again, getting ahead of ourselves here.

The rows as laid out are physical lists of keys, distributed left-to-right. Given a word, we can look at each letter in turn and see which row contains it, and if we go through the whole word without changing rows then we have a winner.

Doen’t seem so hard. We could go about this a few ways.

The physical model of the problem suggests scanning and searching for each letter in turn along the lines, and while this certainly would work as outlined above it also sounds terribly inefficient. There’s a whole lot of rescanning over the same keys over and over involved here. There must be some better way to look at, and for, our letters.

I think the best abstraction to apply here is set theory. Readers of thses pages will know that me and set theory go way back, and I consider us to be very close.

Viewed this way, each row, then, contains a proper subset of the alphabet and now the problem at hand can be rephrased: to determine whether the letters that make up a word, considered as a set of letters, in wholly contained within one of these three other sets.

Same thing said just a little bit differently, really.

The slight problem with this approach, though is that Perl doesn’t really do sets as such. There’s no “set” operator, or data type for that matter, out-of-the-box. But we can, however, make perfectly good sets out of what Perl does provide us, in what turns out to be a fast and very efficient manner. It only works for strings and things that can be stringified but it’s perfect for the character data we’re using here. Perl, as we know, is very good for manipulating text.

Perl has only three basic data types for holding scraps of information. These are individual items contained as scalars, then ordered lists of these scalars, and finally associated mappings of scalar strings to other scalars. It’s this last type, so-called associative arrays, or hashes, that we can use to make our set models.

A hash is a collection of data, with keys comprised exclusively of strings, that map to other scalars which can be characters, references or pretty much anything. Given a key, the item that the key points to is returned. The thing is, though, that the keys are kept as a collection themselves, with no inherent ordering and Perl’s own private method for locating them within this collection. Lookup is fast and efficient and for all practical purposes completely independent of scale. The association dereferencing is swift and constant, no matter whether the hash has ten or ten billion elements. It’s kind of freaky, actually, that. And hashes, as a core datatype, are heavily optimized. Perl can find a key, if it exists, very quickly.

The keys, then, considered just by themselves, make a pretty good model of a set, don’t they? And they do. It doesn’t even matter what the other half of the association is, either, although we can choose to use it if we want, because we get the whole package when we buy in.

Special Cases

Before we begin let’s consider what it means to be a word. Not is a deep, metaphysical existential-crisis sort of way, but just superfically: what characters are we looking at, really? Letters, obviously, but actually there are some other legitimate options one might not think of out-of-hand. Punctuation can be viewed in modern phrasing as metadata applied to lists of words, and is thus by definition not part of those words it is referring to. Sounds about right. But some special characters are actually parts of words and not punctuation. In English these would be hyphens and apostrophes, to make compound and contracted words respecively. So on the other hand if we’re looking at any proper words we’ll need to accommodate these cases.

Numbers, though, are never a part of any text word I can think of, when the written language is used properly. Used improperly, such as in “leetspeek”, the sky’s the limit, but we really can’t go into the realm of evey possible encoding of a word in every possible filthy degenerate way. It’s too much. We have standards. We are decent people.

I mean, practically it would be trivial to expand our method to do this, but I do think it loses some the focus around the problem, with what amounts to raising the noise-floor to little practical benefit. So leetspeek is out, with all the numbers. Let’s keep it tight. Dictionary words it is.

Lastly, what of capitalization? The temptation would be to lowercase everything as a data-purification ritual, as parsing apart a sentence could yield a word beginning with a capital letter. The examples even show words that are not, in themselves, normally capitalized. It’s interesting speculative hypothesis to consider this possibility, but also there is no mention made of any sentences, and I think the best assumption should be that we are given a word, relaxed, as it would be found in its natural habitat. We want the words to be comfortable. Comfortable words make good thoughts and sound logic. Do not piss off the words — it’s never pretty.

And more so, compassion is a virtue.

Ahh, but some words, specifically proper nouns, are valid words that contain capital letters, and I see no reason to leave these out. Diminishing those proud letters seems to me morally dubious if not blatantly, unconscionably wrong.

It’s about respect for one’s elders, kids.

So we need to take capitals into account when they are required. I mean, maybe we don’t have to, but my moral compass demands that I do the right thing here. YMMV.

METHOD

We can go about constucting our set model two ways. In one, we construct a single hash of all three rows of keyboard characters and use these keys to point to their respective rows. This is a very Perlish thing to do. We keep track of the row of the first character found and if any other rows don’t match we throw the deadbeats out. We talked about this. We run a respectable establishment here, after all.

Alternately, though, we can compose three separate lookups, one for each keyboard row. In this case we can just use the keys in each collection as sets, not even bothering with the associated values, and use the exists keyword to see whether or not a particular key is found.

It’s little difference either way we go about this, but the second makes it slightly easier to accommodate the special cases we talked about above, so we’ll use that. It’s also more pure to the underlying set theory, if you care about such things. It closely models the methodology, which might matter more were we to upscale the technique for some other purpose.

I mean, maybe?

Sure, why not?

The apostrophe is, using this method, just another character. The fact we can’t actually make any contracted words using the middle row is an unfortunate fact of life and is not my fault. Then again, we can just make something up to test the mechanics, as I have no intention of evaluating whether the input is a meaningful word. That’d be a very difficult problem indeed, and I’d argue not even linguistically meaningful. I’m a dyed-in-the-wool descriptionist and you’ll claw my coinage from my cold, dead hands. I will die on that hill.

I’m going to contract “skedaddle” into “k’dad’ll”. As in: “K’dad’ll do ya.” See what I did there?

Why?

Because I can, obviously. Do try to keep up.

Hyphens are solid, proper characters too, but the hyphen character is up on the topmost row with the digits, without any letter characters to join it, or be joined by it. Thus the hyphen is not included in any of the letter-row sets and any hyphenated word will fail the test. Such is life. This is however robustly modeled and as such is also fine.

On the other hand what would you call that thing at start of 3-2-1-go? I’m going to go with “construct”, rather tan “word”. Lots of symbols can find their way into text without being proper words. Like, as we said before, leetspeak. Are the numbers just shorthand abbreviations for the words they represent? This is actually a very interesting question, and have spoken about it before. But I digress…

For capitalizations we could get clever and create some sort of capitalization flag.

Somehow, I don’t know, seems legit. Let’s roll with it.

Then, if you look, we have a modifier key on two rows that we can use to change case: the “shift” key on the lower row and the “caps lock” key on middle. As we are being asked what is possible, not what is practical, toggling the caps lock on, typing a letter and toggling it off again is a perfectly legitimate option. So we could set a flag on detecting a capital, then check to see if the lowercase letter is in one of these two rows before continuing, as we can only allow a capital if we can type the upper-case character using only any single row.

Or, alternately, to be double-plus clever we can encode the capital letters when we first build our hash sets, and afterwards treat every letter same as any other, and not worry about any setting or unsetting of flags. I like this idea better.

In the end, I will note, that the keyboard itself needs to be hard-coded. The distribution of the keys is unique, odd, and a little bit arbitrary on top of it all, so it is what it is and we will need to start with a row-wise keyboard map. But these will be defined as the physical ordered lists of keys that we described at the beginning, translated into arrays, which we will then convert to our various hashes. We’ll leave out all extraneous punctuation keys and simply remember where the case-changing keys are when we process the middle and lower rows. If we’re hard-coding the keys we can hard-code the knowledge of the case keys too.

PERL 5 SOLUTION

The comments walk the reader through the steps pretty well I think. Of note the examples given presented the output data as lists such that one would find initializing an array in Perl code — rather than a more human-readable style — and this is what I decided to replicate.

Nothing the list wise say operator, a join and a map can’t handle.

use warnings;
use strict;
use utf8;
use feature ":5.26";
use feature qw(signatures);
no warnings 'experimental::signatures';

## the middle three keyboard rows
my @top = qw( q w e r t y u i o p       );
my @mid = qw(  a s d f g h j k l   '    );
my @bot = qw(   z x c v b n m           );

## the lookups, accommodating capitals in middle and bottom rows
my $top = { map { $_ => undef } @top };
my $mid = { map { $_ => undef, uc($_) => undef } @mid };
my $bot = { map { $_ => undef, uc($_) => undef } @bot };

## input
my @words = @ARGV;
@words == 0 and @words = qw( hello Alaska dad peace k'dad'll);
my @out; 


WORD: for my $word (@words) {
    my $row;
    my $char = substr $word, 0, 1;
    
    ## select the row containing the first character
    for ( $top, $mid, $bot ) {
        $row = $_ and last if exists $_->{$char}
    }

    ## short-circuit out if we can't find any remaining letter in the row
    for ( 1..length($word)-1 ) {
        next WORD if not exists $row->{substr $word, $_, 1}
    }
    
    ## ergo, the word can be created from the row, so add it to the output
    push @out, $word;  
}

## the output from the examples is in my opinion unnecessarily complicated
## but reproducing the formatting provided a fun challenge in itself

say '( ', (join ', ', map {qq("$_")} @out) ,' )';




The Perl Weekly Challenge, that idyllic glade wherein we stumble upon the holes for these sweet descents, is now known as

The Weekly Challenge – Perl and Raku

It is the creation of the lovely Mohammad Sajid Anwar and a veritable swarm of contributors from all over the world, who gather, as might be expected, weekly online to solve puzzles. Everyone is encouraged to visit, learn and contribute at

https://theweeklychallenge.org