Wherein we cross the Rubicon into the medium of mixed metaphors and frolic with Gaulling multilingual monstrosities
THE WEEKLY CHALLENGE – PERL & RAKU #97
episode one:
“Stabbed in the Back Over Passing Notes in Class”
Task 1
Caesar Cipher
Submitted by: Mohammad S Anwar
You are given string $S
containing alphabets A..Z
only and a number $N
.
Write a script to encrypt the given string $S
using Caesar Cipher
with left shift of size $N
.
Example
Input: $S = "THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG", $N = 3
Output: "QEB NRFZH YOLTK CLU GRJMP LSBO QEB IXWV ALD"
Plain: ABCDEFGHIJKLMNOPQRSTUVWXYZ
Cipher: XYZABCDEFGHIJKLMNOPQRSTUVW
Plaintext: THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG
Ciphertext: QEB NRFZH YOLTK CLU GRJMP LSBO QEB IXWV ALD
Method
Wait. Left shift? Right shift? Red shift? It that my left of your left? With a brief confusion right out of the gate, I’ll chose to mimic the example and not think about it too much.
The Caesar cypher is a near trivial substitution cypher, shifting the alphabet as written a certain number of characters to the left or right and wrapping around to the beginning or end should the encyphered letter fall above “Z” or below “A”.
The reassignment can be done with two lists, shifting one relative to the other and restarting the transposition as required. After shifting, those letters beyond “Z” are moved to start the list at “A”, so that all letters in the upper list end up with a complement in the lower. Note that it’s all on the wheel — shifting 5 characters one way is the same as shifting 21 characters the other.
A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z shift 5 | | | | |A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z | | | | | | | | | | | | | | | | | | | | | | | | | ^-- cut here V|W|X|Y|Z|A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T| <-- and replace
To encode, one looks up the letter on the top list, then scans downward to find its shifted encoding. To decode, the same operation is done in reverse.
Although this simplest of ciphers can hardly even be called cryptographic, there is reason to believe it may have been effective in Caesar’s time. Although with the relatively consistent declension in Latin grammar it would have been relatively easy to spot recurring word endings, and from there to work out the offset, there are no contemporaneous accounts surviving of anyone actually figuring this out. Letter frequency analysis, used to attack less regular cryptograms, would not be invented until the 9th century. It has also been suggested that most anyone in the position to actually intercept a courier’s message would be illiterate anyway, including the courier.
As such the cypher functions more as an obstacle than an impenetrable fortress, making any message require an effort to be read, more akin to sealing an envelope to turn away prying eyes than locking the message in a vault.
For our transliteration, whitespace will be preserved as that is the way it works in the example. This runs contrary to the most elementary rules of cyphering, as word boundaries reveal short words, which being limited in their possible range provide an attack vector into the encoding. Cyphertexts are normally broken into equally sized segments of characters to confound this method.
I find it interesting that weak as this cypher may be now, as Caesar would have used it, it would have been a bit stronger. Most punctuation, and even spacing between words, only begins to arise around the Early Medieval era. It turns out that practically, reading fluency in a language makes word gaps somewhat superfluous; the relative consistency of Latin declensions would make it easy to identify the start and stopping of individual words, and the grammar rules would dictate sentence structure. In a pinch one could default to one sentence per line. It was only when early medieval scribes, at the time primarily Irish monks, began in earnest to copy Latin originals that they would insert additional space to help keep the more unfamiliar texts straight. The original Latin-language writers would always have had the option of using an interpunctus, a dot between words, but this was not normally done in common communications. Without word boundaries to orient the reader decyphering would be more complex.
PERL 5 SOLUTION
In Perl, a straight rendition of the visual encoding method above would be to use the translation operator, tr/abc/xyz/
. Given a string of characters and a matching association, the operator swaps each letter in the first list for its positional counterpart in the second, both efficiently and extremely quickly. Here the lists are laid out linearly but the function is nearly identical. The problem with this is the operator works its magic building its routing table at compile time, and as such no variable interpolation can happen: somehow we’d have to already know what our encoding alphabet looks like even before we compute it, so as to be able to explicitly write out the characters. Here’s how we’d write out the example above:
$str =~ tr/ABCDEFGHIJKLMNOPQRSTUVWXYZ/VWXYZABCDEFGHIJKLMNOPQRSTU/;
There’s an obvious chicken-and-egg problem here, which is generally worked around using an eval
statement to interpolate and then spin off a new Perl context to compile it up. There are no small security concerns about turning eval
loose on random user input, so we’re not going to do that today.
The next best plan is to use a hash table, with each letter used as a key pointing to a translated replacement. This requires some calculation upfront to create the hash but once that is done it’s smooth sailing. The difficulties, as they are, come from deriving the pattern of the character offsets.
A good way to do this would be by constructing an array comprised of the letters in the alphabet repeated twice. For input the letters would be indexed 0-25, and for the encoded side we could add any offset up to 26. The wrap around is taken care of by restarting the letters with A at postion 26.
I like this approach, but ultimately decided to pursue a more mathematical solution. You see, letters as computer text already have a numerical mapping built in to them, being the ASCII codes that are actually recorded in memory for each character. With a base numerical association between the letters and the numbers 0 through 25, we could then use modulo arithmetic to seamlessly loop around when shifting the values.
The only complication to using ASCII codes is that the numbering doesn’t start at 0. Yes ASCII starts at 0, but for the upper-case letters A through Z, the numbering range is 65 to 90. But this is no mind, we only need to upper case the text, get its number, subtract 65, add or subtract the offset as determined, run the result modulo 26, add 65 again, and then view the code as a letter again.
Ok, well granted it is a lot of steps, but they’re small and can be chained into a single expression. In Perl we have the functions ord
and chr
to shift back and forth between character representations and ascii, and the rest is just some arithmetic.
To do the actual translation, I’d rather not break the text up into an array. Sure that would work, but it would be nice to not have to recopy the text into a data structure. To this end we can unleash the regular expression engine on the input string in a grand substitution scheme.
We’re ignoring anything that’s not a letter, so first off we create a character class for just the alphabet, [a-zA-Z]
. When any single lower- or uppercase letter is matched, it is captured and handed off to a small helper routine that performs the numerical massaging magic we’ve outlined above, returning the newly translated character to be substituted in for the original using the code block /e
switch. Add the global /g
switch and we’re streaming away, encoding our string in-place.
If we wanted to go there, the expression easily fits into one long line, but having a helper routine breaks up the actions and winds up much easier on the eyes.
~/PWC/97_1_et-tu-brute.pl
--------------------------------------------------------------------------------
THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG
QEB NRFZH YOLTK CLU GRJMP LSBO QEB IXWV ALD
The script:
use warnings;
use strict;
use feature ":5.26";
my $n = shift // 3;
my $str = shift // "THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG";
say $str;
$str =~ s/([a-zA-Z])/encode($1,$n)/ge;
say $str;
sub encode {
my $out = chr(((ord(uc $_[0])-65-$_[1])%26)+65);
}
raku solution
In Raku we just inlined everything, because, you know, you only live once. chr
and ord
work the same, only with unicode codepoints, which mimics ASCII for first 128 characters.
unit sub MAIN (Int $shift = 3, Str $str = "THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG") ;
$_ = $str;
s:g/ (<[a..zA..Z]>) /{ chr(((ord(uc $0)-65-$shift)%26)+65) }/;
$str.say;
.say;
episode two:
“Flippern Bit Ausflippen”
task 2
Binary Substrings
Submitted by: Mohammad S Anwar
You are given a binary string $B
and an integer $S
.
Write a script to split the binary string $B
of size $S
and then find the minimum number of flips required to make it all the same.
Example 1:
Input: $B = “101100101”, $S = 3
Output: 1
Binary Substrings:
"101": 0 flip
"100": 1 flip to make it "101"
"101": 0 flip
Example 2:
Input $B = “10110111”, $S = 4
Output: 2
Binary Substrings:
"1011": 0 flip
"0111": 2 flips to make it "1011"
Method
First of all: what a strange activity to find oneself doing. That said, I rather enjoy these odd little puzzles, with a lot of small moving parts working towards some exotic goal. In this version we need to segment the initial string, making sure it divides out evenly, then compare the bits in each part against those in all of the remaining segments, calculating the changes required to alter the others to match the first, and then sum the number of changes. Finally, after comparing each section to its compatriots, the minimum sum of these changes, representing the most efficient way to go about things, will be the result requested. The given goal, the number of bits flipped to equalize the segments to one of the values without specifiying which, does not serve any obvious practical purpose in itself. It could be a widget in a longer chain of data processing, but who knows? This robs us the opportunity to provide background and commentary on the context, but on the other hand allows us to focus on the quiltwork: stiching together smaller disparate ideas to form a cohesive whole.
Whew! There’s a lot of moving parts in that mechanism.
So let’s break that down again:
- make sure the initial segmentation is even (or none of this makes any sense)
- then for each segment:
- convert to decimal
- xor with other segments to find bit differences
- sum those different bits with each of the remaining segments
- compare the result against, and possibly replace, a running minimum of flips required
- the minimum is the result requested
As I said, a lot of little parts. Each one is its own little puzzle, and further gluing them together gracefully becomes its own project.
PERL 5 SOLUTION
There are a couple of implementation notes to go along with this sequence of actions. One is about identifying the bit differences between two binary strings: we can cut straight to the meat of the matter by using a bitwise XOR on the values. Identical bits, either 0 or 1, will produce a zero, differences will produce a 1. And as we only want the number of differences and don’t care about the underlying structure, we can get this result by summing the individual bits in the XOR result.
This seems much more efficient than breaking apart the binary strings and directly comparing bits, but there’s a caveat: the bitwise operators work on decimal numbers, necessatating a conversion to base-10, and to sum the bits it’s easiest to work in binary. Fortunately it’s easy to switch back and forth.
Another refactoring we can make is that once we have originally produced the bitstring segments we no longer ever need to see them in binary notation, so we can immediately convert the list of values to base-10. This cleans up the code underneath, at the expense of easily debugging exactly what’s happening on a bit-by-bit basis in the binary notation. But, as I said, a refactor move. By this point we’re cleaning up the code — we’ve removed print
statements on some internal variables, set up to peek in on the action as we created the algorithm, and we’re just honing a process that’s already vetted. Add some comments should we ever need to revisit this later and we’re good to go.
use warnings;
use strict;
use feature ":5.26";
sub minflips {
## given a segment length and binary bitstring,
## segment the binary string and determine the minimum flips required to
## produce consistency among sections, using any section as the model
my ($size, $bin) = @_;
return undef if length($bin)%$size;
my @sections;
## section into equal sized parts and convert to decimal
push @sections, oct('0b'.substr($bin, 0, $size, '')) while length $bin > 0;
my $min = "Inf";
for ( 0..$#sections ) {
my $flips = bitflips($_, @sections);
$min = $flips if $flips < $min;
}
return $min;
}
sub bitflips {
## given a list of decimal numbers and an index,
## compute the number of flipped bits to transform all numbers in the list
## to the value indicated at index $idx
my ($idx, @sections) = @_;
my $flips = 0;
my $base = splice @sections, $idx, 1;
for my $sect (@sections) {
my $xored = sprintf "%b", $base ^ $sect; ## base and sect are decimal here
$flips += $_ for split //, $xored;
}
return $flips;
}
use Test::More;
is(minflips(3, "101100101"), 1, 'ex-1');
is(minflips(4, "10110111"), 2, 'ex-2');
is(minflips(3, "110001011101010"), 6, '5 sections');
is(minflips(3, "110001011101010111001010101010010000111101111110"), 21, 'many sections');
is(minflips(3, "000000"), 0, 'no work done - 0s');
is(minflips(2, "010101"), 0, 'no work done - mixed');
is(minflips(2, "0101010"), undef, 'not divisible');
done_testing();
raku solution
The Raku version ends up considerably more compact, as we can change methods into tight little nodes of data processing. comb
is waiting for us already to dice the input string into easily digestible bites. parse-base(2)
then can look at the results as binary and tell us what number they are in decimal. For the bitflips()
routine, a map
allows us to work on the entire @sections
array, exclusive-or-ing against the element at a given index. Once that is done more chaining allows us to convert to base 2, dice the digits and sum them to find the count of 1s, first for each pairing and then for the entire list. I find lining up the function chains by dots to be quite readable and airy.
unit sub MAIN (Int $size = 3, Str $bin = "110001011101010111001010101010010000111101111000") ;
die "$size does not evenly divide length of $bin" unless $bin.chars %% $size;
## section into equal sized parts and convert to decimal
my @sections = $bin.comb($size).map({.parse-base(2)});
## establish flips required to equalize to each value in list
my @flips.push( bitflips($_, @sections) ) for 0..@sections.end;
## report minimum
say qq:to/__END__/;
binary string : $bin
segment size : $size
bits flipped : { @flips.min }
__END__
sub bitflips( Int $idx, @sections ) {
## xor each pair with indexed value and sum
my $flips = 1;
@sections.map({ ($_ +^ @sections[$idx]).base(2)
.comb
.sum })
.sum;
}
The Perl Weekly Challenge, that idyllic glade wherein we stumble upon the holes for these sweet descents, is now known as
The Weekly Challenge – Perl and Raku
It is the creation of the lovely Mohammad Sajid Anwar and a veritable swarm of contributors from all over the world, who gather, as might be expected, weekly online to solve puzzles. Everyone is encouraged to visit, learn and contribute at