Wherein we exercise restraint, to avoid saying something we do not wish to say
THE WEEKLY CHALLENGE – PERL & RAKU #148
“”Be careful when you cast out your demons that you don’t throw away the best of yourself.”
– Friedrich Nietzsche.”
Eban Numbers
Task One, Submitted by: Mohammad S Anwar
Write a script to generate all Eban Numbers
<= 100.
An Eban number is a number that has no letter ‘e’ in it when the number is spelled in English (American or British).
Example
2, 4, 6, 30, 32 are the first 5 Eban numbers.
Background
Some time ago I brought up the fact here that a man had once written an entire English book without using the letter “e” — far-and-away the most commonly used letter in the language. Ever since reading about it in Simon Singh’s excellent The Code Book many years ago I’ve tried to work this tidbit into conversations as best I can. It’s a remarkable accomplishment that I would not think possible.
Texts written under arbitrary constraints such as these are known as “lipograms”, from the Ancient Greek λείπειν (missing) γράμματος (letter). The Greek formation is appropriate, as the idea goes back millennia. Wordplay is not a modern science.
In English, depending on the corpus you measure, the letter “e” occurs about 11-13% of the time, or about 1 in 8 letters used. But we are not simply extracting the letter, as in the title this week, leaving the remaining text intact. No, to write a book one should generally use proper words, with all their letters unmolested. If we’re going to play this game we should set the rules correctly to keep it challenging.
Perhaps a better measure then would be the frequency of words using the letter “e” somewhere within them. We don’t care where. The question is not the letter, but rather how many words are we still allowed to use?
If we look at the six volumes of Edward Gibbons’ The History Of The Decline And Fall Of The Roman Empire, we find some 1748821 words, 758260 which contain the proscribed symbol. That’s about 43% of the words. For anyone considering Gibbon’s highfalutin prose might be affecting these numbers, a similar examination of Earnest Hemingway’s The Sun Also Rises yields the figure 42%, so it does seem a pretty good estimate of the prevalence in English as she is spoke. The number of that venerable text, by the way, is a little over 40%, but then again it is a Portuguese phrase-book. It’s bound to be a little off, which it certainly is.
In the past we’re worked quite a bit with rearranging and selecting numbers based on the digits in their positional representation, breaking them apart and reassembling them in various ways. However in going from digit placement within numbers to letter placement in written out in words, I think we’re finally jumped the shark on number theory, and landed with a splash right in the middle of Natural Language Programming. If Number Theory can tend toward the mystical, lipograms tend towards the comical.
If you use the letter ‘e’, I will hit you with this stick. If you do not use the letter ‘e’, I will hit you with this stick. What should you do?
METHOD
It’s tempting to try and write a textual conversion routine to translate all two-digit numbers into their written form. On the other hand, it’s much more sensible to use Neil Bower’s excellent Lingua::EN::Numbers
to do the tricky part for us.
Let’s weigh in on the merits. It is, in no particular order:
- the right way to do it
- good practice in the right way to do it
- dead simple
Now, addressing the first point: Natural Language Programming, like all language in general, is fraught with hairy little edge-cases that need to be sorted out to do things properly. And so, in keeping company with the CSV format and Date and Time Manipulation code, it’s wise to hand over such processing to a dedicated library who’s sole purpose is to keep such stuff straight.
Ok, fine. Then again it wouldn’t be too crazy to address every case. It would, after all, be easy to visually inspect the output. And it’s only 99 written numbers if we start from 1.
Why 1 you ask? Well, as for zero, just look at it, all round and smug, sitting there disguised as an ‘O’, trying to hide that ‘e’ within its ranks. I’m normally a big fan of zero, but here it just will not do. And furthermore, all negative numbers contain the letter “e” in the word “negative” so we won’t even need to dictate to only use positive values. Even if we wanted more, for this challenge they’re all we’re going to get.
On the other hand there are an infinite number of real values below 100, so we will need to disallow letting in that unwholesome creed if we ever want to get this task finished:
“thirty-four point six six six six six two six…”
Yea that’s not going to work.
We’ll start with implementing a solution using the module and see how that goes.
PERL 5 SOLUTION
Getting our list this way looks pretty straightforward. The library gives us the words, and we filter the list using a regular expression. It’s easily adaptable to other ranges or other lipographs.
use Lingua::EN::Numbers qw( num2en );
for (0..99) {
my $word = num2en( $_ );
next if $word =~ /e/;
say $word;
}
The result?
two
four
six
thirty
thirty-two
thirty-four
thirty-six
forty
forty-two
forty-four
forty-six
fifty
fifty-two
fifty-four
fifty-six
sixty
sixty-two
sixty-four
sixty-six
As I said: dead simple, piece of cake.
But what if we wanted to do it the hard way? I’m looking around, and I don’t see anyone to stop us. I spoke to the turtles in the tank behind me about it, and they don’t care.
Let’s do it.
Method #2
Forgetting the lipogramming for now, simply getting a list of written-out numbers less than 100 is a challenge in itself.
Some observations: We have names for the digits in the one’s place and a different list for the tens, and there’s no consistency in transmuting a given digit name into its tens-place name. Vowels get added and dropped haphazardly. Then on top of this there’s a whole separate set of terms in the middle there for our difficult teenage years, and I see their struggle to fit in as oddly resonant to the big picture. Fortunately for us we’re not going to touch the hundreds place or any sort of general-purpose solution to this word-search. Today we’ll just dip our toe in and say we went swimming. It’s cold in there.
Developing a codified set of rules to create the sequence, or convert a given number to its written version, is a maze of irregularities. Following the pattern, as such, is hard; even elucidating the pattern is hard. On the other hand representational numbers written as digits are quite orderly — if we include leading zeros extremely so — so that seems a good place to start.
Every number can be described as a collection of value-position pairs; that’s what the representational system is. “Four tens and two ones” is the same as 42. Now we have special words for the digits in the tens place, and the label for the ones position is implicit. So instead of “four tens” we say “forty” and randomly toss out that “u” because why the hell not? Without rhyme or reason to guide us we might as well compile a separate list of each set of terms. We now have a tens words and a ones word so we connect them with a hyphen to make a compound descriptor. 42 has become “forty-two”.
With a list of tens-place names and a list of ones-place names, we start with a cross-product of those lists, with each matched up in a pairing with every other in the other set, to yield our 100 number names. They won’t be perfect English yet, but we’ll fix that.
A couple of other notes:
- Zeros are implicit, and hence silent. The separate terms for the tens terms already imply the zero following, and we don’t mention it if there are zero tens.
- We’re not going to include zero by itself either. We’re going to stick to positive numbers for the reasons we mentioned.
- The teenagers will just need to be fixed (how many times have we heard that before?). There’s no reasonable way to get from “one” to “eleven” and then to “twelve” and after that start a new paradigm at “thirteen”. The teens do have a semblance of sense to them (see comment above), but the prefixes have their own peculiar irregularities to deal with. I say if we’re going to enumerate the prefixes, we might as well enumerate the whole mess from 11 to 19.
Perl SOLUTION
Remarkably it’s not too bad when we’re finished, but remember this only works for two-digit values and isn’t random-access. We could make a lookup hash for that, though, if we wanted to. For now we’re just making an ordered list.
We start with two lists of ten values, for 0 through 9 in two positions for then 1s and 10s. To make the cleaning step a little clearer we’ll use the null sign, ∅, for zero. It’s an arbitrary but necessary placeholder for the cross-product that gets stripped out later, so it doesn’t matter exactly what we use. This produces some incorrect names such as “twenty-∅”, so the irregularities are simple to spot. It’s not going to be a problem to strip out the extra bits after, so that’s how we’ll do it.
We start by constructing names as one compound word in the form of “{tens}-hyphen-{ones}”. Structuring our loops properly these will be produced in the right order. Once we have this list, we iterate through it with two substitution expressions. The first strips out any leading and trailing nulls and their associated hyphens, and the second quietly slips in the special names for the teens.
Lastly, the first element, “∅-∅” gets obliterated but remains in existence as an empty string, so to be proper we need to shift that off the final list, giving us just the numbers from 1 to 99 written out. Or then again the indices of the array are the corresponding numeric forms at this point, so if we wanted a lookup table this could be quite useful. For this task, however, we’ll remove it, to make our list correct before further processing. We’re already reinventing the wheel, so we may as well be thorough about it.
For output we filter this list using grep
, only allowing words without “e”, and end up with the same list we generated before. Nice.
my @ones = qw( ∅ one two three four five six seven eight nine );
my @tens = qw( ∅ ten twenty thirty forty fifty sixty seventy eighty ninety);
my %teens = qw( ten-one eleven
ten-two twelve
ten-three thirteen
ten-four fourteen
ten-five fifteen
ten-six sixteen
ten-seven seventeen
ten-eight eighteen
ten-nine nineteen );
my @out;
for my $t ( @tens ) {
for my $o ( @ones ) {
push @out, "$t-$o";
}
}
for (@out) {
s/^∅-|-?∅$//g;
s/$_/$teens{$_}/ if $teens{$_};
}
shift @out;
say $_ for grep { ! /e/ } @out;
raku solution
In Raku we have options. Options are nice.
We can utilize the built-in X
cross-product operator to assemble a set of two-element tuples. These can then be mapped to apply a join across each to add the hyphen to make a single compound word:
my @out = ([X] @tens, @ones).map: *.join('-');
Alternately we could add the Z
operator into the mix:
my @out = [X~] (@tens Z~ '-' xx @tens.elems), @ones;
which will first zip the tens with a long-enough list of hyphens before taking the cross product with concatenation.
Or maybe this:
my @out = [X~] ( @tens.map({$_~'-'}) ), @ones;
to map a hyphen onto each tens name before assembling a concatenated cross product.
I think we’ll stick with our original choice, but I wanted to explore some other ways to get there. There is, after all, more than one way to do it.
unit sub MAIN () ;
my @ones = < ∅ one two three four five six seven eight nine >;
my @tens = < ∅ ten twenty thirty forty fifty sixty seventy eighty ninety >;
my %teens = < ten-one eleven
ten-two twelve
ten-three thirteen
ten-four fourteen
ten-five fifteen
ten-six sixteen
ten-seven seventeen
ten-eight eighteen
ten-nine nineteen >;
my @out = ([X] @tens, @ones).map: *.join('-');
for @out {
s:g/ ^ \∅\- | \-?\∅ $ //;
s/ $_ /%teens{$_}/ if %teens{$_}:exists;
}
@out.shift; ## list of names goes from 1-99 now
.say for @out.grep:{ ! /e/ };
The Perl Weekly Challenge, that idyllic glade wherein we stumble upon the holes for these sweet descents, is now known as
The Weekly Challenge – Perl and Raku
It is the creation of the lovely Mohammad Sajid Anwar and a veritable swarm of contributors from all over the world, who gather, as might be expected, weekly online to solve puzzles. Everyone is encouraged to visit, learn and contribute at