Puzzle Pong - part 1

James Buckland and I have decided to start a game of Puzzle Pong. These puzzles are more about the design of the program than about the actual answer; so I’ve decided to cover some of my inferior designs in this blog post. If you are only interested in the best solution, see the subsection titled “Third Approach”.

He challenged me to solve the following puzzle in his blog post about enumerating crossword puzzles. I recommend you give it a read.

The Puzzle

For every subset of the 26 character English alphabet, there is a set of words which can be spelled using only characters from that subset. For example, given the set {a, b, c, d}, /usr/share/dict/american-english says there are 24 words which can be spelled - {A, Ac, Ada, B, Ba, C, Ca, Cd, D, Dacca, Dada, a, ad, add, b, baa, bad, c, cab, cad, d, dB, dab, dad}.

Which such subset has the highest words to characters ratio?

First Approach

(Spoiler - this approach is fundamentally wrong.)

My first idea was to encode every word in the dictionary as a set of characters. Then construct a map from set-of-characters to the count of how many times I’ve seen that set. Lastly I’d iterate over the map and pick the set with the best ratio.

Implementation details

The representation of character sets is critical to making this program performant. I chose to implement them in C++ as bit sets inside a uint32_t. I chose C++ because wanted to use a C like language so I could easily do efficient bitwise operations. Since there are only 26 characters in the alphabet, 32 bits is plenty to encode each word. The encoding function is pretty straightforward. I made the decision to ignore characters outside [a-zA-Z] for simplicity.

uint32_t WordToSet(const std::string& word) {
  uint32_t ret = 0;
  for (const char c : word) {
    if (c >= 'a' && c <= 'z') {
      ret |= (1 << (c - 'a'));
    } else if (c >= 'A' && c <= 'Z') {
      ret |= (1 << (c - 'A'));
    }
  }
  return ret;
}

While I was at it, I wrote the decoding function. This will be used at the end for printing the winning set.

std::string SetToWord(uint32_t set) {
  std::string ret = "";
  uint32_t mask = 1;
  for (int i = 0; i < 25; i++) {
    if (set & (1 << i)) {
      ret += ('a' + i);
    }
  }
  return ret;
}

I usually try to avoid string appending but this function is only called once so it didn’t seem like a big deal.

I also needed a function that told me the size of each set. The Hacker’s Delight gave me several promising choices. I initially went with pop4() since it supposedly has the best performance when the number of bits is small.

I was planning to bechmark this choice against some others until I found out about __builtin_popcount(). This lets the compiler use CPU specific instructions when counting the number of 1-bits in an int.

In practice, this program spends a minority of its time counting the sizes of sets so the benefit is minimal.

Putting it all together

int main() {
  // Key is a bit set encoding the characters of a string.
  // Value is the number of times that key has been seen so far.
  static uint32_t counts [(1 << 26) - 1] = {0};
 
  std::ifstream infile("/usr/share/dict/american-english");
  std::string word;
  // Populate counts.
  while (infile >> word) {
    ++counts[wordtoset(word)];
  }
 
  // Find best ratio in counts.
  uint32_t best_chars = 0;
  int best_num_chars = 0;
  int best_count = 0;
  for (auto it = counts.begin(); it != counts.end(); ++it) {
    int num_chars = Pop(it->first);
    int count = it->second;
    // Equivelent to (count / num_chars) > (best_count / best_num_chars)
    // but avoids floats.
    if (count * best_num_chars >= best_count * num_chars) {
      best_num_chars = num_chars;
      best_count = it->second;
      best_chars = it->first;
    }
  }
  
  std::cout << best_num_chars << "  " << best_count << "  " 
                              << SetToWord(best_chars) << std::endl;
 
  return 0;
}

There is not much to comment on here. I enjoyed my method of fraction comparison that avoided floats. This made me comfortable using -Ofast (which is faster than O3 but might have out of spec behavior on float math in some situations).

Performance (and Correctness)

This program is quite speedy! On my 6 year old laptop, it ran in about a 20th of a second. It says the best set {‘a’, ‘e’, ‘r’, ‘s’, ‘t’} which contains 60 words:

60 seems like a surprisingly small number for such common characters so I asked the program to print me which words it though were in this set. It retuned back to me with the following:

{“Astarte”, “Astarte’s”, “Easter”, “Easter’s”, “Easters”, “Sartre”, “Teresa”, “Teresa’s”, “Terra’s”, “aerates”, “arrest”, “arrest’s”, “arrests”, “assert”, “asserts”, “aster”, “aster’s”, “asters”, “eater’s”, “eaters”, “errata’s”, “erratas”, “rarest”, “raster”, “rate’s”, “rates”, “reassert”, “reasserts”, “restart”, “restart’s”, “restarts”, “restate”, “restates”, “retreat’s”, “retreats”, “stare”, “stare’s”, “stares”, “starter”, “starter’s”, “starters”, “stater”, “tare’s”, “tares”, “tartest”, “taster”, “taster’s”, “tasters”, “tatter’s”, “tatters”, “tear’s”, “tears”, “teaser”, “teaser’s”, “teasers”, “treat’s”, “treats”}

Now I could see that there was an error in my design; I being far to strict. I was counting words that could be spelled using the characters in {‘a’, ‘e’, ‘r’, ‘s’, ‘t’} but I was also requiring that all those characters be used. And likewise for every other set of characters. This result was interesting but ultimately not what I wanted.

Second Approach

My second idea was to solve this problem by brute force. As before, encode the dictionary 32 bit ints, but this time, loop over the all 2^26 sets of characters and see what their ratio is. This is not elegant but I was confident it would get me the correct answer.

Implementation details

This approach mostly reused helper functions from the previous one. The only new function I needed was for determining if one bit set was a subset of another.

My first idea was

// Args: two bit sets stored in uint32_t's
// Returns true if the first set is a subset of the second.
bool Subset(uint32_t first, uint32_t second) {
  return ~(~first | second) == 0;
}

A coworker pointed out the much more elegant:

bool Subset(uint32_t first, uint32_t second) {
  return (first | second) == second;
}

Both of these only use a handful of assembly instructions.

Putting it all together

int main() {
  std::vector<uint32_t> char_sets;

  std::ifstream infile("/usr/share/dict/american-english");
  std::string word;
  while (infile >> word) {
    char_sets.emplace_back(WordToSet(word));
  }

  int best_num_chars = 0;
  int best_num_words = 0;
  int best_char_set = 0;
  for (uint32_t i = 0; i < ((1 << 26) - 1); ++i) {
    int num_words = 0;
    int num_chars = __builtin_popcount(i);

    // Count the number of words in the dictionary which can be spelled using
    // only the letters represented by i.
    for (const uint32_t char_set : char_sets) {
      if (Subset(char_set, i)) {
        ++num_words;
      }
    }

    // Equivalent to if(num_words/num_chars >= best_num_words/best_num_chars)
    // but without needing to use floating point numbers
    if (num_words * best_num_chars >= best_num_words * num_chars) {
      best_num_words = num_words;
      best_num_chars = num_chars;
      best_char_set = i;
    }
  }

  std:: cout << "num_words: " << best_num_words << std::endl
             << "set was: " << SetToWord(best_char_set) << std::endl;

  return 0;
}

Performance (and Correctness)

This code runs in about 25 minutes at produces the answer: abcdefghiklmnorstuvwy. That is to say, all characters except {‘j’, ‘q’, ‘x’, ‘z’}. Upon reflection, this isn’t such a surprising answer.

This program’s memory use is quite nice. I allocate one 32 bit int for each word in the dictionary and a handful to temp and accumulation variables. The memory use scales linearly with the size of the dictionary and is not effected by the number of characters in the language.

Performancewise, this solution leaves a lot to be desired. For each of 2^26 sets of characters, I am looping for ~100,000 words in the dictionary. What if there was a way to do less work for each character set…?

Third Approach

What if instead of looping over ~100k words, I did subset lookups in the map from the first approach? So for a set like {‘a, ‘e’, ‘r’}, sum the values at {}, {‘a’}, {‘e’}, {‘r’}, {‘a’, ‘e’}, {‘a’, ‘r’}, {‘e’, ‘r’} and {‘a’, ‘e’, ‘r’}. If the initial set has n elements, then it has 2^n subsets. When n is small, this is less work than looping over the whole dictionary; when n is large, this is more work than looping over the whole dictionary.

This felt like a promising strategy but I didn’t have an good ideas how to efficiently iterate over all subsets of a bit set. I mentioned this to a coworker and he suggested I check out the chess programming wiki. Apparently, chess programs use bit sets for internal representations so this wiki has lots of helpful functions for doing computations in bit sets. Indeed, it has a page on Traversing Subsets of a Set.

Armed with this, I modified my solution from approach 2 into the following:

Putting it all together

int main() {
  // For sets with many characters.
  std::vector<uint32_t> char_sets;
  // For sets with not many characters.
  static uint32_t counts [(1 << 26) - 1] = {0};

  std::ifstream infile("/usr/share/dict/american-english");
  std::string word;
  while (infile >> word) {
    char_sets.emplace_back(WordToSet(word));
    ++counts[WordToSet(word)];
  }

  int best_num_chars = 0;
  int best_num_words = 0;
  int best_char_set = 0;
  for (uint32_t i = 0; i < ((1 << 26) - 1); ++i) {
    int num_chars = __builtin_popcount(i);

    // Compute how many words can be spelled with those letters.
    int num_words = 0;
    if (num_chars <= 14) {
      uint32_t n = 0;
      do {
        num_words += counts[n];
        n = (n - i) & i;
      } while ( n );
    } else {
      for (const uint32_t char_set : char_sets) {
        if (Subset(char_set, i)) {
          ++num_words;
        }
      }
    }

    if (num_words * best_num_chars >= best_num_words * num_chars) {
      best_num_words = num_words;
      best_num_chars = num_chars;
      best_char_set = i;
  }

  std:: cout << "num_words: " << best_num_words << std::endl
             << "set was: " << SetToWord(best_char_set) << std::endl;

  return 0;
}

Performance

This program runs in about 16 minutes and produces the same answer as my second one. I’m sure more improvements are possible but overall I’m much happier with this design.

On the negative side, this approach uses more memory than the previous one. As before, it allocates a 32 bit int for each word in the dictionary; but now it additionally allocates a 32 bit int for each possible subset of the alphabet. This amount of memory scales exponentially with the number of characters! Luckily for me 26 characters corresponds to about 250 megabytes which is a manageable amount. If English had 32 characters then my laptop would not have had enough RAM to allocate such a map.

One interesting feature I’d like to point out is choice of 14 in line 22. Log base 2 of 100,000 is around 16.6 so my initial guess for this value was 16. This is because for sets of size 15 and 16, it costs fewer than 100k lookups in counts to compute how many words can be spelled with those characters. In practice, I get best performance with this threshold at 14. My suspicion is that this is a cache locality issue. Looping over a vector probably has favorable cache behavior but indexing into a table probably has more cache misses.

Conclusion

I challenge James to solve the following puzzle:

Given the list of numbers [1, 2, 3, 4, 5] unlimited parentheses, and the operations {+, -, *, /, ^, !}, it is possible to construct many integers. For example,

1 = 1 + 2 - 3 - 4 + 5

2 = 1 + (((2 + 3) * 4!) / 5!)

3 = …

What is the smallest positive integer that cannot be written this way?

For some inspiration, please enjoy:

https://www.youtube.com/watch?v=ukUkVaOyI0o

https://www.youtube.com/watch?v=-ruC5A9EzzE

To formalize the rules slightly.

Numbers must appear in order, each exactly once.

You may not take the factorial of a factorial (i.e. (3!)! = 720).

Concatination is forbidden (i.e. 1 + 23 + 4 + 5).

Unary negative is forbidden (i.e. 13 = (-1) + 2 + 3 + 4 + 5)

Bonus problem - For what list of five integers (all between 0 and 9 inclusive), is the answer to this puzzle the smallest? If this is not unique, please provide all lists which have the lowest non-expressable value.

Bonus bonus problem - What list of five positive integers has the smallest sum and cannot construct 1. Again, if there is more than one list with this minimal sum, please provide all of them.

Yet another bonus problem (feel free to ignore these). For the starting list [1,2,3,4,5], what is the smallest non-negative integer which can only be made in one way? For what starting list of five integers (all between 0 and 9 inclusive) is the answer to the previous question the smallest?

Written on June 17, 2017