Wednesday, July 26, 2023

Puzzles #564 - #567 - Turing Test (Nurikabe)

chaotic_iak's definition of a puzzle came up again as a topic of discussion. I very, very strongly disagree with some of the points in here, especially about excluding puzzles that are generated that meet every other requirement. (I also disagree that something can subjectively be treated as puzzle or not-puzzle based on perceived quality, though I acknowledge that this critiques are much more leveled at logic puzzles than other sorts of puzzles.) The final point in the human authorship section is this: 

"I'm willing to call these computer-assisted puzzles puzzles."
This distinction feels almost entirely meaningless to me, and I set out with an experiment to try to prove it. The thing I really dislike about this is that the exact same puzzle is considered a puzzle if it was made one way, and not if it was considered another way. And if including such a "puzzle?" in a scenario like this makes it one, when in some cases all I did was generate a puzzle and include the first one I got with no assistance or curation... then how would that be different than someone generating that puzzle and solving it?
My definition of a puzzle would be something like this:
"A puzzle is something that can be solved, that a solver knows when they've solved it, and in which meaning can be found".

"Can be solved" - the puzzle must have a solution. There should typically be exactly one solution - though that solution could be "number of solutions" or "superset of all forced steps" or "an extracted phrase" or anything like that.
"knows when they've solved it" - the solution should be checkable or otherwise clearly correct when found. This can be accomplished through external checkers, but often won't need to be.
"in which meaning can be found" - this can include anything from finding a deeper idea (that was likely specifically embedded) to a particularly nice layout to even just the simple enjoyment of solving a puzzle. Emphasis on can here - just because one person does not find any meaning doesn't invalidate meaning that someone else found here.

None of these parts make a qualitative judgment: a poor quality logic puzzle with a single solution is still a puzzle, it's just not a good one. In the same vein, a generated logic puzzle is only as good as the generator and the effort put into tuning it to give good puzzles, but it's still a puzzle.

Anyway. I used 7x7 Nurikabe to test as it was the largest size I could get from multiple generators and was also supported by pzprrt. A more proper test would probably use larger puzzles, but for a quick test to prove my point this was sufficient.

Under chao's definition, somewhere between 2 and 6 of these are puzzles. When I posed these links, the best accuracy anyone had on identifying sources was 40%: no better than random guessing. In fact, there was one puzzle everyone falsely identified as being human made and one everyone falsely identified as being computer generated!

Two of these puzzles were generated by myamya's generator, taking the first two puzzles. I generated farther to try to find better ones for another category, but couldn't find better - I got lucky.
Four of these puzzles were generated on puzzle-team's website. I took the first two puzzles I got in one category, then continued generating for a while and picked two I found interesting (a "curation" step).
Two of these puzzles were made with heavy assistance from semiexp's pzprrt tool, where I put down values and question marks arbitrarily until I had a puzzle that worked, and then I included it.
The last two of these puzzles were handmade using no external tools other than validating that there was a single solution. (for blog puzzle numbering, I am counting these last two categories as my own)

These links are presented in alphabetical order of URL.


(for best experience, take a guess yourself before looking)

None of this is to say that chao's definition doesn't have its good points, and it certainly describes what goes into a great puzzle well, but like I said above. When the definition results in something being a puzzle to some people and not to others, or when a change of unknown context changes it as well, I just can't think it's a good definition.

No comments:

Post a Comment