Your question, and the various answers, focus on digital (logic state machine) behaviors.
I suggest you start with a FULL state table, or logic table if no flip flops are needed, and have your metric be the % of correct responses.
Now, in your question, you have already tried this. As you note, for certain non-solutions the addition of an output inverter is all that is needed to be completely successful. Your question then becomes "how can we have learning, unless the optimal answer, or at least one successful answer, is already known?"
which is a superb question.
How do humans do learning in unknown environments? They try something, anything, in small pieces, in large pieces. They try something.
This means you need to define what are "small pieces" (add an inverter, randomly, in any or all paths.
You also need to define what are "large pieces", and this requires graph surgery, which brings up the need to "understand", or must a system understand? if random exploration is the path to enlightment?
Again, how do humans explore new situations? They make changes and see what happens. That means they need to have a grasp upon Inputs, and Outputs.
When you realized the addition of just one inverter, at output, would be the next and final step, how did you the human come to that conclusion? Likely you explored all possible changes, using your vision to examine a logic table. The table, and your vision/brain ability to find useful patterns (and you've trained your brain to recognize "useful" for digital systems), allow your realization "Oh. Just add an output inverter."
How did your brain get there? breadth/depth/random search.
In the world of RF design, with numerous specs (fidelity, power consumption, filters used) provided, the heuristics become much more flexible.
Again, for logic design, one missing inverter can impair.