Third try - score Policy
In part 1, I played the 2048 game by moving into a preferred direction, and i called it a direction policy algorithm. Another simple way of choosing the direction to move in is to look at the scores a move in each direction will result in, and choose the direction with the highest score difference.Here's an example of what the score differences for a move in each direction look like. It's obvious that it will always be the same for left and right and for up and down.
So we still don't know which direction to move in - there are always at least two equal "good" moves.
We can solve that problem by again, applying a direction policy (moving rather into one direction than into the other). For example, prefer left over right, and up over down.
But still, there's another problem: what shall i do if there's no difference in score between horizontal and vertical moving?
I will overcome that problem by applying another direction policy, which will only be taken into consideration when the score difference gives us no hint whether we should move horizontal or vertical.
This gives us even more different possible combinations than the original direction policy, 24 to be exactly (faculty of 4, also known as 4!).
I tested all of them, and here are some of the results:
The worst one scored 1777.27 in avg., and the best one scored 4206.12 . Interestingly, there was no policy which was worse than random. And the best was almost 4 times as good as random guessing, just by looking at 2 possible next boards (one for horizontal and one for vertical) each move.
Here you can see the change of direction probabilities over each rank (higher rank = higher score), the blue line represents the most chosen direction, the orange one the second most chosen direction, the yellow one the third most chosen direction, and the green line the least chosen direction.
Here is the Score over Rank:
Although i managed to get a score 4 times as high as random playing, it is still far away from the 2048 tile, which occurs at around 20000. And i could further hand-code rules and make the algorithm longer and longer, and test more and more combinations, but thats boring and time-consuming. In my next post, i will disclose what i did about that.