Andreas' Blog: März 2016

Third try - score Policy

In part 1, I played the 2048 game by moving into a preferred direction, and i called it a direction policy algorithm. Another simple way of choosing the direction to move in is to look at the scores a move in each direction will result in, and choose the direction with the highest score difference.

Here's an example of what the score differences for a move in each direction look like. It's obvious that it will always be the same for left and right and for up and down.
So we still don't know which direction to move in - there are always at least two equal "good" moves.
We can solve that problem by again, applying a direction policy (moving rather into one direction than into the other). For example, prefer left over right, and up over down.
But still, there's another problem: what shall i do if there's no difference in score between horizontal and vertical moving?
I will overcome that problem by applying another direction policy, which will only be taken into consideration when the score difference gives us no hint whether we should move horizontal or vertical.
This gives us even more different possible combinations than the original direction policy, 24 to be exactly (faculty of 4, also known as 4!).

I tested all of them, and here are some of the results:

The worst one scored 1777.27 in avg., and the best one scored 4206.12 . Interestingly, there was no policy which was worse than random. And the best was almost 4 times as good as random guessing, just by looking at 2 possible next boards (one for horizontal and one for vertical) each move.

Here you can see the change of direction probabilities over each rank (higher rank = higher score), the blue line represents the most chosen direction, the orange one the second most chosen direction, the yellow one the third most chosen direction, and the green line the least chosen direction.

Here is the Score over Rank:

I would love to paste my excel sheet with all the data in here, but the formatting gets weird when i do that.

Although i managed to get a score 4 times as high as random playing, it is still far away from the 2048 tile, which occurs at around 20000. And i could further hand-code rules and make the algorithm longer and longer, and test more and more combinations, but thats boring and time-consuming. In my next post, i will disclose what i did about that.

I am Andreas Faust, a Mechatronics Apprentice at Balluff GmbH. In my free time, i fool around in C++ with genetic algorithms and neural nets. I will post some of the stuff i coded so far and will post stuff that i am working on right now.
I got into the thematics one year ago when i was trying to figure out a way to create an algorithm that plays the famous game 2048.
It took me a long time to create a reasonable good algorithm, and my best solution so far is not a neural net, but rather a kind of brute-force tree search, but i will talk about that later on.

First Try: Random Playing

The simplest algorithm one could think of would be one that chooses moves randomly. It also gives a reference for all other, more advanced algorithms.
Here is the console output for 1000000 random games:

Playing 1000000 games...
avg score: 1131.2

So, the average score of playing 2048 with random moves is about 1130. This gives a frame of reference for all other tries.

Second Try: Playing with a direction policy

A very forward attempt would be an algorithm that has an order of preferred directions. This is very simple to implement. In pseudo-code, it would be something like this:

if you can move left, move left,
else
if you can move up, move up,
else
if you can move right, move right,
else
if you can move down, move down,
else
you lost.

Playing 6000 games...
avg score: 2251.3

As you can see, the average score nearly doubled. The reason i "only" let it play 6k games was that its enough for the average to stabilize.

Remember, the algorithm doesn't look at the Board at all, it only has a policy which it follows strictly. Policies are very important in decision-making algorithms. Here, our policy is hand-coded, but there are algorithms out there which can learn their own policies.

Very interesting is also the distribution of directions it moved in. Since it can't always move left, it has to move up sometimes, and when neither can move up nor left, it has to move right. The same applies for moving down.

Here is the distribution plotted:

The numbers were as follows:

1 left: 910310 74.862 %
2 up:    251313 20.667 %
3 right: 54236 4.460 %
4 down: 115       0.009 %

total:   1215974 moves

Second try the second: another direction policy

There are 3 possible direction policies which will behave differently. If we look at the pseudo-code again, we can exchange left with right and up with down and it will be still the same policy, since we can just look from another angle onto the Board and left will change places with right and up with down.
So, left and right are equal and up and down are equal. I will now abbreviate vertical moves with v and horizontal ones with h .
So far we had the order of vhvh which includes hvhv as equivalent, since we can just turn the board by 90 degrees and get the vhvh again.
But what about vhhv ? It includes hvvh because of the 90 degree thingy.
Another one would be vvhh/hhvv .
So all three possible different direction policies are:
vhvh, vhhv and vvhh
And here are all their different average scores over 6000 games:

vhvh 2251.3
vhhv 2545.2
vvhh 782.3

The results are quite interesting. Apparently, the best direction policy is vhhv with a small advantage over vhvh . But what i think is even more interesting, is, that vvhh performs even worse than random playing (avg. 1130). It's a good policy to end the game quickly.

Here are the different methods plotted:

And here with percentage of moves:

Andreas' Blog

Montag, 14. März 2016

Algorithm solving 2048 - Part 2

Third try - score Policy

Donnerstag, 10. März 2016

Algorithm solving 2048 - Part 1

First Try: Random Playing

Second Try: Playing with a direction policy

Second try the second: another direction policy

Über mich