I am Andreas Faust, a Mechatronics Apprentice at Balluff GmbH. In my free time, i fool around in C++ with genetic algorithms and neural nets. I will post some of the stuff i coded so far and will post stuff that i am working on right now.
I got into the thematics one year ago when i was trying to figure out a way to create an algorithm that plays the famous game 2048.
It took me a long time to create a reasonable good algorithm, and my best solution so far is not a neural net, but rather a kind of brute-force tree search, but i will talk about that later on.
Here is the console output for 1000000 random games:
Playing 1000000 games...
avg score: 1131.2
So, the average score of playing 2048 with random moves is about 1130. This gives a frame of reference for all other tries.
if you can move left, move left,
else
if you can move up, move up,
else
if you can move right, move right,
else
if you can move down, move down,
else
you lost.
Playing 6000 games...
avg score: 2251.3
As you can see, the average score nearly doubled. The reason i "only" let it play 6k games was that its enough for the average to stabilize.
Remember, the algorithm doesn't look at the Board at all, it only has a policy which it follows strictly. Policies are very important in decision-making algorithms. Here, our policy is hand-coded, but there are algorithms out there which can learn their own policies.
Very interesting is also the distribution of directions it moved in. Since it can't always move left, it has to move up sometimes, and when neither can move up nor left, it has to move right. The same applies for moving down.
Here is the distribution plotted:
The numbers were as follows:
1 left: 910310 74.862 %
2 up: 251313 20.667 %
3 right: 54236 4.460 %
4 down: 115 0.009 %
total: 1215974 moves
There are 3 possible direction policies which will behave differently. If we look at the pseudo-code again, we can exchange left with right and up with down and it will be still the same policy, since we can just look from another angle onto the Board and left will change places with right and up with down.
So, left and right are equal and up and down are equal. I will now abbreviate vertical moves with v and horizontal ones with h .
So far we had the order of vhvh which includes hvhv as equivalent, since we can just turn the board by 90 degrees and get the vhvh again.
But what about vhhv ? It includes hvvh because of the 90 degree thingy.
Another one would be vvhh/hhvv .
So all three possible different direction policies are:
vhvh, vhhv and vvhh
And here are all their different average scores over 6000 games:
vhvh 2251.3
vhhv 2545.2
vvhh 782.3
The results are quite interesting. Apparently, the best direction policy is vhhv with a small advantage over vhvh . But what i think is even more interesting, is, that vvhh performs even worse than random playing (avg. 1130). It's a good policy to end the game quickly.
Here are the different methods plotted:
And here with percentage of moves:

I got into the thematics one year ago when i was trying to figure out a way to create an algorithm that plays the famous game 2048.
It took me a long time to create a reasonable good algorithm, and my best solution so far is not a neural net, but rather a kind of brute-force tree search, but i will talk about that later on.
First Try: Random Playing
The simplest algorithm one could think of would be one that chooses moves randomly. It also gives a reference for all other, more advanced algorithms.Here is the console output for 1000000 random games:
Playing 1000000 games...
avg score: 1131.2
So, the average score of playing 2048 with random moves is about 1130. This gives a frame of reference for all other tries.
Second Try: Playing with a direction policy
A very forward attempt would be an algorithm that has an order of preferred directions. This is very simple to implement. In pseudo-code, it would be something like this:if you can move left, move left,
else
if you can move up, move up,
else
if you can move right, move right,
else
if you can move down, move down,
else
you lost.
Playing 6000 games...
avg score: 2251.3
As you can see, the average score nearly doubled. The reason i "only" let it play 6k games was that its enough for the average to stabilize.
Remember, the algorithm doesn't look at the Board at all, it only has a policy which it follows strictly. Policies are very important in decision-making algorithms. Here, our policy is hand-coded, but there are algorithms out there which can learn their own policies.
Very interesting is also the distribution of directions it moved in. Since it can't always move left, it has to move up sometimes, and when neither can move up nor left, it has to move right. The same applies for moving down.
Here is the distribution plotted:

The numbers were as follows:
1 left: 910310 74.862 %
2 up: 251313 20.667 %
3 right: 54236 4.460 %
4 down: 115 0.009 %
total: 1215974 moves
Second try the second: another direction policy
There are 3 possible direction policies which will behave differently. If we look at the pseudo-code again, we can exchange left with right and up with down and it will be still the same policy, since we can just look from another angle onto the Board and left will change places with right and up with down.
So, left and right are equal and up and down are equal. I will now abbreviate vertical moves with v and horizontal ones with h .
So far we had the order of vhvh which includes hvhv as equivalent, since we can just turn the board by 90 degrees and get the vhvh again.
But what about vhhv ? It includes hvvh because of the 90 degree thingy.
Another one would be vvhh/hhvv .
So all three possible different direction policies are:
vhvh, vhhv and vvhh
And here are all their different average scores over 6000 games:
vhvh 2251.3
vhhv 2545.2
vvhh 782.3
The results are quite interesting. Apparently, the best direction policy is vhhv with a small advantage over vhvh . But what i think is even more interesting, is, that vvhh performs even worse than random playing (avg. 1130). It's a good policy to end the game quickly.
Here are the different methods plotted:

And here with percentage of moves:

Keine Kommentare:
Kommentar veröffentlichen