## Board/Card Game AI - Questions concerning state/action space - Deep Reinforcement Learning

6

2

Ok, I now know how a machine can learn to play to play Atari games (Breakout): Playing Atari with Reinforcement Learning

With the same technique it is even possible to play FPS games (Doom): Playing FPS Games with Reinforcement Learning

Further studies even investigated multiagent scenarios (Pong): Multiagent Cooperation and Competition with Deep Reinforcement Learning

And even another awesome article for the interested user in context of deep reinforcement learning (easy and a must read for beginners): Demystifying Deep Reinforcement Learning

I was thrilled by these results and immediately wanted to try them in some simple "board/card game scenarios", i.e. writing AI for some simple games in order to learn more about "deep learning". Of course, thinking that I can apply the techniques above easily in my scenarios was stupid. All examples above are based on convolutional nets (image recognition) and some other assumptions, which might not be applicable in my scenarios.

Can you give me hints or futher articles, which deal with my questions below? As a beginner, I do not have an overview, yet. Preferably, your suggestions should also be connected to the following areas already: deep learning, reinforcement learning (, multiagent systems)

(1)

If you have a card game and the AI shall play a card from its hand, you could think about the cards (amongst other stuff) as the current game state. You can easily define some sort of neural net and feed it with the card data. In a trivial case the cards are just numbered. I do not know the net type, which would be suitable, but I guess deep reinforcment learning strategies could be applied easily then.

However, I can only imagine this, if there is a constant number of hand cards. In the examples above, the number of pixels is also constant, for example. What if a player can have a different numbers of cards? What to do, if a player can have an infinite number of cards? Of course, this is just a theoretical question as no game has an infinite number of cards.

(2)

In the initial examples, the action space is constant. What can you do, if the action space is not? This more or less follows from my previous problem. If you have 3 cards, you can play card 1, 2 or 3. If you have 5 cards, you can play card 1, 2, 3, 4 or 5, etc. It is also common in card games, that it is not allowed to play a card. Could this be tackled with negative reward?

So, which "tricks" can be used, e.g. always assume a constant number of cards with "filling values", which is only applicable in the non-infinite case (anyways unrealistic and even humans could not play well with that)? Are there articles, which examine such things already?

1For many cases you dont have to build everything from scratch instead you can use openAI Gym – Eka – 2016-10-27T02:31:18.933

I see this as a framework for many algorithms. Which to take? How to solve my problem with the action space? What's behind the observation space and how does it provide an infinite state space with deep learning, i.e. with which algorithm? I would take frameworks, of course, however, my question is also about theoretical background. You should understand a little bit about the technology. For example, in order to specify a meaningful action space and more.. – Stefe Klauou – 2016-10-27T06:54:48.973

– Eka – 2016-10-27T08:29:16.170

5

Instead of having the AI learn what action to take, you can alternatively train it to judge how "good" a position is. In order to determine what move to make, you don't ask the AI "This is the current state, what move should I make", you iterate through all possible moves, and feed the the resulting state into the AI asking "How good do you think this new state is?". You then chose the move with the resulting state that the AI liked best. (Or you probably even can combine this with a traditional MinMax approach) I'm new to this area myself, but I'd guess you would use this approach when the action space is large, and in particular when most possible actions are not a legal option in most states.

4

1. Filling values is totally fine. In the case of image recognition the filling will be the background of the image (examples). For example in Belot you have total of 32 cards, which can be 32 boolean features. You can set the ones the player has to 1, while the rest are 0. Note that the in most games you'll need more features than the cards in your hand. I.e number of the round, cards that have been played so far, calls that have been made etc.
2. Defining the scope of the "action space" will be specific to the game. For Belot, it can be number encoding for each of the 32 cards.

You can find articles via Google. Here is a paper about ML for a card game. Instead of articles, I'd recommend checking out a course on ML (i.e. Coursera and Udacity have good free online courses).

Thanks already. You are right concerning (2). At least, it should be the best way for most cases. However, I think that it is probably not the perfect way for some specific scenarios and I guess that I have one when choosing my first example. In my game (as a human) I would say that playing number "33" is best. If I don't have, then number "44". If I don't have etc. This is because of side effects in the game mechanics. For sure, creating heuristics for the current cards in the hand would help a lot, but isn't this like creating part of the AI already? So this would not be a generic solution? – Stefe Klauou – 2016-10-27T07:06:03.423

I had something a bit different in mind, but I messed it up trying to generalize it. I'll edit the answer with a suggestion that's fitting to some card games, but not an option for others. – Iliyan Bobev – 2016-10-27T11:14:01.567

Yes, the action space must be defined game specific. However, I am wondering about a general strategy to handle my question. For Belot 32 actions can be specified. But is that all? Do I have to care, if a specific action can be taken or not (e.g. the card is currently available). I can think about two scenarios: (a) allow this, but use a negative reward and no state change, (b) do not allow this, retry until something useful can be done. There might be more strategies and I don't know which is the best concerning performance, training quality etc. Therefore, my question! – Stefe Klauou – 2016-11-02T08:34:26.443

The rules of the game should be enforced by separate mechanism. You have to be optimizing the choice from the available set of actions, not from all encoded actions. So it's option (b), but instead of "retry", I'd say "filter" the optimum from the available actions. – Iliyan Bobev – 2016-11-02T13:29:35.047

3

Considering your use case, I would not use Deep Learning methods... what is the point?

Instead of just winning, good AI is fun to play with. In practice when fine tuning game mechanics, you will want to analyze the game for churning events. Then it would be nice, if you could show the AI that "Hey, this is messed up, could you come up with a nice way of playing, when this situation happens?" and then the AI would be like "Okay, sure, I didn't know, that me winning all the time was not, what humans considered fun... I'll be more fun next time, while also trying to win".

Lately I have been toying around with Computational Creativity and specifically Partial Order Causal Link planners (POCL) and Agents.

POCLs attempt to create plans, which fulfill goals; this makes them computationally effective as they only need to fulfill a flaw in a goal (having best possible cards on the table) and iterate towards the initial condition (specific cards on table and some cards on hands etc.). I believe, that with Conflict driven POCL you could easily introduce bluffing. I have written POCL algorithm in declarative way, so you don't have to code the action space, but instead have them configured by using modal logic.

Then you would have agents, who would use Plan artifacts generated by the POCL algorithm in order to play in a fun way (evaluation function of the Agent), while also trying to win (search towards best odds for winning). The fun thing with Agents is, that you can compose them and discover personalities easily; I have no idea how Deep Learning methods would provide that easily.

So, by using POCL and Agents, you could first teach the Agents to win efficiently or optimize the Plans to provide good "basic moves" by using some heuristic system (like you will do when using Reinforcement Learning). I don't know about the computational complexity issues regarding specific games; however, such POCL algorithms have been implemented, which are context aware (= reduced action space), so if you add a bit more strategic gameplay abstraction, the POCL should be fine (remember to use some kind of damping factors for reducing the path length of plans, in similar way to PageRank).

In all programming a good mental model will make many things a lot easier. With Deep Learning, you will be using image recognition or similar algorithms / methods to solve a different problem, because nobody is preventing you from using wrong tool for the problem at hand. In real games, there are players (Agents), strategies (POCL plans), bluffing (POCL conflicts) and rules (action space of POCL defined by modal logic). Of course some games might have computational complexity issues; however, usually those are solvable by minor optimization to the algorithms, which provide a good mental model of the problem.

Nice point. I'm looking to use prior AI victory vs. Human as a switch for trying less-optimal seeming in the subsequent game. It seems to me that for game AI, my ultimate goal is human engagement. For a small subset of players, analogous to Chess masters, engagement may be a function of an unbeatable AI, but for the vast majority of the human player base, losing every time is no kind of fun! In Mbrane, we made a point of designing weak, classical AI's that reason like humans.

– DukeZhou – 2018-01-02T22:54:57.423