I've recently begun revisiting some old course material and putting it online. Some of it regarding basic game theory, e.g. one example on iterated removal of dominated strategies and another on mixed strategies. That made me think of a file where I've kept a few fun puzzles to use as examples.

One of these is on the so called Braess's paradox, which I came across a few years ago while thinking about autonomous vehicles. Indeed Braess originally formulated it while studying traffic flows, though the phenomena is more general, and pops up here and there in different kind of network models. It is worth keeping in mind, especially for AI applications. The 'paradox' can be understood in terms of game theory as well, which is why I had some puzzle notes on it. I decided to dig up my notes and drop them in this blog post with a bit of extra text. Who knows, perhaps it will be revised further and upgraded to the tutorials section some day?

Imagine a large number of cars - \(N\) - going at rush-hour every day from some location \(S\) to \(G\). Assume that these are driven by the usual self-interested and perfectly rational beings of Nash-equilibrium game theory we have seen before. (Also let the cars be either full or that there is no car pooling, if you are nit-picky about details.)

So, this is an \(N\) player game. Each driver has a choice of two routes: they can either choose to take the road from \(S\) to point \(a\), and then to \(G\), or they can go from \(S\) to point \(b\) to \(G\). So, there are two strategies: \(SaG\) and \(SbG\). Moreover, for some roads there is congestion so that the cost of commuting (in for example time) is dependent on how many cars are using the road, while for others the cost is constant no matter how many cars are using that particular stretch.

Denote the costs associated with some road segment \(uv\) as \(t_{uv}\), and for the segments listed above associate the costs as in Figure 1 below. Namely

\begin{align*} &t_{Sa}(x) = \frac{x}{N}\\ &t_{aG} = 1\\ &t_{Sb} = 1\\ &t_{bG}(x) = \frac{x}{N}, \end{align*}where \(x \in [0,N]\) is the number of cars on that road.

In this case each driver wants to *minimize* their cost (keep this in mind, as often games are thought of as maximizing reward).

Now, the value (= total cost) for strategy \(SaG\) is \(v(SaG) = t_{Sa}(x) + t_{aG} = \frac{x}{N} + 1\), and similarly \(v(SbG) = t_{Sb} + t_{bG}(x) = 1 + \frac{x}{N}\).

As you probably can guess from the symmetry of the game and the two value functions, each of the \(N\) players will minimize their cost by picking a route randomly and with equal probability. This is because when, \(x_a = \frac{N}{2}\) cars go over \(a\) and \(x_b = \frac{N}{2}\) over \(b\), then \[\left. v(SaG) \right|_{x = x_a} = \left. v(SbG) \right|_{x = x_b} = 1 + \frac{N}{2N} = 1.5.\]

If more than \(\frac{N}{2}\) cars take either \(Sa\) or \(bG\) then it is beneficial to change strategy because there will be less cost in going on the opposite. So, the best thing to do is to randomize over the two strategies.

All good this far. But now the traffic authority builds a new super-highway-tunnel from location \(a\) to \(b\). It is really efficient and adds no extra cost to the drivers \(t_{ab} = 0\), as in Figure 2.

This means that there is a new strategy available in addition to the previous two: \(SabG\). That is, first drive from \(S\) to \(a\) then to \(b\) and from there to \(G\).

The value function for the new strategy is quite straight forward: \(v(SabG) = \frac{x}{N} + 0 + \frac{x}{N} = \frac{2x}{N}\).

And, as you might have expected, the equilibrium shifts now that the game has changed. Assume that we start out as before with \(x_a = \frac{N}{2}\) cars driving route \(Sa\). Well, this means that the remaining \(\frac{N}{2}\) cars go \(Sb\), and thus \(x_b = \frac{N}{2}\). So, the best strategy for the cars over \(a\) is to go to \(b\), because the additional cost of this is merely \(0 + \frac{N}{2N} = \frac{1}{2}\) (and going straight \(aG\) costs 1). But now the players going \(Sb\) wants to change. Their total cost is still \(1.5\), while the other drivers get only a total of \(1\).

So, randomly going either \(Sa\) or \(Sb\) is not a Nash equilibrium any more. More drivers will want to go to \(a\). How many?

*All of them.*

The new equilibrium is simply that every player chooses strategy \(SabG\). The associated cost then \(v(SabG) = \frac{2N}{N} = 2\). You can check this by verifying that no other route will give a lower cost for an individual driver - it will only lower cost for others. There is thus no incentive to change for a perfectly rational driver.

Note that even the individual cost in this equilibrium is more than before (2 > 1.5). Still, nobody wants to change.

The above is an example of what is called *Braess's paradox* in the study of road networks and traffic flow. It might seem a bit contrived at first, but examples have been reported in traffic. For instance where *closing* roads or highway lanes might actually improve traffic flow. In general where more resources does not necessarily lead to better outcomes.

It is not quite a 'paradox'. In game theoretic terms it is an example of when the Nash equilibrium does not agree with other types of equilibria. Clearly everyone could be better off by (for example) agreeing not to use the new road, or say by allowing a central route planner to direct which road to use.

You might even ask if there is a correlated equilibrium - where each driver accepts *and respects* the outcome of some random process assigning new route, leading to an even lower cost over time.

So, let's associate \(P(SaG) = p_1, P(SbG) = p_2, P(SabG) = p_3\), and \(p_1 + p_2 + p_3 = 1\). Given this distribution, let \(E[X_{Sa}], E[X_{bG}]\) denote the expected number of cars between points \(Sa\), and \(bG\) respectively.

Now we can write the expected commuting cost of a player as \(\bar{v} = p_1 v(SaG) + p_2 v(SbG) + p_3 v(SabG) = p_1 (\frac{E[X_{Sa}]}{N} + 1) + p_2 (1 + \frac{E[X_{bG}] + 1}{N}) + p_3 (\frac{E[X_{Sa}]}{N} + 0 + \frac{E[X_{bG}]}{N}).\)

Noting that \(\frac{E[X_{Sa}]}{N} = p_1 + p_3\), and similarly for route \(bG\), we can write the above as.

\(\bar{v} = p_1 ( p_1 + p_3 + 1) + p_2 (1 + p_2 + p3) + p_3 (p_1+p_3 + p_2+p_3) = (p_3 + p_1)^2 + (p_3 + p_2)^2 + p_1 + p_2.\)

Using, e.g. \(p_3 = 1 - p_2 - p_1\), this can be written as \(\bar{v} = 2 + p_1^2 + p_2^2 - p_1 - p_2\), which has a minimum at \(p_1 = p_2 = 0.5\). Thus, the correlated equilibrium is at \(p_1 = 0.5, p_2 = 0.5, p_3 = 0\).

That is, the best thing to do is to stop using the new road completely and allowing the traffic authority to randomly assign a route, which coincidentally happens to be exactly the same as letting every driver randomize without the \(ab\) connection.

I think Braess's paradox is a pretty neat example of how the existence of choice or resources can result in less efficiency as a whole for systems of 'rational agents'.

It's related to more complex flow problems on graphs/networks, and to other applications than traffic.

That said, I think that this simple example might provide an intuition pump also closer to home, when discussing the future of autonomous vehicles. For it is easy to imagine that many potential instances of this paradox has been dampened by human driver 'irrationality'. Choice of route can be down to scenery, joy of driving, or simply overconfidence in knowing the 'best route', or many other things which makes perfect sense when it isn't thought too much about. But then, if everyone buys a car, wouldn't they want the 'best' car, taking the 'best route', just as all the other cars (and already seen with GPS route planners in many places).

But more on that, perhaps, some other time.

Sometimes it is not possible to find a Nash equilibrium using pure strategies, e.g. using iterated removal of dominated strategies. But, in these cases there will be an equilibrium where one or more of the players is mixing their strategies (for finite games).

To see what I mean, consider the following game where the row player can choose between the strategies **U** and **D**, while the column player can choose between **L** and **R**:

There are no dominating strategies among U and D, neither between L and R. In fact, for any cell in the matrix there is one player who will want to move away from it. (Say the strategy combination U-R is picked first. Well, now the column player is satisfied, but the row player thinks they could do better, changing to D, but then the column player will want to change to L, so then the row player changes to U, and …)

But what if strategies could be mixed in the sense that players pick one at random according to some distribution? Then, if the same game is played over and over the players would want to do well on average, and tune the probability by which they choose between strategies accordingly.

Could the *expected* utility be such that there's a Nash equilibrium (that is a state where none players would not want to pick another of their strategies)?

The strategies U,D, and L,R are called *pure* strategies while a combination of them is called *mixed*. A mixed strategy is essentially a probability distribution over each player's strategies.

Let's assume that the row player chooses U with probability \(p\), then they must pick the other, D, with probability \(1-p\).

In the same way, assume that the column player picks L with probability \(q\) and R with probability \(1-q\).

Like this:

Our task then is to set up some kind of equation system and solve for \(q\) and \(p\).

The key observation is that *for a player to be indifferent to the available strategies, the expected utility those strategies must be equal*. Meaning that the row player should determine \(p\) such that their opponent (the column player) expects the same reward from both their available strategies (L and R).

If this isn't the case, then there is always a 'better' choice for the opponent. So, *to determine the probability distribution for one player we look at which point the opposing player is indifferent*.

In this spirit, let's start by determining the probabilities \(q\) and \(1-q\) for the column player strategies. The above means that for the row player the following must hold \[E\left[U\right] = E\left[D\right].\]

So, what is \(E\left[U\right]\)? Well, with probability \(q\) the column player chooses L, in which case the utility for the row player will be \(2\), and with probability \(1-q\) the column player chooses R, in which case the utility for the row player will be \(1\).

So, if a game is played over and over the row player can *expect* the following reward on average from strategy U.

\[ E\left[U\right] = 2\times q + 1\times\left(1 - q\right) = q + 1. \]

Using the same reasoning we can see that the row player's expected utility for strategy D is:

\[ E\left[D\right] = 1\times q + 4\times\left(1 - q\right) = 4 - 3q. \]

Now, because the row player must be indifferent we know that \[ q + 1 = 4 - 3q, \] and solving it gives \(q = \frac{3}{4}\).

And here is our answer then. The probability for strategy L is \(q = \frac{3}{4}\), while the probability for strategy R is \(1-q = 1 - \frac{3}{4} = \frac{1}{4}\).

For completeness we then do the same for the row player; calculating \(p\), and \(1-p\). Now we need to see what the expected reward is for the column player.

\[ E\left[L\right] = -3\times p + 1\times\left(1 - p\right) = 1 - 4p, \] \[ E\left[R\right] = 2\times p + (-1)\times\left(1 - p\right) = 3p -1. \]

Then requiring the column player to be indifferent \[E\left[L\right] = E\left[R\right],\] \[1 - 4p = 3p -1,\] from which we have \(p = \frac{2}{7}\).

So, the row player picks strategy U with probability \(p=\frac{2}{7}\), and strategy D with probability \(1-p = \frac{5}{7}\).

That's it. Again, note that the probability for a strategy is computed from the observation that the opposing player must be indifferent.

The basis for this document is a game theory exercise session I taught for an AI course. The notes are quite verbose to start with, because I'm trying to explain every step extensively. Moreover some terms are not directly defined even though I try to explain things on a quite basic level. It was assumed that the students had attended an introductory lecture. So the game is on normal form, 'equilibrium' means Nash equilibrium, and so on. Also please note that in the text I use the terms reward, payoff, and utility interchangeably.

Basically, this is an in-depth example, not a formal introduction.

Iterated elimination is about removing strategies which are **dominated** by other ones. A player's strategy is dominated if all associated utility values (rewards) are *strictly less* than those of some other strategy (or a mixing of other strategies, but that can be left out for now).

Games between two players are often written in a so called game matrix. The first (row) player strategies are written as rows and the second (column) player's as columns. The utility/payoff outcome of each combination of strategies are then written in the cells of the matrix, so that the utility for the row player is the first value, and the utility for the column player is the second one.

I've marked it with colours in the example below, where the row player strategies and utilities are marked in blue, while the column player uses black.

Here the row player has three choices: A, B, or C, while the column player has three other: X, Y, or Z. In (basic) game theory the players are often assumed to be perfectly rational and self-interested: they want to do as well as possible for themselves, and don't care for others. Based on that we can ask if it is possible to exclude one or more of the strategies for one or both of the players, and in process simplify the possible outcomes. When one strategy is removed further simplification may become feasible, and so the whole process can be iterated.

To find *dominated* strategies - those which may be removed - we have to compare the utility values for the strategies for the respective player. Let's do that for the matrix above.

**The Row player** can choose any of the rows, and their utility is the first of each pair of values. First, compare **A** (values \(8,7,2\)) and **B** (values \(7,3,1\)): \(8 > 7\), \(7 > 3\), \(2 > 1\). Well, row B is smaller than A in every column. We say that A *dominates* B: If you think about it, this means that whatever strategy (X, Y, or Z) picked by the Column player, if the Row player has to choose between A and B - *it is always better to choose A*!

So we found one dominated strategy. Let's continue. **A** vs **C** : \(8 > 0\), \(7 >0\), \(2 <3\). So here with see that two components of A is greater than C but one is not. Therefore neither does A dominate C, nor does C dominate A.

Then **B** vs **C**: \(7 > 0, 3 >0, 1<3\) - neither of them dominates the other.

**The Column player** can choose between X,Y, Z. So, now we compare the second values against each other for every column.

**X** vs **Y** : \(5>1, 4 > 3, 1 > 0\); so every value of X is greater than the corresponding value of Y. This means that X *dominates* Y. (Because for the column player it is always better to choose X over Y no matter what the row player does.)

**X** vs **Z** : \(5 < 7, 4 > 2, 1 < 5\) - no dominance for either strategy.

**Y** vs **Z** : \(1 < 7, 3 >2, 0 < 5\) - again no dominance.

When we have found one or more dominated strategies these can be discarded from the game.

As we have found two of them in this iteration you may ask *'will it matter which one is removed first?'*

The answer is *no*, as long as there is a *strict dominance*. That is, the utility values of a dominated strategy must always be *strictly less* than those of the dominating one. (If all values are only less or equal it is called *weak dominance* and then the order can matter [so be careful!].) Here, we are only concerned with strict dominance.

Anyway, let's just cross out the dominated strategies from the table.

First **Y**:

Then **B**:

Now we are left with a simplified matrix, and can iterate the algorithm comparing the remaining utilities in the same way.

So compare **A** and **C** again: \(8 >0, 2 < 3\). Still no dominance here.

Then compare **X** and **Z**: \(5 < 7, 1 < 5\). Aha, so now Z *dominates* X. It didn't before, but as the row player has discarded strategy B, the need for that comparison is gone. So, column X can be removed:

Next iteration - we are left with in effect a 2x1 matrix. There is no choice left for the column player - it will always be best for them to choose strategy Z, but the row player may still pick either A or C. So, let's take this for another round and compare **A** and **C**: \(2 < 3\). So, this time around C *dominates* A, meaning this too can be removed:

We are now left with a single cell, and the knowledge that the optimal strategy for the Row player is **C** and the optimal strategy for the column player is **Z**. The associated utility is 3 for the Row player, and 5 for the Column player.

If you look at that reward you can see that no player is able to do better *on their own*. That is, the Row player would not rather choose A or B (given that the column player has chosen Z) and the column player would not rather choose X or Y (given that the row player has chosen C). (Sure, there are better rewards elsewhere, but that would require both players to agree to change, and once there at least one player can get an ever better reward by choosing something else.)

This is iterative elimination which is used to simplify games. Note that this time it took us all the way to a specific combination of strategies which isn't always the case. Often you may be left with multiple strategies per player.

The basis for this document is a game theory exercise session I taught for an AI course. The notes are quite verbose to start with, because I'm trying to explain every step extensively. Moreover some terms are not directly defined even though I try to explain things on a quite basic level. It was assumed that the students had attended an introductory lecture. So the game is on normal form, 'equilibrium' means Nash equilibrium, and so on. Also please note that in the text I use the terms reward, payoff, and utility interchangeably.

Basically, this is an in-depth example, not a formal introduction.

While updating these pages back last summer, I took a brief detour to write an exporter from jupyter notebooks to Emacs orgmode. I haven't written much about it here before, but jupyter notebook is one of my go-to tools for data analysis and prototyping.

At the same time, use orgmode for note taking, journaling, these pages, and so on. Now and then it becomes necessary to convert from jupyter to org. Jupyter notebooks can be exported to org via the fantastic pandoc program. However, I wanted to be able to treat code blocks in jupyter as org-babel source, and also be able to choose how to import latex, markdown, and other content to the org-file. This lies outside the scope of pandoc, so I decided to write my own nbconvert exporter plugin, called nbcorg.

You can install it from pypi using `pip install nbcorg`

.

After installation there will be new org targets are available under the jupyter export menu option, as well in for the standard jupyter nbconvert command line program.

It's still in an early version and has been available on github and pypi since July, but I thought I'd finally mention it here as well.

]]>In the darkening Finnish autumn of 2018 I decided to take part in a seminar on Digital Ethics at Aalto University. It proved to be a good decision - as you probably gather from these web pages, human relations with other computational processes is something I am always curious about - the course provided an interesting perspective based on sociology, design, law and ethics. It was also a lot of fun to engage in discussions with the other participants.

One term which kept coming back in the articles I read for the course was "governance". Very broad, clearly, and especially popular in circles discussing super-intelligent AI, but equally applicable to the wider spectrum of digital technology discussed during the seminar. Other terms that kept returning was "accountability", "fairness", "transparency", and so on. Terms related to technology performance, but performance not of a task, but in society.

I found that quite interesting, because as a species we humans are very dependent on our tools and technology. On a large scale, governance is about trying to steer that entwined and inseparable dynamic system of matter and though.

On the largest scale it is about playing an infinite game.

But, on the scale of that seminar, of digital ethics, it is about those terms - how to build technology for today, and for next year, which is fair, transparent, accountable, and so on. This is equal portions business and engineering. As much judicial matters as choices of design.

So, for my course project I decided to look at governance of such technology intended to supplement or replace human functions (a traditional reason behind new technology). It resulted in a mini-review, where I tried to use proposed approaches to governance to identify concerns of performance, and finally to base a discussion of how performance along those axis may be a basis for how a technology adapts or disrupts society.

I presented the poster at the FCAI-days 2018. It has soon been a year since I presented the work which somehow reminded me to put it up here.

The site has revamped. Including the RSS, I hope I did not break your feeds.

I didn't post anything on these pages for a very long time. Couple of years, I think. I might write an update one of these days.

In any case, around mid summer I found myself with an unplanned few weeks of nothing and decided to spend one of them revamping the web pages. The first results is what you see here. Most of the old stuff is till around, but I now rely on static pages (generated from org-mode sources, because that is how I have been taking notes the last few years).

WordPress, which the site ran on previously, worked brilliantly for most things, and I am sure my own quite limited knowledge and skill (not to mention interest, sadly) in web design can never reach some of the aesthetics of the most basic plugins I could just have dropped in under WP.

Still, I had gone around mulling a revamp for a couple of years already. First, my WP login pages got hammered, day and night, by scripts trying to break in. Of course I employed IP blocking, et c, et c, and had a strong password. Nobody ever got it, but it was annoying to see the logs over and over. Second, PHP has always turned me off as a programming language, so instead of fixing some issues I had with the previous page I just felt apathy and ran with it. (Well, not sure I will be saner with Emacs lisp, but at least I feel that I can script-process my static pages, if I really want to.) Third, given that this is just me putting out words in the void, WP was overkill.

Yeah, perhaps I'll get famous one of these days and really regret not having a comment section. Feel free to point it out when that happens.

.L

]]>