Smart Bandit Algorithms in Testing Mobile Banner Design

By Ioana Cailean on May 7, 2015 No Comments

Make smarter choices and watch the rewards roll in with bandit algorithms

This blog post is part of our new regular Trademob Tech series. The series takes a look at Engineering topics facing Trademob and the mobile advertising industry. Today’s post was written by Dr. Alexander Weiß and the Data Analytics team.

Using Bandit Algoriths at Trademob to Test Banner Design

When it comes to real-time bidding, the key to success is choosing wisely from the available options. For example, it is very important to determine the best of several banner designs that are to be shown to users. The straight-forward approach to find the champion would be an A/B test: show each design to a fixed number of users and count the number of positive reactions. These could be clicks, installs or any other intended actions by the users. The design triggering the most reactions wins.

However, there is a drawback with this approach. If you have 10 designs and you give 1,000 trials each, you know beforehand that you will waste 9,000 trials on suboptimal designs. A smarter approach would be to dynamically decide if it is worth it to spend more money on a particular choice, or if this choice has already proven bad enough to be neglected. In other words, the exploration phase should be as short as possible to exploit the findings as soon as possible.

Decision algorithms with this kind of dynamic component are called bandit algorithms. The name derives from another application for this class of algorithms: a gambler in a casino tries his luck with one-armed bandits, the old-fashioned slot machines which were played by pulling the handle or ‘arm’ on the side of the machine. Each slot machine has its own particular success rate which can only be determined by trying. Of course, the gambler also wants to minimize the number of coins he puts into the machines with low success rates.

There are multiple bandit algorithms due to the high number of potential decision rules. A popular example is that of Bayesian Bandit Algorithm. For each design, we have the number of trials and number of successes. Assuming that the trials are independent with a constant success rate, we can use the two parameters to determine a distribution for the success rate, which will be a Beta Distribution. Once we have the distribution for each design, we sample a value with respect to its particular distribution. In the next trial, we select the design that has the highest drawn value. Since the Beta Distribution’s variance decreases for an increasing number of trials while its mean converges to the empirical probability, we have a smooth transition from exploration to exploitation.

Bandit Algorithms Banner Testing Trademob

Various Beta Distributions. Red: 30 trials, 10 successes; Green: 50 trials, 10 successes; Blue: 110 trials, 10 successes. With more and more trials the variance decreases and the mean converges to the empirical probability.

The challenge of selecting the best design becomes harder if the banner has several dimensions – size, color, position on screen – which we want to take into account one by one. Under the assumption of independence among the input dimensions, a Bayesian Regression Model can be used to translate the bandit approach to this setup. The number of trials and the number of successes for each input dimension are now used to model Gaussian Distributions. Randomly drawn values for each dimension are then summed up and mapped to a value between 0 and 1 to determine a value for a particular banner design (see [1] for details). The design with the highest value is once again our best choice for the next trial.

Bandit Algorithms Banner Design Trademob

Multidimensional Approach. Each value within a dimension is modelled with a Gaussian Distribution. From each of them a random value is drawn. The sum of these is mapped by a Sigmoid Function to a value between 0 and 1.

In summary: don’t waste money with A/B tests to find the most effective banner designs when smart bandit algorithms can do this faster, cheaper and more efficiently. This allows for better decision making when it comes to real-time bidding, resulting in more successful campaigns –  and hitting the jackpot every time.
[1] http://research.microsoft.com/apps/pubs/default.aspx?id=122779