Yaml
authors: 
- ["Raphael Diener", "https://twitter.com/diener_raphael"]
last-edit: 2024-01-10
tags: 
- testing
- programming
- test-category
- nondeterministic

A/B Testing
...

A/B Testing is a form of nondeterministic blackbox testing,
that compares the quality of a change (version B) to the current version (version A) of the software.

A/B testing is more common in online applications,
since they are meant to be maintained for a long period on the masses for statistical sound assessments are there.

Default A/B Setup
...

The most basic set-up for A/B testing consists of 4 Parts:

The new session establishment happens through a load balancer which distributes incoming traffic based on the testing strategy (50/50, 10/90, ...).
After the new connection is established, the user interacts with the service directly.

The two services A and B then communicate the relevant telemetry to a data base in which you collect the results of your experiment.

In some cases the distinction can be done within the application.
If you are developing a browser game, you could implement the switch inside the code you ship (based on the devices mac address for example) thus eliminating the need for an extra server instance.

After a certain period of time you then evaluate the data received in your data base.

This approach can be scaled to worth with any amount of changes you want to test:

Testing multiple changes at once
...

The most simplest approach to test multiple changes at once is to launch the whole matrix of permutation and distribute the users on them equally.
If you want to test two independent changes simultaneously that requires five parallel running versions. One for the baseline and four for the experiment.

With each split you introduce, this will reduce the statistical soundness of the experiment, since you will have less and less users testing the change.
Hence it is not advised to test more than one thing at a time this way.

Reliability
...

Results gathered from A/B tests are extremely time sensitive.
Since they are only based on observation, they describe current behaviors and trends, not of underlying principles.

This is why you can run the same test 5 years apart and the first result suggests change A while the later suggests change B.

This can partially be explained by the fact that change is generally perceived as something good, if its done under the right pretense.

A/B Testing...

Default A/B Setup...

Testing multiple changes at once...

Reliability...

Sources...

A/B Testing
...

Default A/B Setup
...

Testing multiple changes at once
...

Reliability
...

Sources
...