Implementing a Beeswarm plot in Mathematica

26

13

I am looking for a Beeswarm plot implementation in Mathematica.

Consider the following data:

data = {RandomVariate[NormalDistribution[], 100], RandomVariate[NormalDistribution[], 100]};

Let’s visualize the data as a simple 1D scatter plot:

disperse = 0.1
ListPlot[MapIndexed[{#2[[1]] + 
 RandomReal[{-disperse, disperse}], #1} &, data, {2}], 
 PlotStyle -> {Red, Green, Blue}, PlotMarkers -> {"\[CircleDot]"}, 
 Axes -> None]

We get the following 1D scatter plots (sometimes known as “Stripcharts”). In this example the vertical coordinate of points corresponds to data while the horizontal one is random.

1-D Scatterplot or Stripchart

Sometimes it would make sense to look at the distribution of the data in addition to the data points i.e. look at all of the data points in individually in a non-overlapping manner. A Beeswarm plot does exactly this.

For instance, the data in a stripchart could be re-arranged in a pleasing manner so that one would be able to understand the underlying data without resorting to statistical analysis. For instance, the figure below demonstrates some sample Beeswarm plots.

Sample Beeswarm plot

Another property of a beeswarm plot is that individual data points can be colored individually allowing for the user to understand further delineation of data. The plot below shows an example of a beeswarm plot where subsets of data are colored differently.

Beeswarm plot with individually colored markers

I would like to put it up to the Mathematica gurus to advise on the best manner to generate such charts. Sample working code, of course, would be great!

Pam

Posted 2014-02-19T22:21:04.300

Reputation: 1 759

I suggest you start by looking at these Q&As: http://mathematica.stackexchange.com/questions/tagged/packing

– Mr.Wizard – 2014-02-19T22:25:28.650

Also, and perhaps more relevant: http://mathematica.stackexchange.com/q/2594/121

– Mr.Wizard – 2014-02-19T22:26:59.360

5If you concisely explain what is a "beeswarm plot", it'll increase your chances of getting an answer. People are less likely to become interested if they have to look up various R functions just to understand the question. The page you linked to is not too clear to someone unfamiliar with R or its "stripchart". – Szabolcs – 2014-02-19T22:34:10.657

A Beeswarm plot is a 1-D scatter plot: with two special properties: points are represented as non-overlapping circles and circles (data markers) are closely packed – Pam – 2014-02-19T23:29:12.063

@Pam Please include that in the question body itself. Also, directly imbedding an image or two as examples would be helpful. – Mr.Wizard – 2014-02-19T23:41:13.047

1Sorry, I downvoted temporarily until a reasonable explanation of a Beeswarm plot is included in the question – Dr. belisarius – 2014-02-20T00:10:10.727

@Pam Please also explain what sort of data is used to generate such a plot, and how it's used exactly. Is it the circle coordinates? – Szabolcs – 2014-02-20T00:13:18.847

1

may be you can use RLink to call that R function from M? https://reference.wolfram.com/mathematica/RLink/guide/RLink.html and do you do not have to have an implementation in M for it.

– Nasser – 2014-02-20T00:20:42.020

1

I think this could have been one of those 20+ upvoted questions with several interesting implementations, if only you had put in a little effort to make the question clear and actually attract some answers. People love graphics. Remember that the answers are only as good as the question, and the answers will only have as much effort put into them as the question itself. It's a big missed opportunity.

– Szabolcs – 2014-02-20T16:41:00.540

Szabolcs: See my edits. I hope this is sufficient. – Pam – 2014-02-20T21:15:32.633

@Pam Yes, it's much better now. – Szabolcs – 2014-02-20T21:48:33.907

1In the bottom picture, are the "normal" and the "uniform" switched? – bill s – 2014-02-20T23:43:59.350

Ok, removing my downvote an upvoting. Great work. – Dr. belisarius – 2014-02-21T00:57:58.317

@bills They're definitely switched. – Szabolcs – 2014-02-21T02:27:25.617

Answers

20

This is a solution based on interval operations.

Usage and examples

First, let's look at how to use the function. The code is at the end.

Let's generate some sample data and plot it:

data1 = RandomVariate[ExponentialDistribution[1], 200];
data2 = RandomVariate[NormalDistribution[2, 1], 200];

beeswarmPlot[data1]

Now let's plot two together:

beeswarmPlot[{data1, data2}]

We can also specify the circle radius explicitly, in plot coordinates:

beeswarmPlot[data2, 0.2]

Or we can change the colour while keeping the radius selection automatic:

apricot = RGBColor[1.`, 0.340007`, 0.129994`];
cornflower = RGBColor[0.392193`, 0.584307`, 0.929395`];
beeswarmPlot[{data1, data2}, Automatic, PlotStyle -> {apricot, cornflower}]


The code

Note: I'm going for readability here, not performance. Performance can be improved significantly at the cost of readability, which is already impaired by the large amount of code used just for option handling.

I am going to use these helper functions:

intervalInverse[Interval[]] := Interval[{-Infinity, Infinity}]
intervalInverse[Interval[int__]] :=
 Interval @@ Partition[
   Replace[Flatten[{int}],
    {{-Infinity, mid___, Infinity} :> {mid},
     {-Infinity, mid__} :> {mid, Infinity},
     {mid__, Infinity} :> {-Infinity, mid},
     {mid___} :> {-Infinity, mid, Infinity}
     }
    ], 2]

intervalComplement[a_Interval, b__Interval] := 
 IntervalIntersection[a, intervalInverse@IntervalUnion[b]]

This is the code for calculating the point coordinates and packing the circles. This is the only function that needs to be changed to implement an different packing method.

(* data is assumed to be a sorted vector of numbers *)
beeswarm[data_, radius_] :=
 Module[{points, left, right, int},
  points = {};
  Do[
   int = Interval @@ Cases[points, {x_, y_} /; y > pt - radius :> x + {-1, 1} Sqrt[radius^2 - (pt - y)^2]];
   right = Min[intervalComplement[Interval[{0,  Infinity}], int]];
   left =  Max[intervalComplement[Interval[{-Infinity, 0}], int]];
   AppendTo[points, {If[right < -left, right, left], pt}],
   {pt, data}
  ];
  points
 ]

And this is the plotting function that provides a user friendly interface (option handling) and assembles the final Graphics object.

Options[beeswarmPlot] =
  Join[
   Options[Graphics],
   {PlotStyle -> Automatic}
  ];

SetOptions[beeswarmPlot, Frame -> True];
SetOptions[beeswarmPlot, FrameTicks -> {None, Automatic}];

beeswarmPlot[data_?(VectorQ[#, NumericQ] &), radius : (_?NumericQ | Automatic) : Automatic, opt : OptionsPattern[]] := beeswarmPlot[{data}, radius, opt]
beeswarmPlot[data : {__?(VectorQ[#, NumericQ] &)}, radius : (_?NumericQ | Automatic) : Automatic, opt : OptionsPattern[]] := 
 Module[{r, order, flatData, colours, colfun},

  (* generate colour indices and sort them together with the data *)
  flatData = Flatten[data];
  order = Ordering[flatData];
  colours = Flatten@Table[ConstantArray[i, Length[data[[i]]]], {i, Length[data]}];
  flatData = flatData[[order]];
  colours = colours[[order]];

  (* automatic radius selection *)
  r = If[radius === Automatic, 4 Mean@Differences[flatData], 2 radius];

  (* handle the PlotStyle option *)
  colfun = With[
    {ps = OptionValue[PlotStyle]},
    Switch[ps,
      Automatic, ColorData[1],
      _List, Function[i, ps[[ Mod[i, Length[ps], 1] ]] ],
      _, ps &
    ]
  ];

  (* call the packing function and build the graphics using the result *)
  Graphics[
   MapThread[{colfun[#2], Disk[#1, 0.95 r/2]} &, {beeswarm[flatData, r], colours}],
   Sequence @@ FilterRules[{opt}, Options[Graphics]],
   Frame -> OptionValue[Frame],
   FrameTicks -> OptionValue[FrameTicks]
  ]
 ]

Szabolcs

Posted 2014-02-19T22:21:04.300

Reputation: 213 047

Nice! Will play around with real data and send you some feedback soon! – Pam – 2014-02-21T13:43:33.933

@Szaboics: one nice addition would be the ability to combine or separate datasets. i.e. ability to have multiple distinct beeswarms on a plot … much like the distribution chart shown in the example below… – Pam – 2014-02-21T17:16:37.483

2

@Pam That should be easy to do by post processing the output. Graphics[{First@beeswarmPlot[data1, .05], Translate[First@beeswarmPlot[data2, 0.05], {3, 0}]}, Frame->True]. image It's true that if we're aiming for a complete and polished function, there are so many things one could add, e.g. changing to horizontal orientation, different packing methods, etc. Much of that is not difficult, it's just a bit of work and it would take a large amount of code. I tried to focus on the non-trivial swarm packing here.

– Szabolcs – 2014-02-21T17:23:11.367

Nice. That works well! – Pam – 2014-02-21T18:10:56.667

a trivial question. But can’t seem to get PlotLabel to work with multiple beeswarms on the same graphic… Any thoughts? – Pam – 2014-02-24T01:01:20.900

@Pam Do you need a separate PlotLabel for each beeswarm on the same graphic, or only a single one? Technically, one graphic may only have one PlotLabel, so if you need multiple, we need a slightly different solution. – Szabolcs – 2014-02-24T01:23:48.180

@Pam, does this help? Graphics[{First@beeswarmPlot[data1, 0.05], Translate[First@beeswarmPlot[data2, 0.05], {3, 0}]}, Frame -> True, FrameTicks -> {{Automatic, Automatic}, {{{0, "exponential"}, {3, "normal"}}, None}}, GridLines -> {{0, 3}, None}] – Szabolcs – 2014-02-24T01:28:02.453

Yup… thanks… that works really well. – Pam – 2014-02-24T14:27:31.760

9

It seems to me that the appropriate packing method depends on the data, and normally distributed data don't make sense here because they don't cluster nicely. Here is a simple implementation of square packing and a rudimentary hex packing (it's not quite fully hex because it depends on the number of dots on the rows on either side of the current row). I'm sure there are better approaches.

testintegers = RandomInteger[{1, 30}, 200];

Options[beeswarmPlot] = PackingMethod -> "Square";
SetOptions[beeswarmPlot, PackingMethod -> "Square"];

beeswarmPlot[data : {__?NumericQ}, opts:OptionsPattern[{beeswarmPlot, ListPlot, Graphics}]] := 
 With[{gathered = Sort[Gather[data], First[#1] < First[#2] &], 
   hex = Boole[ToLowerCase@OptionValue[PackingMethod] === "hex"]}, 
  ListPlot[(Join @@ (MapIndexed[
       Transpose[{Range[-Length[#1]/2 - 
            hex Mod[First[#2], 2]/2, (Length[#1] - 1)/2 - 
            hex Mod[First[#2], 2]/4], #1}] &, gathered])), 
   FilterRules[{opts}, Options[ListPlot]], 
   PlotMarkers -> {"\[CircleDot]"}, Axes -> None, 
   PlotRangePadding -> 1, 
   AspectRatio -> (Divide @@ (Length[gathered]/
        Max[Length /@ gathered]))]]  


beeswarmPlot[testintegers]

enter image description here

beeswarmPlot[testintegers, PackingMethod -> "Hex"]

enter image description here

Verbeia

Posted 2014-02-19T22:21:04.300

Reputation: 33 191

8

Versions 8 and beyond offer DistributionChart which resembles the example BeeSwarm plots:

data =
  { RandomVariate[NormalDistribution[],100]
  , RandomVariate[NormalDistribution[],100]
  };
DistributionChart[data]

sample distribution chart

There are numerous styles of chart, selected using the ChartElementFunction option:

Column @ Table[
  Labeled[DistributionChart[data, ChartElementFunction -> f], f, Top]
, {f, ChartElementData["DistributionChart"]}
]

samples of various ChartElementFunction choices

There are numerous other options that affect the appearance of the chart -- see the documentation.

WReach

Posted 2014-02-19T22:21:04.300

Reputation: 62 787

I know and I use DistributionChart and its derivative BoxWhisker chart quite extensively. This does not serve my purpose. While you can see the shape of the distribution it is impossible to color individual data sets. – Pam – 2014-02-20T22:26:20.690

5

Another rich source of display options are the histogram functions. For example:

data1 = RandomVariate[NormalDistribution[0, 1], {500, 2}];
data2 = 5 + RandomVariate[NormalDistribution[1, 1], {500, 2}];
Histogram3D[{data1, data2}, ChartElements -> Graphics3D[Sphere[]], 
 Axes -> False]

enter image description here

Choosing to view this from above (by adding the option ViewPoint -> Above), gives a view that looks something like a beeswarm:

enter image description here

bill s

Posted 2014-02-19T22:21:04.300

Reputation: 62 963

Nice one… but ideally this would be a 2D plot so that it can be overlaid with other plots. For instance a DistributionPlot or BoxWhisker overlaid with a Beeswarm... – Pam – 2014-02-20T22:55:07.850

@Pam use ViewPoint->{0,0,Infinity} + Overlay and it works :) – Kuba – 2014-02-21T01:41:27.203