**Data visualization cheat sheet**

*C Supakorn*

It is often said that “A picture is worth a thousand words”. Words that are enhanced with appropriate graphs reduce or remove the need for lengthy explanations. In statistics, a rule of thumb for effective communication is to present numbers pictorially using charts and graphs. Different kinds of numbers require different kinds of plots, charts, and graphs. However, it is not an easy task to select a suitable graph for fitting datasets.

There are two main objectives for using data visualization in statistics:

- A visual context enables viewers to more easily detect patterns, trend or correlations.
- Pictures can make it easier to communicate statistical results to an audience.

To communicate your data effectively, it is essential to use suitable types of plot, graph or chart. This blog offers a guideline for selecting the right plot type for your data. The following list outlines some diagram types commonly used in statistics:

**Items **

**Scatter plot** uses dots to represent the relationship between two numeric variables and shows how being high or low on one numeric variable relates to being high or low on a second numeric variable.

**Dot plot** displays dots to represent individual variables. It can be used with relatively small sets of data groups.

**Line plot** illustrates how variable changes with respect to another variable, for example time. Multiple lines can be plotted to compare different variables.

**Box plot** is used to display the sample distribution of a variable and detect extreme values or other unusual characteristics.

**Q-Q (quantile-quantile) plot** is useful for assessing whether a sample may have come from a particular probability distribution. It can thus, for example, be used to check an assumption of normality.

**Density plot** shows the distribution of a numeric variable.

**Bland-Altman plot** provides an effective way of assessing two different methods for measuring some quantity. It can also show “limits of agreement”, that are intended to represent boundaries on the acceptable difference between the methods.

**Contour plot** shows how variable changes according to two other variables which could, for example, be directions on a map.

**Surface plot** is another way to show how variable changes according to two other variables. It can be helpful in regression analysis for viewing the relationship between a dependent and two independent variables.

**Biplot** provides a graphical representation of the relationships between data units and variables. It is often used to display results from principal components or canonical variates analyses.

**Pie chart** shows the proportions of a variable that occur in different categories.

**Bar chart** shows the number of samples of a variable occurring in different categories.

**Histogram** displays the frequency distribution of a quantitative variable. Unlike a bar chart, there are no spaces between contiguous columns.

**Stem & leaf chart** is another way of showing the distribution of a quantitative variable. Each number is split into a “stem” (the first digit or digits) and a “leaf” (the next digit or digits). The chart has a row for each stem, containing the leaves of the numbers with that stem.

**Radar chart** or **star pl****ot** displays two or more quantitative variables in the form of a two-dimensional chart. There are axes for the variables, evenly distributed from 0 to 360 degrees. There is a plot for each individual sample, in which its values along each axis are joined by straight lines to form a polygon. The shapes of the polygons can be used to identify individuals with similar characteristics.

**Wind rose diagram** plots data, like wind speeds, that are observed at angles around a circle. It is used for meteorology to give a concise view of how wind speed and direction are distributed at a location.

**Trellis diagram** contains a grid of plots generated for different values of one or more categorical variables. It can be used to show how the distribution of a variable or the relationship between two variables changes according to the categories.

Remark: some images are modified from Wikimedia commons (2012)

John Tukey, a famous American statistician, stated that his favorite part about analytics is “…taking boring flat data and bringing it to life through visualization”. There are many criteria for choosing a suitable graph to create powerful presentations. This flow chart can guide you through the process.

Using data visualization in your reports and presentations can help validate the information that is delivered to the audience. A variety of data visualizations can present data in different ways. The general consideration of utilization of a chart, plot, diagram, or graph to present your data depends on your objectives and the sample type chosen.

If you enjoyed this blog post, share it with a friend!