## Statistics 101

China Supakorn and Jane Cohen

“Statistics 101” in this blog means an introduction to statistics in daily life. Statistics is a process to convert data into a set of equations that can help us solve problems. This science can help us understand our past and make predictions about the future. Using statistics, we can analyze data in different fields to monitor changing patterns, then use this analysis to draw conclusions and make forecasts. In this blog, we briefly review the history of statistics from past to present to understand its impact on our daily lives.

A brief history of statistics

During the 17th to 18th century, “Statistics” had gradually developed, and a lot of work was completed and announced at the end of the 19th century. Sir Ronald Fisher, one of the fathers of modern statistics, showed how statistics can be used to analyze very complicated data sets, and developed many of the methods that we still use today. He also founded the Rothamsted Statistics Department where was first developed. Today, statistics are integrated into science, engineering, agriculture, medicine, the arts and other diverse fields of study. It is frequently used in politics, one well known example being when the American statistician, Nate Silver, developed a forecasting system developed from one of Fisher’s ideas to successfully predict the results for all 50 states in the 2012 U.S. presidential election.

Some statistical techniques in daily life

Statistics play a big role in our daily lives, even without us knowing. Here are just a few examples.

1. Census

A census is used to collect information about members of a population. The term mostly applies to national information, although a census can also refer to a survey of precise, small populations. For example, we could take a census of pig producers in the Northern part of Thailand, or musicians in European countries, or people aged 80 and above in Japan. An example which impacts on trade is the annual economic census. Data are collected from individual businesses and are then compared and summarized. This information is then used to measure trends, and create estimates and forecasts, which allow businesses and policymakers to plan their business activities for several years ahead.

1. Sampling

It is not always possible to collect data from every member of a population, so often a smaller sample is collected. For example, every few years data are collected by the World Health Organization (WHO) and the Food and Agriculture Organization of the United Nations (FAO) to learn about human health and agricultural products in different countries. These Government organizations use smaller, random samples to try to understand the characteristics of the whole population.

1. Prediction

Prediction using regression is a method of forecasting based on previous data. Researchers record data for the dependent (target) variables and independent (predictor) variables. This data is then used to create a best fit model to show the relationship between the variables, with one variable predicting the values of the other. A vivid example is portrayed in the 2011 movie “Moneyball”. In this dramatized version of real-life events, Billy Beane uses predictive analytics to dramatically improve the outcomes of his low-performing major league baseball team so that they become one of the highest performing teams in the U.S.

The importance of statistics in daily life

In the 21st century, more data are collected about our daily life than ever before. As computers become more powerful, we can easily analyze and interpret ever larger datasets. Statistical analysis is becoming increasing important in many research fields, allowing us to fully understand and disseminate ideas not just to our peers, but to the wider population. A few examples will illustrate this point.

1. Medical science

News reports on health and disease often cite statistics starting the devastating impact on populations; indeed, as I type this in March 2020, the worldwide media is feeding us daily updates on the outbreak of the Corona virus Covid-19 and its mortality rate. If the news simply reports the number of people who either have the disease or who have died from it, it is an interesting fact; however, it might not mean much to your life. When statistics become involved, people have a better idea of how that disease may affect them personally. For example, the WHO[i] (2018) reported that the number of people with diabetes has risen from 108 million in 1980 to 422 million in 2014. The WHO reported that diabetes was the seventh major cause of death in 2016. These studies suggest that 85 to 95 percent of diabetes cases can be prevented by eating a healthy diet, having regular physical activity and maintaining normal body weight.

1. Social Science

Statistics in social science is both science and art. It is used visually to present and compare data in the form of histograms, pie charts, and other graphs. It is used to monitor and improve the quality of products and processes within a business organization. Statistical analysis is essential for the development of social science theories, being used to test their validity through a robust analysis of real-world data. In political science, data on presidential elections and political parties, public opinion and voting, social media for promoting policy, etc. are evaluated and predicted using statistics. Presidential and prime ministerial candidates conduct polls to determine their campaign strategies. Many political consultants have developed models to predict the winners of elections. The list of uses goes on.

1. Agriculture

Statistics have always been an essential part of biological and agricultural development and research. The increasing size and complexity of the datasets have culminated in the development of new statistical analysis methods. For example, genomic selection in plant and animal breeding is the process of identifying superior individuals based on breeding values predicted from the analysis of molecular marker loci. Statistical analysis involves a very large number of markers and relatively observations. Nowadays, breeders can assess the efficiency of genomic selection and use this to guide phenotypic selection.

Large datasets cannot easily be analyzed by hand, but in the last few decades, the use of statistical software has made data processing more convenient. Indeed, most introductory statistics classes will teach students how to use one or more statistical software package to aid analysis and interpretation. In any research field, the use of statistical software for data analysis is inevitable. With such a broad range of free and paid packages available to implement statistical methods, how do you choose the right one for you or your organization?

[i] Reference

World Health Organization. 2018. Diabetes. Available from: https://www.who.int/news-room/fact-sheets/detail/diabetes (October 30th, 2018).