Definition 

A 


Aberrant value 
A value that stands apart from all others, giving the impression
that it does not belong to the same dataset. 

Absolute frequency 
Number of elements belonging to a specified class. 

Accumulated absolute frequency 
The accumulated absolute frequency of index number i is the sum of the
absolute frequencies of the variable values from the first such value
to the ith one. 

Accumulated relative frequency 
The
accumulated relative frequency of index number i is the sum of the
relative frequencies of the variable values from the first such value
to the ith one. 

Arithmetic mean 
The mean is the measure most used to locate the sample centre. It is
obtained by adding together all of the sample elements and dividing
the total by the sample size. 


B 




Bar chart 
A graphical method consisting of marking points along the xaxis, of a
system of coordinate axes, that represent the classes and then
plotting vertical bars, equal in height or proportional to the
absolute or relative frequency, at the respective point on the xaxis.. 

Bivariate data 
A pair of values corresponding to a specific individual or
experimental outcome. 

Bivariate distributions 
Statistical distributions in which two variables are analysed. 

Boxandwhisker plot 
It is a graphical method highlighting certain characteristics of the
sample. The set of sample values located between the 1st and the 3rd
quartiles, Q.25 and Q.75, is represented by a rectangle (box) with the
median indicated by a bar. Two lines then join the rectangle's sides
to the socalled adjacent values. 


C 


Census 
The scientific study of a universe of people, institutions or physical
objects in order to obtain knowledge, analysing all elements, and make
quantitative inferences regarding important characteristics of that
population. 

Certain event 
It is an event that has probability 1 to happen. The sample space is
itself an event, which is considered a certain event. 

Class range 
The difference between the maximum and minimum value of the class. 

Coincidence 
Phenomena with uncertain individual results, but which possess
longterm regularity, making it possible to obtain a general behaviour
pattern. 

Complementary event 
The complementary event of event A is the event corresponding to all
the results of the sample space S that are not in A. 

Contingency table 
A contingency table is a representation of either qualitative data or
quantitative data, especially when it is related to bivariate data,
which is, in other words, data that can be classified according to two
criteria. In a contingency table the rows correspond to one of the
criteria and the columns to the other. 

Continuous data 
Quantitative data that can take the form of all numerical values
contained in the variation interval. 

Correlation coefficient 
A measure of the level of linear association between two variables. 

Cumulative function 
Function of the accumulated frequencies of a dataset. 


D 


Descriptive statistics 
The descriptive study of the data of a sample (or a population), where
the information contained in the dataset is summarised by means of
tables, plots and calculating some characteristics of the dataset /
statistics, in the case of a sample, or parameters, in the case of a
population. 

Deterministic experiment 
A deterministic experiment is characterised by producing the same
result, as long as it is repeated under the same conditions. 

Discrete data 
Quantitative data that can only take the form of a number of finite,
or infinite (though numerable) different values. 

Disjointed events 
Disjointed events or mutually exclusive events are events in which the
occurrence of one of them implies the nonoccurrenceof the other one. 

Distribution with long tails 
The frequencies are distributed so that there are a large number of
classes at the ends, with small frequencies compared to the central
classes. 


E 


Elementary event 
An event that corresponds to a single possible result of a random
experiment. 

Empirical distribution function 
It is
a function F(x) of all x values of R, which generates, for each x
value, the proportion of elements of the sample that are less than or
equal to x. 

Estimate 
The
outcome of the estimator using a specific sample as the basis. 

Estimator 
This is a sample statistic (random variable), the specific values of
which constitute estimates of the parameters in question. See
Statistic (2). 

Event 
It is an element of the possible results of a random experiment, or,
in other words, it is a subset of the sample space S. 

Extreme and quartile diagram 
It is a graphical representation highlighting certain characteristics
of the sample. The set of sample values located between the 1st and
the 3rd quartiles, Q.25 and Q.75 is represented by a rectangle (box)
with the median indicated by a sign. Two lines then join the
rectangle's sides to the maximum and minimum values, respectively. 


F 


Frequency distribution 
See bar chart. 

Frequency polygon 
A line that links the ends of the bars of a Bar chart. 

Frequency table 
A table showing the distribution of the variable, in other words, the
values or forms that the variable can take on, as well as the
frequency with which those values occur. 


H 


Histogram 
A histogram is a graphical plot of continuous data, formed of a
succession of adjacent rectangles. Each rectangle refers to a class
interval and the area of each one corresponds to the relative
frequency (or absolute frequency). The total area of the histogram is,
therefore, equal to 1 (respectively equal to n, the sample size). 


I 


Impossible event 
This is the event that results from the intersection of disjointed or
mutually exclusive events 

Interquartile range 
A measure of the variability of a sample, corresponding to the
difference between the values of the third and first quartiles. This
provides information on the range of the interval containing the
middle 50% of the observations. 


L 


Location measures 
Measures that locate and characterize the centre of a sample. 


M 


Mean deviation 
This is the arithmetic mean of the absolute values of the deviation of
each xi data value from the mean. 

Median 
It is a measure used to locate the data distribution centre,
corresponding to the value that divides the sample in half, in other
words half of the elements of the dataset are less than or equal to
the median and the other half is greater than or equal to the median. 

Mill rate 
Proportion relative to one thousand. 

Modal class 
The value that occurs most frequently if the data are discrete, or the
class interval with the greatest frequency if the data are continuous. 

Mode 
The
value that occurs most frequently in a dataset, if the data are
discrete, or the class interval with the greatest frequency if the
data are continuous or grouped. 


P 


Parameter 
It is a number that describes a characteristic of the population. Even
though it is a fixed number, it is usually unknown. An unknown
parameter can be estimated from a statistic (or estimator). 

Percentile 
See Pquantiles. 

Pictogram 
A graphical representation in which the data are represented by an
image (or by a symbol) that is proportional to the frequency. 

Pie chart 
A graphic consisting of a circle divided into a number of sectors. The
number of sectors is equal to the number of classes in the frequency
table of the sample under analysis. The sector angles are proportional
to the class frequency. 

Population 
A collection of individual units, which can be people or experiment
results, that are to be the focus of study and have one or more common
characteristics. 

Population increase 
The difference between population numbers at two different points in
time. The population increase is calculated by the addition of the
natural balance and migration balance. 

pQuantile (Percentile and Quartile) 
The value Qp is known as the pquantile, 0<p<1, or 100p% percentile
when 100p% of the sample elements are equal to Qp and the remaining
elements are greater than or equal to Qp. The 0.25 and 0.75 quantiles
are respectively called the 1st and 3rd quartiles. 

Probability models 
Mathematical models used to describe random phenomena. 

Probability (frequency definition) 
The probability of an event A, represented by P(A) is defined as the
value obtained for the relative frequency observed for A, over a large
number of performances of the random experiment. 


Q 


Qualitative data 
Data that represents the information concerning a quality, category or
characteristic that cannot be measured but can be classified in
various forms. 

Quantitative data 
Data that represents the information resulting from measurable
characteristics, exhibited to different extents, that can be discrete
 discrete data, or continuous  continuous data. 


R 


Random experiment 
An experiment with the following characteristics: it can be repeatedly
performed, in the same circumstances or in an independent manner, any
time it is repeated;  the possible results are known; there is
insufficient knowledge to know which result will be obtained from
amongst the possible results when the experiment is performed or
phenomenon observed. 

Random variable 
A random variable X is a function that associates a number to each
point of the sample space S. 

Regression line 
It is the line that best fits the points of a scatter plot. 

Relative frequency 
The ratio between the number of elements belonging to a specified
class and the total number of elements of the dataset under analysis. 


S 


Sample 
A set of data or observations, collected from a subset of the
population. Sometimes a sample is studied with the objective of
drawing conclusions on the population from which it was collected. 

Sample range 
A measure of the variability of a sample, corresponding to the
difference between the maximum and minimum value of the dataset. 

Sample size 
Number of elements of the sample. 

Sample space 
Is the set of individual results produced by a random experiment (or
when random phenomena are analysed). 

Scatter plot 
The scatter plot is a graphical representation of bivariate data, in
which each data pair (xi, yi) is represented by a coordinate point (xi,
yi) in a system of coordinate axes. 

Skewed distribution 
A skewed distribution may be represented by a histogram with a
frequency distribution that is markedly asymmetrical, containing
values on one side that are substantially smaller than those on the
other side of the distribution. 

Skewed sample 
A sample that does not correctly represent the whole population. 

Standard deviation 
A measure of the variability of a sample in relation to its mean,
corresponding to the square root of the variance and it is expressed
in the same units as the original data. 

Statistics (1) 
A discipline with the basic objective of collecting, compiling,
analysing and interpreting data. 

Statistic (2) 
It is a number that describes the sample. The value of a statistic is
calculated from the sample’s observed values. The statistic is used to
estimate an unknown parameter. 

Statistical inference 
This is a fundamental phase of statistical analysis, during which,
once certain properties are known (obtained via a descriptive analysis
of the sample), expressed by means of propositions, more general
statements are formulated, which express the existence of laws (relative
to the population). 

Stemandleaf diagram 
Also known as the stemplot, it is a type of data representation that
can be deemed to be halfway between a table and a graph, given that
the true sample values are displayed, but in a presentation that
brings to mind a histogram. It consists of writing the digit (or
digits) of the largest class on the lefthand side of a vertical line,
followed by all the others. 

Survey 
The scientific study of one part of a population for the purpose of
studying attitudes, habits and preferences of the population with
regard to events, circumstances and subjects of general interest. 

Symmetrical distribution 
A symmetrical distribution may be represented by a histogram with a
frequency distribution that is more or less centred around a mean
class. 


V 


Variability measures 
Measures that indicates and describes the variability of a dataset. 

Variable 
A common characteristic of a population, possessing different values
from one individual to the next. 

Variance 
A measure obtained by adding together all the squares of the
deviations of data from the mean and dividing the total by the number
of observations less one. 


