What is meant by non-zero correlation

Texts
(Chapter 7 - page 2/3)

Product-moment correlation

We are now looking for a measure that not only tells us how close the connection (or how high the proportion of common variation) is between two features, but which also tells us something about the direction of the connection.

In addition, this dimension should be independent of the selected scale or the measuring scale; i.e. we want a measure that tells us something about the relationship between two variables that are scaled differently (such as willingness to take risks / extraversion or motivation to achieve / openness).

Furthermore, the measure should be independent of the sample size so that we can compare measures of association from different studies.

This measure is the so-called "product-moment-correlation coefficient". It was developed by PEARSON and BRAVAIS. Whenever correlation is mentioned in the following, the product-moment correlation coefficient is always meant, unless otherwise noted.

The formula of the correlation was constructed in such a way that the numerical size of the coefficient can never be greater than +1 and never less than -1 (unless one miscalculates!).

The higher the value of the correlation (the closer it is to +1 or -1), the closer the relationship between the variables under consideration.

The sign of the correlation says nothing about the closeness of the connection, only something about the direction of the connection. A missing correlation is expressed by a correlation close to zero.

Usually correlations are interpreted as follows:

        .00 = no connection
        .00 to .25 = lower "
        .25 to .50 = medium "
        .50 to .75 = higher "
        .75 to 1.0 = more complete "

These values ​​represent approximate benchmarks and not exact limit values.
For example, a correlation of .08 should not be interpreted as a "low correlation" but as "no correlation".

However, the phrase "connection between two characteristics" must not be misunderstood. It is not meant that there is a causal relationship between two features (x and y). The correlation coefficient itself does not permit any statement about a cause-effect relationship, for example in the sense that the feature x causes the feature y.

For example, a high positive correlation between intelligence and school performance cannot simply be interpreted in such a way that school performance is based on intelligence. Such a coefficient initially only says that a student with high (low) intelligence will usually also show good (poor) school performance. The creation of this connection could, however, be explained by a number of alternative causes, e.g. a third variable may be responsible (e.g. socio-economic status of the parents).

Now, within the framework of science that is as precise as possible, we cannot be satisfied with estimates, but must try to calculate exact values. Therefore we cannot do without the formula for calculating the coefficient of the product-moment correlation.

    The correlation is the sum of the deviation products of all xi and yi from the respective mean, divided by the sum of the squared deviations of all xi times that of all yi.

Application rules:

  1. The measured values ​​of the test person with regard to feature 1 (x values) and feature 2 (y values) are entered in a table in such a way that the two measured values ​​for each test person are side by side.
  2. The x and y values ​​are squared and (in the case of marginal calculation) entered in the table.
  3. The product xy is formed for each pair of measured values.
  4. The values ​​in the x, y, x columns2, y2 and xy are added.
  5. The values ​​obtained are entered in the formula.

We want to apply this rule to the example shown at the beginning:

x = intelligence quotient (IQ) / y = computing test performance (RT)

Vpxyx2y2x * y

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

120

118

100

102

96

90

112

115

116

104

95

108

111

119

101

10

7

4

4

1

3

6

8

8

5

4

6

7

10

5

14400

13924

10000

10404

9216

8100

12544

13225

13456

10816

9025

11664

12321

14161

10201

100

49

16

16

1

9

36

64

64

25

16

36

49

100

25

1200

826

400

408

96

270

672

920

928

520

380

648

777

1190

505

In this (fictitious) example there is a very high positive correlation of .92 (the zero in front of the decimal point is usually left out and one reads: "Point ninety-two").

In order to be able to calculate the product-moment correlation coefficient, the following requirements must be met:

  1. The data to be correlated must have been measured at the interval scale level.
  2. The relationship between the two features x and y must be linear.
  3. Therefore the features a) unimodal and b) must be distributed symmetrically.
  4. Conditions 2) and 3) are always met if the characteristics are normally distributed.

What is meant by normal distribution should not be described in more detail here.
Just this much:
The normal distribution looks like a bell and means that values ​​which deviate only slightly from the mean value are observed very frequently, while extreme values, on the other hand, are very rare. The greater the deviation from the mean, the lower the probability of actually measuring such an extreme measured value.

Example:
People over 2.00 meters tall are rare, as are those under 1.40 meters. Most people are average height (around 1.70 meters).

Furthermore, linearity means that the swarm of points in a bivariate distribution can best be represented by a straight line and not by a curve-linear function (parabola, hyperbola, etc.).