There are some attributes which we don’t analyze yet, they are chlorides, sulphates, alcohol and density.
Relationship between chlorides and sulphates
Here is a relationship between chlorides and sulphates, it seems that there are some outlier.
- Most of wine’s sulphates is < 1.0
- Most of wine’s chlorides is < 0.15

Before we remove outliers, it seems that chlorides and sulphates have weak positive relationship.
cor.test(wine$chlorides, wine$sulphates)
##
## Pearson's product-moment correlation
##
## data: wine$chlorides and wine$sulphates
## t = 15.978, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3282127 0.4127694
## sample estimates:
## cor
## 0.3712605
After we remove outliers, however, we can’t see particular relationship from scattrer plot, between chlorides and sulphates. It is not easy to observe the difference as per quality. It will be revisited in ‘Final plot’

We calculate again without outliers, now, we can see that chlorides and sulphates have no relationship.
wine2 = subset(wine, (chlorides < 0.15) & (sulphates < 1))
cor.test(wine2$chlorides, wine2$sulphates)
##
## Pearson's product-moment correlation
##
## data: wine2$chlorides and wine2$sulphates
## t = -0.47664, df = 1496, p-value = 0.6337
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.06293036 0.03834876
## sample estimates:
## cor
## -0.0123224
Relationship between alcohol and density
As alcohol is getting higher, density is getting lower

cor.test(wine$alcohol, wine$density)
##
## Pearson's product-moment correlation
##
## data: wine$alcohol and wine$density
## t = -22.838, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5322547 -0.4583061
## sample estimates:
## cor
## -0.4961798
To observe that same negative relationship is observed after it is grouped by quality * We could observe that same negative relationship in all of quality

To observe cleary as per grouped by quality

Calculated correlation of alcohol/density, as per quality
w6 = subset(wine, quality == 6)
print('correlation alcohol/density of Quality 6')
## [1] "correlation alcohol/density of Quality 6"
cor.test(w6$alcohol, w6$density)
##
## Pearson's product-moment correlation
##
## data: w6$alcohol and w6$density
## t = -2.5897, df = 16, p-value = 0.01975
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.8058649 -0.1026367
## sample estimates:
## cor
## -0.543465
w5 = subset(wine, quality == 5)
print('correlation alcohol/density of Quality 5')
## [1] "correlation alcohol/density of Quality 5"
cor.test(w5$alcohol, w5$density)
##
## Pearson's product-moment correlation
##
## data: w5$alcohol and w5$density
## t = -10.038, df = 197, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.6668522 -0.4815944
## sample estimates:
## cor
## -0.581718
w4 = subset(wine, quality == 4)
print('correlation alcohol/density of Quality 4')
## [1] "correlation alcohol/density of Quality 4"
cor.test(w4$alcohol, w4$density)
##
## Pearson's product-moment correlation
##
## data: w4$alcohol and w4$density
## t = -16.484, df = 636, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5992936 -0.4903238
## sample estimates:
## cor
## -0.5471226
w3 = subset(wine, quality == 3)
print('correlation alcohol/density of Quality 3')
## [1] "correlation alcohol/density of Quality 3"
cor.test(w3$alcohol, w3$density)
##
## Pearson's product-moment correlation
##
## data: w3$alcohol and w3$density
## t = -7.8593, df = 679, p-value = 1.518e-14
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3561655 -0.2183697
## sample estimates:
## cor
## -0.2887623
w2 = subset(wine, quality == 2)
print('correlation alcohol/density of Quality 2')
## [1] "correlation alcohol/density of Quality 2"
cor.test(w2$alcohol, w2$density)
##
## Pearson's product-moment correlation
##
## data: w2$alcohol and w2$density
## t = -3.3461, df = 51, p-value = 0.001544
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.6231186 -0.1739388
## sample estimates:
## cor
## -0.424285
w1 = subset(wine, quality == 1)
print('correlation alcohol/density of Quality 1')
## [1] "correlation alcohol/density of Quality 1"
cor.test(w1$alcohol, w1$density)
##
## Pearson's product-moment correlation
##
## data: w1$alcohol and w1$density
## t = -1.5924, df = 8, p-value = 0.15
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.8558534 0.2011769
## sample estimates:
## cor
## -0.4905907
This plot is to know alcohol/density as per quality, including boxplot of alcohol, to compare alcohol as per quality
ggplot(wine, aes(x=density, y=alcohol, color=quality)) +
facet_wrap(~quality) +
geom_point(alpha=0.5) +
geom_boxplot(alpha=0.1) +
ylab('Alcohol [%]') +
xlab('Density (g / cm^3)')
