ExploringDistribution of Variables
Thedistribution of social demographic index (sei) and age of therespondents (agekdbrn) were explored to check the distribution amongthe variable using sex as the main grouping variable. The results aresummarized by the tables and charts.
Descriptive 

RESPONDENTS SEX 
Statistic 
Std. Error 

RESPONDENT SOCIOECONOMIC INDEX 
MALE 
Mean 
49.039 
.7894 

95% Confidence Interval for Mean 
Lower Bound 
47.489 

Upper Bound 
50.589 

5% Trimmed Mean 
48.192 

Median 
43.400 

Variance 
372.604 

Std. Deviation 
19.3030 

Minimum 
17.1 

Maximum 
96.0 

Range 
78.9 

Interquartile Range 
32.0 

Skewness 
.510 
.100 

Kurtosis 
1.001 
.200 

FEMALE 
Mean 
47.184 
.6745 

95% Confidence Interval for Mean 
Lower Bound 
45.860 

Upper Bound 
48.508 

5% Trimmed Mean 
46.521 

Median 
38.400 

Variance 
371.655 

Std. Deviation 
19.2784 

Minimum 
17.1 

Maximum 
97.2 

Range 
80.1 

Interquartile Range 
32.2 

Skewness 
.564 
.086 

Kurtosis 
.976 
.171 

R`S AGE WHEN 1ST CHILD BORN 
MALE 
Mean 
25.00 
.221 

95% Confidence Interval for Mean 
Lower Bound 
24.56 

Upper Bound 
25.43 

5% Trimmed Mean 
24.72 

Median 
24.00 

Variance 
29.280 

Std. Deviation 
5.411 

Minimum 
14 

Maximum 
46 

Range 
32 

Interquartile Range 
7 

Skewness 
.819 
.100 

Kurtosis 
.735 
.200 

FEMALE 
Mean 
23.01 
.181 

95% Confidence Interval for Mean 
Lower Bound 
22.66 

Upper Bound 
23.37 

5% Trimmed Mean 
22.72 

Median 
22.00 

Variance 
26.644 

Std. Deviation 
5.162 

Minimum 
13 

Maximum 
44 

Range 
31 

Interquartile Range 
7 

Skewness 
.869 
.086 

Kurtosis 
.439 
.171 
Interms of social demographic index, the 5% trimmed mean was found tobe 48.2 and 46.5 for male and female respectively. These statisticsattempt to describe with a single number where data values aretypically found. For the case of the age of the respondents the datavalues are typically found at 24.72 and 22.7 for male and femalerespectively as per 5% trimmed mean.
Investigationon normality test reveals that both social demographic index and theage are not normally distributed thus any parametric test cannot beused to make inferences concerning the Census data.
Tests of Normality 

RESPONDENTS SEX 
KolmogorovSmirnov^{a} 
ShapiroWilk 

Statistic 
df 
Sig. 
Statistic 
df 
Sig. 

RESPONDENT SOCIOECONOMIC INDEX 
MALE 
.178 
598 
.000 
.904 
598 
.000 

FEMALE 
.209 
817 
.000 
.909 
817 
.000 

R`S AGE WHEN 1ST CHILD BORN 
MALE 
.095 
598 
.000 
.957 
598 
.000 

FEMALE 
.143 
817 
.000 
.942 
817 
.000 

a. Lilliefors Significance Correction 
Sincethe Shapirowilk values <0.05 we conclude that both socialdemographic index and age when first child is born are not normallydistributed when the sex is the grouping variable thus tstatisticscannot be relied on to make valid conclusion. This evokes theresponse of using nonparametric test such ManWhitney teststatistics.
Tstatistics
SocialDemographic Index
Thetable shows the summary statistics of the social demographic index.
Group Statistics 

RESPONDENTS SEX 
N 
Mean 
Std. Deviation 
Std. Error Mean 

RESPONDENT SOCIOECONOMIC INDEX 
MALE 
887 
49.109 
19.4399 
.6527 
FEMALE 
1024 
48.458 
19.5677 
.6115 
  
Levene`s Test for Equality of Variances 
ttest for Equality of Means 

F 
Sig. 
t 
df 
Sig. (2tailed) 
Mean Difference 
Std. Error Difference 
95% Confidence Interval of the Difference 

Lower 
Upper 

RESPONDENT SOCIOECONOMIC INDEX 
Equal variances assumed 
.256 
.613 
.728 
1909 
.467 
.6515 
.8948 
1.1034 
2.4065 
Equal variances not assumed 
  
  
.728 
1873.685 
.466 
.6515 
.8944 
1.1026 
2.4057 
Fromthe table above we can conclude that homogeneity of variance is notmet since the pvalue (0.467>0.05) thus we reject the nullhypothesis of homogeneity in variance in social demographic index ofthe two groups.
Tstatisticsof the Age
Themean age when the first child is born was 25 and 23 years for maleand female respectively with a standard deviation of 5 years amongthe two groups
Group Statistics 

RESPONDENTS SEX 
N 
Mean 
Std. Deviation 
Std. Error Mean 

R`S AGE WHEN 1ST CHILD BORN 
MALE 
623 
25.00 
5.444 
.218 
FEMALE 
866 
22.87 
5.128 
.174 
  
Levene`s Test for Equality of Variances 
ttest for Equality of Means 

F 
Sig. 
t 
df 
Sig. (2tailed) 
Mean Difference 
Std. Error Difference 
95% Confidence Interval of the Difference 

Lower 
Upper 

R`S AGE WHEN 1ST CHILD BORN 
Equal variances assumed 
.724 
.395 
7.704 
1487 
.000 
2.130 
.276 
1.588 
2.672 

Equal variances not assumed 
  
  
7.629 
1291.153 
.000 
2.130 
.279 
1.582 
2.677 
Forthe case of the age when the first child was born the homogeneity ofvariance was not met since it is significant is very low thus wereject the null hypothesis.
ErrorBar Chart Social Demographic Index
Themean social demographic index for each sex along with 95% confidenceintervals is represented is represented in this bar. The confidenceintervals for the two sex don’t quite overlap, which is consistentwith the result from the T Test The error bars have a small rangecompared to the range of social demographic index which indicates weare fairly precisely measuring the respondents demographic indexbecause of large sample sizes.
ErrorBar Chart for the Age
Themean for age when the first child is born for each sex along with 95%confidence intervals is represented is represented in this bar. Theconfidence intervals for the two sex don’t quite overlap, which isconsistent with the result from the T Test. The error bars have asmall range compared to the range when the first child is born whichindicates we are fairly precisely measuring the age respondents firstchild is born because of large sample sizes.
Analysisfor the Race as Grouping Variable
Descriptives 

RACE OF RESPONDENT 
Statistic 
Std. Error 

RESPONDENT SOCIOECONOMIC INDEX 
WHITE 
Mean 
49.952 
.5064 

95% Confidence Interval for Mean 
Lower Bound 
48.959 

Upper Bound 
50.946 

5% Trimmed Mean 
49.296 

Median 
43.550 

Variance 
378.537 

Std. Deviation 
19.4560 

Minimum 
17.1 

Maximum 
97.2 

Range 
80.1 

Interquartile Range 
31.0 

Skewness 
.441 
.064 

Kurtosis 
1.078 
.127 

BLACK 
Mean 
42.687 
1.1103 

95% Confidence Interval for Mean 
Lower Bound 
40.501 

Upper Bound 
44.874 

5% Trimmed Mean 
41.588 

Median 
35.100 

Variance 
321.772 

Std. Deviation 
17.9380 

Minimum 
20.1 

Maximum 
87.9 

Range 
67.8 

Interquartile Range 
24.7 

Skewness 
.980 
.151 

Kurtosis 
.349 
.300 

OTHER 
Mean 
47.755 
1.5554 

95% Confidence Interval for Mean 
Lower Bound 
44.685 

Upper Bound 
50.825 

5% Trimmed Mean 
46.893 

Median 
37.700 

Variance 
420.969 

Std. Deviation 
20.5175 

Minimum 
17.1 

Maximum 
97.2 

Range 
80.1 

Interquartile Range 
33.2 

Skewness 
.616 
.184 

Kurtosis 
.931 
.366 
Thetable above shows that the 5% trimmed mean statistics to be 49.3 and41.5 for black and white respectively. The measures of centraltendencies for social demographic index are far apart from each otherwhich suggest that it’s not normally distributed along the race.Further normality test was computed from the shapairowill lambda.
Tests of Normality 

RACE OF RESPONDENT 
KolmogorovSmirnov^{a} 
ShapiroWilk 

Statistic 
df 
Sig. 
Statistic 
df 
Sig. 

RESPONDENT SOCIOECONOMIC INDEX 
WHITE 
.190 
1476 
.000 
.920 
1476 
.000 

BLACK 
.224 
261 
.000 
.846 
261 
.000 

OTHER 
.194 
174 
.000 
.894 
174 
.000 

a. Lilliefors Significance Correction 
Thetable shows that respondents social demographic index is not normallydistributed since ShapiroWilk<0.05. If the significance level is greater than the set 0.05 thenwe conclude that it has attained normality test. This is furthersupported by the plot below
Analysisfor the Race as Grouping Variable
Descriptive 
Descriptive 

RACE OF RESPONDENT 
Statistic 
Std. Error 

R`S AGE WHEN 1ST CHILD BORN 
WHITE 
Mean 
24.08 
.157 

95% Confidence Interval for Mean 
Lower Bound 
23.77 

Upper Bound 
24.38 

5% Trimmed Mean 
23.78 

Median 
23.00 

Variance 
28.282 

Std. Deviation 
5.318 

Minimum 
14 

Maximum 
46 

Range 
32 

Interquartile Range 
7 

Skewness 
.868 
.072 

Kurtosis 
.739 
.145 

BLACK 
Mean 
21.75 
.341 

95% Confidence Interval for Mean 
Lower Bound 
21.08 

Upper Bound 
22.42 

5% Trimmed Mean 
21.36 

Median 
20.00 

Variance 
24.672 

Std. Deviation 
4.967 

Minimum 
13 

Maximum 
38 

Range 
25 

Interquartile Range 
6 

Skewness 
1.155 
.167 

Kurtosis 
.983 
.333 

OTHER 
Mean 
24.28 
.492 

95% Confidence Interval for Mean 
Lower Bound 
23.31 

Upper Bound 
25.26 

5% Trimmed Mean 
24.12 

Median 
24.00 

Variance 
32.400 

Std. Deviation 
5.692 

Minimum 
14 

Maximum 
38 

Range 
24 

Interquartile Range 
8 

Skewness 
.341 
.209 

Kurtosis 
.635 
.416 
The5% trimmed mean for the age when the first child was born was foundto be higher in white as compared to black with mean statistics of23.7 and 21.4 for white and black respectively. These shows most ofthe individuals lies in terms of their race.
Tests of Normality 

RACE OF RESPONDENT 
KolmogorovSmirnov^{a} 
ShapiroWilk 

Statistic 
df 
Sig. 
Statistic 
df 
Sig. 

R`S AGE WHEN 1ST CHILD BORN 
WHITE 
.117 
1143 
.000 
.950 
1143 
.000 

BLACK 
.164 
212 
.000 
.899 
212 
.000 

OTHER 
.088 
134 
.014 
.971 
134 
.006 

a. Lilliefors Significance Correction 
Theresults for normality test shows that respondents age when the firstchild is born is not normal distributed since the pvalue <0.05 asper shapirowilk value from the table above. This is furthersupported by the plots below which show the skewness of the datausing histogram.
Inconclusion we can see that the data does is not normal distributedthus parametric test such tstatistics may not give sufficientinformation. This can be rectified by carrying out nonparametrictest such ManWhitney statistics.
Testfor Homogeneity of Variance
Group Statistics 

RACE OF RESPONDENT 
N 
Mean 
Std. Deviation 
Std. Error Mean 

R`S AGE WHEN 1ST CHILD BORN 
WHITE 
1143 
24.08 
5.318 
.157 
BLACK 
212 
21.75 
4.967 
.341 
Thetable shows that the mean statistics to 24 and 22 of the respondentsage when the first child is born for white and black with a standarddeviation of five years respectively.
  
  
  
  
  
  
95% Confidence Interval of the Difference 

  
F 
Sig 
t 
df 
Sig. (2tailed) 
Lower 
Upper 
Equal variances assumed 
2.39 
0.123 
5.906 
1353 
.000 
1.553 
3.098 
Equal variances not assumed 
  
6.190 
307.691 
.000 
1.586 
3.064 
Fromthe output, we observe that the hypothesis of equal variances must berejected because the significance value is low so we conclude thereis a significant difference in age when the first child is bornbetween white and black.
Group Statistics 

RACE OF RESPONDENT 
N 
Mean 
Std. Deviation 
Std. Error Mean 

RESPONDENT SOCIOECONOMIC INDEX 
WHITE 
1476 
49.952 
19.4560 
.5064 
BLACK 
261 
42.687 
17.9380 
1.1103 
Forthe case of social demographic index as per the race the meanstatistics was 49.95 and 42.69 for white and black respectively.
  
  
  
  
  
  
95% Confidence Interval of the Difference 

  
F 
Sig 
t 
df 
Sig. (2tailed) 
Lower 
Upper 
Equal variances assumed 
13.3 
0 
5.624 
1735 
.000 
4.7316 
9.7984 
Equal variances not assumed 
  
  
5.953 
376.552 
.000 
4.8654 
9.6646 
Fromthe output, we observe that the hypothesis of equal variances must berejected because the significance value is low so we conclude thereis a significant difference social demographic index between whiteand black.
Chartsfor Race
Themean for age when the first child is born for each race along with95% confidence intervals is represented is represented in this bar.The confidence intervals for the three races don’t quite overlap,which is consistent with the result from the T Test. The error barshave a small range compared to the range when the first child is bornwhich indicates we are fairly precisely measuring the age respondentsfirst child is born because of large sample sizes.
Themean for respondent social demographic index for each race along with95% confidence intervals is represented is represented in this bar.The confidence intervals for the three races don’t quite overlap,which is consistent with the result from the T Test. The error barshave a small range compared to the range when the first child is bornwhich indicates we are fairly precisely measuring the age respondentsfirst child is born because of large sample sizes.
Forthe case to ensure that only two categories appear in error chart baryou can use exclusion criteria during analysis by only ensuring thedata set you are using compose of only two variables bytransformation method.
References
Pearson,R. (2011). Exploringdata in engineering, the sciences, and medicine.OUP USA.