Exploring Distribution of Variables

  • Uncategorized

ExploringDistribution of Variables

Thedistribution of social demographic index (sei) and age of therespondents (agekdbrn) were explored to check the distribution amongthe variable using sex as the main grouping variable. The results aresummarized by the tables and charts.

Descriptive

RESPONDENTS SEX

Statistic

Std. Error

RESPONDENT SOCIOECONOMIC INDEX

MALE

Mean

49.039

.7894

95% Confidence Interval for Mean

Lower Bound

47.489

Upper Bound

50.589

5% Trimmed Mean

48.192

Median

43.400

Variance

372.604

Std. Deviation

19.3030

Minimum

17.1

Maximum

96.0

Range

78.9

Interquartile Range

32.0

Skewness

.510

.100

Kurtosis

-1.001

.200

FEMALE

Mean

47.184

.6745

95% Confidence Interval for Mean

Lower Bound

45.860

Upper Bound

48.508

5% Trimmed Mean

46.521

Median

38.400

Variance

371.655

Std. Deviation

19.2784

Minimum

17.1

Maximum

97.2

Range

80.1

Interquartile Range

32.2

Skewness

.564

.086

Kurtosis

-.976

.171

R`S AGE WHEN 1ST CHILD BORN

MALE

Mean

25.00

.221

95% Confidence Interval for Mean

Lower Bound

24.56

Upper Bound

25.43

5% Trimmed Mean

24.72

Median

24.00

Variance

29.280

Std. Deviation

5.411

Minimum

14

Maximum

46

Range

32

Interquartile Range

7

Skewness

.819

.100

Kurtosis

.735

.200

FEMALE

Mean

23.01

.181

95% Confidence Interval for Mean

Lower Bound

22.66

Upper Bound

23.37

5% Trimmed Mean

22.72

Median

22.00

Variance

26.644

Std. Deviation

5.162

Minimum

13

Maximum

44

Range

31

Interquartile Range

7

Skewness

.869

.086

Kurtosis

.439

.171

Interms of social demographic index, the 5% trimmed mean was found tobe 48.2 and 46.5 for male and female respectively. These statisticsattempt to describe with a single number where data values aretypically found. For the case of the age of the respondents the datavalues are typically found at 24.72 and 22.7 for male and femalerespectively as per 5% trimmed mean.

Investigationon normality test reveals that both social demographic index and theage are not normally distributed thus any parametric test cannot beused to make inferences concerning the Census data.

Tests of Normality

RESPONDENTS SEX

Kolmogorov-Smirnova

Shapiro-Wilk

Statistic

df

Sig.

Statistic

df

Sig.

RESPONDENT SOCIOECONOMIC INDEX

MALE

.178

598

.000

.904

598

.000

FEMALE

.209

817

.000

.909

817

.000

R`S AGE WHEN 1ST CHILD BORN

MALE

.095

598

.000

.957

598

.000

FEMALE

.143

817

.000

.942

817

.000

a. Lilliefors Significance Correction

Sincethe Shapiro-wilk values &lt0.05 we conclude that both socialdemographic index and age when first child is born are not normallydistributed when the sex is the grouping variable thus t-statisticscannot be relied on to make valid conclusion. This evokes theresponse of using non-parametric test such Man-Whitney teststatistics.

T-statistics

SocialDemographic Index

Thetable shows the summary statistics of the social demographic index.

Group Statistics

RESPONDENTS SEX

N

Mean

Std. Deviation

Std. Error Mean

RESPONDENT SOCIOECONOMIC INDEX

MALE

887

49.109

19.4399

.6527

FEMALE

1024

48.458

19.5677

.6115

&nbsp

Levene`s Test for Equality of Variances

t-test for Equality of Means

F

Sig.

t

df

Sig. (2-tailed)

Mean Difference

Std. Error Difference

95% Confidence Interval of the Difference

Lower

Upper

RESPONDENT SOCIOECONOMIC INDEX

Equal variances assumed

.256

.613

.728

1909

.467

.6515

.8948

-1.1034

2.4065

Equal variances not assumed

&nbsp

&nbsp

.728

1873.685

.466

.6515

.8944

-1.1026

2.4057

Fromthe table above we can conclude that homogeneity of variance is notmet since the p-value (0.467&gt0.05) thus we reject the nullhypothesis of homogeneity in variance in social demographic index ofthe two groups.

T-statisticsof the Age

Themean age when the first child is born was 25 and 23 years for maleand female respectively with a standard deviation of 5 years amongthe two groups

Group Statistics

RESPONDENTS SEX

N

Mean

Std. Deviation

Std. Error Mean

R`S AGE WHEN 1ST CHILD BORN

MALE

623

25.00

5.444

.218

FEMALE

866

22.87

5.128

.174

&nbsp

Levene`s Test for Equality of Variances

t-test for Equality of Means

F

Sig.

t

df

Sig. (2-tailed)

Mean Difference

Std. Error Difference

95% Confidence Interval of the Difference

Lower

Upper

R`S AGE WHEN 1ST CHILD BORN

Equal variances assumed

.724

.395

7.704

1487

.000

2.130

.276

1.588

2.672

Equal variances not assumed

&nbsp

&nbsp

7.629

1291.153

.000

2.130

.279

1.582

2.677

Forthe case of the age when the first child was born the homogeneity ofvariance was not met since it is significant is very low thus wereject the null hypothesis.

ErrorBar Chart Social Demographic Index

Themean social demographic index for each sex along with 95% confidenceintervals is represented is represented in this bar. The confidenceintervals for the two sex don’t quite overlap, which is consistentwith the result from the T Test The error bars have a small rangecompared to the range of social demographic index which indicates weare fairly precisely measuring the respondents demographic indexbecause of large sample sizes.

ErrorBar Chart for the Age

Themean for age when the first child is born for each sex along with 95%confidence intervals is represented is represented in this bar. Theconfidence intervals for the two sex don’t quite overlap, which isconsistent with the result from the T Test. The error bars have asmall range compared to the range when the first child is born whichindicates we are fairly precisely measuring the age respondents firstchild is born because of large sample sizes.

Analysisfor the Race as Grouping Variable

Descriptives

RACE OF RESPONDENT

Statistic

Std. Error

RESPONDENT SOCIOECONOMIC INDEX

WHITE

Mean

49.952

.5064

95% Confidence Interval for Mean

Lower Bound

48.959

Upper Bound

50.946

5% Trimmed Mean

49.296

Median

43.550

Variance

378.537

Std. Deviation

19.4560

Minimum

17.1

Maximum

97.2

Range

80.1

Interquartile Range

31.0

Skewness

.441

.064

Kurtosis

-1.078

.127

BLACK

Mean

42.687

1.1103

95% Confidence Interval for Mean

Lower Bound

40.501

Upper Bound

44.874

5% Trimmed Mean

41.588

Median

35.100

Variance

321.772

Std. Deviation

17.9380

Minimum

20.1

Maximum

87.9

Range

67.8

Interquartile Range

24.7

Skewness

.980

.151

Kurtosis

-.349

.300

OTHER

Mean

47.755

1.5554

95% Confidence Interval for Mean

Lower Bound

44.685

Upper Bound

50.825

5% Trimmed Mean

46.893

Median

37.700

Variance

420.969

Std. Deviation

20.5175

Minimum

17.1

Maximum

97.2

Range

80.1

Interquartile Range

33.2

Skewness

.616

.184

Kurtosis

-.931

.366

Thetable above shows that the 5% trimmed mean statistics to be 49.3 and41.5 for black and white respectively. The measures of centraltendencies for social demographic index are far apart from each otherwhich suggest that it’s not normally distributed along the race.Further normality test was computed from the shapairo-will lambda.

Tests of Normality

RACE OF RESPONDENT

Kolmogorov-Smirnova

Shapiro-Wilk

Statistic

df

Sig.

Statistic

df

Sig.

RESPONDENT SOCIOECONOMIC INDEX

WHITE

.190

1476

.000

.920

1476

.000

BLACK

.224

261

.000

.846

261

.000

OTHER

.194

174

.000

.894

174

.000

a. Lilliefors Significance Correction

Thetable shows that respondents social demographic index is not normallydistributed since Shapiro-Wilk&lt0.05. If the significance level is greater than the set 0.05 thenwe conclude that it has attained normality test. This is furthersupported by the plot below

Analysisfor the Race as Grouping Variable

Descriptive

Descriptive

RACE OF RESPONDENT

Statistic

Std. Error

R`S AGE WHEN 1ST CHILD BORN

WHITE

Mean

24.08

.157

95% Confidence Interval for Mean

Lower Bound

23.77

Upper Bound

24.38

5% Trimmed Mean

23.78

Median

23.00

Variance

28.282

Std. Deviation

5.318

Minimum

14

Maximum

46

Range

32

Interquartile Range

7

Skewness

.868

.072

Kurtosis

.739

.145

BLACK

Mean

21.75

.341

95% Confidence Interval for Mean

Lower Bound

21.08

Upper Bound

22.42

5% Trimmed Mean

21.36

Median

20.00

Variance

24.672

Std. Deviation

4.967

Minimum

13

Maximum

38

Range

25

Interquartile Range

6

Skewness

1.155

.167

Kurtosis

.983

.333

OTHER

Mean

24.28

.492

95% Confidence Interval for Mean

Lower Bound

23.31

Upper Bound

25.26

5% Trimmed Mean

24.12

Median

24.00

Variance

32.400

Std. Deviation

5.692

Minimum

14

Maximum

38

Range

24

Interquartile Range

8

Skewness

.341

.209

Kurtosis

-.635

.416

The5% trimmed mean for the age when the first child was born was foundto be higher in white as compared to black with mean statistics of23.7 and 21.4 for white and black respectively. These shows most ofthe individuals lies in terms of their race.

Tests of Normality

RACE OF RESPONDENT

Kolmogorov-Smirnova

Shapiro-Wilk

Statistic

df

Sig.

Statistic

df

Sig.

R`S AGE WHEN 1ST CHILD BORN

WHITE

.117

1143

.000

.950

1143

.000

BLACK

.164

212

.000

.899

212

.000

OTHER

.088

134

.014

.971

134

.006

a. Lilliefors Significance Correction

Theresults for normality test shows that respondents age when the firstchild is born is not normal distributed since the p-value &lt0.05 asper shapiro-wilk value from the table above. This is furthersupported by the plots below which show the skewness of the datausing histogram.

Inconclusion we can see that the data does is not normal distributedthus parametric test such t-statistics may not give sufficientinformation. This can be rectified by carrying out non-parametrictest such Man-Whitney statistics.

Testfor Homogeneity of Variance

Group Statistics

RACE OF RESPONDENT

N

Mean

Std. Deviation

Std. Error Mean

R`S AGE WHEN 1ST CHILD BORN

WHITE

1143

24.08

5.318

.157

BLACK

212

21.75

4.967

.341

Thetable shows that the mean statistics to 24 and 22 of the respondentsage when the first child is born for white and black with a standarddeviation of five years respectively.

&nbsp

&nbsp

&nbsp

&nbsp

&nbsp

&nbsp

95% Confidence Interval of the Difference

&nbsp

F

Sig

t

df

Sig. (2-tailed)

Lower

Upper

Equal variances assumed

2.39

0.123

5.906

1353

.000

1.553

3.098

Equal variances not assumed

&nbsp

6.190

307.691

.000

1.586

3.064

Fromthe output, we observe that the hypothesis of equal variances must berejected because the significance value is low so we conclude thereis a significant difference in age when the first child is bornbetween white and black.

Group Statistics

RACE OF RESPONDENT

N

Mean

Std. Deviation

Std. Error Mean

RESPONDENT SOCIOECONOMIC INDEX

WHITE

1476

49.952

19.4560

.5064

BLACK

261

42.687

17.9380

1.1103

Forthe case of social demographic index as per the race the meanstatistics was 49.95 and 42.69 for white and black respectively.

&nbsp

&nbsp

&nbsp

&nbsp

&nbsp

&nbsp

95% Confidence Interval of the Difference

&nbsp

F

Sig

t

df

Sig. (2-tailed)

Lower

Upper

Equal variances assumed

13.3

0

5.624

1735

.000

4.7316

9.7984

Equal variances not assumed

&nbsp

&nbsp

5.953

376.552

.000

4.8654

9.6646

Fromthe output, we observe that the hypothesis of equal variances must berejected because the significance value is low so we conclude thereis a significant difference social demographic index between whiteand black.

Chartsfor Race

Themean for age when the first child is born for each race along with95% confidence intervals is represented is represented in this bar.The confidence intervals for the three races don’t quite overlap,which is consistent with the result from the T Test. The error barshave a small range compared to the range when the first child is bornwhich indicates we are fairly precisely measuring the age respondentsfirst child is born because of large sample sizes.

Themean for respondent social demographic index for each race along with95% confidence intervals is represented is represented in this bar.The confidence intervals for the three races don’t quite overlap,which is consistent with the result from the T Test. The error barshave a small range compared to the range when the first child is bornwhich indicates we are fairly precisely measuring the age respondentsfirst child is born because of large sample sizes.

Forthe case to ensure that only two categories appear in error chart baryou can use exclusion criteria during analysis by only ensuring thedata set you are using compose of only two variables bytransformation method.

References

Pearson,R. (2011).&nbspExploringdata in engineering, the sciences, and medicine.OUP USA.

Close Menu