Data Analytics

  • Uncategorized



Datasets were, initially, composed of only limited items. Today, improvedtechnologies and innovations have made it easier for organizations tostore larger amounts of data (Kurasovaet al., 2014).This kind of data is so intricate that, traditional processingprocedures cannot be used. Data clustering is deemed to be a majorissue during processing. Large data sets comprise of clusters, and itis said to be important to locate them. Clustering techniques havebeen used to solve numerous vital complications. To address theseproblems, some clustering algorithm has been established(Shirkhorshidiet al., 2014).We shall examine the cluster techniques and its use in theadministrative expenditure data based on the article, &quotA studyof end-stage renal disease patients who initiated hemodialysis(ESRD)”(Liao et al., 2016).

Clusteranalysis has been used to help identify the buried information,structures, and groups that have been found in big data sets. Thechronic kidney disease (CKD) is affecting almost 1 million patientsin the US alone(Liao et al., 2016).It is a complex healthcare issue that is increasing public healthconcerns and is intensifying the global epidemic. Hemodialysis (HD)and the peritoneal dialysis (PD) treatments are the commonly usedtherapeutic procedures. The HD is then considered to be the mostexpensive treatment method for ESRD patients thus few reports havebeen generated beyond the level of economic effects of ESRD patientstransitioning to hemodialysis therapies. Therefore, to providevaluable and informed information on healthcare decisions on mattersrelating to the cost burden of hemodialysis treatments, it isimportant to evaluate the spending patterns of the ESRD patients whohave received HD and the categorizing them into clusters.

Theobjectives of applying cluster analysis method were to assess thechanges in the costs of treatments in ESRD patients before and afterusing hemodialysis and to evaluate the clusters to determine thevariances in comorbidity and other components to see whether clinicaland demographic factors will explain the differences in costs.Patients above 18 years and with more than two ESRD diagnosis and hadHD therapies were studied(Liao et al., 2016).The hierarchical cluster analysis and the K-means cluster analysiswere used both for one year before starting HD treatment and one yearafter receiving HD treatments so as to classify the clusters. Thepopulation structure factors, medical and cost data were collectedbefore and after HD periods and were analyzed by clusters.

Approximately18,400 ESRD patients were studied(Liao et al., 2016).Significant all-cause expenditure groups were obtained while applyingthe K-means as well as the hierarchical cluster methods. Hierarchicalcluster analysis technique with the single and complete linkage,centroid, average and the McQuitty resemblance techniques contributedto constellation results that incorporated clusters with unrealisticsamples. From the sample size and the changes in the cost patterns,K-means cluster analysis techniques, and other four were nominated.In cluster 1 category of average to high, a sample size of 113 wasselected. Cluster 2 group comprised of a sample size of 89. The thirdcluster ranged from average to average with a sample size of 16,624.The fourth sample size was 1,554 (Liaoet al., 2016).Similarly, the average cost patterns before and after HD were notedto have grown in cluster 1 from $ 185,070-884,605. In cluster 2, thespending patterns declined from $ 910,930-157,997. In group 3, it wasobserved that the expenditures were steady and low while in the lastcategory, the changes in cost rose from $ 57,909-193,140(Liao et al., 2016).In conclusion, cluster analysis, in particular, the K-means was foundto be useful in assessing healthcare data claims that had alteredcost information. One study of the analysis proved that the cost forESRD patients remained stable after commencing hemodialysistreatments. Likewise, another small fraction of patients would drivethe increasing costs after beginning HD treatments which were mostlyinitiated by increased comorbidity burden among the patients.


Kurasova,O., Marcinkevicius, V., Medvedev, V., Rapecka, A., &amp Stefanovic,P. (2014, November). Strategies for big data are clustering. 2014IEEE 26th International Conference on Tools with ArtificialIntelligence(pp. 740-747). IEEE.

Liao,M., Li, Y., Kianifard, F., Obi, E., &amp Arcona, S. (2016). Clusteranalysis and its application to healthcare claims data: a study ofend-stage renal disease patients who initiated hemodialysis. BMCnephrology,17(1),1.

Shirkhorshidi,A. S., Aghabozorgi, S., Wah, T. Y., &amp Herawan, T. (2014, June).Big data clustering: a review. InternationalConference on Computational Science and Its Applications(pp. 707-720). Springer International Publishing.

Close Menu