La Trobe

Cleaning Big Data Streams: A Systematic Literature Review

Download (5 MB)
journal contribution
posted on 2023-08-17, 02:37 authored by Obaid Haylan B AlotaibiObaid Haylan B Alotaibi, Eric PardedeEric Pardede, Sarath TomySarath Tomy
In today’s big data era, cleaning big data streams has become a challenging task because of the different formats of big data and the massive amount of big data which is being generated. Many studies have proposed different techniques to overcome these challenges, such as cleaning big data in real time. This systematic literature review presents recently developed techniques that have been used for the cleaning process and for each data cleaning issue. Following the PRISMA framework, four databases are searched, namely IEEE Xplore, ACM Library, Scopus, and Science Direct, to select relevant studies. After selecting the relevant studies, we identify the techniques that have been utilized to clean big data streams and the evaluation methods that have been used to examine their efficiency. Also, we define the cleaning issues that may appear during the cleaning process, namely missing values, duplicated data, outliers, and irrelevant data. Based on our study, the future directions of cleaning big data streams are identified.

History

Publication Date

2023-07-26

Journal

Technologies

Volume

11

Issue

4

Article Number

101

Pagination

24p.

Publisher

MDPI

ISSN

2227-7080

Rights Statement

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Usage metrics

    Journal Articles

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC