Efficient data storage: adaptively changing chunk size in cloud computing storage systems
Cloud computing enables users to utilise a shared pool of resources on demand. BLOB storage is a type of cloud storage for unstructured data. Most data management systems use chunk sizes equal to a given BLOB. Despite the simplicity of this strategy, as the nodes in the system are heterogeneous, the BLOB sizes are different and access to the data is not consistent due to these issues. We propose an adaptive strategy in order to define an adequate chunk size by taking into account metrics concerning the actual resources (bandwidth, storage usage), the size of the BLOB/file and the access rate of the chunks. Our experimental results show that our proposed approach provides an execution time that is about 24% better than that of a method based only on one chunk size and about 96% better than that of a method based on a random choice of the chunk size.
Citation
M. CHALABI Baya,
(2023-01-09),
"Efficient data storage: adaptively changing chunk size in cloud computing storage systems",
[national]International Journal of Grid and Utility, Inderscience Publishers
BlobSeer Scalability: A Multi-version Managers Approach
: With the emergence of Cloud Computing, the amount of data generated in different fields such as physics,
medical, social networks, etc. is growing exponentially. This increase in the volume of data and their large scale make the
problem of their processing more complex. Actually, the current datasets are very different in nature, ranging from small to
very large, from structured to unstructured, and from largely complete to noisy and incomplete. In addition, these datasets
evolve over time, often at very rapid rates. If we consider the characteristics of these datasets, traditional data management
systems are not adapted to support them. For example, Relational Database Management Systems (RDMS) manage only
databases where data conforms to a schema. However, current databases contain a mix of structured and less or no structured
data. Furthermore, relational systems lack support for version management that is very important in a data management
system. As data management system dedicated to large-scale datasets, we consider the BlobSeer system. It is a concurrencyoptimized data management system for data-intensive distributed applications. BlobSeer is adapted for target applications
that handle massive unstructured data in the context of large-scale distributed environments. It uses the concept of versioning
for concurrent manipulation of large binary objects in order to exploit efficiently access to data. To reach this objective,
BlobSeer uses a versioning manager to generate a new snapshot version of a BLOB every time it is written or appended by a
client. But if the number of BLOBs created or the primitives (writing, appending or reading) increase and are managed by a
single version manager, then we have a performance bottleneck and a version manager overload. To avoid the bottleneck of
the version manager, we propose a multi-version managers, such that each version manager maintains a subset of BLOBS.
Citation
M. CHALABI Baya,
(2019-09-14),
"BlobSeer Scalability: A Multi-version Managers Approach",
[national]Journal of Networking Technology, DLINE
iMPLEMENTATION OF SOLUTION CLOUD cOMPUTING WITH mAPREDUCE MODEL
In recent years, large scale computer systems have emerged to meet the demands of high storage, supercomputing, and applications using very large data sets. The emergence of Cloud Computing offers the potentiel for analysis and processing of large data sets.
Mapreduce is the most popular programming model which is used to support the development of such applications. It was initially designed by Google for building large datacenters on a large scale, to provide Web search services with rapid response and high availability.
In this paper we will test the clustering algorithm K-means Clustering in a Cloud Computing. This algorithm is implemented on MapReduce. It has been chosen for its characteristics that are representative of many iterative data analysis algorithms. Then, we modify the framework CloudSim to simulate the MapReduce execution of K-means Clustering on different Cloud Computing, depending on their size and characteristics of target platforms.
The experiment show that the implementation of K-means Clustering gives good results especially for large data set and the Cloud infrastructure has an influence on these results.
Citation
M. CHALABI Baya,
(2014),
"iMPLEMENTATION OF SOLUTION CLOUD cOMPUTING WITH mAPREDUCE MODEL",
[international]iop Journal of physics :conference series, IOP