Load balancing and service discovery using Docker Swarm for microservice based big data applications (2024)

Research
Open access
Published: 07 January 2023

Neelam Singh¹,
Yasir Hamid²,
Sapna Juneja³,
Gautam Srivastava^4,5,6,
Gaurav Dhiman^1,7,8,9,
Thippa Reddy Gadekallu^9,10 &
…
Mohd Asif Shah^11,12

Journal of Cloud Computing volume12, Articlenumber:4 (2023) Cite this article

10k Accesses
23 Citations
Metrics details

Abstract

Big Data applications require extensive resources and environments to store, process and analyze this colossal collection of data in a distributed manner. Containerization with cloud computing provides a pertinent remedy to accommodate big data requirements, however requires a precise and appropriate load-balancing mechanism. The load on servers increases exponentially with increased resource usage thus making load balancing an essential requirement. Moreover, the adjustment of containers accurately and rapidly according to load as per services is one of the crucial aspects in big data applications. This study provides a review relating to containerized environments like Docker for big data applications with load balancing. A novel scheduling mechanism of containers for big data applications established on Docker Swarm and Microservice architecture is proposed. The concept of Docker Swarm is utilized to effectively handle big data applications' workload and service discovery. Results shows that increasing workloads with respect to big data applications can be effectively managed by utilizing microservices in containerized environments and load balancing is efficiently achieved using Docker Swarm. The implementation is done using a case study deployed on asingle server and then scaled to four instances. Applications developed using containerized microservices reduces average deployment time and continuous integration.

Introduction

TheBig Data era led to the advent of tools, technologies and architectures with improved efficiency, elasticity and resiliency. Big data applications need sophisticated architectures with inherent capabilities to scale and optimize. To enhance the scalability and elasticity of big data application deployment, implemented environments need to be continuously improved and updated. Organizations are using cloud based services to enhance the performance and to lessen overall cost. Containerization, a cloud based technology, is attaining popularity since it is lightweight in nature. Docker is one of the predominant, extensively used container based virtualizations, and since it is anopen source project, it can be used to develop, run and deploy an application efficiently.

Big data applications require extensive resources and environments to store, process and analyze this colossal collection of data in a distributed manner. Containerization with cloud computing provides a pertinent remedy to accommodate big data requirements, however requires a precise and appropriate load-balancing mechanism. The load on servers increases exponentially with increased resource usage, thus making load balancing an essential requirement. Moreover, the adjustment of containers accurately and rapidly according to load as per services is one of the crucial aspects in big data applications. This study provides a review relating to containerized environments like Docker for big data applications with load balancing. In this study, a scheduling mechanism of containers for big data applications established on Docker Swarm and Microservice architecture is proposed. In this paper, we utilize the concept of Docker Swarm to effectively handle big data applications workload and service discovery.

Improving the performance of an application is an ongoing struggle with increased demands and usage. Technical innovations are always aiming towards achieving higher degree of efficiency and performance. Employing an environment that gives performance, reliability and fault tolerance is required by all organizations. Cloud Computing has created a niche for itself when performance, resiliency, availability, and acost effective solution is required. Modern technologies are employing Cloud Computing to gain benefits as theCloud enables the ubiquitous availability of resources in a cost effective yet competent manner [1,2,3,4,5,6]. The Cloud has made resources like infrastructure, platform, and software available to the end user without any management efforts. TheCloud offers everything as a service including emerging technologies like the Internet of Things (IoT) or Big Data [3]. Various organizations are offering cloud based solutions for handling Big Data as is shown in [7]. Elasticity in a cloud environment having multi-tier structure is achieved by scaling the quantity of physical resources [8, 9]. Resources can be scaled in two ways, either by horizontal scaling i.e. by adding more (VMs) virtual machines [10], or through vertical scaling by adding more resources to the deployed VMs [11]. Both of the methods require additional time, suffer from latency issues, and may incur additional cost. To expedite the processes and optimizing the cost of application development, different paradigms and architectures have been studied and evaluated.

Containerization is one of the cloud based techniques which is gaining popularity because of features like beinglightweight, scalability, and availability when compared to virtual machines. Containers are best suited for continuous integration and continuous delivery (CI/CD) workflows. Docker [12], an open source project, is a widely used container-based virtualization tool assisting in the development, execution, and deployment of applications in containerized environments. Docker can manage workloads dynamically in real time due to portability and itslightweight nature. Applications executed in aDocker container remain isolated from the underlying host environment.

Docker can prove beneficial to deploy big data applications. Applications can be deployed in containers to serve massive workloads. It is a challenging task to manage numerous containers for asingle application. Dockerthankfully comes with a cluster management tool called Docker Swarm to handle multiple clusters. Docker Swarm provides clustering and anorchestration mechanism and thus can deploy several containers across different host machines. Docker Swarm also provides a fault tolerance mechanism not only by detecting failed containers on a host machine, but also redeploying the samecontainer on another host machine [13]. Big data applications suffer from some of the major issues like conventional data analysis methods not adjusting to input data i.e. on-the-fly or real time streaming data. Methodsused may also pose computational and speed-up overhead. Moreover, there is no theoretical derivation of parallelization speed-up factors. Machine Learning (ML) programs may exhibit skewed distribution if the load on each machine is not balanced, as they still lack methods to synchronize during waitingtimes for the exchange of parameter updates. As such, synchronization is another open challenge to handle big data using ML algorithms. Singh et al. [14] proposed acontainer based microservice architecture to handle monolithic design challenges like scalability, integration, and throughput. However, the focus was not on handling big data based applications which requires massive effort and deployment issues. Another issue which needs to be addressed is howto assign containers in real time accurately to manage service loads. In [15], a container scheduling approach employing ML is proposed by the authors by analyzing data sets using Random Forest(RF).

Most of the currentresearch fails to contribute the cause and effect of decrease in service execution performance due to an increase in load on thenodes. Another area of concern is howto assign service load dynamically at run time in terms of big data applications.

This research aims to distribute big data applications implemented as amicroservice inside theDocker Swarm according to resource utilization for respective host machines and service discovery, both of which are important for microservice architectures. The main focus of this research is on memory utilization according to given memory limits. In this paper, we propose a Docker Swarm based mechanism to observe consumption of memory by each host machine and then look toassign the load to a given host machine based on memory usage using microservices for big data applications. Performance of the work is evaluated based on the load assignment according to memory utilization. Contributions of the work focuses on improving performance during higher workloads that may occurdue to big dataprocessing and scaling the services to improve the efficiency.

The paper is structured as follows: Section Related Work will give a comprehensive summary of the work done in this area. Section Architectural design of Docker based load balancing and service discovery scheme for microservice based big data application will give an account of the proposed work and methodology and Section Result and Discussion will analyze the result of the proposed work. In Section Conclusion, we conclude our work.

Related work

Big data applications require extensive set of resources like storage, processing power, and communication channels due to their inherent characteristics. To handle this gigantic pile of data, it is common that techniques, frameworks, environments, and methodologies are continuously reviewed, analyzed and developed. This section explores the work done for big data analytics using Cloud computing and the use of Docker as well as Docker Swarm for the purpose of managing and orchestrating clusters for load balancing.

A Microservice based architecture for big data knowledge discovery which aims to acknowledge scalability and efficiency issues in processing is proposed by Singh et al. [16]. Naik et al. [17], have demonstrated thein workings of a model based on big data processing centered on Docker containers in multiple clouds by automatic assignment of big data clusters using Hadoop and Pachyderm. In thedevelopment phase, the environment used ensures the accurate working of code but it may fail duringthe testing or production phase due to environmental changes and/or differences. Containerization comes into play to handle this issue. Hardikar et al. [18] explored several facets of Containerization like automation, deployment, scaling, and load balancing with a focus on Docker as theruntime environment and Kubernetesis deployed for orchestration. The focus is mainly on containerization, but handling thebig data microservice is not focusedon directly in the study. In microservices, based on neighbourhood divison, a container scheduling approach called CSBND was proposed in [19] to optimize thesystem performance using response time and load balancing. The research did not handle big data and microservice based applications to be deployed on containers.

Big data analytics

Big Data Analytics deals with discovering knowledge from large datasets popularly known as big data for strategic planning, decision making, and predictionpurposes [20]. To analyze these colossal datasets, dynamic environments are required which need to be scalable enough to manage varying workloads as conventional methods often fail to process these large sizes of data. Big Data Analytics is an assortment of tools, technologies, methodologies combined in a system/platform or framework to perform knowledge discovery through processes like data gathering, cleaning, modelling, and visualization [21]. Techniques like machine learningand deep neural networks are utilized to perform theanalysis process.

The authors in [20, 22] provide an insight into various machine learning and deep learning algorithms which proveto be beneficial in Big Data Analytics. These processes require sophisticated architecture for storage, processing, and visualization. Cloud computing is considered to be an effective solution for it. The authors illustrated the affinity of Big Data with cloud with respect to its characteristics [23]. A web server load balancing mechanism focused on memory exploitation using Docker Swarm was proposed by Bella et al. [24]. This work focused on web server load balancing, however theservice discovery part is not considered in the paper. Big data applications requires extensive use of resources and resource utilization for big data is also not discussed.

Containerization using Docker

To increase the efficiency of methods and optimize development as well as the deployment cost of applications over thecloud, there have been numerous architectures, frameworks, environments, and paradigms examined in the literature extensively. Docker, which is an open source containerization tool, is fast emerging as an alternative for application deployment over anycloudbased architecture. Container centric virtualization is a substitute for virtualization done using hypervisor where containers share all resources like hardware, operating system and supporting libraries while maintaining abstraction and isolation [25]. Docker is a well-knownlightweight tool providing prompt development and relocation with improved efficiency and flexibility in resource provision [26].

A distinct host can be used to create numerous containers in multiple user spaces, which is unlike VMs [27]. Container-based applications fabricated using Microservice architecture require traffic management and load balancing at high workloads. This issue is handled through container load balancing. A load balancer fora container results in higher availability and scalability of applications for client requests. This ensures seamless performance of Microservice applications running in containers. Tools like Docker Swarmas well as Kuberbnetes provide support to manage and deploy containers. Figure1 gives an illustration of adistributing application client load to containerized microservices usinga load balancer.

Container load balancing

Full size image

Docker Swarm

Management of containers is an important and crucial aspect of containerization. Load Balancing is required to handle requests dynamically. To manage Docker clusters, Docker Swarm, a cluster administration and orchestration tool is used that links and controls all Docker nodes [28]. Docker Swarm offers features like reliability, security, availability, scalability, and maintainability. It helps in the balanced distribution of anyload and checks host machines for failed containers. If any failed containers are found, Docker Swarm redeploys it [23]. It is an enhancement of Docker.

Docker Swarm is made up of two types of nodes, manager and worker nodes. Allmembership and allocation processes are handled by themanager node while worker nodes execute swarm based services in Docker Swarm. TheManager node uses its own IP address and port to expose swarm services to all clients. Requests from clients are channelled to a chosen worker node by theswarm manager's internal load balancing mechanism so that requests are evenly distributed [29]. Although theDocker Swarm load balancing process distributes the load, the ability tomonitor resource utilization according to available limits is not provided. This can lead to uneven load distribution making any Big Data Microservice prone to collapse. In this study, we will distribute Microservice based loads in Docker Swarm by checking resource consumption of host machines creating an even load distribution mandated by available limits.

Microservice architecture

Monolithic architectures are the most common conventional architectures used to deploy applications. These architectures work on more or less three basic layers i.e. presentation, business, and data logic in order to handle simple to complex tasks. The architecture is simple and easyto use since everything is under one autonomous deployment unit. However, the architecture may limit the application to scale and makeupdates a difficult task when a complex task needs to be managed. Microservice architectures aim to minimize the issues that exist in monolithic architectures by dividing the entire application into lightweight and loosely coupled components [30, 31]. Every component has its individual code repository and can be updated independently, making any complex application far more scalable, resilient and efficient. Service Discovery and Load Balancing are two critical as well as fundamental aspects of Microservices. Service Discovery can be defined as a registry of running instances for one or many services. It is required by Microservices to collaborate. Systems’ scalability, throughput, execution time, response time, and performance is largely influenced by load balancing [32].

Container-based virtualization and Microservices make a perfect association as containers provide a decentralized environment and are lightweight in nature. Today, Docker is used to build modules called Microservices [33], to decentralize packages and distribute jobs into distinct, stand alone applications that collaborate with each other. Microservices can be considered as small applications that must be deployed in their individual VM instances to have discrete environments. But to dedicate an entire VM [34] instance to just a part ofan application is not an efficient approach. Docker containers require less computing resources when compared to virtual machines, therefore deploying hundreds and thousands of Microservices on Docker containers will reduce performance overhead and will increase the overallefficiency of the applications [35]. In this study, we will distribute the load of Big Data applications inside aDocker Swarm by utilizing resources of host machines. The main objective is to balance the load by checking memory consumption of allhost machines based on knownmemory limits. This research aims at service discovery and server-side load balancing for Big Data applications based on Microservices using Docker Swarm.

Architectural design of Docker based load balancing and service discovery scheme for microservice based big data application

A fault tolerant and decentralized architecture is provided by Docker Swarm. A set of Docker hosts can be combined into a swarm using swarm mode. Services can be created and scaled with health checks along with built in load balancing and service discovery features. Big Data Applications require anextensive set of resources which are required to be properly load balanced. ADocker based Load Balancing and Service Discovery systemis used for Big Data applications. Figure2provides a look at the Microservices stack in use for a big data application.

Microservices stack of abig data application

Full size image

We containerize our application using Docker as Microservices. We used Docker Swarm for orchestration, service discovery, and load balancing.

The Big Data application stack will provide the given functionality in the form of Microservices:

Extraction of links from the input URL using front end PHP application using Apache server.
Interaction of theWeb application with API server (Python) to manage link extraction and return JSON response.
An image of Redis cache (used by API server) to check for already scraped pages and evading repeated fetch.

The experimental setupis made up of four Swarm nodes – a masternode and three worker nodes as given in Table 1and Fig. 3, respectively. The master node is implemented using NGINX service where the Swarm commands are run. Swarm itselfis responsible for scheduling, Domain Name Service (DNS) service discovery, scaling, and container load balancing on all nodes. This Docker Swarm load balancer will run on every node and will balanceload requests as required.

Full size table

Service Discovery for the Containers

Full size image

Four services are created within the Docker Swarm. A Master Load Balancer service to enable load balancing and thethreeother services are Microservices for the implemented scenario for aBig Data application. These three worker nodes are namely the PHP front end Microservice, python API Microservice, and Redis cache Microservice, respectively. The port number and respective containers of the services are listed in Table 1. For example, to run the load balancer service, it is required to open port number http://192.168.0.23:80 and to access theweb server, http://192.168.0.24:8080 is usedto acquire the services of theApache web server. Docker Swarm is responsible to supervise and distribute the containers running these services routinely.

The proposed algorithm for theentire process is discussed in Algorithm 1. Load balancing methodologies are covered in Algorithm 2.

Algorithm1: algorithmμBigLB (service orchestration)

Step 1: Automate installation of requirements using Docker file and build an isolated image (Container 1).

Step 2: Use Microservice for Big Data Application:

a.
Creating full path URLs of extracted path
b.
Extracting anchor and link texts
c.
Return object, move main logic to function

Step 3:

a.
Run Server
b.
Map host and container ports
c.
Expose link extraction (from step 2) as web service API in second python file (Microservice).

Step 4:

a.
Create independent image of all the code
b.
Create front end using PHP in different folder
See Also
Load Balancing in the Age of Docker What is Kubernetes Load Balancer? | Avi Networks Time to say goodbye to Docker Swarm
c.
Services are integrated using docker-compose.yml

Step 5: Create second container for front end PHP application.

Step 6: Create third container for Redis for caching purpose.

Algorithm2: load balancing

Step 1: Create NGINX service for load balancing.

Step 2: Run memory monitoring service in each worker node.

Step 3: Check for load and service discovery and redirect load if a worker node fails, to active worker node.

Service discovery is managed by applying Docker Remote API from the services to extract service and instance information. Itis a built in service discovery mechanism of the orchestrator. In order to test the load balancing aspect of Docker Swarm, our “linkextractor” Microserviceis scaled to run multiple instances using Docker API: docker service scale linkextractor = 4.

This will create 4 replicas of our Microservice and if it is curled a few times, it will get different IP addresses. The calls were done usinground-robin over the four instances. This load-balancing mechanism of container orchestrator implemented by Docker Swarm “service” abstraction removes the complexity of client-side based load-balancing. The effect on latency and CPU/memory usage is monitored using Docker API: docker stats, which provides container runtime metrics like CPU usage, Memory usage and limits, and network I/O metrics.

We will consider the following parameters for each scenario (container) implemented:

CPU usage
Memory usage and limits
Network throughput

Result and discussion

Using Remote Docker API, service discoveryis first performed as it is one of the crucial elements ofany Microservice architecture, so that Microservices can discover and collaborate with each other as shown in Fig. 3. Service discovery helps to allocate and assign nodes with lesser nodes and helps in automatic and continuous integration. Containers are assigned based on workloads once service discovery is performed.

Once all the container imagesare discovered and loaded, our applicationis executed to check its working in thecontainerized environment i.e. to extract links from the given URL. Figure4 shows the links extracted from a test URL.

Links extracted from a given URL

Full size image

Once our applicationis tested, itis scaled from one to four instances to check the effect on latencies and CPU/memory usage with respect to memory limits.

Results achieved in Table 2 and Fig. 5, respectively, show that all four container instances are comparatively sharing similar workloads. Therefore, based on theseresults, it is concluded that containerized microservices for big data applications based on the proposed architecture can be effectively managed on Docker Swarm. More instances can be added to scaleup and handle the deployment and continuous integration process in amuch better way.

Full size table

Memory and CPU usage of four containers in Swarm

Full size image

Monolithic applications suffer from scalability and integration issues making it challenging to handle big data applications which can be easily managed by the proposed architecture.

The given case study illustrates the requirement of containerization for applications working on Big Data. Thefollowing section illustrates boththe merits and demerits of the strategy asproposed:

Merits

Containers are well suited for complex applications deployed as microservices and thus can help the efficient balancing of loads across servers as compared to VMs (Virtual Machines)
According to therequirements of a given application, functionality can be scaled by deploying more containers which can be managed effectively using Docker Swarm. This process is difficult to address usingvirtualized environments
Containers can be very easilyduplicated or deleted according to requirements and theSwarm can handle this aspect in an efficient manner.

Demerits

Containers provide scalability, however portability can be affected by placing dependencies on containers.
Containers are susceptible to attacks as they sharethe OS kernel. This can affect service discovery and load balancing across servers in case of an attack or anymalicious activities.
Though Containers can be duplicated at an amazing speed, they consume ahuge amount of resources making them acostlier strategy as compared to other techniques like virtualization.

Conclusion

It is often adifficult and time consuming process to manage Big Data applications because of their predominant characteristics. Microservices are considered to be abetter option to provide ascalable and fault tolerant approach to Big Data applicationmanagement. Service discovery and load balancing are both important aspects of Microservices that need to be addressed in modern systems. In this study, the benefits of containerization on Microservice based Big Data applicationswas illustrated. The load balancing and service discovery facets of Microservices are properly handled by aDocker container and its attachedorchestration toolcalled Docker Swarm.

This proposed concept shows the usefulness of theDocker toolssuite in orchestrating a multi-service stack likeis needed for Big Data Applications. This technique can be utilized to avoid asingle pointof failure in Big Data applications, as such making applications more scalable, resilient, and portable. In the future, the computational complexity and cost efficiency of the proposed workneeds to be examined and addressed. The given techniques as presented can also be developed and implemented for Big Data applications in multi-cloud scenarios.

Availability of data and materials

Not Applicable.

References

Fox A, Griffith R, Joseph A, Katz R, Konwinski A, Lee G et al (2009) Above the clouds: a berkeley view of cloud computing. Rep UCBIEECS 28
Armbrust M et al (2010) A view of cloud computing. Commun ACM 53(4):50–58
Article Google Scholar
Rimal BP, Jukan A, Katsaros an Goeleven D (2011) Architectural requirements for cloud computing systems: an Enterprise cloud approach. J Grid Comput 9(1):3–26
Article Google Scholar
Buyya R, Yeo CS, Venugopal S (2008) Marketoriented cloud computing: vision, hype, and reality for delivering IT services as computing utilities. In: Proceedings of the 10th IEEE international conference on high performance computing and communications
Google Scholar
Vouk MA (2008) Cloud computing issues, research and implementations. In: 30th international conference on information technology interfaces (ITI 2008), Cavtat/Dubrovnik, pp 31–40
P. Mell and T. Grance, “Draft nist working definition of cloud computing”,2009. Available: http://csrc.nist.gov/groups/SNS/cloud-computing/index.html
Google Scholar
Wan J, Cai H, Zhou K (2015) Industrie 4.0: enabling technologies. In: Proceedings of 2015 International Conference on Intelligent Computing and Internet of Things, pp 135–140. https://doi.org/10.1109/ICAIOT.2015.7111555
Chapter Google Scholar
Liu Z, Zhang Q, Zhani MF, Boutaba R, Liu Y, Gong Z (2015) DREAMS: dynamic resource allocation for MapReduce with data skew. In: 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM), pp 18–26. https://doi.org/10.1109/INM.2015.7140272
Chapter Google Scholar
Wei G, Vasilakos AV, Zheng Y, Xiong N (2010) A game-theoretic method of fair resource allocation for cloud computing services. J Supercomput 54(2):252–269
Article Google Scholar
Jiang J, Lu J, Zhang G, Long G (2013) Optimal Cloud Resource Auto-Scaling for Web Applications. In: 2013 13th IEEE/ACM international symposium on cluster, Cloud, and Grid Computing, pp 58–65. https://doi.org/10.1109/CCGrid.2013.73
Chapter Google Scholar
Shi X, Dong J, Djouadi S, Feng Y, Ma X, Wang Y (2016) PAPMSC: power-aware performance management approach for virtualized web servers via stochastic control. J Grid Comput 14(1):171–191
Article Google Scholar
Preeth EN, Mulerickal FJ, Mulerickal BP, Sastri Y (2015) Evaluation of Docker containers based on hardware utilization. In: 2015 International Conference on Control Communication & Computing India (ICCC), pp 697–700. https://doi.org/10.1109/ICCC.2015.7432984
Chapter Google Scholar
Ismail BI et al (2015) Evaluation of Docker as edge computing platform. In: 2015 IEEE Conference on Open Systems (ICOS), pp 130–135. https://doi.org/10.1109/ICOS.2015.7377291
Chapter Google Scholar
Singh V, Peddoju SK (2017) Container-based microservice architecture for cloud applications. In: 2017 International Conference on Computing, Communication and Automation (ICCCA), pp 847–852. https://doi.org/10.1109/CCAA.2017.8229914
Chapter Google Scholar
Lv J, Wei M, Yu Y (2019) A container scheduling strategy based on machine learning in microservice architecture. In: 2019 IEEE International Conference on Services Computing (SCC), pp 65–71. https://doi.org/10.1109/SCC.2019.00023
Chapter Google Scholar
Singh N, Singh DP, Pant B, Tiwari UK (2021) μBIGMSA-microservice-based model for big Data knowledge discovery: thinking beyond the monoliths. Wirel Pers Commun 116(4):2819–2833
Article Google Scholar
Naik N, Jenkins P, Savage N, Katos V (2016) Big data security analysis approach using computational intelligence techniques in R for desktop users. IEEE Symposium Series on Computational Intelligence (SSCI) 2016:1–8. https://doi.org/10.1109/SSCI.2016.7849907
Article Google Scholar
Hardikar S, Ahirwar P, Rajan S Containerization: cloud computing based inspiration Technology for Adoption through Docker and Kubernetes. In: 2021 Second International Conference on Electronics and Sustainable Communication Systems (ICESC), vol 2021, pp 1996–2003. https://doi.org/10.1109/ICESC51422.2021.9532917
Guo Y, Yao W (2018) A container scheduling strategy based on neighborhood division in micro service. In: NOMS 2018–2018 IEEE/IFIP Network Operations and Management Symposium, pp 1–6. https://doi.org/10.1109/NOMS.2018.8406285
Chapter Google Scholar
Singh N, Singh DP, Pant B (2017) A comprehensive study of big data machine learning approaches and challenges. In: 2017 International Conference on Next Generation Computing and Information Systems (ICNGCIS), pp 80–85. https://doi.org/10.1109/ICNGCIS.2017.14
Chapter Google Scholar
Trnka A (2014) Big data analysis. Eur J Sci Theol 10(1):143–148
MathSciNet Google Scholar
Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E (2015) Deep learning applications and challenges in big data analytics. J Big Data 2(1):1–21
Article Google Scholar
Hashem IAT, Yaqoob I, Anuar NB, Mokhtar S, Gani A, Khan SU (2015) The rise of ‘big data’ on cloud computing: review and open research issues. Inf Syst 47:98–115
Article Google Scholar
Bella MRM, Data M, Yahya W (2018) Web server load balancing based on memory utilization using Docker swarm. In: 2018 International Conference on Sustainable Information Engineering and Technology (SIET), pp 220–223. https://doi.org/10.1109/SIET.2018.8693212
Chapter Google Scholar
Soltesz S, Pötzl H, Fiuczynski ME, Bavier A, Peterson L (2007) Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors. SIGOPS Oper Syst Rev 41(3):275–287 (Pubitemid 47281589)
Article Google Scholar
Felter W, Ferreira A, Rajamony R, Rubio J (2015) An updated performance comparison of virtual machines and Linux containers. In: 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp 171–172. https://doi.org/10.1109/ISPASS.2015.7095802
Chapter Google Scholar
J. Turnbull, The Docker Book, 2014, Available: www.dockerbook.com
Google Scholar
Docker.com./Docker Swarm. https://docs.docker.com/engine/swarm/. Accessed 24 Aug 2020]
Docker Swarm mode key concepts. Available: https://docs.docker.com/engine/swarm/key-concepts/. Accessed 24 Aug 2020
Al-Masri E (2018) Enhancing the microservices architecture for the internet of things. In: 2018 IEEE International Conference on Big Data (Big Data), pp 5119–5125. https://doi.org/10.1109/BigData.2018.8622557
Chapter Google Scholar
Imran S (2021) Ahmad, and do Hyeun Kim, “a task orchestration approach for Efficient Mountain fire detection based on microservice and predictive analysis in IoT environment”. J Intell Fuzzy Syst 40(3):5681–5696
Article MathSciNet Google Scholar
Dhiman G et al (2022) Federated learning approach to protect healthcare data over big data scenario. Sustainability 14(5):2500
Article Google Scholar
Singh P et al (2022) A fog-cluster based load-balancing technique. Sustainability 14(13):7961
Article Google Scholar
Kanwal S et al (2022) Mitigating the coexistence technique in wireless body area networks by using superframe interleaving. IETE J Res 2022:1–15
Article Google Scholar
Kour K et al (2022) Smart-hydroponic-based framework for saffron cultivation: a precision smart agriculture perspective. Sustainability 14(3):1120
Article Google Scholar

Download references

Acknowledgments

Not applicable.

Code availability

Not Applicable.

Funding

Not applicable.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Graphic Era Deemed to be University, Dehradun, 248002, India
Neelam Singh&Gaurav Dhiman
Abu Dhabi Polytechnic, Abu Dhabi Polytechnic, Abu Dhabi, United Arab Emirates
Yasir Hamid
KIET Group of Institutions, Delhi NCR, Ghaziabad, India
Sapna Juneja
Department of Mathematics and Computer Science, Brandon University, Brandon, Canada
Gautam Srivastava
Research Centre of Interneural Computing, China Medical University, 40402, Taichung, Taiwan
Gautam Srivastava
Dept. of Computer Science and Math, Lebanese American University, 1102, Beirut, Lebanon
Gautam Srivastava
Govt. Bikram College of Commerce, Patiala, Punjab, India
Gaurav Dhiman
University Centre for Research and Development, Department of Computer Science and Engineering, Chandigarh University, Gharuan, Mohali, 140413, India
Gaurav Dhiman
Department of Electrical and Computer Engineering, Lebanese American University, Byblos, Lebanon
Gaurav Dhiman&Thippa Reddy Gadekallu
School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
Thippa Reddy Gadekallu
Department of Economics, Kebri Dehar University, Kebri Dehar, Somali, 250, Ethiopia
Mohd Asif Shah
School of Business, Woxsen University, Hyderabad, Telangana, 502345, India
Mohd Asif Shah

Authors

Neelam Singh
View author publications
You can also search for this author in PubMedGoogle Scholar
Yasir Hamid
View author publications
You can also search for this author in PubMedGoogle Scholar
Sapna Juneja
View author publications
You can also search for this author in PubMedGoogle Scholar
Gautam Srivastava
View author publications
You can also search for this author in PubMedGoogle Scholar
Gaurav Dhiman
View author publications
You can also search for this author in PubMedGoogle Scholar
Thippa Reddy Gadekallu
View author publications
You can also search for this author in PubMedGoogle Scholar
Mohd Asif Shah
View author publications
You can also search for this author in PubMedGoogle Scholar

Contributions

Conceptualization by Neelam Singh; Methodology by Sapna Juneja; Software and formal analysis by Yasir Hamid; Investigationand Writing by Gautam Srivastava; Resources and data collection by Writing by: Gaurav Dhiman; Validation by: Thippa Reddy Gadekallu and Mohd Asif Shah. The authors read and approved the final manuscript.

Corresponding author

Correspondence to Mohd Asif Shah.

Ethics declarations

Competing interests

Not Applicable.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Singh, N., Hamid, Y., Juneja, S. et al. Load balancing and service discovery using Docker Swarm for microservice based big data applications. J Cloud Comp 12, 4 (2023). https://doi.org/10.1186/s13677-022-00358-7

Download citation

Received: 15 September 2022
Accepted: 02 November 2022
Published: 07 January 2023
DOI: https://doi.org/10.1186/s13677-022-00358-7

Keywords

Big data
Containerization
Docker
Microservice
Docker Swarm
Load-balancing