The Alvio Simulator
The Alvio Simulator is an C++ event driven simulator that has been designed and developed for evaluate Scheduling Policies in high performance architectures. Like other simulators, given a workload and an architecture definition it is able to simulate how the jobs would be scheduled using an specific scheduling policy (such as First-Come-First-Serve, or the Backfilling policies). The main contribution of this simulator is that it not only allows to model the job worklflow in the system, it also allows to simulate different resource allocation policies (such as First-Continuous-Fit). Thereby, the researcher is able to validate how different combinations of scheduling policies and resource selection policies impact on the performance of the system. The other novelty contribution of this simulator is the modelization of the resource usage of the architecture by the different jobs that are running in the system. The researcher is able to specify the resources that are available in the architecture (for instance the amount of memory bandwidth in each node) and for each job in the workload the amount of resources that is used.
This simulator has a Apache 2 Licence. There are the following sources available:
- It general, architectural and functional description: In this section we describe the internals of the simulator, its functionalities and the different models that are currently available to the researcher for evaluate different high performance architecture scenarios.
- In this document the internal structure of the simulator is described. Mainly, in terms of data structures, modules and developoment issues. All the classes of the simulator and namespaces are deeply described. This source is mainly oriented to the developer.
- In this tar.gz file you will find a kdevelop project, including the configure files and so. Where the source code of the Alvio simulator is distributed using a Apache Licence. Note that actually is a developer version. Thereby, although it can be used by research it has a lot of files that are related to the kdevelop project. Please if you download the simulator, mail us as alvio@guim.net.
- In this document how to built up the environment for use the simulator is described. Furthermore, it contains information about the dependencies of software of the simulator and other useful information for use the alvio simulator.
- The license of this simulator is based on the Apache 2 licence. Check this license requirements at LICENSE file.
Abstract
The main simulation component is the simulator engine, which manages all the simulation events and instantiates all the events of the simulation. These events are based on the simulated workload. When a JOB_ARRIBAL is triggered, a new ISIS-Dispatcher entity is created for the job. In the new version of the system, the dispatcher decides which center to use to gather the scheduling information from the scheduling policies components (similar to the previous version of the system) and from the prediction system. This last component includes a historical database containing all the information about the performance variables of jobs which have finished in the system, and a set of predictors that estimate the job performance variables using different techniques (i.e: statistical techniques or dataminig techniques). Once the dispatcher has chosen where to submit the job it contacts the scheduling components which manage the resource, and submits the job. At this point, the job becomes scheduled at the local scenario. As we have already explained above, this component manages and allocates jobs based on the reservation table which models how the computational resources are used (processors, memory etc.). Once the job has been executed, the scheduling component provides information regarding job execution to the ISIS-dispatcher which is managing its execution. The dispatcher contacts the prediction service and provides feedback about the job execution. The prediction service updates its historical database and may update the prediction models that it is using.
The approach we followed in the design of the predictor internals, how the scheduler manages the missed deadlines, and how the predictor and the scheduler interact, is based on formalizations provided in the work of Tsafrir et al.. Although in our experiments we used only one prediction service, the current version of the simulator allows us to define a model of the ISIS-dispatcher environment, including several prediction services. This will allow us to carry out further studies of the scalability of the provided solutions and to consider the possibility of having multiple systems to gather predictions. Furthermore, as described in figure, in the evaluations given in this paper, dispatcher entities know all the available scheduling resources that are available in the system. We are currently working on a new component (information service), which allows the dispatcher to dynamically gather the information about the resources that are available at a given point of time. Thus we will be able to evaluate the effect of dispatchers having access to information about the whole system, depending on the information service used.
The authors
The authors of this simulator are:
- Francesc Guim Bernat - http://francesc.guim.net/
- Julita Corbalan - http://personals.ac.upc.edu/juli/
- Ivan Rodero - http://personals.ac.upc.edu/irodero/
Enterprise solutions (updated soon)
Would you like to evaluate your scheduling architecture ? Cluster system ? farms of computers ? Special resources ? Contact us to hpcsolutions |at| guim.net ! We can provide our expirience to your enterprise.
The related papers
Using Runtime Prediction with Economic Considerations in the ISIS-Dispatcher, Authors: Francesc Guim, Julita Corabalan, Submitted to Workshops on Job Scheduling Strategies for Parallel Processing 2008.
I Rodero, F Guim, J Corbalan, "Modeling and Evaluating Interoperable Grid Systems", International Grid Interoperability and Interoperation Workshop 2008 (IGIIW 2008) in conjunction with IEEE International Conference on e-Science, Indianapolis, USA, December 2008, pp 508-515
A Kertesz, I Rodero, F Guim, "Meta-Brokering Solutions for Expanding Grid Middleware Limitations", Workshop on Secure, Trusted, Manageable and Controllable Grid Services, in conjunction with Euro-Par, 2008
F Guim, J Corbalan, J Labarta, ``Resource Sharing Usage Aware Resource Selection Policies for Backfilling Strategies'', The 2008 High Performance Computing & Simulation Conference (HPCS 2008)
‘Modeling the impact of resource sharing in Backfilling Policies using the Alvio Simulator’, Authors: F. Guim, Julita Corbalan, Jesus Labarta 15th Annual Meeting of the IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)
I. Rodero, F. Guim, J. Corbalan, L. Fong, S.M. Sadjadi, "Interoperable Grid Scheduling Strategies", Future Generations in Computer Systems (FGCS), 2008.
I. Rodero, F. Guim, J. Corbalan, L. Fong, S.M. Sadjadi, "Broker Selection Strategies in Interoperable Grid Systems", IEEE International Symposium on Cluster Computing and the Grid (CCGrid), 2009.
`The resource Usage Aware Backfilling ’, Authors: F. Guim, Julita Corbalan, Jesus Labarta . To be submitted Workshops on Job Scheduling Strategies for Parallel Processing 2009.
‘The Grid Backfilling: a Multi-Site Scheduling Architecture with Data Min- ing Prediction Techniques’, Authors: Francesc Guim, Ivan Rodero, Julita Corbalan, A. Goyeneche, Coregrid Workshop In Grid Middleware 2007.
‘A Job Self-Scheduling Policy for HPC Infrastructures’, Authors: Francesc Guim, Julita Corabalan, Workshops on Job Scheduling Strategies for Parallel Processing 2007
‘Prediction f based Models for Evaluating Backfilling Scheduling Policies’, Au thors: F. Guim, J. Corbalan, J. Labarta . The 8th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)
More info
For more information, bugs notification or any further comments mail to alvio|at | guim.net
First Publication
The first release was the V1.1 - The simulator is distributed based on the Apache 2 License.
Many Thanks to..
- Dan Tsafrir, tha provided us his easysim simualtor that has been used for validate our models and that has been the base for the internals and design of the prediction modules and prediction scheduling policies. http://www.cs.huji.ac.il/~dants/