Dynamic data processing cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for customers to access these services and to deploy their programs currently used have been designed for static, homogeneous cluster setups and disregard the particular nature of a cloud.
Consequently, the allocated compute resources may be inadequate for big parts of the submitted job and unnecessarily increase processing time and cost opportunities and challenges for efficient parallel data processing in clouds. It is the first data processing framework to explicitly exploit the dynamic resource allocation offered by today’s IaaS clouds for both, task scheduling and execution. Particular tasks of a processing job can be assigned to different types of virtual machines which are automatically instantiated and terminated during the job execution.
A growing number of companies have to process huge amounts of data in a cost-efficient manner. Classic representatives for these companies are operators of Internet search engines. The vast amount of data they have to deal with every day has made traditional database solutions prohibitively expensive .Instead; these companies have popularized an architectural paradigm based on a large number of commodity servers. Problems like processing crawled documents or regenerating a web index are split into several independent subtasks, distributed among the available nodes, and computed in parallel.
In recent years a variety of systems to facilitate MTC has been developed. Although these systems typically share common goals (e.g. to hide issues of parallelism or fault tolerance), they aim at different fields of application. Map Reduce is designed to run data analysis jobs on a large amount of data, which is expected to be stored across a large set of share-nothing commodity servers. Once a user has fit his program into the required map and reduce pattern, the execution framework takes care of splitting the job into subtasks, distributing and executing them. A single Map Reduce job always consists of a distinct map and reduce program.
1. Job Scheduling and ExecutionAfter having received a valid Job Graph from the user, Nephele’s Job Manager transforms it into a so-called Execution Graph. An Execution Graph is Nephele’s primary data structure for scheduling and monitoring the execution of a Nephele job. Unlike the abstract Job Graph, the Execution Graph contains all the concrete information required to schedule and execute the received job on the cloud.
2. Parallelization and Scheduling StrategiesIf constructing an Execution Graph from a user’s submitted Job Graph may leave different degrees of freedom to Nephele. The user provides any job annotation which contains more specific instructions we currently pursue simple default strategy: Each vertex of the Job Graph is transformed into one Execution Vertex. The default channel types are network channels. Each Execution Vertex is by default assigned to its own Execution Instance unless the user’s annotations or other scheduling restrictions (e.g. the usage of in-memory channels) prohibit it.
Web Technologies : ASP.NET 2.0 or above
Database : SQL SERVER 2005 or above
Web Server : IIS
Operating System : WINDOWS XP or above
Code Behind : C#.NET
This project is developed for academic purpose, so it has some limitations. The zip file consists all required documents, database and full project source code.
Comments/Suggestions are invited. Happy coding......!