Paper Review:
Summary:
Mesos is a thin layer application between different frameworks and resources, dynamically assigning tasks to a cluster of machines. Because of the different tasks and constraints of frameworks, the task assignment should be made with a lot of considerations. The overall system seems quite simple and elegant, which makes the results even better.
Strong points:
- The system takes a lot of things into consideration during the assignment: the constraints of frameworks, the data location, the time of the tasks, the priority and the fairness. But yet all these considerations could be expressed in simple values and dynamically changed during the run;
- Mesos leaves a lot of room for the frameworks, which we can see from it’s implementation with Hadoop. It takes advantages of the fine-grained task assigning of Hadoop and reduce the overhead when Mesos and Hadoop work together. This is something that static partitioning tool could have a hard time to achieve;
- Mesos master is implemented with minimum communication between the upper and lower layers. Also the state of master is replicated in other standby masters and managed by ZooKeeper. The data in the memory of Mesos master could be constructed from the frameworks and slaves, which makes the system more resilient.
Weak point:
- While the tasks are isolated with Linux Container and separated cores and memory chips, there are still shared system resources like the network bandwidth, which is not taken into consideration during the run as well;
- The actual task assignment is still performed by a centralized machine and the bandwidth of other standby masters are wasted. The Mesos master might face short-time network exhaustion during run caused by similar tasks finished all at once. This might explain why the Hadoop tasks have spiky share of cluster performance;
- There isn’t much in the paper talking about the resource offer preference is calculated from the data location and other factors (it simply mentioned how to select the suitable frameworks with lottery scheduling with given preference si). I guess the preference could be useful if we can take advantage of the data location by assigning the tasks to nodes which already have the necessary data to save the transfer time. But I guess this is something that’s hard to keep track of and it requires the the frameworks to cooperate, which makes Mesos less adaptive after all.
Paper Notes:
- Sharing a cluster is hard. There are two common solutions: statically partition the cluster and run one framework per partition or allocate a set of VMs to each framework. However, there is no way to perform fine-grained sharing across the frameworks. Mesos is a thin layer that lies between frameworks and cluster resources.
- A centralized scheduler cannot achieve the complexity and scalability. It cannot adapt to new frameworks. Also it overlaps with the frameworks’ scheduler.
- Mesos will offer frameworks with resources based on an organizational policy such as fair sharing and the frameworks decide which ones to accept and the tasks to run on them.
- Mesos has minimal interfaces and pushes the control to the frameworks to 1) leave more room for various frameworks and 2) keeps Mesos simple, scalable and low in system requirements
- Frameworks resources could be revoked due to a buggy job or a greedy framework; however, frameworks can be assigned with resources with guarantee so the tasks won’t be killed as long as the framework is below the guaranteed allocation.
- Linux containers are used for task isolation
- ZooKeeper for leader election from all the standby masters. Mesos master’s memory could be reconstructed from the frameworks and resource slaves
- Mandatory and preferred resources; lottery scheduling; resource reservation for short tasks