public class SlotPool extends RpcEndpoint implements SlotPoolGateway, AllocatedSlotActions
ExecutionGraph. It will will attempt to acquire new slots
from the ResourceManager when it cannot serve a slot request. If no ResourceManager is currently available,
or it gets a decline from the ResourceManager, or a request times out, it fails the slot request. The slot pool also
holds all the slots that were offered to it and accepted, and can thus provides registered free slots even if the
ResourceManager is down. The slots will only be released when they are useless, e.g. when the job is fully running
but we still have some free slots.
All the allocation or the slot offering will be identified by self generated AllocationID, we will use it to eliminate ambiguities.
TODO : Make pending requests location preference aware TODO : Make pass location preferences to ResourceManager when sending a slot request
| Modifier and Type | Class and Description |
|---|---|
protected static class |
SlotPool.AvailableSlots
Organize all available slots from different points of view.
|
static class |
SlotPool.ProviderAndOwner
An implementation of the
SlotOwner and SlotProvider interfaces
that delegates methods as RPC calls to the SlotPool's RPC gateway. |
RpcEndpoint.MainThreadExecutor| Modifier and Type | Field and Description |
|---|---|
protected Map<SlotSharingGroupId,SlotSharingManager> |
slotSharingManagers
Managers for the different slot sharing groups.
|
log, rpcServer| Modifier | Constructor and Description |
|---|---|
protected |
SlotPool(RpcService rpcService,
org.apache.flink.api.common.JobID jobId,
SchedulingStrategy schedulingStrategy) |
|
SlotPool(RpcService rpcService,
org.apache.flink.api.common.JobID jobId,
SchedulingStrategy schedulingStrategy,
Clock clock,
org.apache.flink.api.common.time.Time rpcTimeout,
org.apache.flink.api.common.time.Time idleSlotTimeout) |
| Modifier and Type | Method and Description |
|---|---|
CompletableFuture<LogicalSlot> |
allocateSlot(SlotRequestId slotRequestId,
ScheduledUnit task,
SlotProfile slotProfile,
boolean allowQueuedScheduling,
org.apache.flink.api.common.time.Time allocationTimeout)
Requests to allocate a slot for the given
ScheduledUnit. |
void |
connectToResourceManager(ResourceManagerGateway resourceManagerGateway)
Connects the SlotPool to the given ResourceManager.
|
void |
disconnectResourceManager()
Disconnects the slot pool from its current Resource Manager.
|
CompletableFuture<org.apache.flink.types.SerializableOptional<ResourceID>> |
failAllocation(AllocationID allocationID,
Exception cause)
Fail the specified allocation and release the corresponding slot if we have one.
|
protected org.apache.flink.runtime.jobmaster.slotpool.SlotPool.AllocatedSlots |
getAllocatedSlots() |
protected SlotPool.AvailableSlots |
getAvailableSlots() |
SlotOwner |
getSlotOwner()
Gets the slot owner implementation for this pool.
|
SlotProvider |
getSlotProvider()
Gets the slot provider implementation for this pool.
|
CompletableFuture<Boolean> |
offerSlot(TaskManagerLocation taskManagerLocation,
TaskManagerGateway taskManagerGateway,
SlotOffer slotOffer)
Slot offering by TaskExecutor with AllocationID.
|
CompletableFuture<Collection<SlotOffer>> |
offerSlots(TaskManagerLocation taskManagerLocation,
TaskManagerGateway taskManagerGateway,
Collection<SlotOffer> offers)
Offers multiple slots to the
SlotPool. |
CompletableFuture<Void> |
postStop()
User overridable callback.
|
CompletableFuture<Acknowledge> |
registerTaskManager(ResourceID resourceID)
Register TaskManager to this pool, only those slots come from registered TaskManager will be considered valid.
|
CompletableFuture<Acknowledge> |
releaseSlot(SlotRequestId slotRequestId,
SlotSharingGroupId slotSharingGroupId,
Throwable cause)
Releases the slot with the given
SlotRequestId. |
CompletableFuture<Acknowledge> |
releaseTaskManager(ResourceID resourceId,
Exception cause)
Unregister TaskManager from this pool, all the related slots will be released and tasks be canceled.
|
void |
start()
Starts the rpc endpoint.
|
void |
start(JobMasterId jobMasterId,
String newJobManagerAddress)
Start the slot pool to accept RPC calls.
|
void |
suspend()
Suspends this pool, meaning it has lost its authority to accept and distribute slots.
|
protected void |
timeoutPendingSlotRequest(SlotRequestId slotRequestId) |
callAsync, getAddress, getEndpointId, getHostname, getMainThreadExecutor, getRpcService, getSelfGateway, getTerminationFuture, runAsync, scheduleRunAsync, scheduleRunAsync, shutDown, stop, validateRunsInMainThreadclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitgetAddress, getHostnameprotected final Map<SlotSharingGroupId,SlotSharingManager> slotSharingManagers
@VisibleForTesting protected SlotPool(RpcService rpcService, org.apache.flink.api.common.JobID jobId, SchedulingStrategy schedulingStrategy)
public SlotPool(RpcService rpcService, org.apache.flink.api.common.JobID jobId, SchedulingStrategy schedulingStrategy, Clock clock, org.apache.flink.api.common.time.Time rpcTimeout, org.apache.flink.api.common.time.Time idleSlotTimeout)
public void start()
RpcEndpointIMPORTANT: Whenever you override this method, call the parent implementation to enable rpc processing. It is advised to make the parent call last.
start in class RpcEndpointpublic void start(JobMasterId jobMasterId, String newJobManagerAddress) throws Exception
jobMasterId - The necessary leader id for running the job.newJobManagerAddress - for the slot requests which are sent to the resource managerExceptionpublic CompletableFuture<Void> postStop()
RpcEndpointThis method is called when the RpcEndpoint is being shut down. The method is guaranteed to be executed in the main thread context and can be used to clean up internal state.
IMPORTANT: This method should never be called directly by the user.
postStop in class RpcEndpointpublic void suspend()
suspend in interface SlotPoolGatewaypublic SlotOwner getSlotOwner()
This method does not mutate state and can be called directly (no RPC indirection)
public SlotProvider getSlotProvider()
This method does not mutate state and can be called directly (no RPC indirection)
public void connectToResourceManager(ResourceManagerGateway resourceManagerGateway)
SlotPoolGatewayconnectToResourceManager in interface SlotPoolGatewayresourceManagerGateway - The RPC gateway for the resource manager.public void disconnectResourceManager()
SlotPoolGatewayThe slot pool will still be able to serve slots from its internal pool.
disconnectResourceManager in interface SlotPoolGatewaypublic CompletableFuture<LogicalSlot> allocateSlot(SlotRequestId slotRequestId, ScheduledUnit task, SlotProfile slotProfile, boolean allowQueuedScheduling, org.apache.flink.api.common.time.Time allocationTimeout)
SlotPoolGatewayScheduledUnit. The request
is uniquely identified by the provided SlotRequestId which can also
be used to release the slot via AllocatedSlotActions.releaseSlot(SlotRequestId, SlotSharingGroupId, Throwable).
The allocated slot will fulfill the requested ResourceProfile and it
is tried to place it on one of the location preferences.
If the returned future must not be completed right away (a.k.a. the slot request can be queued), allowQueuedScheduling must be set to true.
allocateSlot in interface SlotPoolGatewayslotRequestId - identifying the requested slottask - for which to allocate slotslotProfile - profile that specifies the requirements for the requested slotallowQueuedScheduling - true if the slot request can be queued (e.g. the returned future must not be completed)allocationTimeout - for the operationLogicalSlotpublic CompletableFuture<Acknowledge> releaseSlot(SlotRequestId slotRequestId, @Nullable SlotSharingGroupId slotSharingGroupId, Throwable cause)
AllocatedSlotActionsSlotRequestId. If the slot belonged to a
slot sharing group, then the corresponding SlotSharingGroupId has to be
provided. Additionally, one can provide a cause for the slot release.releaseSlot in interface AllocatedSlotActionsslotRequestId - identifying the slot to releaseslotSharingGroupId - identifying the slot sharing group to which the slot belongs, null if nonecause - of the slot release, null if nonepublic CompletableFuture<Collection<SlotOffer>> offerSlots(TaskManagerLocation taskManagerLocation, TaskManagerGateway taskManagerGateway, Collection<SlotOffer> offers)
SlotPoolGatewaySlotPool. The slot offerings can be
individually accepted or rejected by returning the collection of accepted
slot offers.offerSlots in interface SlotPoolGatewaytaskManagerLocation - from which the slot offers originatetaskManagerGateway - to talk to the slot offereroffers - slot offers which are offered to the SlotPoolpublic CompletableFuture<Boolean> offerSlot(TaskManagerLocation taskManagerLocation, TaskManagerGateway taskManagerGateway, SlotOffer slotOffer)
offerSlot in interface SlotPoolGatewaytaskManagerLocation - location from where the offer comes fromtaskManagerGateway - TaskManager gatewayslotOffer - the offered slotpublic CompletableFuture<org.apache.flink.types.SerializableOptional<ResourceID>> failAllocation(AllocationID allocationID, Exception cause)
failAllocation in interface SlotPoolGatewayallocationID - Represents the allocation which should be failedcause - The cause of the failurepublic CompletableFuture<Acknowledge> registerTaskManager(ResourceID resourceID)
registerTaskManager in interface SlotPoolGatewayresourceID - The id of the TaskManagerpublic CompletableFuture<Acknowledge> releaseTaskManager(ResourceID resourceId, Exception cause)
releaseTaskManager in interface SlotPoolGatewayresourceId - The id of the TaskManagercause - for the releasing of the TaskManager@VisibleForTesting protected void timeoutPendingSlotRequest(SlotRequestId slotRequestId)
@VisibleForTesting protected org.apache.flink.runtime.jobmaster.slotpool.SlotPool.AllocatedSlots getAllocatedSlots()
@VisibleForTesting protected SlotPool.AvailableSlots getAvailableSlots()
Copyright © 2014–2019 The Apache Software Foundation. All rights reserved.