public abstract class ResourceManager<WorkerType extends ResourceIDRetrievable> extends FencedRpcEndpoint<ResourceManagerId> implements ResourceManagerGateway, LeaderContender
It offers the following methods as part of its rpc interface to interact with him remotely:
registerJobManager(JobMasterId, ResourceID, String, JobID, Time) registers a JobMaster at the resource managerrequestSlot(JobMasterId, SlotRequest, Time) requests a slot from the resource managerRpcEndpoint.MainThreadExecutor| Modifier and Type | Field and Description |
|---|---|
static String |
RESOURCE_MANAGER_NAME |
log, rpcServer| Constructor and Description |
|---|
ResourceManager(RpcService rpcService,
String resourceManagerEndpointId,
ResourceID resourceId,
HighAvailabilityServices highAvailabilityServices,
HeartbeatServices heartbeatServices,
SlotManager slotManager,
MetricRegistry metricRegistry,
JobLeaderIdService jobLeaderIdService,
ClusterInformation clusterInformation,
FatalErrorHandler fatalErrorHandler,
JobManagerMetricGroup jobManagerMetricGroup) |
| Modifier and Type | Method and Description |
|---|---|
void |
cancelSlotRequest(AllocationID allocationID)
Cancel the slot allocation requests from the resource manager.
|
protected CompletableFuture<Void> |
clearStateAsync()
This method can be overridden to add a (non-blocking) state clearing routine to the
ResourceManager that will be called when leadership is revoked.
|
protected void |
closeJobManagerConnection(org.apache.flink.api.common.JobID jobId,
Exception cause)
This method should be called by the framework once it detects that a currently registered
job manager has failed.
|
protected void |
closeTaskManagerConnection(ResourceID resourceID,
Exception cause)
This method should be called by the framework once it detects that a currently registered
task executor has failed.
|
protected static Collection<ResourceProfile> |
createSlotsPerWorker(int numSlots) |
CompletableFuture<Acknowledge> |
deregisterApplication(ApplicationStatus finalStatus,
String diagnostics)
Cleanup application and shut down cluster.
|
void |
disconnectJobManager(org.apache.flink.api.common.JobID jobId,
Exception cause)
Disconnects a JobManager specified by the given resourceID from the
ResourceManager. |
void |
disconnectTaskManager(ResourceID resourceId,
Exception cause)
Disconnects a TaskManager specified by the given resourceID from the
ResourceManager. |
CompletableFuture<Integer> |
getNumberOfRegisteredTaskManagers()
Gets the currently registered number of TaskManagers.
|
protected int |
getNumberRequiredTaskManagerSlots() |
void |
grantLeadership(UUID newLeaderSessionID)
Callback method when current resourceManager is granted leadership.
|
void |
handleError(Exception exception)
Handles error occurring in the leader election service.
|
void |
heartbeatFromJobManager(ResourceID resourceID)
Sends the heartbeat to resource manager from job manager
|
void |
heartbeatFromTaskManager(ResourceID resourceID,
SlotReport slotReport)
Sends the heartbeat to resource manager from task manager
|
protected abstract void |
initialize()
Initializes the framework specific components.
|
protected abstract void |
internalDeregisterApplication(ApplicationStatus finalStatus,
String optionalDiagnostics)
The framework specific code to deregister the application.
|
protected void |
jobLeaderLostLeadership(org.apache.flink.api.common.JobID jobId,
JobMasterId oldJobMasterId) |
void |
notifySlotAvailable(InstanceID instanceID,
SlotID slotId,
AllocationID allocationId)
Sent by the TaskExecutor to notify the ResourceManager that a slot has become available.
|
protected void |
onFatalError(Throwable t)
Notifies the ResourceManager that a fatal error has occurred and it cannot proceed.
|
CompletableFuture<Void> |
postStop()
User overridable callback.
|
protected CompletableFuture<Void> |
prepareLeadershipAsync()
This method can be overridden to add a (non-blocking) initialization routine to the
ResourceManager that will be called when leadership is granted but before leadership is
confirmed.
|
void |
registerInfoMessageListener(String address)
Registers an info message listener.
|
CompletableFuture<RegistrationResponse> |
registerJobManager(JobMasterId jobMasterId,
ResourceID jobManagerResourceId,
String jobManagerAddress,
org.apache.flink.api.common.JobID jobId,
org.apache.flink.api.common.time.Time timeout)
Register a
JobMaster at the resource manager. |
CompletableFuture<RegistrationResponse> |
registerTaskExecutor(String taskExecutorAddress,
ResourceID taskExecutorResourceId,
int dataPort,
HardwareDescription hardwareDescription,
org.apache.flink.api.common.time.Time timeout)
Register a
TaskExecutor at the resource manager. |
protected void |
releaseResource(InstanceID instanceId,
Exception cause) |
protected void |
removeJob(org.apache.flink.api.common.JobID jobId) |
CompletableFuture<ResourceOverview> |
requestResourceOverview(org.apache.flink.api.common.time.Time timeout)
Requests the resource overview.
|
CompletableFuture<Acknowledge> |
requestSlot(JobMasterId jobMasterId,
SlotRequest slotRequest,
org.apache.flink.api.common.time.Time timeout)
Requests a slot from the resource manager.
|
CompletableFuture<TransientBlobKey> |
requestTaskManagerFileUpload(ResourceID taskManagerId,
FileType fileType,
org.apache.flink.api.common.time.Time timeout)
Request the file upload from the given
TaskExecutor to the cluster's BlobServer. |
CompletableFuture<TaskManagerInfo> |
requestTaskManagerInfo(ResourceID resourceId,
org.apache.flink.api.common.time.Time timeout)
Requests information about the given
TaskExecutor. |
CompletableFuture<Collection<TaskManagerInfo>> |
requestTaskManagerInfo(org.apache.flink.api.common.time.Time timeout)
Requests information about the registered
TaskExecutor. |
CompletableFuture<Collection<org.apache.flink.api.java.tuple.Tuple2<ResourceID,String>>> |
requestTaskManagerMetricQueryServicePaths(org.apache.flink.api.common.time.Time timeout)
Requests the paths for the TaskManager's
MetricQueryService to query. |
void |
revokeLeadership()
Callback method when current resourceManager loses leadership.
|
void |
sendInfoMessage(String message) |
CompletableFuture<Acknowledge> |
sendSlotReport(ResourceID taskManagerResourceId,
InstanceID taskManagerRegistrationId,
SlotReport slotReport,
org.apache.flink.api.common.time.Time timeout)
Sends the given
SlotReport to the ResourceManager. |
void |
start()
Starts the rpc endpoint.
|
abstract Collection<ResourceProfile> |
startNewWorker(ResourceProfile resourceProfile)
Allocates a resource using the resource profile.
|
abstract boolean |
stopWorker(WorkerType worker)
Stops the given worker.
|
void |
unRegisterInfoMessageListener(String address)
Unregisters an info message listener.
|
protected abstract WorkerType |
workerStarted(ResourceID resourceID)
Callback when a worker was started.
|
callAsyncWithoutFencing, getFencingToken, getMainThreadExecutor, getUnfencedMainThreadExecutor, runAsyncWithoutFencing, setFencingTokencallAsync, getAddress, getEndpointId, getHostname, getRpcService, getSelfGateway, getTerminationFuture, runAsync, scheduleRunAsync, scheduleRunAsync, shutDown, stop, validateRunsInMainThreadclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitgetFencingTokengetAddress, getHostnamegetAddresspublic static final String RESOURCE_MANAGER_NAME
public ResourceManager(RpcService rpcService, String resourceManagerEndpointId, ResourceID resourceId, HighAvailabilityServices highAvailabilityServices, HeartbeatServices heartbeatServices, SlotManager slotManager, MetricRegistry metricRegistry, JobLeaderIdService jobLeaderIdService, ClusterInformation clusterInformation, FatalErrorHandler fatalErrorHandler, JobManagerMetricGroup jobManagerMetricGroup)
public void start()
throws Exception
RpcEndpointIMPORTANT: Whenever you override this method, call the parent implementation to enable rpc processing. It is advised to make the parent call last.
start in class RpcEndpointException - indicating that something went wrong while starting the RPC endpointpublic CompletableFuture<Void> postStop()
RpcEndpointThis method is called when the RpcEndpoint is being shut down. The method is guaranteed to be executed in the main thread context and can be used to clean up internal state.
IMPORTANT: This method should never be called directly by the user.
postStop in class RpcEndpointpublic CompletableFuture<RegistrationResponse> registerJobManager(JobMasterId jobMasterId, ResourceID jobManagerResourceId, String jobManagerAddress, org.apache.flink.api.common.JobID jobId, org.apache.flink.api.common.time.Time timeout)
ResourceManagerGatewayJobMaster at the resource manager.registerJobManager in interface ResourceManagerGatewayjobMasterId - The fencing token for the JobMaster leaderjobManagerResourceId - The resource ID of the JobMaster that registersjobManagerAddress - The address of the JobMaster that registersjobId - The Job ID of the JobMaster that registerstimeout - Timeout for the future to completepublic CompletableFuture<RegistrationResponse> registerTaskExecutor(String taskExecutorAddress, ResourceID taskExecutorResourceId, int dataPort, HardwareDescription hardwareDescription, org.apache.flink.api.common.time.Time timeout)
ResourceManagerGatewayTaskExecutor at the resource manager.registerTaskExecutor in interface ResourceManagerGatewaytaskExecutorAddress - The address of the TaskExecutor that registerstaskExecutorResourceId - The resource ID of the TaskExecutor that registersdataPort - port used for data communication between TaskExecutorshardwareDescription - of the registering TaskExecutortimeout - The timeout for the response.public CompletableFuture<Acknowledge> sendSlotReport(ResourceID taskManagerResourceId, InstanceID taskManagerRegistrationId, SlotReport slotReport, org.apache.flink.api.common.time.Time timeout)
ResourceManagerGatewaySlotReport to the ResourceManager.sendSlotReport in interface ResourceManagerGatewaytaskManagerRegistrationId - id identifying the sending TaskManagerslotReport - which is sent to the ResourceManagertimeout - for the operationAcknowledge once the slot report has been received.public void heartbeatFromTaskManager(ResourceID resourceID, SlotReport slotReport)
ResourceManagerGatewayheartbeatFromTaskManager in interface ResourceManagerGatewayresourceID - unique id of the task managerslotReport - Current slot allocation on the originating TaskManagerpublic void heartbeatFromJobManager(ResourceID resourceID)
ResourceManagerGatewayheartbeatFromJobManager in interface ResourceManagerGatewayresourceID - unique id of the job managerpublic void disconnectTaskManager(ResourceID resourceId, Exception cause)
ResourceManagerGatewayResourceManager.disconnectTaskManager in interface ResourceManagerGatewayresourceId - identifying the TaskManager to disconnectcause - for the disconnection of the TaskManagerpublic void disconnectJobManager(org.apache.flink.api.common.JobID jobId,
Exception cause)
ResourceManagerGatewayResourceManager.disconnectJobManager in interface ResourceManagerGatewayjobId - JobID for which the JobManager was the leadercause - for the disconnection of the JobManagerpublic CompletableFuture<Acknowledge> requestSlot(JobMasterId jobMasterId, SlotRequest slotRequest, org.apache.flink.api.common.time.Time timeout)
ResourceManagerGatewayrequestSlot in interface ResourceManagerGatewayjobMasterId - id of the JobMasterslotRequest - The slot to requestpublic void cancelSlotRequest(AllocationID allocationID)
ResourceManagerGatewaycancelSlotRequest in interface ResourceManagerGatewayallocationID - The slot to requestpublic void notifySlotAvailable(InstanceID instanceID, SlotID slotId, AllocationID allocationId)
ResourceManagerGatewaynotifySlotAvailable in interface ResourceManagerGatewayinstanceID - TaskExecutor's instance idslotId - The SlotID of the freed slotallocationId - to which the slot has been allocatedpublic void registerInfoMessageListener(String address)
registerInfoMessageListener in interface ResourceManagerGatewayaddress - address of infoMessage listener to register to this resource managerpublic void unRegisterInfoMessageListener(String address)
unRegisterInfoMessageListener in interface ResourceManagerGatewayaddress - of the info message listener to unregister from this resource managerpublic CompletableFuture<Acknowledge> deregisterApplication(ApplicationStatus finalStatus, @Nullable String diagnostics)
deregisterApplication in interface ResourceManagerGatewayfinalStatus - of the Flink applicationdiagnostics - diagnostics message for the Flink application or nullpublic CompletableFuture<Integer> getNumberOfRegisteredTaskManagers()
ResourceManagerGatewaygetNumberOfRegisteredTaskManagers in interface ResourceManagerGatewaypublic CompletableFuture<Collection<TaskManagerInfo>> requestTaskManagerInfo(org.apache.flink.api.common.time.Time timeout)
ResourceManagerGatewayTaskExecutor.requestTaskManagerInfo in interface ResourceManagerGatewaytimeout - of the requestpublic CompletableFuture<TaskManagerInfo> requestTaskManagerInfo(ResourceID resourceId, org.apache.flink.api.common.time.Time timeout)
ResourceManagerGatewayTaskExecutor.requestTaskManagerInfo in interface ResourceManagerGatewayresourceId - identifying the TaskExecutor for which to return informationtimeout - of the requestpublic CompletableFuture<ResourceOverview> requestResourceOverview(org.apache.flink.api.common.time.Time timeout)
ResourceManagerGatewayrequestResourceOverview in interface ResourceManagerGatewaytimeout - of the requestpublic CompletableFuture<Collection<org.apache.flink.api.java.tuple.Tuple2<ResourceID,String>>> requestTaskManagerMetricQueryServicePaths(org.apache.flink.api.common.time.Time timeout)
ResourceManagerGatewayMetricQueryService to query.requestTaskManagerMetricQueryServicePaths in interface ResourceManagerGatewaytimeout - for the asynchronous operationpublic CompletableFuture<TransientBlobKey> requestTaskManagerFileUpload(ResourceID taskManagerId, FileType fileType, org.apache.flink.api.common.time.Time timeout)
ResourceManagerGatewayTaskExecutor to the cluster's BlobServer. The
corresponding TransientBlobKey is returned.requestTaskManagerFileUpload in interface ResourceManagerGatewaytaskManagerId - identifying the TaskExecutor to upload the specified filefileType - type of the file to uploadtimeout - for the asynchronous operationTransientBlobKey after uploading the file to the
BlobServer.protected void closeJobManagerConnection(org.apache.flink.api.common.JobID jobId,
Exception cause)
jobId - identifying the job whose leader shall be disconnected.cause - The exception which cause the JobManager failed.protected void closeTaskManagerConnection(ResourceID resourceID, Exception cause)
resourceID - Id of the TaskManager that has failed.cause - The exception which cause the TaskManager failed.protected void removeJob(org.apache.flink.api.common.JobID jobId)
protected void jobLeaderLostLeadership(org.apache.flink.api.common.JobID jobId,
JobMasterId oldJobMasterId)
protected void releaseResource(InstanceID instanceId, Exception cause)
public void sendInfoMessage(String message)
protected void onFatalError(Throwable t)
t - The exception describing the fatal errorpublic void grantLeadership(UUID newLeaderSessionID)
grantLeadership in interface LeaderContendernewLeaderSessionID - unique leadershipIDpublic void revokeLeadership()
revokeLeadership in interface LeaderContenderpublic void handleError(Exception exception)
handleError in interface LeaderContenderexception - Exception being thrown in the leader election serviceprotected abstract void initialize()
throws ResourceManagerException
ResourceManagerException - which occurs during initialization and causes the resource manager to fail.protected CompletableFuture<Void> prepareLeadershipAsync()
CompletableFuture that completes when the computation is finished.protected CompletableFuture<Void> clearStateAsync()
CompletableFuture that completes when the state clearing routine
is finished.protected abstract void internalDeregisterApplication(ApplicationStatus finalStatus, @Nullable String optionalDiagnostics) throws ResourceManagerException
This method also needs to make sure all pending containers that are not registered yet are returned.
finalStatus - The application status to report.optionalDiagnostics - A diagnostics message or null.ResourceManagerException - if the application could not be shut down.@VisibleForTesting public abstract Collection<ResourceProfile> startNewWorker(ResourceProfile resourceProfile)
resourceProfile - The resource descriptionResourceProfile describing the launched slotsprotected abstract WorkerType workerStarted(ResourceID resourceID)
resourceID - The worker resource idpublic abstract boolean stopWorker(WorkerType worker)
worker - The worker.protected int getNumberRequiredTaskManagerSlots()
protected static Collection<ResourceProfile> createSlotsPerWorker(int numSlots)
Copyright © 2014–2019 The Apache Software Foundation. All rights reserved.