Environments
Environments are the core of any DRL project since they specify the “game” and interaction logic
Tetris_scheduling Environment
This file provides the scheduling environment class Env, which can be used to load and simulate scheduling-problem instances.
- class environments.env_tetris_scheduling.Env(config: dict, data: List[List[Task]])
Bases:
Env
Environment for scheduling optimization. This class inherits from the base gym environment, so the functions step, reset, _state_obs and render are implemented and can be used by default.
If you want to customize the given rewards, you can adapt the function compute_reward.
- Parameters
config – Dictionary with parameters to specify environment attributes
data – Scheduling problem to be solved, so a list of instances
- __init__(config: dict, data: List[List[Task]])
- reset() List[float]
Resets the episode information trackers
Updates the number of runs
Loads new instance
- Returns
First observation by calling the class function self.state_obs
- step(action: ~typing.Union[int, float], **kwargs) -> (typing.List[float], typing.Any, <class 'bool'>, typing.Dict)
Step Function :param action: Action to be performed on the current state of the environment :return: Observation, reward, done, infos
- get_instance_info() -> (<class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>)
Retrieves info about the instance size and configuration from an instance sample :return: (number of jobs, number of tasks and the maximum runtime) of this datapoint
- property state_obs: List[float]
Transforms state (task state and factory state) to gym obs Scales the values between 0-1 and transforms to onehot encoding
- Returns
Observation
- static to_one_hot(x: int, max_size: int) array
Convert to One Hot encoding
- Parameters
x – Index which value should be 1
max_size – Size of the one hot encoding vector
- Returns
One hot encoded vector
- static check_valid_job_action(job_action: array, job_mask: array) bool
Check if job action is valid
- Parameters
job_action – Job action as one hot vector
job_mask – One hot vector with ones for each valid job
- Returns
True if job_action is valid, else False
- get_selected_task(job_idx: int) Tuple[int, Task]
Helper Function to get the selected task (next possible task) only by the job index
- Parameters
job_idx – job index
- Returns
Index of the task in the task list and the selected task
- choose_machine(task: Task) int
This function performs the logic, with which the machine is chosen (in the case of the flexible JSSP) Implemented at the moment: Choose the machine out of the set of possible machines with the earliest possible start time
- Parameters
task – Task
- Returns
Machine on which the task will be scheduled.
- get_action_mask() array
Get Action mask It is needed for the heuristics, the machine selection (and the agent, if it is masked). 0 -> available 1 -> not available
- Returns
Action mask
- execute_action(job_id: int, task: Task, machine_id: int) None
This Function executes a valid action - set machine - update job and task
- Parameters
job_id – job_id of the task to be executed
task – Task
machine_id – ID of the machine on which the task is to be executed
- Returns
None
- compute_reward() Any
Calculates the reward that will later be returned to the agent. Uses the self.reward_strategy string to discriminate between different reward strategies. Default is ‘dense_reward’.
- Returns
Reward
- sparse_makespan_reward() int
Computes the reward based on the final makespan at the end of the episode. Else 0.
- Returns
(int) sparse reward
- mr2_reward() Any
Computes mr2 reward based on https://doi.org/10.1016/j.engappai.2022.104868
- Returns
mr2 reward
- check_done() bool
Check if all jobs are done
- Returns
True if all jobs are done, else False
- calculate_tardiness() int
Calculates the tardiness of all jobs (this is the previous was the calc reward function)
- Returns
(int) tardiness of last solution
- get_makespan()
Returns the current makespan (the time the latest of all scheduled tasks finishes)
- log_intermediate_step() None
Log Function
- Returns
None
- close()
This is a relict of using OpenAI Gym API. This is currently unnecessary.
- seed(seed=1)
This is a relict of using OpenAI Gym API. Currently unnecessary, because the environment is deterministic -> no seed is used.
- render(mode='human')
Visualizes the current status of the environment
- Parameters
mode – “human”: Displays the gantt chart, “image”: Returns an image of the gantt chart
- Returns
PIL.Image.Image if mode=image, else None
Tetris_scheduling_indirect_action Environment
- class environments.env_tetris_scheduling_indirect_action.IndirectActionEnv(config: dict, data: List[List[Task]])
Bases:
Env
Scheduling environment for scheduling optimization according to https://www.sciencedirect.com/science/article/pii/S0952197622001130.
Main differences to the vanilla environment:
ACTION: Indirect action mapping
REWARD: m-r2 reward (which means we have to train on the same data again and again)
OBSERVATION: observation different (“normalization” looks like division by max to [0, 1] in paper code). Not every part makes sense, due to the different interaction logic
INTERACTION LOGIC WARNING:
original paper: time steps are run through, the agent can take as many actions as it wants per time-step, but may not schedule into the past.
our adaptation: we still play tetris, meaning that we schedule whole blocks of work at a time
- Parameters
config – Dictionary with parameters to specify environment attributes
data – Scheduling problem to be solved, so a list of instances
- __init__(config: dict, data: List[List[Task]])
- step(action: int, **kwargs)
Step Function
- Parameters
action – Action to be performed on the current state of the environment
kwargs – should include “action_mode”, because the interaction pattern between heuristics and the agent are different and need to be processed differently
- Returns
Observation, reward, done, infos
- reset() ndarray
Resets the episode information trackers
Updates the number of runs
Loads new instance
- Returns
First observation by calling the class function self.state_obs
- property state_obs: ndarray
Transforms state (task state and factory state) to gym obs Scales the values between 0-1 and transforms to onehot encoding Confer https://www.sciencedirect.com/science/article/pii/S0952197622001130 section 4.2.1
- Returns
Observation
- get_action_mask() array
Get Action mask In this environment, we always treat all actions as valid, because the interaction logic accepts it. Note that we only allow non-masked algorithms. The heuristics, however, still need the job mask. 0 -> available 1 -> not available
- Returns
Action mask
- get_next_tasks()
returns the next tasks that can be scheduled
Util Functions
- class environments.environment_loader.EnvironmentLoader
Bases:
object
Loads the right environment as named in the passed config file. Also checks if the environment is compatible with the chosen algorithm.
- classmethod load(config: dict, check_env_agent_compatibility: bool = True, register_gym_env: bool = False, **kwargs) Tuple[Any, str]
loading function
- classmethod check_environment_agent_compatibility(config: dict, env_name: Optional[str] = None, algo_name: Optional[str] = None)
Check if environment and algorithm are compatible. E.g., some environments may depend on action masking.