Environments

Environments are the core of any DRL project since they specify the “game” and interaction logic

Tetris_scheduling Environment

This file provides the scheduling environment class Env, which can be used to load and simulate scheduling-problem instances.

class environments.env_tetris_scheduling.Env(config: dict, data: List[List[Task]])

Bases: Env

Environment for scheduling optimization. This class inherits from the base gym environment, so the functions step, reset, _state_obs and render are implemented and can be used by default.

If you want to customize the given rewards, you can adapt the function compute_reward.

Parameters

config – Dictionary with parameters to specify environment attributes
data – Scheduling problem to be solved, so a list of instances

__init__(config: dict, data: List[List[Task]])

reset() → List[float]

Resets the episode information trackers
Updates the number of runs
Loads new instance

Returns: First observation by calling the class function self.state_obs

step(action: ~typing.Union[int, float], **kwargs) -> (typing.List[float], typing.Any, <class 'bool'>, typing.Dict): Step Function :param action: Action to be performed on the current state of the environment :return: Observation, reward, done, infos

get_instance_info() -> (<class 'int'>, <class 'int'>, <class 'int'>, <class 'int'>): Retrieves info about the instance size and configuration from an instance sample :return: (number of jobs, number of tasks and the maximum runtime) of this datapoint

property state_obs: List[float]

Transforms state (task state and factory state) to gym obs Scales the values between 0-1 and transforms to onehot encoding

Returns: Observation

static to_one_hot(x: int, max_size: int) → array

Convert to One Hot encoding

Parameters

x – Index which value should be 1
max_size – Size of the one hot encoding vector

Returns

One hot encoded vector

static check_valid_job_action(job_action: array, job_mask: array) → bool

Check if job action is valid

Parameters

job_action – Job action as one hot vector
job_mask – One hot vector with ones for each valid job

Returns

True if job_action is valid, else False

get_selected_task(job_idx: int) → Tuple[int, Task]

Helper Function to get the selected task (next possible task) only by the job index

Parameters: job_idx – job index
Returns: Index of the task in the task list and the selected task

choose_machine(task: Task) → int

This function performs the logic, with which the machine is chosen (in the case of the flexible JSSP) Implemented at the moment: Choose the machine out of the set of possible machines with the earliest possible start time

Parameters: task – Task
Returns: Machine on which the task will be scheduled.

get_action_mask() → array

Get Action mask It is needed for the heuristics, the machine selection (and the agent, if it is masked). 0 -> available 1 -> not available

Returns: Action mask

execute_action(job_id: int, task: Task, machine_id: int) → None

This Function executes a valid action - set machine - update job and task

Parameters

job_id – job_id of the task to be executed
task – Task
machine_id – ID of the machine on which the task is to be executed

Returns

None

compute_reward() → Any

Calculates the reward that will later be returned to the agent. Uses the self.reward_strategy string to discriminate between different reward strategies. Default is ‘dense_reward’.

Returns: Reward

sparse_makespan_reward() → int

Computes the reward based on the final makespan at the end of the episode. Else 0.

Returns: (int) sparse reward

mr2_reward() → Any

Computes mr2 reward based on https://doi.org/10.1016/j.engappai.2022.104868

Returns: mr2 reward

check_done() → bool

Check if all jobs are done

Returns: True if all jobs are done, else False

calculate_tardiness() → int

Calculates the tardiness of all jobs (this is the previous was the calc reward function)

Returns: (int) tardiness of last solution

get_makespan(): Returns the current makespan (the time the latest of all scheduled tasks finishes)

log_intermediate_step() → None

Log Function

Returns: None

close(): This is a relict of using OpenAI Gym API. This is currently unnecessary.

seed(seed=1): This is a relict of using OpenAI Gym API. Currently unnecessary, because the environment is deterministic -> no seed is used.

render(mode='human')

Visualizes the current status of the environment

Parameters: mode – “human”: Displays the gantt chart, “image”: Returns an image of the gantt chart
Returns: PIL.Image.Image if mode=image, else None

Tetris_scheduling_indirect_action Environment

class environments.env_tetris_scheduling_indirect_action.IndirectActionEnv(config: dict, data: List[List[Task]])

Bases: Env

Scheduling environment for scheduling optimization according to https://www.sciencedirect.com/science/article/pii/S0952197622001130.

Main differences to the vanilla environment:

ACTION: Indirect action mapping
REWARD: m-r2 reward (which means we have to train on the same data again and again)
OBSERVATION: observation different (“normalization” looks like division by max to [0, 1] in paper code). Not every part makes sense, due to the different interaction logic
INTERACTION LOGIC WARNING:
original paper: time steps are run through, the agent can take as many actions as it wants per time-step, but may not schedule into the past.
our adaptation: we still play tetris, meaning that we schedule whole blocks of work at a time

Parameters

config – Dictionary with parameters to specify environment attributes
data – Scheduling problem to be solved, so a list of instances

__init__(config: dict, data: List[List[Task]])

step(action: int, **kwargs)

Step Function

Parameters

action – Action to be performed on the current state of the environment
kwargs – should include “action_mode”, because the interaction pattern between heuristics and the agent are different and need to be processed differently

Returns

Observation, reward, done, infos

reset() → ndarray

Resets the episode information trackers
Updates the number of runs
Loads new instance

Returns: First observation by calling the class function self.state_obs

property state_obs: ndarray

Transforms state (task state and factory state) to gym obs Scales the values between 0-1 and transforms to onehot encoding Confer https://www.sciencedirect.com/science/article/pii/S0952197622001130 section 4.2.1

Returns: Observation

get_action_mask() → array

Get Action mask In this environment, we always treat all actions as valid, because the interaction logic accepts it. Note that we only allow non-masked algorithms. The heuristics, however, still need the job mask. 0 -> available 1 -> not available

Returns: Action mask

get_next_tasks(): returns the next tasks that can be scheduled

Util Functions

class environments.environment_loader.EnvironmentLoader

Bases: object

Loads the right environment as named in the passed config file. Also checks if the environment is compatible with the chosen algorithm.

classmethod load(config: dict, check_env_agent_compatibility: bool = True, register_gym_env: bool = False, **kwargs) → Tuple[Any, str]: loading function

classmethod check_environment_agent_compatibility(config: dict, env_name: Optional[str] = None, algo_name: Optional[str] = None): Check if environment and algorithm are compatible. E.g., some environments may depend on action masking.