Action Space#
An action is specified by an action type (e.g., CLICK_COORDS
)
and the necessary fields for that action type (e.g., coords=[30, 60]
).
Supported Action Types#
MiniWoB++ environments support the following action types:
Name |
Description |
---|---|
|
Do nothing for the current step. |
|
Move the cursor to the specified coordinates. |
|
Click on the specified coordinates. |
|
Double-click on the specified coordinates. |
|
Start dragging on the specified coordinates. |
|
Stop dragging on the specified coordinates. |
|
Scroll up on the mouse wheel at the specified coordinates. |
|
Scroll down on the mouse wheel at the specified coordinates. |
|
Click on the specified element using JavaScript. |
|
Press the specified key or key combination. |
|
Type the specified string. |
|
Type the value of the specified task field. |
|
Click on the specified element using JavaScript, and then type the specified string. |
|
Click on the specified element using JavaScript, and then type the value of the specified task field. |
There are action types that perform similar actions (e.g., CLICK_COORDS
and CLICK_ELEMENT
).
A common practice is to specify a subset of action types that the agent can use in the config, as described below.
Action Configs#
The list of selected action types, along with other configurations, can be customized
by passing a miniwob.action.ActionSpaceConfig
object to the action_space_config
argument
during environment construction.
An ActionSpaceConfig
object has the following fields:
Key |
Type |
Description |
---|---|---|
|
|
An ordered sequence of action types to include. |
|
|
Screen width. Will be overridden by the environment constructor. |
|
|
Screen height. Will be overridden by the environment constructor. |
|
|
If specified, bin the x and y coordinates to these numbers of bins. Mouse actions will be executed at the middle of the specified partition. |
|
|
The amount to scroll for scroll actions. |
|
|
Time in milliseconds to wait for scroll action animation. |
|
|
An ordered sequence of allowed keys and key combinations for the |
|
|
Maximum text length for the |
|
|
Character set for the |
Presets#
The following preset names can be specified in place of the ActionSpaceConfig
object:
"all_supported"
: Select all supported actions, including redundant ones."shi17"
: The action space from (Shi et al., 2017) World of Bits: An Open-Domain Platform for Web-Based Agents."liu18"
: The action space from (Liu et al., 2018) Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration."humphreys22"
: The action space from (Humphreys et al., 2022) A data-driven approach for learning to control computers.
Adding "_mac_os"
to the preset name will change the key modifiers in allowed_keys
from Control to Meta.
Key combinations#
The PRESS_KEY
action type issues a key combination via Selenium.
Each key combination in the allowed_keys
config follow the rules:
Modifiers are specified using prefixes “C-” (Control), “S-” (Shift), “A-” (Alternate), or “M-” (Meta).
Printable character keys (a, 1, etc.) are specified directly. Shifted characters (A, !, etc.) are equivalent to “S-” + non-shifted counterpart.
Special keys are inclosed in “<…>”. The list of valid names is specified in
miniwob.constants.WEBDRIVER_SPECIAL_KEYS
.
Example valid key combinations:"7"
, "<Enter>"
, "C-S-<ArrowLeft>"
.
Action Object#
The action passed to the step
method
should be a dict
whose field inclusion depends on the selected action types in the config:
Key |
Type |
Description |
Inclusion |
---|---|---|---|
|
|
Action type index from the |
Always. |
|
|
Left and top coordinates.
Depending on the |
When any |
|
|
Element |
When any |
|
|
Key index from the |
When the |
|
|
Text to type. |
When any |
|
|
Index from the task field list |
When any |
For instance, if the config only contains action types CLICK_COORDS
and PRESS_KEY
,
the action object can be
action = {
"action_type": 0, # CLICK_COORDS
"coords": np.array([100, 50]),
"key": 0, # Ignored by the action CLICK_COORDS
}