In the scenario as you present it, reinforcement learning (RL) should work, but you may gain very little by applying it, over simpler search algorithms.

As you explain in comments, each application is separately trained, the task in each case is arbitrary, and there is no way for the agent to apply contextual knowledge to identify likely actions. The only feedback will be task completion. There are no heurtistics available, simply a list of available actions on each step.

In terms of RL problem definition, this is very similar to solving a maze in least number of steps. The problem is episodic (it terminates), so setting a cost per action (negative reward e.g. $r = -1$ on each step) is the simplest way for the agent to get feedback on how it is doing. The highest total reward will represent the shortest path.

Simple maze solvers are often used as toy problems in RL tutorials. You could use pretty much any example of a gridworld maze from such a tutorial as a starting point to make an agent for your problem, and replace the up/right/left/down actions with your more abstract list of text links or menu options. The state representation will depend on application behaviour, but could be as simple as the current list of available actions.

For this final part, I am assuming the following things, implied by the question and comments:

  • The application behaves completely deterministically.
  • The entry point to the application is always the same.
  • The output goal is a single, shortest-possible list of simple actions (labels to select from) that reach the goal from the start.

If all the above are true, then RL is effectively a random search for the goal state followed by further random iterations to remove non-necessary steps and find the shortest path. In which case, other tree-searching algorithms may be simpler to implment and have better performance than RL. You could try depth-first search (DFS) and/or breadth-first search (BFS), perhaps also requiring loop detection/avoidance. These would avoid the randomness inherent in an RL-based search, and likely be far more efficient due to that.