Betreuer/in: Sommer

This paper extends the Agentic Policy Search (APS) framework proposed by Sommer et al. in „Adaptive Self-Improvement for Smarter Energy Systems using Agentic Policy Search“[1]. The main objective is to guide the policy search process in a more
interpretable and structured manner, particularly in highly stochastic environments.
While agentic policy search enables flexible and adaptive policy generation, it often
lacks transparency and systematic guidance, making it difficult to understand,
compare, and reliably improve discovered policies. This limitation becomes especially
pronounced in volatile domains such as energy systems, where uncertainty,
high variability, and transfer across environments require robust and explainable
decision-making strategies. Addressing these challenges motivates the introduction
of structured search over policy classes, enabling a more principled exploration of
the policy space while improving interpretability and robustness.
To this end, a hierarchical search approach over policy classes as defined by
Warren Powell [2] is introduced. At the class level, a Knowledge Gradient (KG)
strategy is employed to efficiently allocate search effort across different policy classes.
Within each class, candidate policies are optimized using the Huxley–Gödel Machine
(HGM) [3], enabling the identification of strong representative policies. In addition,
the simulation environment is extended and reworked to improve the robustness of
generated policies and to facilitate transfer to a modified CityLearn environment
[4]. A dedicated parameter tuning phase is incorporated to further enhance policy
performance, and a memory system is introduced to retain and reuse relevant
information across the search process.
Experimental results indicate that appropriately chosen policy classes yield
promising performance in highly stochastic settings while providing improved interpretability
of the underlying decision structure. However, the stochastic nature
of large language models can lead to weak policies even within suitable classes.
Moreover, since the proposed approach requires exploration across multiple policy
classes, several iterations may be necessary before identifying an effective policy,
resulting in increased computational effort.
Raum 04.137, Martensstr. 3, Erlangen
oder
Zoom-Meeting beitreten:
https://fau.zoom-x.de/j/68350702053?pwd=UkF3aXY0QUdjeSsyR0tyRWtLQ0hYUT09
Meeting-ID: 683 5070 2053
Kenncode: 647333