Quantifying capability gaps via information relaxation and deep reinforcement learning in infinite-horizon Markov decision processes: A military air battle management application.

Saved in:
Bibliographic Details
Title: Quantifying capability gaps via information relaxation and deep reinforcement learning in infinite-horizon Markov decision processes: A military air battle management application.
Authors: Liles IV, Joseph M.1 (AUTHOR) joseph.liles@us.af.mil, Robbins, Matthew J.1 (AUTHOR), Lunday, Brian J.1 (AUTHOR)
Source: Journal of the Operational Research Society. May2026, Vol. 77 Issue 5, p1322-1337. 16p.
Subjects: Markov processes, Air warfare, Stochastic control theory, Reinforcement learning, Mathematical optimization
Abstract: This paper presents a novel application of information relaxation techniques to quantify upper bounds on solution quality in a complex, stochastic, and dynamic assignment problem in military air battle management. Information relaxation refers to relaxing the non-anticipativity constraints in a sequential decision-making problem that require a decision-maker to act only on currently available information. We introduce a temporal event horizon—–an adjustable window into future stochastic outcomes—–to explore the marginal value of information in shaping decision policies. Whereas previous work has investigated information relaxation with regard to problems that can be solved more easily under a deterministic relaxation, we demonstrate a methodology for applying the approach to a continuous-time, continuous-space problem that remains computationally challenging even after relaxation. We formulate the problem as a discounted, infinite-horizon Markov decision process and solve it by employing a deep neural network-based approximate policy iteration algorithm in concert with several designed computational experiments. We demonstrate how a multidimensional sensitivity analysis of the event horizon and other problem features helps quantify potential improvements to decision policy effectiveness resulting from either a change to tactics or a modification to capabilities. Our findings provide a methodology for objective, data-driven insights that can augment traditionally subjective capability gap analysis to guide decision-making and establish more effective requirements for acquisition programs. [ABSTRACT FROM AUTHOR]
Copyright of Journal of the Operational Research Society is the property of Taylor & Francis Ltd and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
Description
Abstract:This paper presents a novel application of information relaxation techniques to quantify upper bounds on solution quality in a complex, stochastic, and dynamic assignment problem in military air battle management. Information relaxation refers to relaxing the non-anticipativity constraints in a sequential decision-making problem that require a decision-maker to act only on currently available information. We introduce a temporal event horizon—–an adjustable window into future stochastic outcomes—–to explore the marginal value of information in shaping decision policies. Whereas previous work has investigated information relaxation with regard to problems that can be solved more easily under a deterministic relaxation, we demonstrate a methodology for applying the approach to a continuous-time, continuous-space problem that remains computationally challenging even after relaxation. We formulate the problem as a discounted, infinite-horizon Markov decision process and solve it by employing a deep neural network-based approximate policy iteration algorithm in concert with several designed computational experiments. We demonstrate how a multidimensional sensitivity analysis of the event horizon and other problem features helps quantify potential improvements to decision policy effectiveness resulting from either a change to tactics or a modification to capabilities. Our findings provide a methodology for objective, data-driven insights that can augment traditionally subjective capability gap analysis to guide decision-making and establish more effective requirements for acquisition programs. [ABSTRACT FROM AUTHOR]
ISSN:01605682
DOI:10.1080/01605682.2025.2528915