skip to main content

Title: Evaluating the Stability of Non-Adaptive Trading in Continuous Double Auctions
The continuous double auction (CDA) is the predominant mechanism in modern securities markets. Many agent-based analyses of CDA environments rely on simple non-adaptive trading strategies like Zero Intelligence (ZI), which (as their name suggests) are quite limited. We examine the viability of this reliance through empirical game-theoretic analysis in a plausible market environment. Specifically, we evaluate the strategic stability of equilibria defined over a small set of ZI traders with respect to strategies found by reinforcement learning (RL) applied over a much larger policy space. RL can indeed find beneficial deviations from equilibria of ZI traders, by conditioning on signals of the likelihood a trade will execute or the favorability of the current bid and ask. Nevertheless, the surplus earned by well-calibrated ZI policies is empirically observed to be nearly as great as what the adaptive strategies can earn, despite their much more expressive policy space. Our findings generally support the use of equilibrated ZI traders in CDA studies.
Authors:
;
Award ID(s):
1741190
Publication Date:
NSF-PAR ID:
10105519
Journal Name:
17th International Conference on Autonomous Agents and MultiAgent Systems
Sponsoring Org:
National Science Foundation
More Like this
  1. We present an agent-based model of manipulating prices in financial markets through spoofing: submitting spurious orders to mislead traders who learn from the order book. Our model captures a complex market environment for a single security, whose common value is given by a dynamic fundamental time series. Agents trade through a limit-order book, based on their private values and noisy observations of the fundamental. We consider background agents following two types of trading strategies: the non-spoofable zero intelligence (ZI) that ignores the order book and the manipulable heuristic belief learning (HBL) that exploits the order book to predict price outcomes.more »We conduct empirical game-theoretic analysis upon simulated agent payoffs across parametrically different environments and measure the effect of spoofing on market performance in approximate strategic equilibria. We demonstrate that HBL traders can benefit price discovery and social welfare, but their existence in equilibrium renders a market vulnerable to manipulation: simple spoofing strategies can effectively mislead traders, distort prices and reduce total surplus. Based on this model, we propose to mitigate spoofing from two aspects: (1) mechanism design to disincentivize manipulation; and (2) trading strategy variations to improve the robustness of learning from market information. We evaluate the proposed approaches, taking into account potential strategic responses of agents, and characterize the conditions under which these approaches may deter manipulation and benefit market welfare. Our model provides a way to quantify the effect of spoofing on trading behavior and market efficiency, and thus it can help to evaluate the effectiveness of various market designs and trading strategies in mitigating an important form of market manipulation.« less
  2. We study learning-based trading strategies in markets where prices can be manipulated through spoofing: the practice of submitting spurious orders to mislead traders who use market information. To reduce the vulnerability of learning traders to such manipulation, we propose two variations based on the standard heuristic belief learning (HBL) trading strategy, which learns transaction probabilities from market activities observed in an order book. The first variation selectively ignores orders at certain price levels, particularly where spoof orders are likely to be placed. The second considers the full order book, but adjusts its limit order price to correct for bias inmore »decisions based on the learned heuristic beliefs. We employ agent-based simulation to evaluate these variations on two criteria: effectiveness in non-manipulated markets and robustness against manipulation. Background traders can adopt (non-learning) zero intelligence strategies or HBL, in its basic form or the two variations. We conduct empirical game-theoretic analysis upon simulated payoffs to derive approximate strategic equilibria, and compare equilibrium outcomes across a variety of trading environments. Results show that agents can strategically make use of the option to block orders to improve robustness against spoofing, while retaining a comparable competitiveness in non-manipulated markets. Our second HBL variation exhibits a general improvement over standard HBL, in markets with and without manipulation. Further explorations suggest that traders can enjoy both improved profitability and robustness by combining the two proposed variations.« less
  3. We study learning-based trading strategies in markets where prices can be manipulated through spoofing: the practice of submitting spurious orders to mislead traders who use market information. To reduce the vulnerability of learning traders to such manipulation, we propose two variations based on the standard heuristic belief learning (HBL) trading strategy, which learns transaction probabilities from market activities observed in an order book. The first variation selectively ignores orders at certain price levels, particularly where spoof orders are likely to be placed. The second considers the full order book, but adjusts its limit order price to correct for bias inmore »decisions based on the learned heuristic beliefs. We employ agent-based simulation to evaluate these variations on two criteria: effectiveness in non-manipulated markets and robustness against manipulation. Background traders can adopt the (non-learning) zero intelligence strategies or HBL, in its basic form or the two variations. We conduct empirical game-theoretic analysis upon simulated payoffs to derive approximate strategic equilibria, and compare equilibrium outcomes across a variety of trading environments. Results show that agents can strategically make use of the option to block orders to improve robustness against spoofing, while retaining a comparable competitiveness in non-manipulated markets. Our second HBL variation exhibits a general improvement over standard HBL, in markets with and without manipulation. Further explorations suggest that traders can enjoy both improved profitability and robustness by combining the two proposed variations.« less
  4. Policy Space Response Oracles (PSRO) is a reinforcement learning (RL) algo- rithm for two-player zero-sum games that has been empirically shown to find approximate Nash equilibria in large games. Although PSRO is guaranteed to converge to an approximate Nash equilibrium and can handle continuous actions, it may take an exponential number of iterations as the number of information states (infostates) grows. We propose Extensive-Form Double Oracle (XDO), an extensive-form double oracle algorithm for two-player zero-sum games that is guar- anteed to converge to an approximate Nash equilibrium linearly in the number of infostates. Unlike PSRO, which mixes best responses atmore »the root of the game, XDO mixes best responses at every infostate. We also introduce Neural XDO (NXDO), where the best response is learned through deep RL. In tabular experiments on Leduc poker, we find that XDO achieves an approximate Nash equilibrium in a number of iterations an order of magnitude smaller than PSRO. Experiments on a modified Leduc poker game and Oshi-Zumo show that tabular XDO achieves a lower exploitability than CFR with the same amount of computation. We also find that NXDO outperforms PSRO and NFSP on a sequential multidimensional continuous-action game. NXDO is the first deep RL method that can find an approximate Nash equilibrium in high-dimensional continuous-action sequential games. Experiment code is available at https://github.com/indylab/nxdo.« less
  5. Given the aging infrastructure and the anticipated growing number of highway work zones in the U.S.A., it is important to investigate work zone merge control, which is critical for improving work zone safety and capacity. This paper proposes and evaluates a novel highway work zone merge control strategy based on cooperative driving behavior enabled by artificial intelligence. The proposed method assumes that all vehicles are fully automated, connected, and cooperative. It inserts two metering zones in the open lane to make space for merging vehicles in the closed lane. In addition, each vehicle in the closed lane learns how tomore »adjust its longitudinal position optimally to find a safe gap in the open lane using an off-policy soft actor critic reinforcement learning (RL) algorithm, considering its surrounding traffic conditions. The learning results are captured in convolutional neural networks and used to control individual vehicles in the testing phase. By adding the metering zones and taking the locations, speeds, and accelerations of surrounding vehicles into account, cooperation among vehicles is implicitly considered. This RL-based model is trained and evaluated using a microscopic traffic simulator. The results show that this cooperative RL-based merge control significantly outperforms popular strategies such as late merge and early merge in terms of both mobility and safety measures. It also performs better than a strategy assuming all vehicles are equipped with cooperative adaptive cruise control.« less