We present an agent-based model of manipulating prices in financial markets through spoofing: submitting spurious orders to mislead traders who learn from the order book. Our model captures a complex market environment for a single security, whose common value is given by a dynamic fundamental time series. Agents trade through a limit-order book, based on their private values and noisy observations of the fundamental. We consider background agents following two types of trading strategies: the non-spoofable zero intelligence (ZI) that ignores the order book and the manipulable heuristic belief learning (HBL) that exploits the order book to predict price outcomes. We conduct empirical game-theoretic analysis upon simulated agent payoffs across parametrically different environments and measure the effect of spoofing on market performance in approximate strategic equilibria. We demonstrate that HBL traders can benefit price discovery and social welfare, but their existence in equilibrium renders a market vulnerable to manipulation: simple spoofing strategies can effectively mislead traders, distort prices and reduce total surplus. Based on this model, we propose to mitigate spoofing from two aspects: (1) mechanism design to disincentivize manipulation; and (2) trading strategy variations to improve the robustness of learning from market information. We evaluate the proposed approaches, taking into account potential strategic responses of agents, and characterize the conditions under which these approaches may deter manipulation and benefit market welfare. Our model provides a way to quantify the effect of spoofing on trading behavior and market efficiency, and thus it can help to evaluate the effectiveness of various market designs and trading strategies in mitigating an important form of market manipulation. 
                        more » 
                        « less   
                    
                            
                            Evaluating the Stability of Non-Adaptive Trading in Continuous Double Auctions
                        
                    
    
            The continuous double auction (CDA) is the predominant mechanism in modern securities markets. Many agent-based analyses of CDA environments rely on simple non-adaptive trading strategies like Zero Intelligence (ZI), which (as their name suggests) are quite limited. We examine the viability of this reliance through empirical game-theoretic analysis in a plausible market environment. Specifically, we evaluate the strategic stability of equilibria defined over a small set of ZI traders with respect to strategies found by reinforcement learning (RL) applied over a much larger policy space. RL can indeed find beneficial deviations from equilibria of ZI traders, by conditioning on signals of the likelihood a trade will execute or the favorability of the current bid and ask. Nevertheless, the surplus earned by well-calibrated ZI policies is empirically observed to be nearly as great as what the adaptive strategies can earn, despite their much more expressive policy space. Our findings generally support the use of equilibrated ZI traders in CDA studies. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1741190
- PAR ID:
- 10105519
- Date Published:
- Journal Name:
- 17th International Conference on Autonomous Agents and MultiAgent Systems
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            To overcome the sim-to-real gap in reinforcement learning (RL), learned policies must maintain robustness against environmental uncertainties. While robust RL has been widely studied in single-agent regimes, in multi-agent environments, the problem remains understudied-- despite the fact that the problems posed by environmental uncertainties are often exacerbated by strategic interactions. This work focuses on learning in distributionally robust Markov games (RMGs), a robust variant of standard Markov games, wherein each agent aims to learn a policy that maximizes its own worst-case performance when the deployed environment deviates within its own prescribed uncertainty set. This results in a set of robust equilibrium strategies for all agents that align with classic notions of game-theoretic equilibria. Assuming a non-adaptive sampling mechanism from a generative model, we propose a sample-efficient model-based algorithm (DRNVI) with finite-sample complexity guarantees for learning robust variants of various notions of game-theoretic equilibria. We also establish an information-theoretic lower bound for solving RMGs, which confirms the near-optimal sample complexity of DR-NVI with respect to problem-dependent factors such as the size of the state space, the target accuracy, and the horizon length.more » « less
- 
            We study learning-based trading strategies in markets where prices can be manipulated through spoofing: the practice of submitting spurious orders to mislead traders who use market information. To reduce the vulnerability of learning traders to such manipulation, we propose two variations based on the standard heuristic belief learning (HBL) trading strategy, which learns transaction probabilities from market activities observed in an order book. The first variation selectively ignores orders at certain price levels, particularly where spoof orders are likely to be placed. The second considers the full order book, but adjusts its limit order price to correct for bias in decisions based on the learned heuristic beliefs. We employ agent-based simulation to evaluate these variations on two criteria: effectiveness in non-manipulated markets and robustness against manipulation. Background traders can adopt the (non-learning) zero intelligence strategies or HBL, in its basic form or the two variations. We conduct empirical game-theoretic analysis upon simulated payoffs to derive approximate strategic equilibria, and compare equilibrium outcomes across a variety of trading environments. Results show that agents can strategically make use of the option to block orders to improve robustness against spoofing, while retaining a comparable competitiveness in non-manipulated markets. Our second HBL variation exhibits a general improvement over standard HBL, in markets with and without manipulation. Further explorations suggest that traders can enjoy both improved profitability and robustness by combining the two proposed variations.more » « less
- 
            We study learning-based trading strategies in markets where prices can be manipulated through spoofing: the practice of submitting spurious orders to mislead traders who use market information. To reduce the vulnerability of learning traders to such manipulation, we propose two variations based on the standard heuristic belief learning (HBL) trading strategy, which learns transaction probabilities from market activities observed in an order book. The first variation selectively ignores orders at certain price levels, particularly where spoof orders are likely to be placed. The second considers the full order book, but adjusts its limit order price to correct for bias in decisions based on the learned heuristic beliefs. We employ agent-based simulation to evaluate these variations on two criteria: effectiveness in non-manipulated markets and robustness against manipulation. Background traders can adopt (non-learning) zero intelligence strategies or HBL, in its basic form or the two variations. We conduct empirical game-theoretic analysis upon simulated payoffs to derive approximate strategic equilibria, and compare equilibrium outcomes across a variety of trading environments. Results show that agents can strategically make use of the option to block orders to improve robustness against spoofing, while retaining a comparable competitiveness in non-manipulated markets. Our second HBL variation exhibits a general improvement over standard HBL, in markets with and without manipulation. Further explorations suggest that traders can enjoy both improved profitability and robustness by combining the two proposed variations.more » « less
- 
            The effectiveness of Intelligent Tutoring Systems (ITSs) often depends upon their pedagogical strategies, the policies used to decide what action to take next in the face of alternatives. We induce policies based on two general Reinforcement Learning (RL) frameworks: POMDP &. MDP, given the limited feature space. We conduct an empirical study where the RL-induced policies are compared against a random yet reasonable policy. Results show that when the contents are controlled to be equal, the MDP-based policy can improve students’ learning significantly more than the random baseline while the POMDP-based policy cannot outperform the later. The possible reason is that the features selected for the MDP framework may not be the optimal feature space for POMDP.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    