The 2023 update to the Artificial Intelligence Patent Dataset (AIPD) extends the original AIPD to all United States Patent and Trademark Office (USPTO) patent documents (i.e., patents and pre-grant publications, or PGPubs) published through 2023, while incorporating an improved patent landscaping methodology to identify AI within patents and PGPubs. This new approach substitutes BERT for Patents for the Word2Vec embeddings used previously, and uses active learning to incorporate additional training data closer to the “decision boundary” between AI and not-AI to help improve predictions. We show that this new approach achieves substantially better performance than the original methodology on a set of patent documents where the two methods disagreed—on this set, the AIPD 2023 achieved precision of 68.18 percent and recall of 78.95 percent, while the original AIPD achieved 50 percent and 21.05 percent, respectively. To help researchers, practitioners, and policy-makers better understand the determinants and impacts of AI invention, we have made the AIPD 2023 publicly available on the USPTO’s economic research web page.
more »
« less
This content will become publicly available on October 4, 2026
Automated neural patent landscaping in the small data regime using citations and CPC codes
Patent landscaping is the process of identifying all patents related to a particular technological area, and is important for assessing various aspects of the intellectual property context. Traditionally, constructing patent landscapes is intensely laborious and expensive, and the rapid expansion of patenting activity in recent decades has driven an increasing need for efficient and effective automated patent landscaping approaches. In particular, it is critical that we be able to construct patent landscapes using a minimal number of labeled examples, as labeling patents for a narrow technology area requires highly specialized (and hence expensive) technical knowledge. We present an automated neural patent landscaping system that demonstrates significantly improved performance on difficult examples (0.69 on ‘hard’ examples, versus 0.6 for previously reported systems), and also significant improvements with much less training data (overall 0.75 on as few as 24 examples). Furthermore, in evaluating such automated landscaping systems, acquiring good data is challenge; we demonstrate a higher-quality training data generation procedure by merging (Abood and Feltenberger Artif Intell Law 26:103–125 2018) “seed/anti-seed” approach with active learning to collect difficult labeled examples near the decision boundary. Using this procedure we created a new dataset of labeled AI patents for training and testing. As in prior work we compare our approach with a number of baseline systems, and we release our code and data for others to build upon “(Code and data may be downloaded from https://doi.org/10.34703/gzx1-9v95/QDLKVWCode and data are released under the Creative Commons NC-BY 4.0 license at https://creativecommons.org/licenses/by-nc/4.0/)”.
more »
« less
- Award ID(s):
- 1749917
- PAR ID:
- 10644695
- Publisher / Repository:
- Springer
- Date Published:
- Journal Name:
- Artificial Intelligence and Law
- ISSN:
- 0924-8463
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
One means to support for design-by-analogy (DbA) in practice involves giving designers efficient access to source analogies as inspiration to solve problems. The patent database has been used for many DbA support efforts, as it is a preexisting repository of catalogued technology. Latent Semantic Analysis (LSA) has been shown to be an effective computational text processing method for extracting meaningful similarities between patents for useful functional exploration during DbA. However, this has only been shown to be useful at a small-scale (100 patents). Considering the vastness of the patent database and realistic exploration at a large scale, it is important to consider how these computational analyses change with orders of magnitude more data. We present analysis of 1,000 random mechanical patents, comparing the ability of LSA to Latent Dirichlet Allocation (LDA) to categorize patents into meaningful groups. Resulting implications for large(r) scale data mining of patents for DbA support are detailed.more » « less
-
The United States (US) and the People’s Republic of China (China) have the most patents in nanotechnology in their own depositories and overall in the international depositories. This paper compares nanotechnology landscapes between 2001 and 2017 as reflected in the United States Patent and Trademark Office (USPTO) and China National Intellectual Property Administration (CNIPA). It presents the evolution of nanotechnology patent development in the US and China, the differences between nanotechnology topics addressed in the USPTO and CNIPA patents, key players in nanotechnology fields in both domestic and foreign markets, and the player collaboration patterns. Bibliographic, content, and social network analyses are used. The longitudinal changes of granted patents and ranked countries, patent families, technology fields, and key players in domestic and overseas markets are outlined. Collaboration networks of assignees and the influential players have been identified based on network parameters. Results show that the US market attracts more international collaborations and has a higher level of knowledge exchange and resource sharing than the Chinese market. Companies play a vital role with regard to US nanotechnology development, resulting in more within-industry collaborations. In contrast, universities and research institutes are the dominant contributors to China’s nanotechnology development, leading to more academia-industry collaborations in China’s market.more » « less
-
How important is access to patent documents for subsequent innovation? We examine the expansion of the USPTO Patent Library system after 1975. Patent libraries provided access to patents before the Internet. We find that after patent library opening, local patenting increases by 8–20 percent relative to similar regions. Additional analyses suggest that disclosure of technical information drives this effect: inventors increasingly take up ideas from outside their region, and the effect is strongest in technologies where patents are more informative. We thus provide evidence that disclosure plays an important role in cumulative innovation. (JEL D83, K11, O31, O34, R11)more » « less
-
Patents are key strategic resources which enable firms to appropriate innovation returns and prevent rival imitation. Patent examiners – individuals who may be subject to various sources of bias – play a central role in determining which inventions are awarded patent rights. Using a novel dataset, we explore if one increasingly prevalent source of bias – political ideology – manifests in examiner decision-making. Reassuringly, our analysis suggests that the political ideology of patent examiners is largely unrelated to patent office outcomes. However, we do find evidence suggesting politically active conservative-leaning examiners are more likely to grant patents relative to politically active liberal-leaning examiners, but only for patent applications where there is ambiguity regarding what constitutes patentable subject matter and hence examiners have greater discretion.more » « less
An official website of the United States government
