skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Smith, Brett"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. A comprehensive computational study on the underlying reactivity of iron tetra-NHC complexes for C2+ N1aziridination catalysis is presented. 
    more » « less
  2. A chiral tetra-NHC iron(ii) complex and its disparate reactivity with multiple organic azides is reported. 
    more » « less
  3. Abstract MotivationLarge language models (LLMs) are being adopted at an unprecedented rate, yet still face challenges in knowledge-intensive domains such as biomedicine. Solutions such as pretraining and domain-specific fine-tuning add substantial computational overhead, requiring further domain-expertise. Here, we introduce a token-optimized and robust Knowledge Graph-based Retrieval Augmented Generation (KG-RAG) framework by leveraging a massive biomedical KG (SPOKE) with LLMs such as Llama-2-13b, GPT-3.5-Turbo, and GPT-4, to generate meaningful biomedical text rooted in established knowledge. ResultsCompared to the existing RAG technique for Knowledge Graphs, the proposed method utilizes minimal graph schema for context extraction and uses embedding methods for context pruning. This optimization in context extraction results in more than 50% reduction in token consumption without compromising the accuracy, making a cost-effective and robust RAG implementation on proprietary LLMs. KG-RAG consistently enhanced the performance of LLMs across diverse biomedical prompts by generating responses rooted in established knowledge, accompanied by accurate provenance and statistical evidence (if available) to substantiate the claims. Further benchmarking on human curated datasets, such as biomedical true/false and multiple-choice questions (MCQ), showed a remarkable 71% boost in the performance of the Llama-2 model on the challenging MCQ dataset, demonstrating the framework’s capacity to empower open-source models with fewer parameters for domain-specific questions. Furthermore, KG-RAG enhanced the performance of proprietary GPT models, such as GPT-3.5 and GPT-4. In summary, the proposed framework combines explicit and implicit knowledge of KG and LLM in a token optimized fashion, thus enhancing the adaptability of general-purpose LLMs to tackle domain-specific questions in a cost-effective fashion. Availability and implementationSPOKE KG can be accessed at https://spoke.rbvi.ucsf.edu/neighborhood.html. It can also be accessed using REST-API (https://spoke.rbvi.ucsf.edu/swagger/). KG-RAG code is made available at https://github.com/BaranziniLab/KG_RAG. Biomedical benchmark datasets used in this study are made available to the research community in the same GitHub repository. 
    more » « less
  4. Abstract Extreme precision radial velocity (EPRV) measurements contend with internal noise (instrumental systematics) and external noise (intrinsic stellar variability) on the road to 10 cm s−1“exo-Earth” sensitivity. Both of these noise sources are well-probed using “Sun-as-a-star” RVs and cross-instrument comparisons. We built the Solar Calibrator (SoCal), an autonomous system that feeds stable, disk-integrated sunlight to the recently commissioned Keck Planet Finder (KPF) at the W. M. Keck Observatory. With SoCal, KPF acquires signal-to-noise ratio (S/N) ∼ 1200,R= 98,000 optical (445–870 nm) spectra of the Sun in 5 s exposures at unprecedented cadence for an EPRV facility using KPF’s fast readout mode (<16 s between exposures). Daily autonomous operation is achieved by defining an operations loop using state machine logic. Data affected by clouds are automatically flagged using a reliable quality control metric derived from simultaneous irradiance measurements. Comparing solar data across the growing global network of EPRV spectrographs with solar feeds will allow EPRV teams to disentangle internal and external noise sources and benchmark spectrograph performance. To facilitate this, all SoCal data products are immediately available to the public on the Keck Observatory Archive. We compared SoCal RVs to contemporaneous RVs from NEID, the only other immediately public EPRV solar data set. We find agreement at the 30–40 cm s−1level on timescales of several hours, which is comparable to the combined photon-limited precision. Data from SoCal were also used to assess a detector problem and wavelength calibration inaccuracies associated with KPF during early operations. Long-term SoCal operations will collect upwards of 1000 solar spectra per six-hour day using KPF’s fast readout mode, enabling stellar activity studies at high S/N on our nearest solar-type star. 
    more » « less
  5. We have explored the ligand topology of high-valent Fe(iv)–oxo complexes for screening a large molecular database with machine learning. 
    more » « less
  6. null (Ed.)
  7. Vernet, Joël R; Bryant, Julia J; Motohara, Kentaro (Ed.)
    The Keck Planet Finder (KPF) is a fiber-fed, high-resolution, echelle spectrometer that specializes in the discovery and characterization of exoplanets using Doppler spectroscopy. In designing KPF, the guiding principles were high throughput to promote survey speed and access to faint targets, and high stability to keep uncalibrated systematic Doppler measurement errors below 30 cm s−1. KPF achieves optical illumination stability with a tip-tilt injection system, octagonal cross-section optical fibers, a double scrambler, and active fiber agitation. The optical bench and optics with integral mounts are made of Zerodur to provide thermo-mechanical stability. The spectrometer includes a slicer to reformat the optical input, green and red channels (445-600 nm and 600-870 nm), and achieves a resolving power of ∼97,000. Additional subsystems include a separate, medium-resolution UV spectrometer (383-402 nm) to record the Ca II H & K lines, an exposure meter for real-time flux monitoring, a solar feed for sunlight injection, and a calibration system with a laser frequency comb and etalon for wavelength calibration. KPF was installed and commissioned at the W. M. Keck Observatory in late 2022 and early 2023 and is now in regular use for scientific observations. This paper presents an overview of the as-built KPF instrument and its subsystems, design considerations, and initial on-sky performance. 
    more » « less
  8. Abstract Knowledge representation and reasoning (KR&R) has been successfully implemented in many fields to enable computers to solve complex problems with AI methods. However, its application to biomedicine has been lagging in part due to the daunting complexity of molecular and cellular pathways that govern human physiology and pathology. In this article, we describe concrete uses of Scalable PrecisiOn Medicine Knowledge Engine (SPOKE), an open knowledge network that connects curated information from thirty‐seven specialized and human‐curated databases into a single property graph, with 3 million nodes and 15 million edges to date. Applications discussed in this article include drug discovery, COVID‐19 research and chronic disease diagnosis, and management. 
    more » « less