Achieving precise alignment between textual instructions and generated images in text-to-image generation is a significant challenge, particularly in rendering written text within images. Sate-of-the-art models like Stable Diffusion 3 (SD3), Flux, and AuraFlow still struggle with accurate text depiction, resulting in misspelled or inconsistent text. We introduce a training-free method with minimal computational overhead that significantly enhances text rendering quality. Specifically, we introduce an overshooting sampler for pretrained rectified flow (RF) models, by alternating between over-simulating the learned ordinary differential equation (ODE) and reintroducing noise. Compared to the Euler sampler, the overshooting sampler effectively introduces an extra Langevin dynamics term that can help correct the compounding error from successive Euler steps and therefore improve the text rendering. However, when the overshooting strength is high, we observe over-smoothing artifacts on the generated images. To address this issue, we propose an Attention Modulated Overshooting sampler (AMO), which adaptively controls the strength of overshooting for each image patch according to their attention score with the text content. AMO demonstrates a 32.3% and 35.9% improvement in text rendering accuracy on SD3 and Flux without compromising overall image quality or increasing inference cost.
more »
« less
An AI-Resilient Text Rendering Technique for Reading and Skimming Documents
Readers find text difficult to consume for many reasons. Summarization can address some of these difficulties, but introduce others, such as omitting, misrepresenting, or hallucinating information, which can be hard for a reader to notice. One approach to addressing this problem is to instead modify how the original text is rendered to make important information more salient. We introduce Grammar-Preserving Text Saliency Modulation (GP-TSM), a text rendering method with a novel means of identifying what to de-emphasize. Specifically, GP-TSM uses a recursive sentence compression method to identify successive levels of detail beyond the core meaning of a passage, which are de-emphasized by rendering words in successively lighter but still legible gray text. In a lab study (n=18), participants preferred GP-TSM over pre-existing word-level text rendering methods and were able to answer GRE reading comprehension questions more efficiently.
more »
« less
- Award ID(s):
- 2107391
- PAR ID:
- 10542290
- Publisher / Repository:
- ACM
- Date Published:
- ISBN:
- 9798400703300
- Page Range / eLocation ID:
- 1 to 22
- Format(s):
- Medium: X
- Location:
- Honolulu HI USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We introduce and experimentally demonstrate the utility of tag-based genetic regulation, a new genetic programming (GP) technique that allows programs to dynamically adjust which code modules to express.Tags are evolvable labels that provide a flexible mechanism for referencing code modules. Tag-based genetic regulation extends existing tag-based naming schemes to allow programs to “promote” and “repress” code modules in order to alter expression patterns. This extension allows evolution to structure a program as a gene regulatory network where modules are regulated based on instruction executions. We demonstrate the functionality of tag-based regulation on a range of program synthesis problems. We find that tag-based regulation improves problem-solving performance on context-dependent problems; that is, problems where programs must adjust how they respond to current inputs based on prior inputs. Indeed, the system could not evolve solutions to some context-dependent problems until regulation was added. Our implementation of tag-based genetic regulation is not universally beneficial, however. We identify scenarios where the correct response to a particular input never changes, rendering tag-based regulation an unneeded functionality that can sometimes impede adaptive evolution. Tag-based genetic regulation broadens our repertoire of techniques for evolving more dynamic genetic programs and can easily be incorporated into existing tag-enabled GP systems.more » « less
-
An ensemble data-learning approach based on proper orthogonal decomposition (POD) and Galerkin projection (EnPOD-GP) is proposed for thermal simulations of multi-core CPUs to improve training efficiency and the model accuracy for a previously developed global POD-GP method (GPOD-GP). GPOD-GP generates one set of basis functions (or POD modes) to account for thermal behavior in response to variations in dynamic power maps (PMs) in the entire chip, which is computationally intensive to cover possible variations of all power sources. EnPOD-GP however acquires multiple sets of POD modes to significantly improve training efficiency and effectiveness, and its simulation accuracy is independent of any dynamic PM. Compared to finite element simulation, both GPOD-GP and EnPOD-GP offer a computational speedup over 3 orders of magnitude. For a processor with a small number of cores, GPOD-GP provides a more efficient approach. When high accuracy is desired and/or a processor with more cores is involved, EnPOD-GP is more preferable in terms of training effort and simulation accuracy and efficiency. Additionally, the error resulting from EnPOD-GP can be precisely predicted for any random spatiotemporal power excitation.more » « less
-
When rheological models of polymer blends are used for inverse modeling, they can characterize polymer mixtures from rheological observations. This requires repeated evaluation of potentially expensive rheological models. We explored surrogate models based on Gaussian processes (GP-SM) as a cheaper alternative for describing the rheology of polydisperse binary blends. We used the time-dependent diffusion double reptation (TDD-DR) model as the true model; it takes a 5-dimensional input vector specifying the binary blend as input and yields a function called the relaxation spectrum as output. We used the TDD-DR model to generate training data of different sizes [Formula: see text], via Latin hypercube sampling. The optimal values of the GP-SM hyper-parameters, assuming a separable covariance kernel, were obtained by maximum likelihood estimation. The GP-SM interpolates the training data by design and offers reasonable predictions of relaxation spectra with uncertainty estimates. In general, the accuracy of GP-SMs improves as the size of the training data [Formula: see text] increases, as does the cost for training and prediction. The optimal hyper-parameters were found to be relatively insensitive to [Formula: see text]. Finally, we considered the inverse problem of inferring the structure of the polymer blend from a synthetic dataset generated using the true model. Surprisingly, the solution to the inverse problem obtained using GP-SMs and TDD-DR was qualitatively similar. GP-SMs can be several orders of magnitude cheaper than expensive rheological models, which provides a proof-of-concept validation for using GP-SMs for inverse problems in polymer rheology.more » « less
-
Physics-based differentiable rendering is becoming increasingly crucial for tasks in inverse rendering and machine learning pipelines. To address discontinuities caused by geometric boundaries and occlusion, two classes of methods have been proposed: 1) the edge-sampling methods that directly sample light paths at the scene discontinuity boundaries, which require nontrivial data structures and precomputation to select the edges, and 2) the reparameterization methods that avoid discontinuity sampling but are currently limited to hemispherical integrals and unidirectional path tracing. We introduce a new mathematical formulation that enjoys the benefits of both classes of methods. Unlike previous reparameterization work that focused on hemispherical integral, we derive the reparameterization in the path space. As a result, to estimate derivatives using our formulation, we can apply advanced Monte Carlo rendering methods, such as bidirectional path tracing, while avoiding explicit sampling of discontinuity boundaries. We show differentiable rendering and inverse rendering results to demonstrate the effectiveness of our method.more » « less
An official website of the United States government

