Social media cyberbullying has a detrimental effect on human life. As online social networking grows daily, the amount of hate speech also increases. Such terrible content can cause depression and actions related to suicide. This paper proposes a trustable LSTM Autoencoder Network for cyberbullying detection on social media using synthetic data. We have demonstrated a cutting-edge method to address data availability difficulties by producing machine-translated data. However, several languages such as Hindi and Bangla still lack adequate investigations due to a lack of datasets. We carried out experimental identification of aggressive comments on Hindi, Bangla, and English datasets using the proposed model and traditional models, including Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory (BiLSTM), LSTM-Autoencoder, Word2vec, Bidirectional Encoder Representations from Transformers (BERT), and Generative Pre-trained Transformer 2 (GPT-2) models. We employed evaluation metrics such as f1-score, accuracy, precision, and recall to assess the models’ performance. Our proposed model outperformed all the models on all datasets, achieving the highest accuracy of 95%. Our model achieves state-of-the-art results among all the previous works on the dataset we used in this paper.
more »
« less
Probing the limit of hydrologic predictability with the Transformer network
For a number of years since their introduction to hydrology, recurrent neural networks like long short-term memory (LSTM) networks have proven remarkably difficult to surpass in terms of daily hydrograph metrics on community-shared benchmarks. Outside of hydrology, Transformers have now become the model of choice for sequential prediction tasks, making it a curious architecture to investigate for application to hydrology. Here, we first show that a vanilla (basic) Transformer architecture is not competitive against LSTM on the widely benchmarked CAMELS streamflow dataset, and lagged especially prominently for the high-flow metrics, perhaps due to the lack of memory mechanisms. However, a recurrence-free variant of the Transformer model can obtain mixed comparisons with LSTM, producing very slightly higher Kling-Gupta efficiency coefficients (KGE), along with other metrics. The lack of advantages for the vanilla Transformer network is linked to the nature of hydrologic processes. Additionally, similar to LSTM, the Transformer can also merge multiple meteorological forcing datasets to improve model performance. Therefore, the modified Transformer represents a rare competitive architecture to LSTM in rigorous benchmarks. Valuable lessons were learned: (1) the basic Transformer architecture is not suitable for hydrologic modeling; (2) the recurrence-free modification is beneficial so future work should continue to test such modifications; and (3) the performance of state-of-the-art models may be close to the prediction limits of the dataset. As a non-recurrent model, the Transformer may bear scale advantages for learning from bigger datasets and storing knowledge. This work lays the groundwork for future explorations into pretraining models, serving as a foundational benchmark that underscores the potential benefits in hydrology.
more »
« less
- Award ID(s):
- 2221880
- PAR ID:
- 10541025
- Publisher / Repository:
- Elsevier (ScienceDirect)
- Date Published:
- Journal Name:
- Journal of Hydrology
- Volume:
- 637
- ISSN:
- 0022-1694
- Page Range / eLocation ID:
- 131389
- Subject(s) / Keyword(s):
- Transformer Long short-term memory Streamflow CAMELS Deep learning
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Model compression is significant for the wide adoption of Recurrent Neural Networks (RNNs) in both user devices possessing limited resources and business clusters requiring quick responses to large-scale service requests. This work aims to learn structurally-sparse Long Short-Term Memory (LSTM) by reducing the sizes of basic structures within LSTM units, including input updates, gates, hidden states, cell states and outputs. Independently reducing the sizes of basic structures can result in inconsistent dimensions among them, and consequently, end up with invalid LSTM units. To overcome the problem, we propose Intrinsic Sparse Structures (ISS) in LSTMs. Removing a component of ISS will simultaneously decrease the sizes of all basic structures by one and thereby always maintain the dimension consistency. By learning ISS within LSTM units, the obtained LSTMs remain regular while having much smaller basic structures. Based on group Lasso regularization, our method achieves 10:59 speedup without losing any perplexity of a language modeling of Penn TreeBank dataset. It is also successfully evaluated through a compact model with only 2:69M weights for machine Question Answering of SQuAD dataset. Our approach is successfully extended to non-LSTM RNNs, like Recurrent Highway Networks (RHNs). Our source code is available.more » « less
-
Abstract. As a genre of physics-informed machine learning, differentiable process-based hydrologic models (abbreviated as δ or delta models) with regionalized deep-network-based parameterization pipelines were recently shown to provide daily streamflow prediction performance closely approaching that of state-of-the-art long short-term memory (LSTM) deep networks. Meanwhile, δ models provide a full suite of diagnostic physical variables and guaranteed mass conservation. Here, we ran experiments to test (1) their ability to extrapolate to regions far from streamflow gauges and (2) their ability to make credible predictions of long-term (decadal-scale) change trends. We evaluated the models based on daily hydrograph metrics (Nash–Sutcliffe model efficiency coefficient, etc.) and predicted decadal streamflow trends. For prediction in ungauged basins (PUB; randomly sampled ungauged basins representing spatial interpolation), δ models either approached or surpassed the performance of LSTM in daily hydrograph metrics, depending on the meteorological forcing data used. They presented a comparable trend performance to LSTM for annual mean flow and high flow but worse trends for low flow. For prediction in ungauged regions (PUR; regional holdout test representing spatial extrapolation in a highly data-sparse scenario), δ models surpassed LSTM in daily hydrograph metrics, and their advantages in mean and high flow trends became prominent. In addition, an untrained variable, evapotranspiration, retained good seasonality even for extrapolated cases. The δ models' deep-network-based parameterization pipeline produced parameter fields that maintain remarkably stable spatial patterns even in highly data-scarce scenarios, which explains their robustness. Combined with their interpretability and ability to assimilate multi-source observations, the δ models are strong candidates for regional and global-scale hydrologic simulations and climate change impact assessment.more » « less
-
null (Ed.)Basin-centric long short-term memory (LSTM) network models have recently been shown to be an exceptionally powerful tool for stream temperature (Ts) temporal prediction (training in one period and making predictions for another period at the same sites). However, spatial extrapolation is a well-known challenge to modeling Ts and it is uncertain how an LSTM-based daily Ts model will perform in unmonitored or dammed basins. Here we compiled a new benchmark dataset consisting of >400 basins across the contiguous United States in different data availability groups (DAG, meaning the daily sampling frequency) with or without major dams and studied how to assemble suitable training datasets for predictions in basins with or without temperature monitoring. For prediction in unmonitored basins (PUB), LSTM produced an RMSE of 1.129 °C and R2 of 0.983. While these metrics declined from LSTM's temporal prediction performance, they far surpassed traditional models' PUB values, and were competitive with traditional models' temporal prediction on calibrated sites. Even for unmonitored basins with major reservoirs, we obtained a median RMSE of 1.202°C and an R2 of 0.984. For temporal prediction, the most suitable training set was the matching DAG that the basin could be grouped into, e.g., the 60% DAG for a basin with 61% data availability. However, for PUB, a training dataset including all basins with data is consistently preferred. An input-selection ensemble moderately mitigated attribute overfitting. Our results indicate there are influential latent processes not sufficiently described by the inputs (e.g., geology, wetland covers), but temporal fluctuations are well predictable, and LSTM appears to be a highly accurate Ts modeling tool even for spatial extrapolation.more » « less
-
Neural population activity is theorized to reflect an underlying dynamical structure. This structure can be accurately captured using state space models with explicit dynamics, such as those based on recurrent neural networks (RNNs). However, using recurrence to explicitly model dynamics necessitates sequential processing of data, slowing real-time applications such as brain-computer interfaces. Here we introduce the Neural Data Transformer (NDT), a non-recurrent alternative. We test the NDT’s ability to capture autonomous dynamical systems by applying it to synthetic datasets with known dynamics and data from monkey motor cortex during a reaching task well-modeled by RNNs. The NDT models these datasets as well as state-of-the-art recurrent models. Further, its non-recurrence enables 3.9ms inference, well within the loop time of real-time applications and more than 6 times faster than recurrent baselines on the monkey reaching dataset. These results suggest that an explicit dynamics model is not necessary to model autonomous neural population dynamics. Code: https://github.com/snel-repo/neural-data-transformersmore » « less
An official website of the United States government

