ProtNLM: Protein Natural Language Model
The Protein Natural Language Model (ProteinNLM) is a deep learning model developed by researchers at the University of Washington and the Allen Institute for AI. It is a natural language processing model that can read and understand scientific papers about protein research and generate summaries of their content.
ProteinNLM is trained on a large corpus of scientific papers and uses a transformer architecture similar to the GPT-3 language model. The model can understand the language used in protein research papers, identify key concepts and relationships between them, and generate summaries in natural language.
ProteinNLM has potential applications in the field of drug discovery and protein research by enabling researchers to quickly summarize the findings of relevant scientific papers. The model is freely available for use and can be accessed through an online platform called COVID-19 Open Research Dataset (CORD-19), which provides a collection of scientific articles related to COVID-19 research.
Suggested readings:
Resources: Read the ProtNLM preprint → https://storage.googleapis.com/brain-… Read the ProtNLM help page on the UniProt website: https://www.uniprot.org/help/ProtNLM
See ProtNLM predictions on the UniProt website → https://www.uniprot.org/uniprotkb?que…
Explore the Colab notebook that provides evidence for ProtNLM predictions → https://colab.research.google.com/git…
Explore the curation example shown in the video, starting with the UniProt entry containing the ProtNLM prediction. → https://www.uniprot.org/uniprotkb/D2G…