Publications

Publishing our work allows us to share ideas and work collaboratively to advance the field of computer science.

People looking at a screen

  • Algorithms and Optimization 323
  • Applied science 186
  • Climate and Sustainability 10
  • Cloud AI 46
  • Language 235
  • Perception 291

Research Area

  • Algorithms and Theory 1317
  • Data Management 166
  • Data Mining and Modeling 353
  • Distributed Systems and Parallel Computing 340
  • Economics and Electronic Commerce 340
  • Education Innovation 68
  • General Science 329
  • Hardware and Architecture 145
  • Health & Bioscience 360
  • Human-Computer Interaction and Visualization 803
  • Information Retrieval and the Web 414
  • Machine Intelligence 3782
  • Machine Perception 1449
  • Machine Translation 145
  • Mobile Systems 107
  • Natural Language Processing 1068
  • Networking 312
  • Quantum Computing 124
  • Responsible AI 221
  • Robotics 198
  • Security, Privacy and Abuse Prevention 491
  • Software Engineering 200
  • Software Systems 448
  • Speech Processing 542
  • Title, desc

We believe open collaboration is essential for progress

We're proud to work with academic and research institutions to push the boundaries of AI and computer science. Learn more about our student and faculty programs, as well as our global outreach initiatives.

Outreach

IEEE - Advancing Technology for Humanity

is Mainsite

IEEE - Advancing Technology for Humanity

  • Search all IEEE websites
  • Mission and vision
  • IEEE at a glance
  • IEEE Strategic Plan
  • Organization of IEEE
  • Diversity, Equity, & Inclusion
  • Organizational Ethics
  • Annual Report
  • History of IEEE
  • Volunteer resources
  • IEEE Corporate Awards Program
  • Financials and Statistics
  • IEEE Future Directions
  • IEEE for Industry (Corporations, Government, Individuals)

IEEE Climate Change

  • Select an option
  • Get the latest news
  • Access volunteer resources (Code of Ethics, financial forms, tools and templates, and more)
  • Find IEEE locations
  • Get help from the IEEE Support Center
  • Recover your IEEE Account username and password
  • Learn about the IEEE Awards program and submit nomination
  • View IEEE's organizational structure and leadership
  • Apply for jobs at IEEE
  • See the history of IEEE
  • Learn more about Diversity, Equity & Inclusion at IEEE
  • Join an IEEE Society
  • Renew your membership
  • Member benefits
  • IEEE Contact Center
  • Connect locally
  • Memberships and Subscriptions Catalog
  • Member insurance and discounts
  • Member Grade Elevation
  • Get your company engaged
  • Access your Account
  • Learn about membership dues
  • Learn about Women in Engineering (WIE)
  • Access IEEE member email
  • Find information on IEEE Fellows
  • Access the IEEE member directory
  • Learn about the Member-Get-a-Member program
  • Learn about IEEE Potentials magazine
  • Learn about Student membership
  • Affinity groups
  • IEEE Societies
  • Technical Councils
  • Technical Communities
  • Geographic Activities
  • Working groups
  • IEEE Regions
  • IEEE Collabratec®
  • IEEE Resource Centers

IEEE DataPort

  • See the IEEE Regions
  • View the MGA Operations Manual
  • Find information on IEEE Technical Activities
  • Get IEEE Chapter resources
  • Find IEEE Sections, Chapters, Student Branches, and other communities
  • Learn how to create an IEEE Student Chapter
  • Upcoming conferences
  • IEEE Meetings, Conferences & Events (MCE)
  • IEEE Conference Application
  • IEEE Conference Organizer Education Program
  • See benefits of authoring a conference paper
  • Search for 2025 conferences
  • Search for 2024 conferences
  • Find conference organizer resources
  • Register a conference
  • Publish conference papers
  • Manage conference finances
  • Learn about IEEE Meetings, Conferences & Events (MCE)
  • Visit the IEEE SA site
  • Become a member of the IEEE SA
  • Find information on the IEEE Registration Authority
  • Obtain a MAC, OUI, or Ethernet address
  • Access the IEEE 802.11™ WLAN standard
  • Purchase standards
  • Get free select IEEE standards
  • Purchase standards subscriptions on IEEE Xplore®
  • Get involved with standards development
  • Find a working group
  • Find information on IEEE 802.11™
  • Access the National Electrical Safety Code® (NESC®)
  • Find MAC, OUI, and Ethernet addresses from Registration Authority (regauth)
  • Get free IEEE standards
  • Learn more about the IEEE Standards Association
  • View Software and Systems Engineering Standards
  • IEEE Xplore® Digital Library
  • Subscription options
  • IEEE Spectrum
  • The Institute

Proceedings of the IEEE

  • IEEE Access®
  • Author resources
  • Get an IEEE Xplore Digital Library trial for IEEE members
  • Review impact factors of IEEE journals
  • Request access to the IEEE Thesaurus and Taxonomy
  • Access the IEEE copyright form
  • Find article templates in Word and LaTeX formats
  • Get author education resources
  • Visit the IEEE Xplore digital library
  • Find Author Digital Tools for IEEE paper submission
  • Review the IEEE plagiarism policy
  • Get information about all stages of publishing with IEEE
  • IEEE Learning Network (ILN)
  • IEEE Credentialing Program
  • Pre-university
  • IEEE-Eta Kappa Nu
  • Accreditation
  • IEEE Continu►ED eLearning Special Offer for Africa
  • Access continuing education courses on the IEEE Learning Network
  • Find STEM education resources on TryEngineering.org
  • Learn about the TryEngineering Summer Institute for high school students
  • Explore university education program resources
  • Access pre-university STEM education resources
  • Learn about IEEE certificates and how to offer them
  • Find information about the IEEE-Eta Kappa Nu honor society
  • Learn about resources for final-year engineering projects
  • Access career resources

Publications

Ieee provides a wide range of quality publications that make the exchange of technical knowledge and information possible among technology professionals..

Expand All | Collapse All

  • > Get an IEEE Xplore Digital Library trial for IEEE members
  • > Review impact factors of IEEE journals
  • > Access the IEEE thesaurus and taxonomy
  • > Find article templates in Word and LaTeX formats
  • > Get author education resources
  • > Visit the IEEE Xplore Digital Library
  • > Learn more about IEEE author tools
  • > Review the IEEE plagiarism policy
  • > Get information about all stages of publishing with IEEE

A finger touches a digital circle with various technology symbols at the other end.

Why choose IEEE publications?

IEEE publishes the leading journals, transactions, letters, and magazines in electrical engineering, computing, biotechnology, telecommunications, power and energy, and dozens of other technologies.

In addition, IEEE publishes more than 1,800 leading-edge conference proceedings every year, which are recognized by academia and industry worldwide as the most vital collection of consolidated published papers in electrical engineering, computer science, and related fields.

Spotlight on IEEE publications

Ieee xplore ®.

A lit-up lock symbol sites on a microchip.

  • About IEEE Xplore
  • Visit the IEEE Xplore Digital Library
  • See how to purchase articles and standards
  • Find support and training
  • Browse popular content
  • Sign up for a free trial

IEEE Spectrum Magazine

IEEE Spectrum in white on orange background. Text in upper left reads More Signal, Less Noise.

  • Visit the IEEE Spectrum website
  • Visit the Institute for IEEE member news

IEEE Access

IEEE Access logo with a stopwatch to the right.

  • Visit IEEE Access

View of a piece of Earth from space.

  • See recent issues

Benefits of publishing

Authors: why publish with ieee.

A smiling woman types on a laptop.

  • PSPB Accomplishments in 2023 (PDF, 228 KB)
  • IEEE statement of support for Open Science
  • IEEE signs San Francisco Declaration on Research Assessment (DORA)
  • Read about how IEEE journals maintain top citation rankings

Open Access Solutions

A group of padlocks sit over binary code. One of the padlocks is unlocked.

  • Visit IEEE Open

The Author Center logo.

Visit the IEEE Author Center

Find author resources >

  • > IEEE Collabratec ®
  • > Choosing a journal
  • > Writing
  • > Author Tools
  • > How to Publish with IEEE (English) (PPT, 3 MB)
  • > How to Publish with IEEE (Chinese) (PPT, 3 MB)
  • > Benefits of Publishing with IEEE (PPT, 7 MB)
  • > View author tutorial videos
  • Read the IEEE statement on appropriate use of bibliometric indicators

published research papers

Publication types and subscription options

  • Journal and magazine subscriptions
  • Digital library subscriptions
  • Buy individual articles from IEEE Xplore

For organizations:

  • Browse IEEE subscriptions
  • Get institutional access
  • Subscribe through your local IEEE account manager

published research papers

Publishing information

IEEE publishing makes the exchange of technical knowledge possible with the highest quality and the greatest impact.

  • Open access publishing options
  • Intellectual Property Rights (IPR)
  • Reprints of articles
  • Services for IEEE organizations

published research papers

Contact information

  • Contact IEEE Publications
  • About the Publication Services & Products Board

Related Information >

Network. collaborate. create with ieee collabratec®..

All within one central hub—with exclusive features for IEEE members. 

  • Experience IEEE Collabratec

A finger presses a computer keyboard key with the word 'Membership' written on it. A door key sits above the keyboard.

Join/Renew IEEE or a Society

Receive member access to select content, product discounts, and more.

  • Review all member benefits

A rainbow abstract of lines coming out from a darker central point.

Try this easy-to-use, globally accessible data repository that provides significant benefits to researchers, data analysts, and the global technical community.

  • Start learning today

Solar panels and wind turbines with the sun setting behind a city skyline.

IEEE is committed to helping combat and mitigate the effects of climate change.  

  • See what's new on the IEEE Climate Change site
  • Corrections

Search Help

Get the most out of Google Scholar with some helpful tips on searches, email alerts, citation export, and more.

Finding recent papers

Your search results are normally sorted by relevance, not by date. To find newer articles, try the following options in the left sidebar:

  • click "Since Year" to show only recently published papers, sorted by relevance;
  • click "Sort by date" to show just the new additions, sorted by date;
  • click the envelope icon to have new results periodically delivered by email.

Locating the full text of an article

Abstracts are freely available for most of the articles. Alas, reading the entire article may require a subscription. Here're a few things to try:

  • click a library link, e.g., "FindIt@Harvard", to the right of the search result;
  • click a link labeled [PDF] to the right of the search result;
  • click "All versions" under the search result and check out the alternative sources;
  • click "Related articles" or "Cited by" under the search result to explore similar articles.

If you're affiliated with a university, but don't see links such as "FindIt@Harvard", please check with your local library about the best way to access their online subscriptions. You may need to do search from a computer on campus, or to configure your browser to use a library proxy.

Getting better answers

If you're new to the subject, it may be helpful to pick up the terminology from secondary sources. E.g., a Wikipedia article for "overweight" might suggest a Scholar search for "pediatric hyperalimentation".

If the search results are too specific for your needs, check out what they're citing in their "References" sections. Referenced works are often more general in nature.

Similarly, if the search results are too basic for you, click "Cited by" to see newer papers that referenced them. These newer papers will often be more specific.

Explore! There's rarely a single answer to a research question. Click "Related articles" or "Cited by" to see closely related work, or search for author's name and see what else they have written.

Searching Google Scholar

Use the "author:" operator, e.g., author:"d knuth" or author:"donald e knuth".

Put the paper's title in quotations: "A History of the China Sea".

You'll often get better results if you search only recent articles, but still sort them by relevance, not by date. E.g., click "Since 2018" in the left sidebar of the search results page.

To see the absolutely newest articles first, click "Sort by date" in the sidebar. If you use this feature a lot, you may also find it useful to setup email alerts to have new results automatically sent to you.

Note: On smaller screens that don't show the sidebar, these options are available in the dropdown menu labelled "Year" right below the search button.

Select the "Case law" option on the homepage or in the side drawer on the search results page.

It finds documents similar to the given search result.

It's in the side drawer. The advanced search window lets you search in the author, title, and publication fields, as well as limit your search results by date.

Select the "Case law" option and do a keyword search over all jurisdictions. Then, click the "Select courts" link in the left sidebar on the search results page.

Tip: To quickly search a frequently used selection of courts, bookmark a search results page with the desired selection.

Access to articles

For each Scholar search result, we try to find a version of the article that you can read. These access links are labelled [PDF] or [HTML] and appear to the right of the search result. For example:

A paper that you need to read

Access links cover a wide variety of ways in which articles may be available to you - articles that your library subscribes to, open access articles, free-to-read articles from publishers, preprints, articles in repositories, etc.

When you are on a campus network, access links automatically include your library subscriptions and direct you to subscribed versions of articles. On-campus access links cover subscriptions from primary publishers as well as aggregators.

Off-campus access

Off-campus access links let you take your library subscriptions with you when you are at home or traveling. You can read subscribed articles when you are off-campus just as easily as when you are on-campus. Off-campus access links work by recording your subscriptions when you visit Scholar while on-campus, and looking up the recorded subscriptions later when you are off-campus.

We use the recorded subscriptions to provide you with the same subscribed access links as you see on campus. We also indicate your subscription access to participating publishers so that they can allow you to read the full-text of these articles without logging in or using a proxy. The recorded subscription information expires after 30 days and is automatically deleted.

In addition to Google Scholar search results, off-campus access links can also appear on articles from publishers participating in the off-campus subscription access program. Look for links labeled [PDF] or [HTML] on the right hand side of article pages.

Anne Author , John Doe , Jane Smith , Someone Else

In this fascinating paper, we investigate various topics that would be of interest to you. We also describe new methods relevant to your project, and attempt to address several questions which you would also like to know the answer to. Lastly, we analyze …

You can disable off-campus access links on the Scholar settings page . Disabling off-campus access links will turn off recording of your library subscriptions. It will also turn off indicating subscription access to participating publishers. Once off-campus access links are disabled, you may need to identify and configure an alternate mechanism (e.g., an institutional proxy or VPN) to access your library subscriptions while off-campus.

Email Alerts

Do a search for the topic of interest, e.g., "M Theory"; click the envelope icon in the sidebar of the search results page; enter your email address, and click "Create alert". We'll then periodically email you newly published papers that match your search criteria.

No, you can enter any email address of your choice. If the email address isn't a Google account or doesn't match your Google account, then we'll email you a verification link, which you'll need to click to start receiving alerts.

This works best if you create a public profile , which is free and quick to do. Once you get to the homepage with your photo, click "Follow" next to your name, select "New citations to my articles", and click "Done". We will then email you when we find new articles that cite yours.

Search for the title of your paper, e.g., "Anti de Sitter space and holography"; click on the "Cited by" link at the bottom of the search result; and then click on the envelope icon in the left sidebar of the search results page.

First, do a search for your colleague's name, and see if they have a Scholar profile. If they do, click on it, click the "Follow" button next to their name, select "New articles by this author", and click "Done".

If they don't have a profile, do a search by author, e.g., [author:s-hawking], and click on the mighty envelope in the left sidebar of the search results page. If you find that several different people share the same name, you may need to add co-author names or topical keywords to limit results to the author you wish to follow.

We send the alerts right after we add new papers to Google Scholar. This usually happens several times a week, except that our search robots meticulously observe holidays.

There's a link to cancel the alert at the bottom of every notification email.

If you created alerts using a Google account, you can manage them all here . If you're not using a Google account, you'll need to unsubscribe from the individual alerts and subscribe to the new ones.

Google Scholar library

Google Scholar library is your personal collection of articles. You can save articles right off the search page, organize them by adding labels, and use the power of Scholar search to quickly find just the one you want - at any time and from anywhere. You decide what goes into your library, and we’ll keep the links up to date.

You get all the goodies that come with Scholar search results - links to PDF and to your university's subscriptions, formatted citations, citing articles, and more!

Library help

Find the article you want to add in Google Scholar and click the “Save” button under the search result.

Click “My library” at the top of the page or in the side drawer to view all articles in your library. To search the full text of these articles, enter your query as usual in the search box.

Find the article you want to remove, and then click the “Delete” button under it.

  • To add a label to an article, find the article in your library, click the “Label” button under it, select the label you want to apply, and click “Done”.
  • To view all the articles with a specific label, click the label name in the left sidebar of your library page.
  • To remove a label from an article, click the “Label” button under it, deselect the label you want to remove, and click “Done”.
  • To add, edit, or delete labels, click “Manage labels” in the left column of your library page.

Only you can see the articles in your library. If you create a Scholar profile and make it public, then the articles in your public profile (and only those articles) will be visible to everyone.

Your profile contains all the articles you have written yourself. It’s a way to present your work to others, as well as to keep track of citations to it. Your library is a way to organize the articles that you’d like to read or cite, not necessarily the ones you’ve written.

Citation Export

Click the "Cite" button under the search result and then select your bibliography manager at the bottom of the popup. We currently support BibTeX, EndNote, RefMan, and RefWorks.

Err, no, please respect our robots.txt when you access Google Scholar using automated software. As the wearers of crawler's shoes and webmaster's hat, we cannot recommend adherence to web standards highly enough.

Sorry, we're unable to provide bulk access. You'll need to make an arrangement directly with the source of the data you're interested in. Keep in mind that a lot of the records in Google Scholar come from commercial subscription services.

Sorry, we can only show up to 1,000 results for any particular search query. Try a different query to get more results.

Content Coverage

Google Scholar includes journal and conference papers, theses and dissertations, academic books, pre-prints, abstracts, technical reports and other scholarly literature from all broad areas of research. You'll find works from a wide variety of academic publishers, professional societies and university repositories, as well as scholarly articles available anywhere across the web. Google Scholar also includes court opinions and patents.

We index research articles and abstracts from most major academic publishers and repositories worldwide, including both free and subscription sources. To check current coverage of a specific source in Google Scholar, search for a sample of their article titles in quotes.

While we try to be comprehensive, it isn't possible to guarantee uninterrupted coverage of any particular source. We index articles from sources all over the web and link to these websites in our search results. If one of these websites becomes unavailable to our search robots or to a large number of web users, we have to remove it from Google Scholar until it becomes available again.

Our meticulous search robots generally try to index every paper from every website they visit, including most major sources and also many lesser known ones.

That said, Google Scholar is primarily a search of academic papers. Shorter articles, such as book reviews, news sections, editorials, announcements and letters, may or may not be included. Untitled documents and documents without authors are usually not included. Website URLs that aren't available to our search robots or to the majority of web users are, obviously, not included either. Nor do we include websites that require you to sign up for an account, install a browser plugin, watch four colorful ads, and turn around three times and say coo-coo before you can read the listing of titles scanned at 10 DPI... You get the idea, we cover academic papers from sensible websites.

That's usually because we index many of these papers from other websites, such as the websites of their primary publishers. The "site:" operator currently only searches the primary version of each paper.

It could also be that the papers are located on examplejournals.gov, not on example.gov. Please make sure you're searching for the "right" website.

That said, the best way to check coverage of a specific source is to search for a sample of their papers using the title of the paper.

Ahem, we index papers, not journals. You should also ask about our coverage of universities, research groups, proteins, seminal breakthroughs, and other dimensions that are of interest to users. All such questions are best answered by searching for a statistical sample of papers that has the property of interest - journal, author, protein, etc. Many coverage comparisons are available if you search for [allintitle:"google scholar"], but some of them are more statistically valid than others.

Currently, Google Scholar allows you to search and read published opinions of US state appellate and supreme court cases since 1950, US federal district, appellate, tax and bankruptcy courts since 1923 and US Supreme Court cases since 1791. In addition, it includes citations for cases cited by indexed opinions or journal articles which allows you to find influential cases (usually older or international) which are not yet online or publicly available.

Legal opinions in Google Scholar are provided for informational purposes only and should not be relied on as a substitute for legal advice from a licensed lawyer. Google does not warrant that the information is complete or accurate.

We normally add new papers several times a week. However, updates to existing records take 6-9 months to a year or longer, because in order to update our records, we need to first recrawl them from the source website. For many larger websites, the speed at which we can update their records is limited by the crawl rate that they allow.

Inclusion and Corrections

We apologize, and we assure you the error was unintentional. Automated extraction of information from articles in diverse fields can be tricky, so an error sometimes sneaks through.

Please write to the owner of the website where the erroneous search result is coming from, and encourage them to provide correct bibliographic data to us, as described in the technical guidelines . Once the data is corrected on their website, it usually takes 6-9 months to a year or longer for it to be updated in Google Scholar. We appreciate your help and your patience.

If you can't find your papers when you search for them by title and by author, please refer your publisher to our technical guidelines .

You can also deposit your papers into your institutional repository or put their PDF versions on your personal website, but please follow your publisher's requirements when you do so. See our technical guidelines for more details on the inclusion process.

We normally add new papers several times a week; however, it might take us some time to crawl larger websites, and corrections to already included papers can take 6-9 months to a year or longer.

Google Scholar generally reflects the state of the web as it is currently visible to our search robots and to the majority of users. When you're searching for relevant papers to read, you wouldn't want it any other way!

If your citation counts have gone down, chances are that either your paper or papers that cite it have either disappeared from the web entirely, or have become unavailable to our search robots, or, perhaps, have been reformatted in a way that made it difficult for our automated software to identify their bibliographic data and references. If you wish to correct this, you'll need to identify the specific documents with indexing problems and ask your publisher to fix them. Please refer to the technical guidelines .

Please do let us know . Please include the URL for the opinion, the corrected information and a source where we can verify the correction.

We're only able to make corrections to court opinions that are hosted on our own website. For corrections to academic papers, books, dissertations and other third-party material, click on the search result in question and contact the owner of the website where the document came from. For corrections to books from Google Book Search, click on the book's title and locate the link to provide feedback at the bottom of the book's page.

General Questions

These are articles which other scholarly articles have referred to, but which we haven't found online. To exclude them from your search results, uncheck the "include citations" box on the left sidebar.

First, click on links labeled [PDF] or [HTML] to the right of the search result's title. Also, check out the "All versions" link at the bottom of the search result.

Second, if you're affiliated with a university, using a computer on campus will often let you access your library's online subscriptions. Look for links labeled with your library's name to the right of the search result's title. Also, see if there's a link to the full text on the publisher's page with the abstract.

Keep in mind that final published versions are often only available to subscribers, and that some articles are not available online at all. Good luck!

Technically, your web browser remembers your settings in a "cookie" on your computer's disk, and sends this cookie to our website along with every search. Check that your browser isn't configured to discard our cookies. Also, check if disabling various proxies or overly helpful privacy settings does the trick. Either way, your settings are stored on your computer, not on our servers, so a long hard look at your browser's preferences or internet options should help cure the machine's forgetfulness.

Not even close. That phrase is our acknowledgement that much of scholarly research involves building on what others have already discovered. It's taken from Sir Isaac Newton's famous quote, "If I have seen further, it is by standing on the shoulders of giants."

  • Privacy & Terms
  • Advanced search
  • Peer review

published research papers

Discover relevant research today

published research papers

Advance your research field in the open

published research papers

Reach new audiences and maximize your readership

ScienceOpen puts your research in the context of

Publications

For Publishers

ScienceOpen offers content hosting, context building and marketing services for publishers. See our tailored offerings

  • For academic publishers  to promote journals and interdisciplinary collections
  • For open access journals  to host journal content in an interactive environment
  • For university library publishing  to develop new open access paradigms for their scholars
  • For scholarly societies  to promote content with interactive features

For Institutions

ScienceOpen offers state-of-the-art technology and a range of solutions and services

  • For faculties and research groups  to promote and share your work
  • For research institutes  to build up your own branding for OA publications
  • For funders  to develop new open access publishing paradigms
  • For university libraries to create an independent OA publishing environment

For Researchers

Make an impact and build your research profile in the open with ScienceOpen

  • Search and discover relevant research in over 92 million Open Access articles and article records
  • Share your expertise and get credit by publicly reviewing any article
  • Publish your poster or preprint and track usage and impact with article- and author-level metrics
  • Create a topical Collection  to advance your research field

Create a Journal powered by ScienceOpen

Launching a new open access journal or an open access press? ScienceOpen now provides full end-to-end open access publishing solutions – embedded within our smart interactive discovery environment. A modular approach allows open access publishers to pick and choose among a range of services and design the platform that fits their goals and budget.

Continue reading “Create a Journal powered by ScienceOpen”   

What can a Researcher do on ScienceOpen?

ScienceOpen provides researchers with a wide range of tools to support their research – all for free. Here is a short checklist to make sure you are getting the most of the technological infrastructure and content that we have to offer. What can a researcher do on ScienceOpen? Continue reading “What can a Researcher do on ScienceOpen?”   

ScienceOpen on the Road

Upcoming events.

  • 20 – 22 February – ResearcherToReader Conferece

Past Events

  • 09 November – Webinar for the Discoverability of African Research
  • 26 – 27 October – Attending the Workshop on Open Citations and Open Scholarly Metadata
  • 18 – 22 October – ScienceOpen at Frankfurt Book Fair.
  • 27 – 29 September – Attending OA Tage, Berlin .
  • 25 – 27 September – ScienceOpen at Open Science Fair
  • 19 – 21 September – OASPA 2023 Annual Conference .
  • 22 – 24 May – ScienceOpen sponsoring Pint of Science, Berlin.
  • 16-17 May – ScienceOpen at 3rd AEUP Conference.
  • 20 – 21 April – ScienceOpen attending Scaling Small: Community-Owned Futures for Open Access Books .
  • 18 – 20 April – ScienceOpen at the London Book Fair .

What is ScienceOpen?

  • Smart search and discovery within an interactive interface
  • Researcher promotion and ORCID integration
  • Open evaluation with article reviews and Collections
  • Business model based on providing services to publishers

Live Twitter stream

Some of our partners:.

UCL Press

Eight Ways (and More) To Find and Access Research Papers

This blog is part of our Research Smarter series. You’ll discover the various search engines, databases and data repositories to help you along the way. Click on any of the following links for in an in-depth look at how to find relevant research papers, journals , and authors for your next project using the Web of Science™. You can  also check out our ultimate guides here , which include tips to speed up the writing process.

If you’re in the early stages of your research career, you’re likely struggling to learn all you can about your chosen field and evaluate your options. You also need an easy and convenient way to find the right research papers upon which to build your own work and keep you on the proper path toward your goals.

Fortunately, most institutions have access to thousands of journals, so your first step should be to be to check with library staff  and find out what is available via your institutional subscriptions.

For those who may be unfamiliar with other means of access, this blog post – the first in a series devoted to helping you “research smarter” – will provide a sampling of established data sources for scientific research. These include search engines, databases, and data repositories.

Search Engines and Databases

You may have already discovered that the process of searching for research papers offers many choices and scenarios. Some search engines, for example, can be accessed free of charge. Others require a subscription. The latter group generally includes services that index the contents of thousands of published journals, allowing for detailed searches on data fields such as author name, institution, title or keyword, and even funding sources. Because many journals operate on a subscription model too, the process of obtaining full-text versions of papers can be complicated.

On the other hand, a growing number of publishers follow the practice of Open Access (OA) , making their journal content freely available. Similarly, some authors publish their results in the form of preprints, posting them to preprint servers for immediate and free access. These repositories, like indexing services, differ in that some concentrate in a given discipline or broad subject area, while others cover the full range of research.

Search Engines

Following is a brief selection of reputable search engines by which to locate articles relevant to your research.

Google Scholar is a free search engine that provides access to research in multiple disciplines. The sources include academic publishers, universities, online repositories, books, and even judicial opinions from court cases. Based on its indexing, Google Scholar provides citation counts to allow authors and others to track the impact of their work.  

The Directory of Open Access Journals ( DOAJ ) allows users to search and retrieve the article contents of nearly 10,000 OA journals in science, technology, medicine, social sciences, and humanities. All journals must adhere to quality-control standards, including peer review.

PubMed , maintained by the US National Library of Medicine, is a free search engine covering the biomedical and life sciences. Its coverage derives primarily from the MEDLINE database, covering materials as far back as 1951.

JSTOR affords access to more than 12 million journal articles in upwards of 75 disciplines, providing full-text searches of more than 2,000 journals, and access to more than 5,000 OA books.

Selected Databases

The following selection samples a range of resources, including databases which, as discussed above, index the contents of journals either in a given specialty area or the full spectrum of research. Others listed below offer consolidated coverage of multiple databases. Your institution is likely subscribed to a range of research databases, speak to your librarian to see which databases you have access to, and how to go about your search.

Web of Science includes The Web of Science Core Collection, which covers more than 20,000 carefully selected journals, along with books, conference proceedings, and other sources. The indexing also captures citation data, permitting users to follow the thread of an idea or development over time, as well as to track a wide range of research-performance metrics. The Web of Science also features EndNote™ Click , a free browser plugin that offers one-click access to the best available legal and legitimate full-text versions of papers. See here for our ultimate guide to finding relevant research papers on the Web of Science .

Science.gov covers the vast territory of United States federal science, including more than 60 databases and 2,200-plus websites. The many allied agencies whose research is reflected include NASA, the US Department of Agriculture, and the US Environmental Protection Agency.

CiteSeerx is devoted primarily to information and computer science. The database includes a feature called Autonomous Citation Indexing, designed to extract citations and create a citation index for literature searching and evaluation.

Preprint and Data Repositories

An early form of OA literature involved authors, as noted above,  making electronic, preprint versions of their papers freely available. This practice has expanded widely today. You can find archives devoted to a single main specialty area, as well as general repositories connected with universities and other institutions.

The specialty archive is perhaps best exemplified by arXiv (conveniently pronounced “archive,” and one of the earliest examples of a preprint repository). Begun in 1991 as a physics repository, ArXiv has expanded to embrace mathematics, astronomy, statistics, economics, and other disciplines. The success of ArXiv spurred the development of, for example, bioArXiv devoted to an array of topics within biology, and for chemistry, ChemRxiv .

Meanwhile, thousands of institutional repositories hold a variety of useful materials. In addition to research papers, these archives store raw datasets, graphics, notes, and other by-products of investigation. Currently, the Registry of Open Access Repositories lists more than 4,700 entries.

Reach Out Yourself?

If the resources above don’t happen to result in a free and full-text copy of the research you seek, you can also try reaching out to the authors yourself.

To find who authored a paper, you can search indexing platforms like the Web of Science , or research profiling systems like Publons™ , or ResearchGate , then look to reach out to the authors directly.

So, although the sheer volume of research can pose a challenge to identifying and securing needed papers, plenty of options are available.

Related posts

2024 journal citation reports: changes in journal impact factor category rankings to enhance transparency and inclusivity.

published research papers

New Web of Science Grants Index helps researchers develop more targeted grant proposals

published research papers

Three ways research offices can lead researchers to more funding

published research papers

How to Write and Publish a Research Paper for a Peer-Reviewed Journal

  • Open access
  • Published: 30 April 2020
  • Volume 36 , pages 909–913, ( 2021 )

Cite this article

You have full access to this open access article

  • Clara Busse   ORCID: orcid.org/0000-0002-0178-1000 1 &
  • Ella August   ORCID: orcid.org/0000-0001-5151-1036 1 , 2  

256k Accesses

14 Citations

711 Altmetric

Explore all metrics

Communicating research findings is an essential step in the research process. Often, peer-reviewed journals are the forum for such communication, yet many researchers are never taught how to write a publishable scientific paper. In this article, we explain the basic structure of a scientific paper and describe the information that should be included in each section. We also identify common pitfalls for each section and recommend strategies to avoid them. Further, we give advice about target journal selection and authorship. In the online resource 1 , we provide an example of a high-quality scientific paper, with annotations identifying the elements we describe in this article.

Similar content being viewed by others

published research papers

Literature reviews as independent studies: guidelines for academic practice

Sascha Kraus, Matthias Breier, … João J. Ferreira

published research papers

Open access research outputs receive more diverse citations

Chun-Kai Huang, Cameron Neylon, … Katie Wilson

published research papers

Why, When, Who, What, How, and Where for Trainees Writing Literature Review Articles

Gerry L. Koons, Katja Schenke-Layland & Antonios G. Mikos

Avoid common mistakes on your manuscript.

Introduction

Writing a scientific paper is an important component of the research process, yet researchers often receive little formal training in scientific writing. This is especially true in low-resource settings. In this article, we explain why choosing a target journal is important, give advice about authorship, provide a basic structure for writing each section of a scientific paper, and describe common pitfalls and recommendations for each section. In the online resource 1 , we also include an annotated journal article that identifies the key elements and writing approaches that we detail here. Before you begin your research, make sure you have ethical clearance from all relevant ethical review boards.

Select a Target Journal Early in the Writing Process

We recommend that you select a “target journal” early in the writing process; a “target journal” is the journal to which you plan to submit your paper. Each journal has a set of core readers and you should tailor your writing to this readership. For example, if you plan to submit a manuscript about vaping during pregnancy to a pregnancy-focused journal, you will need to explain what vaping is because readers of this journal may not have a background in this topic. However, if you were to submit that same article to a tobacco journal, you would not need to provide as much background information about vaping.

Information about a journal’s core readership can be found on its website, usually in a section called “About this journal” or something similar. For example, the Journal of Cancer Education presents such information on the “Aims and Scope” page of its website, which can be found here: https://www.springer.com/journal/13187/aims-and-scope .

Peer reviewer guidelines from your target journal are an additional resource that can help you tailor your writing to the journal and provide additional advice about crafting an effective article [ 1 ]. These are not always available, but it is worth a quick web search to find out.

Identify Author Roles Early in the Process

Early in the writing process, identify authors, determine the order of authors, and discuss the responsibilities of each author. Standard author responsibilities have been identified by The International Committee of Medical Journal Editors (ICMJE) [ 2 ]. To set clear expectations about each team member’s responsibilities and prevent errors in communication, we also suggest outlining more detailed roles, such as who will draft each section of the manuscript, write the abstract, submit the paper electronically, serve as corresponding author, and write the cover letter. It is best to formalize this agreement in writing after discussing it, circulating the document to the author team for approval. We suggest creating a title page on which all authors are listed in the agreed-upon order. It may be necessary to adjust authorship roles and order during the development of the paper. If a new author order is agreed upon, be sure to update the title page in the manuscript draft.

In the case where multiple papers will result from a single study, authors should discuss who will author each paper. Additionally, authors should agree on a deadline for each paper and the lead author should take responsibility for producing an initial draft by this deadline.

Structure of the Introduction Section

The introduction section should be approximately three to five paragraphs in length. Look at examples from your target journal to decide the appropriate length. This section should include the elements shown in Fig.  1 . Begin with a general context, narrowing to the specific focus of the paper. Include five main elements: why your research is important, what is already known about the topic, the “gap” or what is not yet known about the topic, why it is important to learn the new information that your research adds, and the specific research aim(s) that your paper addresses. Your research aim should address the gap you identified. Be sure to add enough background information to enable readers to understand your study. Table 1 provides common introduction section pitfalls and recommendations for addressing them.

figure 1

The main elements of the introduction section of an original research article. Often, the elements overlap

Methods Section

The purpose of the methods section is twofold: to explain how the study was done in enough detail to enable its replication and to provide enough contextual detail to enable readers to understand and interpret the results. In general, the essential elements of a methods section are the following: a description of the setting and participants, the study design and timing, the recruitment and sampling, the data collection process, the dataset, the dependent and independent variables, the covariates, the analytic approach for each research objective, and the ethical approval. The hallmark of an exemplary methods section is the justification of why each method was used. Table 2 provides common methods section pitfalls and recommendations for addressing them.

Results Section

The focus of the results section should be associations, or lack thereof, rather than statistical tests. Two considerations should guide your writing here. First, the results should present answers to each part of the research aim. Second, return to the methods section to ensure that the analysis and variables for each result have been explained.

Begin the results section by describing the number of participants in the final sample and details such as the number who were approached to participate, the proportion who were eligible and who enrolled, and the number of participants who dropped out. The next part of the results should describe the participant characteristics. After that, you may organize your results by the aim or by putting the most exciting results first. Do not forget to report your non-significant associations. These are still findings.

Tables and figures capture the reader’s attention and efficiently communicate your main findings [ 3 ]. Each table and figure should have a clear message and should complement, rather than repeat, the text. Tables and figures should communicate all salient details necessary for a reader to understand the findings without consulting the text. Include information on comparisons and tests, as well as information about the sample and timing of the study in the title, legend, or in a footnote. Note that figures are often more visually interesting than tables, so if it is feasible to make a figure, make a figure. To avoid confusing the reader, either avoid abbreviations in tables and figures, or define them in a footnote. Note that there should not be citations in the results section and you should not interpret results here. Table 3 provides common results section pitfalls and recommendations for addressing them.

Discussion Section

Opposite the introduction section, the discussion should take the form of a right-side-up triangle beginning with interpretation of your results and moving to general implications (Fig.  2 ). This section typically begins with a restatement of the main findings, which can usually be accomplished with a few carefully-crafted sentences.

figure 2

Major elements of the discussion section of an original research article. Often, the elements overlap

Next, interpret the meaning or explain the significance of your results, lifting the reader’s gaze from the study’s specific findings to more general applications. Then, compare these study findings with other research. Are these findings in agreement or disagreement with those from other studies? Does this study impart additional nuance to well-accepted theories? Situate your findings within the broader context of scientific literature, then explain the pathways or mechanisms that might give rise to, or explain, the results.

Journals vary in their approach to strengths and limitations sections: some are embedded paragraphs within the discussion section, while some mandate separate section headings. Keep in mind that every study has strengths and limitations. Candidly reporting yours helps readers to correctly interpret your research findings.

The next element of the discussion is a summary of the potential impacts and applications of the research. Should these results be used to optimally design an intervention? Does the work have implications for clinical protocols or public policy? These considerations will help the reader to further grasp the possible impacts of the presented work.

Finally, the discussion should conclude with specific suggestions for future work. Here, you have an opportunity to illuminate specific gaps in the literature that compel further study. Avoid the phrase “future research is necessary” because the recommendation is too general to be helpful to readers. Instead, provide substantive and specific recommendations for future studies. Table 4 provides common discussion section pitfalls and recommendations for addressing them.

Follow the Journal’s Author Guidelines

After you select a target journal, identify the journal’s author guidelines to guide the formatting of your manuscript and references. Author guidelines will often (but not always) include instructions for titles, cover letters, and other components of a manuscript submission. Read the guidelines carefully. If you do not follow the guidelines, your article will be sent back to you.

Finally, do not submit your paper to more than one journal at a time. Even if this is not explicitly stated in the author guidelines of your target journal, it is considered inappropriate and unprofessional.

Your title should invite readers to continue reading beyond the first page [ 4 , 5 ]. It should be informative and interesting. Consider describing the independent and dependent variables, the population and setting, the study design, the timing, and even the main result in your title. Because the focus of the paper can change as you write and revise, we recommend you wait until you have finished writing your paper before composing the title.

Be sure that the title is useful for potential readers searching for your topic. The keywords you select should complement those in your title to maximize the likelihood that a researcher will find your paper through a database search. Avoid using abbreviations in your title unless they are very well known, such as SNP, because it is more likely that someone will use a complete word rather than an abbreviation as a search term to help readers find your paper.

After you have written a complete draft, use the checklist (Fig. 3 ) below to guide your revisions and editing. Additional resources are available on writing the abstract and citing references [ 5 ]. When you feel that your work is ready, ask a trusted colleague or two to read the work and provide informal feedback. The box below provides a checklist that summarizes the key points offered in this article.

figure 3

Checklist for manuscript quality

Data Availability

Michalek AM (2014) Down the rabbit hole…advice to reviewers. J Cancer Educ 29:4–5

Article   Google Scholar  

International Committee of Medical Journal Editors. Defining the role of authors and contributors: who is an author? http://www.icmje.org/recommendations/browse/roles-and-responsibilities/defining-the-role-of-authosrs-and-contributors.html . Accessed 15 January, 2020

Vetto JT (2014) Short and sweet: a short course on concise medical writing. J Cancer Educ 29(1):194–195

Brett M, Kording K (2017) Ten simple rules for structuring papers. PLoS ComputBiol. https://doi.org/10.1371/journal.pcbi.1005619

Lang TA (2017) Writing a better research article. J Public Health Emerg. https://doi.org/10.21037/jphe.2017.11.06

Download references

Acknowledgments

Ella August is grateful to the Sustainable Sciences Institute for mentoring her in training researchers on writing and publishing their research.

Code Availability

Not applicable.

Author information

Authors and affiliations.

Department of Maternal and Child Health, University of North Carolina Gillings School of Global Public Health, 135 Dauer Dr, 27599, Chapel Hill, NC, USA

Clara Busse & Ella August

Department of Epidemiology, University of Michigan School of Public Health, 1415 Washington Heights, Ann Arbor, MI, 48109-2029, USA

Ella August

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Ella August .

Ethics declarations

Conflicts of interests.

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

(PDF 362 kb)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Busse, C., August, E. How to Write and Publish a Research Paper for a Peer-Reviewed Journal. J Canc Educ 36 , 909–913 (2021). https://doi.org/10.1007/s13187-020-01751-z

Download citation

Published : 30 April 2020

Issue Date : October 2021

DOI : https://doi.org/10.1007/s13187-020-01751-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Manuscripts
  • Scientific writing
  • Find a journal
  • Publish with us
  • Track your research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • J Postgrad Med
  • v.67(4); Oct-Dec 2021

Published a research paper? What next??

Assistant Professor of Pediatrics at College of Medicine and Health Sciences, National University of Science and Technology, Sohar, Sultanate of Oman

In our earlier editorials, we have already discussed the importance of conducting good-quality medical research, composing an original research paper, and getting the paper published successfully.[ 1 , 2 , 3 ] We have also given a roadmap for reviewing an original research paper.[ 4 ] The current editorial deals with some important post-publication issues that every author should be acquainted with.

Replying to Letters to Editor Received on the Published Manuscript

After the publication of the research paper, the editors may receive one or more “letters to editors,” supporting or criticizing or commenting on the published research paper. If the contents of such letters arouse genuine concerns/issues, the editor will ask for a rebuttal/reply to the same, from the authors of the original paper and publish both of these in a subsequent issue of the journal. It is the responsibility of the corresponding author of the original paper to contact the co-authors and provide a reply that has been drafted and approved by all the authors.[ 5 ] Such correspondence indicates that the paper has aroused sufficient interest in the readers. Replying to such letters gives the authors an opportunity to explain their research findings anew and also address issues that may not have been addressed in their research paper (published earlier).

Editorial Commentaries

The editor may invite an editorial commentary on the accepted research paper, which is usually published in the same issue as the original paper. The commentary is usually written by an expert in the concerned field (who would probably also have reviewed the article and recommended its publication). The purpose of the commentary is to provide a balanced view for interpreting the results of the study and give insights into the clinical applicability/relevance of the study findings. Such commentaries also reflect the experience and the opinion of the expert who is writing the commentary.

MEDLINE and Other indexation

The prestige of a publication rests in its representation in an “indexed journal” of a highly rated database such as MEDLINE, PubMed, Scopus, Embase, and Web of Science. Such indexation not only increases the prestige of a journal but also provides wider access to its content. MEDLINE, of the US National Library of Medicine (NLM), is one of the most widely used biomedical journal citation databases, containing more than 26 million articles (from 1946 to the present) published in more than 5,200 journals. It is available through PubMed free of charge and by subscription via database vendors (Ovid and EBSCO).[ 6 , 7 ] Publishers submit journals to the National Institutes of Health (NIH)-chartered advisory committee, the Literature Selection Technical Review Committee (LSTRC), which reviews and recommends journals for indexation in MEDLINE based on the scientific policy and scientific quality.[ 6 ] PubMed ( www.pubmed.gov ), developed by the National Center for Biotechnology Information (NCBI) at the US NLM, is a free search engine for retrieval of the literature, having more than 30 million citations and abstracts on biomedical and life sciences across several NLM literature resources.[ 6 , 7 ] It provides access to all of MEDLINE, journals/manuscripts deposited in PubMed Central (PMC), and NCBI Bookshelf. The update occurs daily with reference data supplied directly by publishers, often before a journal issue is released. To be indexed in PubMed, a journal should be selected as a MEDLINE journal or be deposited to the PMC.[ 6 ]

Medical Subject Headings (MeSH) terms—a controlled and hierarchically organized vocabulary thesaurus—are used by the NLM to index and search biomedical literature. They provide an overview of an article's content through a set of terms pertaining to main headings (descriptors) and subheadings (qualifiers), with yearly updates. Indexers (generally librarians trained to read MEDLINE published articles) assign relevant MeSH indexing terms based on the content/concept of an article, using words from an official MeSH list.[ 8 ] The manual assignment of MeSH terms is laborious, subjective, time-consuming, and expensive, which has led to the development of the Medical Text Indexer (MTI), a MeSH prediction tool that assists NLM indexers by providing recommendations for MeSH terms.[ 9 ]

Scopus ( www.scopus.com ) is another subscription-based citation database produced by Elsevier Co. and maintained by independent subject matter experts. It indexes about 4,600 health science titles including 100% of MEDLINE and Embase coverage.[ 10 ]

Embase (Excerpta Medica Database) ( www.embase.com ) is a biomedical as well as a pharmacological bibliographic database. This subscription-based Elsevier database (32 million records of over 8,500 currently published journals since 1947) assists information managers and pharmacovigilance in licensed drugs. Emtree is the Embase thesaurus, and all journals listed in MEDLINE are also registered in Embase, with additional 2,900 journals unique to it.[ 11 ]

The Web of Science (Thomson Reuters) is an interdisciplinary subscription-based database with records (from 1900 onward) of multiple bibliographic databases. It includes Science Citation Index Expanded (SCI-EXPANDED), a medical database, and helps in article recovery and citation search.[ 7 ]

Publons ( http://publons.com/ ), a free commercial website, combines publications, citation metrics, peer-reviews, and journal editorial work, all in one place. It serves as a platform for publishers to seek and connect to peer-reviewers, reports global peer-review activity, and provides peer-review training for early-career researchers.[ 12 ]

Google Scholar ( http://scholar.google.com/ ), a mainstream free academic crawler-based search engine, has content across academic disciplines, countries, and languages, and over 380 million records. Indexing in Google Scholar enhances accessibility, sharing, and online citation worldwide, particularly for open-access (OA) journals. Google Scholar offers free alerts (via email) of citations to the authors of their publications indexed with it, once the authors register for such alerts. A Google Scholar search also focuses on individual articles and not journals, improves article retrieval (including unpublished conference material), shows more frequently cited works higher in search, and lists the papers citing original papers (via “Cited by”).[ 13 ]

Posting of the Pdf of the Final Published Article on Websites/Servers

Posting of the complete paper or the portable document format (“pdf”) of the final published paper on websites/servers (self-archiving) may be done only if the paper has been published in an OA journal and if the journal policy allows such a posting (this precaution is to be taken to prevent copyright infringement). Some journals allow the revised accepted (pre-print) version of the manuscript to be shared but not the final printed pdf version. These details of the necessary permissions required are usually given on the website of the journal/publisher. It is the responsibility of the author who is posting such a pre-print version to check whether the journal policy allows him/her to do so. Usually, the journal editors/publishers send the final, published pdf copy to the corresponding author. The corresponding author may then share it with the co-authors of the paper. However, it should be remembered that the copyright of the paper is with the publisher (usually) and the publisher provides the pdf to the corresponding author with a rider that mentions that “the pdf is for personal and educational purposes only and should not be distributed or printed commercially or distributed systematically.” This is especially true for the non-OA journals.

Getting the Research Noticed by the Medical Fraternity

It is necessary that a research paper gets read by the medical fraternity all over the world. For this, there are various avenues such as sharing the title and abstract or their links with professional colleagues on social media (such as Telegram, Whatsapp, Facebook, Twitter, and LinkedIn) or via emails on groups/listserv, displaying the title and abstract on the website of the journal (in the table of contents of the journal) or the website of the institution where the work was carried out, posting the title and abstract on websites such as the Researchgate ( www.researchgate.net ) or Mendeley ( www.mendeley.com ), discussion of the paper at journal clubs, and inclusion of the results in subsequent presentations at conferences/seminars by the authors. There may be some limitations on sharing the full text of the paper or the pdf version of the published paper as discussed earlier in this editorial. The journal editor/publisher may also send details to the corresponding author on how to increase the visibility of their paper. The increased visibility is most likely to translate into better citations and research impact for the paper. The visibility of the paper can be enhanced by (after checking the journal's OA policy)

  • The journal's website and its bibliographic linking.
  • Institutional open archive repository, where one may post the pdf in the archive with a link to the article on the journal's website.
  • By depositing the article in a subject-based OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) compliant repository (subject-wise list of repositories is available from http://opcit.eprints.org/explorearchives.shtml#disciplinary ).
  • Linking the paper from as many websites as possible using citation and social bookmarking tools such as GetCited ( http://www.getcited.org/add/ ), CiteULike ( http://www.citeulike.org/register ), Connotea ( http://www.connotea.org/register ), Zotero ( http://www.zotero.org/ ), and Stumbleupon ( http://www.stumbleupon.com/sign_up.php?pre2 = hp_join ).
  • Linking the article from an appropriate topic in Wikipedia.
  • Depositing the paper with the NLM's PubMed Central, if the authors have received an NIH grant ( http://www.nihms.nih.gov/db/sub.cgi ) as NIH insists that publicly funded research should be available to everyone without having to pay.
  • Linking the paper from the author's personal/institutional web pages.

The Journal of Postgraduate Medicine approves self-archiving of articles (final accepted version) on OAI (Open Archives Initiative)—compliant institutional/subject-based repository.[ 14 ] Self-archiving enables maximum visibility, impact, access, and usage and can be done by the author himself or via digital archivers in the author's institution/library.[ 15 ] Self-archiving can be expedited by the installation of OAI-compliant Eprint Archives in university/research institutions, and self-archiving pre-peer-review preprints (without the embargo period) and post-peer-review post-prints (or corrigenda file) (after the embargo period) on the author's personal website, company/institutional repository or archive, not-for-profit subject-based preprint servers or repositories.[ 15 , 16 ] “Embargo period” refers to the time post-publication (commonly ranging from 12 to 24 months), after which a subscribed article is made freely available/openly accessible to users.

Open Access

Traditionally, publishing an article would incur no charges toward author submission, peer-review, and publication. However, users are often charged a subscription fee for full-article access, which limits free access to the literature. The novel concept of OA has emerged in the last two decades through pioneers such as BioMed Central and Public Library of Science (PLoS) with online-only journals.[ 17 ] Fully-OA journals make all their articles freely and immediately accessible online (without embargo period) under a Creative Commons (CC) or equivalent open copyright license permitting anyone to “read, download, copy, distribute, print, search, or link to the full-texts of articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose” through two established routes—“gold” and “green”.[ 18 , 19 ] In the “gold” route, the authors pay a fee—an article processing charge (APC)–to facilitate free and immediate access to their published articles. These charges are journal-specific and may range from 500 to 5000 US dollars, which are often prohibitive and unaffordable by Indian authors. If financial support for OA research is provided by the author's institution or organizations, such as the Indian Council of Medical Research (ICMR), and INCLEN Trust (International Clinical Epidemiology Network- INCLEN), then Indian authors may be able to afford these publication charges. In the “green” route, the author publishes the research article in any journal and then archives it in an institutional repository (University, a central repository [e.g. PubMed Central] or an OA website) based on the journal self-archiving policies.[ 17 , 19 ] This balances the researcher's freedom to publish and share work, and the publisher's control on quality. Publishing in a reputable OA journal provides versatility and visibility, and in return the academic researcher receives a higher research impact through citation counts.[ 20 ]

Many OA journals use CC licenses, an easy alternative to standard copyrights, permitting authors to determine the use of their work broadly, minus the need to look into individual permission requests. A CC license allows copying and redistribution of material in any medium/format (sharing) and remixing, transforming, and building upon the material (adapting). Restrictive elements such as “Attribution” require users to cite the creator of the work, “Non-Commercial” prohibits users from making commercial use of the work, “No Derivates” prohibit users from making modifications to the work, and “Share Alike” require users to apply the same licenses to a new work they create with the original work. These clauses limit the re-use, but provide useful protection to scholars, research subjects, and the OA nature of the publication. The elements can be combined, as is the Attribution-Noncommercial-Share Alike license, or CC-BY-NC-SA, or the Attribution-Non-Commercial-No Derivatives or CC-BY-NC-ND.[ 21 ]

The increase in the number of OA journals has often led to questions about their quality. This and the increasing pressure on researchers to “publish” or “perish” has fostered an increasing growth in medical journals, hoping to attract eager young academic researchers to publish their work in their journals.[ 5 ] The Directory of Open Access Journals (DOAJ) is a community-curated database that provides comprehensive access and quality control over the content of OA scientific and scholarly journals. DOAJ aims to increase the visibility and ease of use of OA journals, thereby increasing their usage and impact.[ 22 ]

The Journal Impact Factor and Personal Research Impact Factors

Several journal-/author-/article-level metric tools are available via indexing databases (e.g., Scopus and Web of Science) that enable users to track the scholarly impact of a journal, author, or article. One of the most popular ways of assessing a journal's importance is via its journal impact factor (JIF). The 2-year JIF (in any given year) is the ratio between the number of citations received in that year for publications in that journal in the two preceding years and the total number of “citable items” published in that journal during the two previous years. An impact factor could also consider shorter or longer periods of citations and sources.[ 23 ]

Author impact factor (AIF) similarly evaluates the impact of an individual author; however, because the number is generally not large, other citation metrics are used.[ 24 ] One such prominent influence measure is “h-index” (or Hirsch index), which effectively combines papers (indicating quantity/productivity) with citations (indicating quality/impact), thus evaluating an individual author's publication career. It is a count of the largest number of papers (h) from an author that has at least (h) number of citations. It enables a comparison of researchers from the same field with equally long careers, predicts future scientific achievements, and helps in decisions pertaining to tenure positions/grants.[ 25 ] The “i10-index” (introduced by Google), in contrast, is a simple tally of a researcher's publications with at least 10 citations. The “i10-index” is straightforward, easy to calculate (using “My Citations” on Google Scholar), and helps to identify important/influential papers out of an author's publications (those that are cited at least 10 times). However, it is restricted to Google Scholar and does not account for the total number of publications and total citations of an author, thus not giving a clear impression of an author's research productivity.[ 26 ]

Citation analysis involves measuring the number of citations that a particular work has received, indicating the overall quality of that work, whereas citation count is the total number of an individual's citations. Citation counts measure the impact and performance of individual researchers as well as departments, research institutions, universities, books, journals, and nations.[ 27 ]

Citation-based metrics may take years to accumulate and are not always the best indicator of practical impact in fields, such as clinical medicine. Article-level metrics (ALMs) measure the impact/uptake of an individual journal article on the scientific community post-publication and include usage, citations, social bookmarking and dissemination activity, media and blog coverage, discussion activity and ratings.[ 28 ] They thus measure the dissemination and reach of published research articles in practical fields. PLOS uses the category labels of Viewed, Cited, Saved, Discussed, and Recommended.[ 28 ] ALMs are valuable to researchers (track and share the impact of published work), research institutions, funders, and publishers. The PLOS Application Programming Interface (API) for ALMs is freely and publicly available from https://web.archive.org/web/20140408224328/http://api.plos.org/alm/using-the-alm-api/ and allows users with programming skills to extract data for various research purposes.[ 29 ]

The ORCID (Open Researcher and Contributor ID) [ https://orcid.org/register ]

ORCID, via its unique 16-digit author identifier, provides a digital name—or an iD—that uniquely and persistently identifies researchers and other contributors to their research effort. By connecting iDs to different research activities (grant proposal submissions, manuscripts to journal publishers, and datasets to data repositories) and affiliations across multiple research information platforms, ORCID enables recognition and reduces the reporting burden for researchers. As a researcher and author, it is important to be recognized and receive full credit for their contributions and research in work. The ORCID provides a unique identifier for the research that is linked to names rather than institutions, thus researchers can maintain the same iD throughout their career, even when their institutional affiliation changes.[ 30 ] It allows researchers to receive full credit for their contributions and eliminates mistaken identity, especially when there are multiple authors with the same name. It also makes the submission process easier by allowing users to sign in to multiple journal submission sites with one username and password. ORCID can be applied to research outputs to identify, validate, and confirm authorship as well as track research output. Some journals now print the ORCID number of an author in the publication. A single click on the displayed ORCID numbers gives the reader the entire list of publications by the author. It also easily integrates with other databases such as Crossref, ResearchID, and SCOPUS.[ 30 , 31 ]

Researcher ID (developed by Thomas Reuters and used in Web of Science) and Scopus Author ID (developed by Elsevier and used in Scopus) are similar identifiers provided by subscription-based proprietary systems. Researcher ID (obtained by creating a Researcher ID account) allows researchers to manage their publication lists, track citations and h-index, and identify potential collaborators. A Scopus Author ID is automatically assigned to an author with a Scopus-indexed publication and enables tracking publications indexed in the Scopus citation database and building metric reports.[ 32 ]

Concluding Remarks

In the current scenario, increasing importance is being given to research and publications—as a measure of individual/institutional progress as well as a benchmark to determine recruitment, promotions, and funding. Thus, it has become mandatory for an individual to keep working on quality research, followed by writing and publishing it successfully.[ 33 ] However, publication is not the end of an author's work but the beginning of another important process. The value of a publication lies in its wide accessibility and impact. Bibliographic databases, such as MEDLINE, Embase, and Scopus, compile data from a selection of journals (such documented journals are “indexed” in that database), thus improving the visibility and access on a literature search. The use of database-specific thesaurus/controlled vocabulary such as MeSH (for MEDLINE) and Emtree (for EMBASE) enables precise search outcomes.[ 7 ] An increase in the number of OA journals and processes such as “self-archiving” has opened avenues for the wide dissemination of published information.[ 15 ] Many articles are freely available online immediately after publication (gold OA), whereas many permit the authors to archive in an institutional repository, subject to journal policies.[ 17 ] The use of the CC license and its restrictive elements allows the authors to choose how their work can be used.[ 21 ] Bibliometrics help to measure academic/scholarly activity and scientific impact, but should not be obsessed over.[ 23 ] Their shortcomings in the true measurement of the impact of journals, articles, and authors have necessitated a suitable replacement, one with a more effective and meaningful evaluation of the true influence of research. The availability of free resources such as ORCID has linked an individual's research over various platforms and provided consolidated data of one's research activities. It is important that authors are aware of these post-publication resources and utilize them proactively to disseminate one's research and ensure a meaningful research impact.

Financial support and sponsorship

Conflicts of interest.

There are no conflicts of interest.

share this!

February 21, 2024

This article has been reviewed according to Science X's editorial process and policies . Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

peer-reviewed publication

trusted source

Exploring microstructures for high-performance materials

by Mike Krapfl, Iowa State University

Exploring microstructures for high-performance materials

In just the first few months of 2024, the journal Nature has published two scientific papers co-authored by Kun Luo, an Iowa State University postdoctoral research associate in materials science and engineering.

"My research aims to unravel the fundamental mechanisms governing the behavior of diverse materials," Ken Luo wrote in a brief biography, "paving the way for developing innovative and high-performance materials across various industries."

Luo has a background in experimental science, studying superhard materials using techniques in high-pressure physics. He also has expertise in theoretical simulations using machine learning tools to discover the microstructures within materials.

"Throughout my career, I recognized the importance of theoretical simulation in explaining the atomic mechanisms behind the macroscopic behaviors of materials," he said.

At Iowa State, he's working "to continue exploring the mechanisms behind material behaviors."

For these two Nature studies (and another Nature paper published in July 2022, for which he was the first author, "Coherent interfaces govern direct transformation from graphite to diamond"), Luo used the same tools and techniques to contribute findings.

He started with real atomic arrangements using the best electron microscope data available, which provided two-dimensional images. Luo used those images to construct three-dimensional atomic models with computer software manually.

"Currently, experiments cannot observe the evolution of these microstructures in situ during phase transitions , movements, or deformation processes," Luo said. "Therefore, effective computational simulations can provide us with a solid theoretical basis to uncover the mechanisms behind these phenomena, ultimately leading to convincing conclusions."

Luo said the study described in the 2022 Nature paper about the direct transformation from graphite to diamond resulted in the discovery of a new material called Gradia, a material that has been patented in the United States.

Gradia has mechanical and electrical properties —such as super-hardness and conductivity—that Luo said could be applied to new technologies.

He said the latest Nature paper about ceramic materials that can be shaped and molded like metals could have applications as heat-resistant or insulating structural materials.

Luo's atomic structure models "are indeed tools of basic science for uncovering novel materials," he said, "and at the same time, they open the gates to more practical applications."

Ke Tong et al, Structural transition and migration of incoherent twin boundary in diamond, Nature (2024). DOI: 10.1038/s41586-023-06908-6 www.nature.com/articles/s41586-023-06908-6

Journal information: Nature

Provided by Iowa State University

Explore further

Feedback to editors

published research papers

Carbon emissions from the destruction of mangrove forests predicted to increase by 50,000% by the end of the century

28 minutes ago

published research papers

New realistic computer model will help robots collect moon dust

4 hours ago

published research papers

New approach to carbon-14 dating corrects the age of a prehistoric burial site

10 hours ago

published research papers

Researchers develop molecules for a new class of antibiotics that can overcome drug resistant bacteria

published research papers

Utah's Bonneville Salt Flats has long been in flux, new research finds

11 hours ago

published research papers

High resolution techniques reveal clues in 3.5 billion-year-old biomass

published research papers

Physicists discover a quantum state with a new type of emergent particles: Six-flux composite fermions

published research papers

A new RNA editing tool could enhance cancer treatment

published research papers

Research team develops nanoscale device for brain chemistry analysis

published research papers

Study finds home heating fuel is direct source of sulfate in Fairbanks's winter air

Relevant physicsforums posts, force that causes ions to move to a lower concentration.

Feb 20, 2024

Differences between ph meters for solutions, creams and oils

Feb 16, 2024

Freon filled balloons go flat QUICKLY! Why?

Feb 13, 2024

Help, I have made a huge mistake with copper sulfate!

Feb 9, 2024

Trying to impress my 8th grade students, made some unknown stuff

Feb 8, 2024

Regenerating ion exchange resin

Jan 29, 2024

More from Chemistry

Related Stories

published research papers

Machine learning sifts through vast data from X-ray diffraction techniques to find new materials

Dec 19, 2023

published research papers

A mechanism of pressure-induced glass phase transition leading to advanced phase-change memories

Dec 11, 2023

published research papers

Numerical simulation of materials-oriented ultra-precision diamond cutting: Review and outlook

Mar 17, 2023

published research papers

Researchers reveal structure-property relationship of two-dimensional amorphous carbon

Mar 14, 2023

published research papers

Scientists predict new superhard materials

Sep 10, 2020

published research papers

Seeing light elements in a grain boundary: Revealing material properties down to the atomic scale

Aug 1, 2023

Recommended for you

published research papers

New class of 'intramolecular bivalent glue' could transform cancer drug discovery

16 hours ago

published research papers

Unraveling the pH-dependent oxygen reduction performance on single-atom catalysts

18 hours ago

published research papers

Magnetic effects at the origin of life? It's the spin that makes the difference

published research papers

Researchers synthesize a new manganese-fluorine catalyst with exceptional oxidizing power

published research papers

AI-assisted robot lab develops new catalysts to synthesize methanol from CO₂

published research papers

Research team develops universal and accurate method to calculate how proteins interact with drugs

Let us know if there is a problem with our content.

Use this form if you have come across a typo, inaccuracy or would like to send an edit request for the content on this page. For general inquiries, please use our contact form . For general feedback, use the public comments section below (please adhere to guidelines ).

Please select the most appropriate category to facilitate processing of your request

Thank you for taking time to provide your feedback to the editors.

Your feedback is important to us. However, we do not guarantee individual replies due to the high volume of messages.

E-mail the story

Your email address is used only to let the recipient know who sent the email. Neither your address nor the recipient's address will be used for any other purpose. The information you enter will appear in your e-mail message and is not retained by Phys.org in any form.

Newsletter sign up

Get weekly and/or daily updates delivered to your inbox. You can unsubscribe at any time and we'll never share your details to third parties.

More information Privacy policy

Donate and enjoy an ad-free experience

We keep our content available to everyone. Consider supporting Science X's mission by getting a premium account.

E-mail newsletter

published research papers

Research Papers

Big Data for a Climate Disaster-Resilient Country, Philippines Ebinezer R. Florano

A Veto Players Analysis of Subnational Territorial Reform in Indonesia Michael A. Tumanut

The Politics of Municipal Merger in the Philippines Michael A. Tumanut

2018 AGPA Conference papers 

Management of Social Media for Disaster Risk Reduction and Mitigation in Philippine Local Government Units Erwin A. Alamapy, Maricris Delos Santos, and Xavier Venn Asuncion

An Assessment of the Impact of GAD Programs on the Retention Intentions of Female Uniformed Personnel of the Philippine Navy Michelle C. Castillo

Contextualizing Inclusive Business: Amelioration of ASEAN Economic Community Arman V. Cruz

The impact of mobile financial services in low- and lower middle-income countries Erwin A. Alampay, Goodiel Charles Moshi, Ishita Ghosh, Mina Lyn C. Peralta and Juliana Harshanti

How Cities Are Promoting Clean Energy and Dealing with Problems Along the Way Rizalino B. Cruz Impact Assessment Methods: Toward Institutional Impact Assessment Romeo B. Ocampo

Philippine Technocracy and Politico-administrative Realities During the Martial Law Period (1972–1986): Decentralization, Local governance and Autonomy Concerns of Prescient Technocrats Alex B. Brillantes, Jr. and Abigail Modino

Policy Reforms to Improve the Quality of Public Services in the Philippines Maria Fe Villamejor-Mendoza

Compliance with, and Effective Implementation of Multilateral Environmental Agreements: Looking Back at the Transboundary Haze Pollution Problem in the ASEAN Region Ebinezer R. Florano, Ph.D.

ASEAN, Food Security, and Land Rights: Enlarging a Democratic Space for Public Services in the ASEAN Maria Faina L. Diola, DPA

Public Finance in the ASEAN: Trend and Patterns Jocelyn C. Cuaresma, DPA

Private Sector Engagement in Climate Change Mitigation and Adaptation: Implications in Regional Governance Maria Fe Villamejor-Mendoza , Ph.D.

Philippine Response to Curb Human Trafficking of Migrant Workers Lizan Perante-Calina

Local Heritage Networking for ASEAN Connectivity Salvacion Manuel-Arlante

Financing Universal Healthcare and the ASEAN: Focus on the Philippine Sin Tax Law Abigail A. Modino

Decentralized Local Governance in Asian Region:Good Practices of Mandaluyong City, Philippines Rose Gay E. Gonzales- Castaneda

Disaster-Resilient Community Index: Measuring the Resiliency of Barangays in Tacloban, Iligan, Dagupan and Marikina Ebinezer R. Florano , Ph.D.

Towards Attaining the Vision “Pasig Green City”: Thinking Strategically, Acting Democratically Ebinezer R. Florano , Ph.D.

Community Governance for Disaster Recovery and Resilience: Four Case Studies in the Philippines  Ebinezer R. Florano , Ph.D.

Mainstreaming Integrated Climate Change Adaptation and Disaster Risk Reduction in Local development Plans in the Philippines Ebinezer R. Florano , Ph.D. 

Building Back a Better Nation: Disaster Rehabilitation and Recovery in the Philippines Ebinezer R. Florano , Ph.D.  and Joe-Mar S. Perez

The New Public Management Then and Now: Lessons from the Transition in Central and Eastern Europe Wolfgang Drechsler and Tiina Randma-Liiv

Optimizing ICT Budgets through eGovernment Projects Harmonization Erwin A. Alampay

ICT Sector Performance Review for Philippines Erwin A. Alampay

The Challenges to the Futures of Public Administration Education Maria Fe Villamejor-Mendoza

Enhancing Trust and Performance in the Philippine Public Enterprises: A Revisit of Recent Reforms and Transformations Maria Fe Villamejor-Mendoza

The Legal Framework for the Philippine Third Sector: Progressive or Regressive? Ma. Oliva Z. Domingo

Roles of Community and Communal Law in Disaster Management in the Philippines: The Case of Dagupan City Ebinezer R. Florano

Revisiting Meritocracy in Asian Settings: Dimensions of colonial Influences and Indigenous Traditions Danilo R. Reyes

The openness of the University of the Philippines Open University: Issues and Prospects Maria Fe Villamejor-Mendoza

Equity and Fairness in Public-Private Partnerships: The Case of Airport Infrastructure Development in the Philippines Maria Fe Villamejor- Mendoza

Restoring Trust and Building Integrity in Government: Issues and Concerns in the Philippines and Areas for Reform Alex B. Brillantes, Jr. and Maricel T. Fernandez

Competition in Electricity Markets: The Case of the Philippines  Maria Fe Villamejor-Mendoza

Economic Reforms for Philippine Competitiveness, UP Open University Maria Fe Villamejor-Mendoza and G.H. Ambat (Eds) 

Open Access to Educational Resources: The Wave of the Future? Maria Fe Villamejor-Mendoza

Climate Change Governance in the Philippines and Means of Implementation diagram Ebinezer R. Florano

Mobile 2.0: M-money for the BoP in the Philippines Erwin A. Alampay and Gemma Bala

When Social Networking Websites Meet Mobile Commerce Erwin A. Alampay 

Monitoring Employee Use of the Internet in Philippine Organizations Erwin A. Alampay 

Living the Information Society Erwin A. Alampay

Analysing Socio-Demographic Differences in the Access & Use of ICTs in the Philippines Using the Capability Approach, Electronic Journal of Information Systems in Developing Countries Erwin A. Alampay

Measuring Capabilities in the Information Society Erwin A. Alampay

Modes of Learning and Performance Among U.P. Open University Graduates, Electronic Journal of Information Systems in Developing Countries Victoria A. Bautista and Ma. Anna T. Quimbo

Copyright © 2024 | NCPAG

  • Share full article

published research papers

A Columbia Surgeon’s Study Was Pulled. He Kept Publishing Flawed Data.

The quiet withdrawal of a 2021 cancer study by Dr. Sam Yoon highlights scientific publishers’ lack of transparency around data problems.

Supported by

Benjamin Mueller

By Benjamin Mueller

Benjamin Mueller covers medical science and has reported on several research scandals.

  • Feb. 15, 2024

The stomach cancer study was shot through with suspicious data. Identical constellations of cells were said to depict separate experiments on wholly different biological lineages. Photos of tumor-stricken mice, used to show that a drug reduced cancer growth, had been featured in two previous papers describing other treatments.

Problems with the study were severe enough that its publisher, after finding that the paper violated ethics guidelines, formally withdrew it within a few months of its publication in 2021. The study was then wiped from the internet, leaving behind a barren web page that said nothing about the reasons for its removal.

As it turned out, the flawed study was part of a pattern. Since 2008, two of its authors — Dr. Sam S. Yoon, chief of a cancer surgery division at Columbia University’s medical center, and a more junior cancer biologist — have collaborated with a rotating cast of researchers on a combined 26 articles that a British scientific sleuth has publicly flagged for containing suspect data. A medical journal retracted one of them this month after inquiries from The New York Times.

A person walks across a covered walkway connecting two buildings over a road with parked cars. A large, blue sign on the walkway says "Columbia University Irving Medical Center."

Memorial Sloan Kettering Cancer Center, where Dr. Yoon worked when much of the research was done, is now investigating the studies. Columbia’s medical center declined to comment on specific allegations, saying only that it reviews “any concerns about scientific integrity brought to our attention.”

Dr. Yoon, who has said his research could lead to better cancer treatments , did not answer repeated questions. Attempts to speak to the other researcher, Changhwan Yoon, an associate research scientist at Columbia, were also unsuccessful.

The allegations were aired in recent months in online comments on a science forum and in a blog post by Sholto David, an independent molecular biologist. He has ferreted out problems in a raft of high-profile cancer research , including dozens of papers at a Harvard cancer center that were subsequently referred for retractions or corrections.

From his flat in Wales , Dr. David pores over published images of cells, tumors and mice in his spare time and then reports slip-ups, trying to close the gap between people’s regard for academic research and the sometimes shoddier realities of the profession.

When evaluating scientific images, it is difficult to distinguish sloppy copy-and-paste errors from deliberate doctoring of data. Two other imaging experts who reviewed the allegations at the request of The Times said some of the discrepancies identified by Dr. David bore signs of manipulation, like flipped, rotated or seemingly digitally altered images.

Armed with A.I.-powered detection tools, scientists and bloggers have recently exposed a growing body of such questionable research, like the faulty papers at Harvard’s Dana-Farber Cancer Institute and studies by Stanford’s president that led to his resignation last year.

But those high-profile cases were merely the tip of the iceberg, experts said. A deeper pool of unreliable research has gone unaddressed for years, shielded in part by powerful scientific publishers driven to put out huge volumes of studies while avoiding the reputational damage of retracting them publicly.

The quiet removal of the 2021 stomach cancer study from Dr. Yoon’s lab, a copy of which was reviewed by The Times, illustrates how that system of scientific publishing has helped enable faulty research, experts said. In some cases, critical medical fields have remained seeded with erroneous studies.

“The journals do the bare minimum,” said Elisabeth Bik, a microbiologist and image expert who described Dr. Yoon’s papers as showing a worrisome pattern of copied or doctored data. “There’s no oversight.”

Memorial Sloan Kettering, where portions of the stomach cancer research were done, said no one — not the journal nor the researchers — had ever told administrators that the paper was withdrawn or why it had been. The study said it was supported in part by federal funding given to the cancer center.

Dr. Yoon, a stomach cancer specialist and a proponent of robotic surgery, kept climbing the academic ranks, bringing his junior researcher along with him. In September 2021, around the time the study was published, he joined Columbia, which celebrated his prolific research output in a news release . His work was financed in part by half a million dollars in federal research money that year, adding to a career haul of nearly $5 million in federal funds.

The decision by the stomach cancer study’s publisher, Elsevier, not to post an explanation for the paper’s removal made it less likely that the episode would draw public attention or affect the duo’s work. That very study continued to be cited in papers by other scientists .

And as recently as last year, Dr. Yoon’s lab published more studies containing identical images that were said to depict separate experiments, according to Dr. David’s analyses.

The researchers’ suspicious publications stretch back 16 years. Over time, relatively minor image copies in papers by Dr. Yoon gave way to more serious discrepancies in studies he collaborated on with Changhwan Yoon, Dr. David said. The pair, who are not related, began publishing articles together around 2013.

But neither their employers nor their publishers seemed to start investigating their work until this past fall, when Dr. David published his initial findings on For Better Science, a blog, and notified Memorial Sloan Kettering, Columbia and the journals. Memorial Sloan Kettering said it began its investigation then.

None of those flagged studies was retracted until last week. Three days after The Times asked publishers about the allegations, the journal Oncotarget retracted a 2016 study on combating certain pernicious cancers. In a retraction notice , the journal said the authors’ explanations for copied images “were deemed unacceptable.”

The belated action was symptomatic of what experts described as a broken system for policing scientific research.

A proliferation of medical journals, they said, has helped fuel demand for ever more research articles. But those same journals, many of them operated by multibillion-dollar publishing companies, often respond slowly or do nothing at all once one of those articles is shown to contain copied data. Journals retract papers at a fraction of the rate at which they publish ones with problems.

Springer Nature, which published nine of the articles that Dr. David said contained discrepancies across five journals, said it was investigating concerns. So did the American Association for Cancer Research, which published 10 articles under question from Dr. Yoon’s lab across four journals.

It is difficult to know who is responsible for errors in articles. Eleven of the scientists’ co-authors, including researchers at Harvard, Duke and Georgetown, did not answer emailed inquiries.

The articles under question examined why certain stomach and soft-tissue cancers withstood treatment, and how that resistance could be overcome.

The two independent image specialists said the volume of copied data, along with signs that some images had been rotated or similarly manipulated, suggested considerable sloppiness or worse.

“There are examples in this set that raise pretty serious red flags for the possibility of misconduct,” said Dr. Matthew Schrag, a Vanderbilt University neurologist who commented as part of his outside work on research integrity.

One set of 10 articles identified by Dr. David showed repeated reuse of identical or overlapping black-and-white images of cancer cells supposedly under different experimental conditions, he said.

“There’s no reason to have done that unless you weren’t doing the work,” Dr. David said.

One of those papers , published in 2012, was formally tagged with corrections. Unlike later studies, which were largely overseen by Dr. Yoon in New York, this paper was written by South Korea-based scientists, including Changhwan Yoon, who then worked in Seoul.

An immunologist in Norway randomly selected the paper as part of a screening of copied data in cancer journals. That led the paper’s publisher, the medical journal Oncogene, to add corrections in 2016.

But the journal did not catch all of the duplicated data , Dr. David said. And, he said, images from the study later turned up in identical form in another paper that remains uncorrected.

Copied cancer data kept recurring, Dr. David said. A picture of a small red tumor from a 2017 study reappeared in papers in 2020 and 2021 under different descriptions, he said. A ruler included in the pictures for scale wound up in two different positions.

The 2020 study included another tumor image that Dr. David said appeared to be a mirror image of one previously published by Dr. Yoon’s lab. And the 2021 study featured a color version of a tumor that had appeared in an earlier paper atop a different section of ruler, Dr. David said.

“This is another example where this looks intentionally done,” Dr. Bik said.

The researchers were faced with more serious action when the publisher Elsevier withdrew the stomach cancer study that had been published online in 2021. “The editors determined that the article violated journal publishing ethics guidelines,” Elsevier said.

Roland Herzog, the editor of Molecular Therapy, the journal where the article appeared, said that “image duplications were noticed” as part of a process of screening for discrepancies that the journal has since continued to beef up.

Because the problems were detected before the study was ever published in the print journal, Elsevier’s policy dictated that the article be taken down and no explanation posted online.

But that decision appeared to conflict with industry guidelines from the Committee on Publication Ethics . Posting articles online “usually constitutes publication,” those guidelines state. And when publishers pull such articles, the guidelines say, they should keep the work online for the sake of transparency and post “a clear notice of retraction.”

Dr. Herzog said he personally hoped that such an explanation could still be posted for the stomach cancer study. The journal editors and Elsevier, he said, are examining possible options.

The editors notified Dr. Yoon and Changhwan Yoon of the article’s removal, but neither scientist alerted Memorial Sloan Kettering, the hospital said. Columbia did not say whether it had been told.

Experts said the handling of the article was symptomatic of a tendency on the part of scientific publishers to obscure reports of lapses .

“This is typical, sweeping-things-under-the-rug kind of nonsense,” said Dr. Ivan Oransky, co-founder of Retraction Watch, which keeps a database of 47,000-plus retracted papers. “This is not good for the scientific record, to put it mildly.”

Susan C. Beachy contributed research.

Benjamin Mueller reports on health and medicine. He was previously a U.K. correspondent in London and a police reporter in New York. More about Benjamin Mueller

Advertisement

  • Skip to Guides Search
  • Skip to breadcrumb
  • Skip to main content
  • Skip to footer
  • Skip to chat link
  • Report accessibility issues and get help
  • Go to Penn Libraries Home
  • Go to Franklin catalog

Critical Writing Program: Decision Making - Spring 2024: Researching the White Paper

  • Getting started
  • News and Opinion Sites
  • Academic Sources
  • Grey Literature
  • Substantive News Sources
  • What to Do When You Are Stuck
  • Understanding a citation
  • Examples of Quotation
  • Examples of Paraphrase
  • Chicago Manual of Style: Citing Images
  • Researching the Op-Ed
  • Researching Prospective Employers
  • Resume Resources
  • Cover Letter Resources

Research the White Paper

Researching the White Paper:

The process of researching and composing a white paper shares some similarities with the kind of research and writing one does for a high school or college research paper. What’s important for writers of white papers to grasp, however, is how much this genre differs from a research paper.  First, the author of a white paper already recognizes that there is a problem to be solved, a decision to be made, and the job of the author is to provide readers with substantive information to help them make some kind of decision--which may include a decision to do more research because major gaps remain. 

Thus, a white paper author would not “brainstorm” a topic. Instead, the white paper author would get busy figuring out how the problem is defined by those who are experiencing it as a problem. Typically that research begins in popular culture--social media, surveys, interviews, newspapers. Once the author has a handle on how the problem is being defined and experienced, its history and its impact, what people in the trenches believe might be the best or worst ways of addressing it, the author then will turn to academic scholarship as well as “grey” literature (more about that later).  Unlike a school research paper, the author does not set out to argue for or against a particular position, and then devote the majority of effort to finding sources to support the selected position.  Instead, the author sets out in good faith to do as much fact-finding as possible, and thus research is likely to present multiple, conflicting, and overlapping perspectives. When people research out of a genuine desire to understand and solve a problem, they listen to every source that may offer helpful information. They will thus have to do much more analysis, synthesis, and sorting of that information, which will often not fall neatly into a “pro” or “con” camp:  Solution A may, for example, solve one part of the problem but exacerbate another part of the problem. Solution C may sound like what everyone wants, but what if it’s built on a set of data that have been criticized by another reliable source?  And so it goes. 

For example, if you are trying to write a white paper on the opioid crisis, you may focus on the value of  providing free, sterilized needles--which do indeed reduce disease, and also provide an opportunity for the health care provider distributing them to offer addiction treatment to the user. However, the free needles are sometimes discarded on the ground, posing a danger to others; or they may be shared; or they may encourage more drug usage. All of those things can be true at once; a reader will want to know about all of these considerations in order to make an informed decision. That is the challenging job of the white paper author.     
 The research you do for your white paper will require that you identify a specific problem, seek popular culture sources to help define the problem, its history, its significance and impact for people affected by it.  You will then delve into academic and grey literature to learn about the way scholars and others with professional expertise answer these same questions. In this way, you will create creating a layered, complex portrait that provides readers with a substantive exploration useful for deliberating and decision-making. You will also likely need to find or create images, including tables, figures, illustrations or photographs, and you will document all of your sources. 

Business & Research Support Services Librarian

Profile Photo

Connect to a Librarian Live Chat or "Ask a Question"

  • Librarians staff live chat from 9-5 Monday through Friday . You can also text to chat: 215-543-7674
  • You can submit a question 24 hours a day and we aim to respond within 24 hours 
  • You can click the "Schedule Appointment" button above in librarian's profile box (to the left), to schedule a consultation with her in person or by video conference.  
  • You can also make an appointment with a  Librarian by subject specialization . 
  • Connect by email with a subject librarian

Find more easy contacts at our Quick Start Guide

  • Next: Getting started >>
  • Last Updated: Feb 15, 2024 12:28 PM
  • URL: https://guides.library.upenn.edu/spring2024/decision-making

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 19 February 2024

Genomic data in the All of Us Research Program

The all of us research program genomics investigators.

Nature ( 2024 ) Cite this article

6878 Accesses

452 Altmetric

Metrics details

  • Genetic variation
  • Genome-wide association studies

Comprehensively mapping the genetic basis of human disease across diverse individuals is a long-standing goal for the field of human genetics 1 , 2 , 3 , 4 . The All of Us Research Program is a longitudinal cohort study aiming to enrol a diverse group of at least one million individuals across the USA to accelerate biomedical research and improve human health 5 , 6 . Here we describe the programme’s genomics data release of 245,388 clinical-grade genome sequences. This resource is unique in its diversity as 77% of participants are from communities that are historically under-represented in biomedical research and 46% are individuals from under-represented racial and ethnic minorities. All of Us identified more than 1 billion genetic variants, including more than 275 million previously unreported genetic variants, more than 3.9 million of which had coding consequences. Leveraging linkage between genomic data and the longitudinal electronic health record, we evaluated 3,724 genetic variants associated with 117 diseases and found high replication rates across both participants of European ancestry and participants of African ancestry. Summary-level data are publicly available, and individual-level data can be accessed by researchers through the All of Us Researcher Workbench using a unique data passport model with a median time from initial researcher registration to data access of 29 hours. We anticipate that this diverse dataset will advance the promise of genomic medicine for all.

Comprehensively identifying genetic variation and cataloguing its contribution to health and disease, in conjunction with environmental and lifestyle factors, is a central goal of human health research 1 , 2 . A key limitation in efforts to build this catalogue has been the historic under-representation of large subsets of individuals in biomedical research including individuals from diverse ancestries, individuals with disabilities and individuals from disadvantaged backgrounds 3 , 4 . The All of Us Research Program (All of Us) aims to address this gap by enrolling and collecting comprehensive health data on at least one million individuals who reflect the diversity across the USA 5 , 6 . An essential component of All of Us is the generation of whole-genome sequence (WGS) and genotyping data on one million participants. All of Us is committed to making this dataset broadly useful—not only by democratizing access to this dataset across the scientific community but also to return value to the participants themselves by returning individual DNA results, such as genetic ancestry, hereditary disease risk and pharmacogenetics according to clinical standards, to those who wish to receive these research results.

Here we describe the release of WGS data from 245,388 All of Us participants and demonstrate the impact of this high-quality data in genetic and health studies. We carried out a series of data harmonization and quality control (QC) procedures and conducted analyses characterizing the properties of the dataset including genetic ancestry and relatedness. We validated the data by replicating well-established genotype–phenotype associations including low-density lipoprotein cholesterol (LDL-C) and 117 additional diseases. These data are available through the All of Us Researcher Workbench, a cloud platform that embodies and enables programme priorities, facilitating equitable data and compute access while ensuring responsible conduct of research and protecting participant privacy through a passport data access model.

The All of Us Research Program

To accelerate health research, All of Us is committed to curating and releasing research data early and often 6 . Less than five years after national enrolment began in 2018, this fifth data release includes data from more than 413,000 All of Us participants. Summary data are made available through a public Data Browser, and individual-level participant data are made available to researchers through the Researcher Workbench (Fig. 1a and Data availability).

figure 1

a , The All of Us Research Hub contains a publicly accessible Data Browser for exploration of summary phenotypic and genomic data. The Researcher Workbench is a secure cloud-based environment of participant-level data in a Controlled Tier that is widely accessible to researchers. b , All of Us participants have rich phenotype data from a combination of physical measurements, survey responses, EHRs, wearables and genomic data. Dots indicate the presence of the specific data type for the given number of participants. c , Overall summary of participants under-represented in biomedical research (UBR) with data available in the Controlled Tier. The All of Us logo in a is reproduced with permission of the National Institutes of Health’s All of Us Research Program.

Participant data include a rich combination of phenotypic and genomic data (Fig. 1b ). Participants are asked to complete consent for research use of data, sharing of electronic health records (EHRs), donation of biospecimens (blood or saliva, and urine), in-person provision of physical measurements (height, weight and blood pressure) and surveys initially covering demographics, lifestyle and overall health 7 . Participants are also consented for recontact. EHR data, harmonized using the Observational Medical Outcomes Partnership Common Data Model 8 ( Methods ), are available for more than 287,000 participants (69.42%) from more than 50 health care provider organizations. The EHR dataset is longitudinal, with a quarter of participants having 10 years of EHR data (Extended Data Fig. 1 ). Data include 245,388 WGSs and genome-wide genotyping on 312,925 participants. Sequenced and genotyped individuals in this data release were not prioritized on the basis of any clinical or phenotypic feature. Notably, 99% of participants with WGS data also have survey data and physical measurements, and 84% also have EHR data. In this data release, 77% of individuals with genomic data identify with groups historically under-represented in biomedical research, including 46% who self-identify with a racial or ethnic minority group (Fig. 1c , Supplementary Table 1 and Supplementary Note ).

Scaling the All of Us infrastructure

The genomic dataset generated from All of Us participants is a resource for research and discovery and serves as the basis for return of individual health-related DNA results to participants. Consequently, the US Food and Drug Administration determined that All of Us met the criteria for a significant risk device study. As such, the entire All of Us genomics effort from sample acquisition to sequencing meets clinical laboratory standards 9 .

All of Us participants were recruited through a national network of partners, starting in 2018, as previously described 5 . Participants may enrol through All of Us - funded health care provider organizations or direct volunteer pathways and all biospecimens, including blood and saliva, are sent to the central All of Us Biobank for processing and storage. Genomics data for this release were generated from blood-derived DNA. The programme began return of actionable genomic results in December 2022. As of April 2023, approximately 51,000 individuals were sent notifications asking whether they wanted to view their results, and approximately half have accepted. Return continues on an ongoing basis.

The All of Us Data and Research Center maintains all participant information and biospecimen ID linkage to ensure that participant confidentiality and coded identifiers (participant and aliquot level) are used to track each sample through the All of Us genomics workflow. This workflow facilitates weekly automated aliquot and plating requests to the Biobank, supplies relevant metadata for the sample shipments to the Genome Centers, and contains a feedback loop to inform action on samples that fail QC at any stage. Further, the consent status of each participant is checked before sample shipment to confirm that they are still active. Although all participants with genomic data are consented for the same general research use category, the programme accommodates different preferences for the return of genomic data to participants and only data for those individuals who have consented for return of individual health-related DNA results are distributed to the All of Us Clinical Validation Labs for further evaluation and health-related clinical reporting. All participants in All of Us that choose to get health-related DNA results have the option to schedule a genetic counselling appointment to discuss their results. Individuals with positive findings who choose to obtain results are required to schedule an appointment with a genetic counsellor to receive those findings.

Genome sequencing

To satisfy the requirements for clinical accuracy, precision and consistency across DNA sample extraction and sequencing, the All of Us Genome Centers and Biobank harmonized laboratory protocols, established standard QC methodologies and metrics, and conducted a series of validation experiments using previously characterized clinical samples and commercially available reference standards 9 . Briefly, PCR-free barcoded WGS libraries were constructed with the Illumina Kapa HyperPrep kit. Libraries were pooled and sequenced on the Illumina NovaSeq 6000 instrument. After demultiplexing, initial QC analysis is performed with the Illumina DRAGEN pipeline (Supplementary Table 2 ) leveraging lane, library, flow cell, barcode and sample level metrics as well as assessing contamination, mapping quality and concordance to genotyping array data independently processed from a different aliquot of DNA. The Genome Centers use these metrics to determine whether each sample meets programme specifications and then submits sequencing data to the Data and Research Center for further QC, joint calling and distribution to the research community ( Methods ).

This effort to harmonize sequencing methods, multi-level QC and use of identical data processing protocols mitigated the variability in sequencing location and protocols that often leads to batch effects in large genomic datasets 9 . As a result, the data are not only of clinical-grade quality, but also consistent in coverage (≥30× mean) and uniformity across Genome Centers (Supplementary Figs. 1 – 5 ).

Joint calling and variant discovery

We carried out joint calling across the entire All of Us WGS dataset (Extended Data Fig. 2 ). Joint calling leverages information across samples to prune artefact variants, which increases sensitivity, and enables flagging samples with potential issues that were missed during single-sample QC 10 (Supplementary Table 3 ). Scaling conventional approaches to whole-genome joint calling beyond 50,000 individuals is a notable computational challenge 11 , 12 . To address this, we developed a new cloud variant storage solution, the Genomic Variant Store (GVS), which is based on a schema designed for querying and rendering variants in which the variants are stored in GVS and rendered to an analysable variant file, as opposed to the variant file being the primary storage mechanism (Code availability). We carried out QC on the joint call set on the basis of the approach developed for gnomAD 3.1 (ref.  13 ). This included flagging samples with outlying values in eight metrics (Supplementary Table 4 , Supplementary Fig. 2 and Methods ).

To calculate the sensitivity and precision of the joint call dataset, we included four well-characterized samples. We sequenced the National Institute of Standards and Technology reference materials (DNA samples) from the Genome in a Bottle consortium 13 and carried out variant calling as described above. We used the corresponding published set of variant calls for each sample as the ground truth in our sensitivity and precision calculations 14 . The overall sensitivity for single-nucleotide variants was over 98.7% and precision was more than 99.9%. For short insertions or deletions, the sensitivity was over 97% and precision was more than 99.6% (Supplementary Table 5 and Methods ).

The joint call set included more than 1 billion genetic variants. We annotated the joint call dataset on the basis of functional annotation (for example, gene symbol and protein change) using Illumina Nirvana 15 . We defined coding variants as those inducing an amino acid change on a canonical ENSEMBL transcript and found 272,051,104 non-coding and 3,913,722 coding variants that have not been described previously in dbSNP 16 v153 (Extended Data Table 1 ). A total of 3,912,832 (99.98%) of the coding variants are rare (allelic frequency < 0.01) and the remaining 883 (0.02%) are common (allelic frequency > 0.01). Of the coding variants, 454 (0.01%) are common in one or more of the non-European computed ancestries in All of Us, rare among participants of European ancestry, and have an allelic number greater than 1,000 (Extended Data Table 2 and Extended Data Fig. 3 ). The distributions of pathogenic, or likely pathogenic, ClinVar variant counts per participant, stratified by computed ancestry, filtered to only those variants that are found in individuals with an allele count of <40 are shown in Extended Data Fig. 4 . The potential medical implications of these known and new variants with respect to variant pathogenicity by ancestry are highlighted in a companion paper 17 . In particular, we find that the European ancestry subset has the highest rate of pathogenic variation (2.1%), which was twice the rate of pathogenic variation in individuals of East Asian ancestry 17 .The lower frequency of variants in East Asian individuals may be partially explained by the fact the sample size in that group is small and there may be knowledge bias in the variant databases that is reducing the number of findings in some of the less-studied ancestry groups.

Genetic ancestry and relatedness

Genetic ancestry inference confirmed that 51.1% of the All of Us WGS dataset is derived from individuals of non-European ancestry. Briefly, the ancestry categories are based on the same labels used in gnomAD 18 . We trained a classifier on a 16-dimensional principal component analysis (PCA) space of a diverse reference based on 3,202 samples and 151,159 autosomal single-nucleotide polymorphisms. We projected the All of Us samples into the PCA space of the training data, based on the same single-nucleotide polymorphisms from the WGS data, and generated categorical ancestry predictions from the trained classifier ( Methods ). Continuous genetic ancestry fractions for All of Us samples were inferred using the same PCA data, and participants’ patterns of ancestry and admixture were compared to their self-identified race and ethnicity (Fig. 2 and Methods ). Continuous ancestry inference carried out using genome-wide genotypes yields highly concordant estimates.

figure 2

a , b , Uniform manifold approximation and projection (UMAP) representations of All of Us WGS PCA data with self-described race ( a ) and ethnicity ( b ) labels. c , Proportion of genetic ancestry per individual in six distinct and coherent ancestry groups defined by Human Genome Diversity Project and 1000 Genomes samples.

Kinship estimation confirmed that All of Us WGS data consist largely of unrelated individuals with about 85% (215,107) having no first- or second-degree relatives in the dataset (Supplementary Fig. 6 ). As many genomic analyses leverage unrelated individuals, we identified the smallest set of samples that are required to be removed from the remaining individuals that had first- or second-degree relatives and retained one individual from each kindred. This procedure yielded a maximal independent set of 231,442 individuals (about 94%) with genome sequence data in the current release ( Methods ).

Genetic determinants of LDL-C

As a measure of data quality and utility, we carried out a single-variant genome-wide association study (GWAS) for LDL-C, a trait with well-established genomic architecture ( Methods ). Of the 245,388 WGS participants, 91,749 had one or more LDL-C measurements. The All of Us LDL-C GWAS identified 20 well-established genome-wide significant loci, with minimal genomic inflation (Fig. 3 , Extended Data Table 3 and Supplementary Fig. 7 ). We compared the results to those of a recent multi-ethnic LDL-C GWAS in the National Heart, Lung, and Blood Institute (NHLBI) TOPMed study that included 66,329 ancestrally diverse (56% non-European ancestry) individuals 19 . We found a strong correlation between the effect estimates for NHLBI TOPMed genome-wide significant loci and those of All of Us ( R 2  = 0.98, P  < 1.61 × 10 −45 ; Fig. 3 , inset). Notably, the per-locus effect sizes observed in All of Us are decreased compared to those in TOPMed, which is in part due to differences in the underlying statistical model, differences in the ancestral composition of these datasets and differences in laboratory value ascertainment between EHR-derived data and epidemiology studies. A companion manuscript extended this work to identify common and rare genetic associations for three diseases (atrial fibrillation, coronary artery disease and type 2 diabetes) and two quantitative traits (height and LDL-C) in the All of Us dataset and identified very high concordance with previous efforts across all of these diseases and traits 20 .

figure 3

Manhattan plot demonstrating robust replication of 20 well-established LDL-C genetic loci among 91,749 individuals with 1 or more LDL-C measurements. The red horizontal line denotes the genome wide significance threshold of P = 5 × 10 –8 . Inset, effect estimate ( β ) comparison between NHLBI TOPMed LDL-C GWAS ( x  axis) and All of Us LDL-C GWAS ( y  axis) for the subset of 194 independent variants clumped (window 250 kb, r2 0.5) that reached genome-wide significance in NHLBI TOPMed.

Genotype-by-phenotype associations

As another measure of data quality and utility, we tested replication rates of previously reported phenotype–genotype associations in the five predicted genetic ancestry populations present in the Phenotype/Genotype Reference Map (PGRM): AFR, African ancestry; AMR, Latino/admixed American ancestry; EAS, East Asian ancestry; EUR, European ancestry; SAS, South Asian ancestry. The PGRM contains published associations in the GWAS catalogue in these ancestry populations that map to International Classification of Diseases-based phenotype codes 21 . This replication study specifically looked across 4,947 variants, calculating replication rates for powered associations in each ancestry population. The overall replication rates for associations powered at 80% were: 72.0% (18/25) in AFR, 100% (13/13) in AMR, 46.6% (7/15) in EAS, 74.9% (1,064/1,421) in EUR, and 100% (1/1) in SAS. With the exception of the EAS ancestry results, these powered replication rates are comparable to those of the published PGRM analysis where the replication rates of several single-site EHR-linked biobanks ranges from 76% to 85%. These results demonstrate the utility of the data and also highlight opportunities for further work understanding the specifics of the All of Us population and the potential contribution of gene–environment interactions to genotype–phenotype mapping and motivates the development of methods for multi-site EHR phenotype data extraction, harmonization and genetic association studies.

More broadly, the All of Us resource highlights the opportunities to identify genotype–phenotype associations that differ across diverse populations 22 . For example, the Duffy blood group locus ( ACKR1 ) is more prevalent in individuals of AFR ancestry and individuals of AMR ancestry than in individuals of EUR ancestry. Although the phenome-wide association study of this locus highlights the well-established association of the Duffy blood group with lower white blood cell counts both in individuals of AFR and AMR ancestry 23 , 24 , it also revealed genetic-ancestry-specific phenotype patterns, with minimal phenotypic associations in individuals of EAS ancestry and individuals of EUR ancestry (Fig. 4 and Extended Data Table 4 ). Conversely, rs9273363 in the HLA-DQB1 locus is associated with increased risk of type 1 diabetes 25 , 26 and diabetic complications across ancestries, but only associates with increased risk of coeliac disease in individuals of EUR ancestry (Extended Data Fig. 5 ). Similarly, the TCF7L2 locus 27 strongly associates with increased risk of type 2 diabetes and associated complications across several ancestries (Extended Data Fig. 6 ). Association testing results are available in Supplementary Dataset 1 .

figure 4

Results of genetic-ancestry-stratified phenome-wide association analysis among unrelated individuals highlighting ancestry-specific disease associations across the four most common genetic ancestries of participant. Bonferroni-adjusted phenome-wide significance threshold (<2.88 × 10 −5 ) is plotted as a red horizontal line. AFR ( n  = 34,037, minor allele fraction (MAF) 0.82); AMR ( n  = 28,901, MAF 0.10); EAS ( n  = 32,55, MAF 0.003); EUR ( n  = 101,613, MAF 0.007).

The cloud-based Researcher Workbench

All of Us genomic data are available in a secure, access-controlled cloud-based analysis environment: the All of Us Researcher Workbench. Unlike traditional data access models that require per-project approval, access in the Researcher Workbench is governed by a data passport model based on a researcher’s authenticated identity, institutional affiliation, and completion of self-service training and compliance attestation 28 . After gaining access, a researcher may create a new workspace at any time to conduct a study, provided that they comply with all Data Use Policies and self-declare their research purpose. This information is regularly audited and made accessible publicly on the All of Us Research Projects Directory. This streamlined access model is guided by the principles that: participants are research partners and maintaining their privacy and data security is paramount; their data should be made as accessible as possible for authorized researchers; and we should continually seek to remove unnecessary barriers to accessing and using All of Us data.

For researchers at institutions with an existing institutional data use agreement, access can be gained as soon as they complete the required verification and compliance steps. As of August 2023, 556 institutions have agreements in place, allowing more than 5,000 approved researchers to actively work on more than 4,400 projects. The median time for a researcher from initial registration to completion of these requirements is 28.6 h (10th percentile: 48 min, 90th percentile: 14.9 days), a fraction of the weeks to months it can take to assemble a project-specific application and have it reviewed by an access board with conventional access models.

Given that the size of the project’s phenotypic and genomic dataset is expected to reach 4.75 PB in 2023, the use of a central data store and cloud analysis tools will save funders an estimated US$16.5 million per year when compared to the typical approach of allowing researchers to download genomic data. Storing one copy per institution of this data at 556 registered institutions would cost about US$1.16 billion per year. By contrast, storing a central cloud copy costs about US$1.14 million per year, a 99.9% saving. Importantly, cloud infrastructure also democratizes data access particularly for researchers who do not have high-performance local compute resources.

Here we present the All of Us Research Program’s approach to generating diverse clinical-grade genomic data at an unprecedented scale. We present the data release of about 245,000 genome sequences as part of a scalable framework that will grow to include genetic information and health data for one million or more people living across the USA. Our observations permit several conclusions.

First, the All of Us programme is making a notable contribution to improving the study of human biology through purposeful inclusion of under-represented individuals at scale 29 , 30 . Of the participants with genomic data in All of Us, 45.92% self-identified as a non-European race or ethnicity. This diversity enabled identification of more than 275 million new genetic variants across the dataset not previously captured by other large-scale genome aggregation efforts with diverse participants that have submitted variation to dbSNP v153, such as NHLBI TOPMed 31 freeze 8 (Extended Data Table 1 ). In contrast to gnomAD, All of Us permits individual-level genotype access with detailed phenotype data for all participants. Furthermore, unlike many genomics resources, All of Us is uniformly consented for general research use and enables researchers to go from initial account creation to individual-level data access in as little as a few hours. The All of Us cohort is significantly more diverse than those of other large contemporary research studies generating WGS data 32 , 33 . This enables a more equitable future for precision medicine (for example, through constructing polygenic risk scores that are appropriately calibrated to diverse populations 34 , 35 as the eMERGE programme has done leveraging All of Us data 36 , 37 ). Developing new tools and regulatory frameworks to enable analyses across multiple biobanks in the cloud to harness the unique strengths of each is an active area of investigation addressed in a companion paper to this work 38 .

Second, the All of Us Researcher Workbench embodies the programme’s design philosophy of open science, reproducible research, equitable access and transparency to researchers and to research participants 26 . Importantly, for research studies, no group of data users should have privileged access to All of Us resources based on anything other than data protection criteria. Although the All of Us Researcher Workbench initially targeted onboarding US academic, health care and non-profit organizations, it has recently expanded to international researchers. We anticipate further genomic and phenotypic data releases at regular intervals with data available to all researcher communities. We also anticipate additional derived data and functionality to be made available, such as reference data, structural variants and a service for array imputation using the All of Us genomic data.

Third, All of Us enables studying human biology at an unprecedented scale. The programmatic goal of sequencing one million or more genomes has required harnessing the output of multiple sequencing centres. Previous work has focused on achieving functional equivalence in data processing and joint calling pipelines 39 . To achieve clinical-grade data equivalence, All of Us required protocol equivalence at both sequencing production level and data processing across the sequencing centres. Furthermore, previous work has demonstrated the value of joint calling at scale 10 , 18 . The new GVS framework developed by the All of Us programme enables joint calling at extreme scales (Code availability). Finally, the provision of data access through cloud-native tools enables scalable and secure access and analysis to researchers while simultaneously enabling the trust of research participants and transparency underlying the All of Us data passport access model.

The clinical-grade sequencing carried out by All of Us enables not only research, but also the return of value to participants through clinically relevant genetic results and health-related traits to those who opt-in to receiving this information. In the years ahead, we anticipate that this partnership with All of Us participants will enable researchers to move beyond large-scale genomic discovery to understanding the consequences of implementing genomic medicine at scale.

The All of Us cohort

All of Us aims to engage a longitudinal cohort of one million or more US participants, with a focus on including populations that have historically been under-represented in biomedical research. Details of the All of Us cohort have been described previously 5 . Briefly, the primary objective is to build a robust research resource that can facilitate the exploration of biological, clinical, social and environmental determinants of health and disease. The programme will collect and curate health-related data and biospecimens, and these data and biospecimens will be made broadly available for research uses. Health data are obtained through the electronic medical record and through participant surveys. Survey templates can be found on our public website: https://www.researchallofus.org/data-tools/survey-explorer/ . Adults 18 years and older who have the capacity to consent and reside in the USA or a US territory at present are eligible. Informed consent for all participants is conducted in person or through an eConsent platform that includes primary consent, HIPAA Authorization for Research use of EHRs and other external health data, and Consent for Return of Genomic Results. The protocol was reviewed by the Institutional Review Board (IRB) of the All of Us Research Program. The All of Us IRB follows the regulations and guidance of the NIH Office for Human Research Protections for all studies, ensuring that the rights and welfare of research participants are overseen and protected uniformly.

Data accessibility through a ‘data passport’

Authorization for access to participant-level data in All of Us is based on a ‘data passport’ model, through which authorized researchers do not need IRB review for each research project. The data passport is required for gaining data access to the Researcher Workbench and for creating workspaces to carry out research projects using All of Us data. At present, data passports are authorized through a six-step process that includes affiliation with an institution that has signed a Data Use and Registration Agreement, account creation, identity verification, completion of ethics training, and attestation to a data user code of conduct. Results reported follow the All of Us Data and Statistics Dissemination Policy disallowing disclosure of group counts under 20 to protect participant privacy without seeking prior approval 40 .

At present, All of Us gathers EHR data from about 50 health care organizations that are funded to recruit and enrol participants as well as transfer EHR data for those participants who have consented to provide them. Data stewards at each provider organization harmonize their local data to the Observational Medical Outcomes Partnership (OMOP) Common Data Model, and then submit it to the All of Us Data and Research Center (DRC) so that it can be linked with other participant data and further curated for research use. OMOP is a common data model standardizing health information from disparate EHRs to common vocabularies and organized into tables according to data domains. EHR data are updated from the recruitment sites and sent to the DRC quarterly. Updated data releases to the research community occur approximately once a year. Supplementary Table 6 outlines the OMOP concepts collected by the DRC quarterly from the recruitment sites.

Biospecimen collection and processing

Participants who consented to participate in All of Us donated fresh whole blood (4 ml EDTA and 10 ml EDTA) as a primary source of DNA. The All of Us Biobank managed by the Mayo Clinic extracted DNA from 4 ml EDTA whole blood, and DNA was stored at −80 °C at an average concentration of 150 ng µl −1 . The buffy coat isolated from 10 ml EDTA whole blood has been used for extracting DNA in the case of initial extraction failure or absence of 4 ml EDTA whole blood. The Biobank plated 2.4 µg DNA with a concentration of 60 ng µl −1 in duplicate for array and WGS samples. The samples are distributed to All of Us Genome Centers weekly, and a negative (empty well) control and National Institute of Standards and Technology controls are incorporated every two months for QC purposes.

Genome Center sample receipt, accession and QC

On receipt of DNA sample shipments, the All of Us Genome Centers carry out an inspection of the packaging and sample containers to ensure that sample integrity has not been compromised during transport and to verify that the sample containers correspond to the shipping manifest. QC of the submitted samples also includes DNA quantification, using routine procedures to confirm volume and concentration (Supplementary Table 7 ). Any issues or discrepancies are recorded, and affected samples are put on hold until resolved. Samples that meet quality thresholds are accessioned in the Laboratory Information Management System, and sample aliquots are prepared for library construction processing (for example, normalized with respect to concentration and volume).

WGS library construction, sequencing and primary data QC

The DNA sample is first sheared using a Covaris sonicator and is then size-selected using AMPure XP beads to restrict the range of library insert sizes. Using the PCR Free Kapa HyperPrep library construction kit, enzymatic steps are completed to repair the jagged ends of DNA fragments, add proper A-base segments, and ligate indexed adapter barcode sequences onto samples. Excess adaptors are removed using AMPure XP beads for a final clean-up. Libraries are quantified using quantitative PCR with the Illumina Kapa DNA Quantification Kit and then normalized and pooled for sequencing (Supplementary Table 7 ).

Pooled libraries are loaded on the Illumina NovaSeq 6000 instrument. The data from the initial sequencing run are used to QC individual libraries and to remove non-conforming samples from the pipeline. The data are also used to calibrate the pooling volume of each individual library and re-pool the libraries for additional NovaSeq sequencing to reach an average coverage of 30×.

After demultiplexing, WGS analysis occurs on the Illumina DRAGEN platform. The DRAGEN pipeline consists of highly optimized algorithms for mapping, aligning, sorting, duplicate marking and haplotype variant calling and makes use of platform features such as compression and BCL conversion. Alignment uses the GRCh38dh reference genome. QC data are collected at every stage of the analysis protocol, providing high-resolution metrics required to ensure data consistency for large-scale multiplexing. The DRAGEN pipeline produces a large number of metrics that cover lane, library, flow cell, barcode and sample-level metrics for all runs as well as assessing contamination and mapping quality. The All of Us Genome Centers use these metrics to determine pass or fail for each sample before submitting the CRAM files to the All of Us DRC. For mapping and variant calling, all Genome Centers have harmonized on a set of DRAGEN parameters, which ensures consistency in processing (Supplementary Table 2 ).

Every step through the WGS procedure is rigorously controlled by predefined QC measures. Various control mechanisms and acceptance criteria were established during WGS assay validation. Specific metrics for reviewing and releasing genome data are: mean coverage (threshold of ≥30×), genome coverage (threshold of ≥90% at 20×), coverage of hereditary disease risk genes (threshold of ≥95% at 20×), aligned Q30 bases (threshold of ≥8 × 10 10 ), contamination (threshold of ≤1%) and concordance to independently processed array data.

Array genotyping

Samples are processed for genotyping at three All of Us Genome Centers (Broad, Johns Hopkins University and University of Washington). DNA samples are received from the Biobank and the process is facilitated by the All of Us genomics workflow described above. All three centres used an identical array product, scanners, resource files and genotype calling software for array processing to reduce batch effects. Each centre has its own Laboratory Information Management System that manages workflow control, sample and reagent tracking, and centre-specific liquid handling robotics.

Samples are processed using the Illumina Global Diversity Array (GDA) with Illumina Infinium LCG chemistry using the automated protocol and scanned on Illumina iSCANs with Automated Array Loaders. Illumina IAAP software converts raw data (IDAT files; 2 per sample) into a single GTC file per sample using the BPM file (defines strand, probe sequences and illumicode address) and the EGT file (defines the relationship between intensities and genotype calls). Files used for this data release are: GDA-8v1-0_A5.bpm, GDA-8v1-0_A1_ClusterFile.egt, gentrain v3, reference hg19 and gencall cutoff 0.15. The GDA array assays a total of 1,914,935 variant positions including 1,790,654 single-nucleotide variants, 44,172 indels, 9,935 intensity-only probes for CNV calling, and 70,174 duplicates (same position, different probes). Picard GtcToVcf is used to convert the GTC files to VCF format. Resulting VCF and IDAT files are submitted to the DRC for ingestion and further processing. The VCF file contains assay name, chromosome, position, genotype calls, quality score, raw and normalized intensities, B allele frequency and log R ratio values. Each genome centre is running the GDA array under Clinical Laboratory Improvement Amendments-compliant protocols. The GTC files are parsed and metrics are uploaded to in-house Laboratory Information Management System systems for QC review.

At batch level (each set of 96-well plates run together in the laboratory at one time), each genome centre includes positive control samples that are required to have >98% call rate and >99% concordance to existing data to approve release of the batch of data. At the sample level, the call rate and sex are the key QC determinants 41 . Contamination is also measured using BAFRegress 42 and reported out as metadata. Any sample with a call rate below 98% is repeated one time in the laboratory. Genotyped sex is determined by plotting normalized x versus normalized y intensity values for a batch of samples. Any sample discordant with ‘sex at birth’ reported by the All of Us participant is flagged for further detailed review and repeated one time in the laboratory. If several sex-discordant samples are clustered on an array or on a 96-well plate, the entire array or plate will have data production repeated. Samples identified with sex chromosome aneuploidies are also reported back as metadata (XXX, XXY, XYY and so on). A final processing status of ‘pass’, ‘fail’ or ‘abandon’ is determined before release of data to the All of Us DRC. An array sample will pass if the call rate is >98% and the genotyped sex and sex at birth are concordant (or the sex at birth is not applicable). An array sample will fail if the genotyped sex and the sex at birth are discordant. An array sample will have the status of abandon if the call rate is <98% after at least two attempts at the genome centre.

Data from the arrays are used for participant return of genetic ancestry and non-health-related traits for those who consent, and they are also used to facilitate additional QC of the matched WGS data. Contamination is assessed in the array data to determine whether DNA re-extraction is required before WGS. Re-extraction is prompted by level of contamination combined with consent status for return of results. The arrays are also used to confirm sample identity between the WGS data and the matched array data by assessing concordance at 100 unique sites. To establish concordance, a fingerprint file of these 100 sites is provided to the Genome Centers to assess concordance with the same sites in the WGS data before CRAM submission.

Genomic data curation

As seen in Extended Data Fig. 2 , we generate a joint call set for all WGS samples and make these data available in their entirety and by sample subsets to researchers. A breakdown of the frequencies, stratified by computed ancestries for which we had more than 10,000 participants can be found in Extended Data Fig. 3 . The joint call set process allows us to leverage information across samples to improve QC and increase accuracy.

Single-sample QC

If a sample fails single-sample QC, it is excluded from the release and is not reported in this document. These tests detect sample swaps, cross-individual contamination and sample preparation errors. In some cases, we carry out these tests twice (at both the Genome Center and the DRC), for two reasons: to confirm internal consistency between sites; and to mark samples as passing (or failing) QC on the basis of the research pipeline criteria. The single-sample QC process accepts a higher contamination rate than the clinical pipeline (0.03 for the research pipeline versus 0.01 for the clinical pipeline), but otherwise uses identical thresholds. The list of specific QC processes, passing criteria, error modes addressed and an overview of the results can be found in Supplementary Table 3 .

Joint call set QC

During joint calling, we carry out additional QC steps using information that is available across samples including hard thresholds, population outliers, allele-specific filters, and sensitivity and precision evaluation. Supplementary Table 4 summarizes both the steps that we took and the results obtained for the WGS data. More detailed information about the methods and specific parameters can be found in the All of Us Genomic Research Data Quality Report 36 .

Batch effect analysis

We analysed cross-sequencing centre batch effects in the joint call set. To quantify the batch effect, we calculated Cohen’s d (ref.  43 ) for four metrics (insertion/deletion ratio, single-nucleotide polymorphism count, indel count and single-nucleotide polymorphism transition/transversion ratio) across the three genome sequencing centres (Baylor College of Medicine, Broad Institute and University of Washington), stratified by computed ancestry and seven regions of the genome (whole genome, high-confidence calling, repetitive, GC content of >0.85, GC content of <0.15, low mappability, the ACMG59 genes and regions of large duplications (>1 kb)). Using random batches as a control set, all comparisons had a Cohen’s d of <0.35. Here we report any Cohen’s d results >0.5, which we chose before this analysis and is conventionally the threshold of a medium effect size 44 .

We found that there was an effect size in indel counts (Cohen’s d of 0.53) in the entire genome, between Broad Institute and University of Washington, but this was being driven by repetitive and low-mappability regions. We found no batch effects with Cohen’s d of >0.5 in the ratio metrics or in any metrics in the high-confidence calling, low or high GC content, or ACMG59 regions. A complete list of the batch effects with Cohen’s d of >0.5 are found in Supplementary Table 8 .

Sensitivity and precision evaluation

To determine sensitivity and precision, we included four well-characterized control samples (four National Institute of Standards and Technology Genome in a Bottle samples (HG-001, HG-003, HG-004 and HG-005). The samples were sequenced with the same protocol as All of Us. Of note, these samples were not included in data released to researchers. We used the corresponding published set of variant calls for each sample as the ground truth in our sensitivity and precision calculations. We use the high-confidence calling region, defined by Genome in a Bottle v4.2.1, as the source of ground truth. To be called a true positive, a variant must match the chromosome, position, reference allele, alternate allele and zygosity. In cases of sites with multiple alternative alleles, each alternative allele is considered separately. Sensitivity and precision results are reported in Supplementary Table 5 .

Genetic ancestry inference

We computed categorical ancestry for all WGS samples in All of Us and made these available to researchers. These predictions are also the basis for population allele frequency calculations in the Genomic Variants section of the public Data Browser. We used the high-quality set of sites to determine an ancestry label for each sample. The ancestry categories are based on the same labels used in gnomAD 18 , the Human Genome Diversity Project (HGDP) 45 and 1000 Genomes 1 : African (AFR); Latino/admixed American (AMR); East Asian (EAS); Middle Eastern (MID); European (EUR), composed of Finnish (FIN) and Non-Finnish European (NFE); Other (OTH), not belonging to one of the other ancestries or is an admixture; South Asian (SAS).

We trained a random forest classifier 46 on a training set of the HGDP and 1000 Genomes samples variants on the autosome, obtained from gnomAD 11 . We generated the first 16 principal components (PCs) of the training sample genotypes (using the hwe_normalized_pca in Hail) at the high-quality variant sites for use as the feature vector for each training sample. We used the truth labels from the sample metadata, which can be found alongside the VCFs. Note that we do not train the classifier on the samples labelled as Other. We use the label probabilities (‘confidence’) of the classifier on the other ancestries to determine ancestry of Other.

To determine the ancestry of All of Us samples, we project the All of Us samples into the PCA space of the training data and apply the classifier. As a proxy for the accuracy of our All of Us predictions, we look at the concordance between the survey results and the predicted ancestry. The concordance between self-reported ethnicity and the ancestry predictions was 87.7%.

PC data from All of Us samples and the HGDP and 1000 Genomes samples were used to compute individual participant genetic ancestry fractions for All of Us samples using the Rye program. Rye uses PC data to carry out rapid and accurate genetic ancestry inference on biobank-scale datasets 47 . HGDP and 1000 Genomes reference samples were used to define a set of six distinct and coherent ancestry groups—African, East Asian, European, Middle Eastern, Latino/admixed American and South Asian—corresponding to participant self-identified race and ethnicity groups. Rye was run on the first 16 PCs, using the defined reference ancestry groups to assign ancestry group fractions to individual All of Us participant samples.

Relatedness

We calculated the kinship score using the Hail pc_relate function and reported any pairs with a kinship score above 0.1. The kinship score is half of the fraction of the genetic material shared (ranges from 0.0 to 0.5). We determined the maximal independent set 41 for related samples. We identified a maximally unrelated set of 231,442 samples (94%) for kinship scored greater than 0.1.

LDL-C common variant GWAS

The phenotypic data were extracted from the Curated Data Repository (CDR, Control Tier Dataset v7) in the All of Us Researcher Workbench. The All of Us Cohort Builder and Dataset Builder were used to extract all LDL cholesterol measurements from the Lab and Measurements criteria in EHR data for all participants who have WGS data. The most recent measurements were selected as the phenotype and adjusted for statin use 19 , age and sex. A rank-based inverse normal transformation was applied for this continuous trait to increase power and deflate type I error. Analysis was carried out on the Hail MatrixTable representation of the All of Us WGS joint-called data including removing monomorphic variants, variants with a call rate of <95% and variants with extreme Hardy–Weinberg equilibrium values ( P  < 10 −15 ). A linear regression was carried out with REGENIE 48 on variants with a minor allele frequency >5%, further adjusting for relatedness to the first five ancestry PCs. The final analysis included 34,924 participants and 8,589,520 variants.

Genotype-by-phenotype replication

We tested replication rates of known phenotype–genotype associations in three of the four largest populations: EUR, AFR and EAS. The AMR population was not included because they have no registered GWAS. This method is a conceptual extension of the original GWAS × phenome-wide association study, which replicated 66% of powered associations in a single EHR-linked biobank 49 . The PGRM is an expansion of this work by Bastarache et al., based on associations in the GWAS catalogue 50 in June 2020 (ref.  51 ). After directly matching the Experimental Factor Ontology terms to phecodes, the authors identified 8,085 unique loci and 170 unique phecodes that compose the PGRM. They showed replication rates in several EHR-linked biobanks ranging from 76% to 85%. For this analysis, we used the EUR-, and AFR-based maps, considering only catalogue associations that were P  < 5 × 10 −8 significant.

The main tools used were the Python package Hail for data extraction, plink for genomic associations, and the R packages PheWAS and pgrm for further analysis and visualization. The phenotypes, participant-reported sex at birth, and year of birth were extracted from the All of Us CDR (Controlled Tier Dataset v7). These phenotypes were then loaded into a plink-compatible format using the PheWAS package, and related samples were removed by sub-setting to the maximally unrelated dataset ( n  = 231,442). Only samples with EHR data were kept, filtered by selected loci, annotated with demographic and phenotypic information extracted from the CDR and ancestry prediction information provided by All of Us, ultimately resulting in 181,345 participants for downstream analysis. The variants in the PGRM were filtered by a minimum population-specific allele frequency of >1% or population-specific allele count of >100, leaving 4,986 variants. Results for which there were at least 20 cases in the ancestry group were included. Then, a series of Firth logistic regression tests with phecodes as the outcome and variants as the predictor were carried out, adjusting for age, sex (for non-sex-specific phenotypes) and the first three genomic PC features as covariates. The PGRM was annotated with power calculations based on the case counts and reported allele frequencies. Power of 80% or greater was considered powered for this analysis.

Reporting summary

Further information on research design is available in the  Nature Portfolio Reporting Summary linked to this article.

Data availability

The All of Us Research Hub has a tiered data access data passport model with three data access tiers. The Public Tier dataset contains only aggregate data with identifiers removed. These data are available to the public through Data Snapshots ( https://www.researchallofus.org/data-tools/data-snapshots/ ) and the public Data Browser ( https://databrowser.researchallofus.org/ ). The Registered Tier curated dataset contains individual-level data, available only to approved researchers on the Researcher Workbench. At present, the Registered Tier includes data from EHRs, wearables and surveys, as well as physical measurements taken at the time of participant enrolment. The Controlled Tier dataset contains all data in the Registered Tier and additionally genomic data in the form of WGS and genotyping arrays, previously suppressed demographic data fields from EHRs and surveys, and unshifted dates of events. At present, Registered Tier and Controlled Tier data are available to researchers at academic institutions, non-profit institutions, and both non-profit and for-profit health care institutions. Work is underway to begin extending access to additional audiences, including industry-affiliated researchers. Researchers have the option to register for Registered Tier and/or Controlled Tier access by completing the All of Us Researcher Workbench access process, which includes identity verification and All of Us-specific training in research involving human participants ( https://www.researchallofus.org/register/ ). Researchers may create a new workspace at any time to conduct any research study, provided that they comply with all Data Use Policies and self-declare their research purpose. This information is made accessible publicly on the All of Us Research Projects Directory at https://allofus.nih.gov/protecting-data-and-privacy/research-projects-all-us-data .

Code availability

The GVS code is available at https://github.com/broadinstitute/gatk/tree/ah_var_store/scripts/variantstore . The LDL GWAS pipeline is available as a demonstration project in the Featured Workspace Library on the Researcher Workbench ( https://workbench.researchallofus.org/workspaces/aou-rw-5981f9dc/aouldlgwasregeniedsubctv6duplicate/notebooks ).

The 1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature 526 , 68–74 (2015).

Article   Google Scholar  

Claussnitzer, M. et al. A brief history of human disease genetics. Nature 577 , 179–189 (2020).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Wojcik, G. L. et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature 570 , 514–518 (2019).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Lewis, A. C. F. et al. Getting genetic ancestry right for science and society. Science 376 , 250–252 (2022).

All of Us Program Investigators. The “All of Us” Research Program. N. Engl. J. Med. 381 , 668–676 (2019).

Ramirez, A. H., Gebo, K. A. & Harris, P. A. Progress with the All of Us Research Program: opening access for researchers. JAMA 325 , 2441–2442 (2021).

Article   PubMed   Google Scholar  

Ramirez, A. H. et al. The All of Us Research Program: data quality, utility, and diversity. Patterns 3 , 100570 (2022).

Article   PubMed   PubMed Central   Google Scholar  

Overhage, J. M., Ryan, P. B., Reich, C. G., Hartzema, A. G. & Stang, P. E. Validation of a common data model for active safety surveillance research. J. Am. Med. Inform. Assoc. 19 , 54–60 (2012).

Venner, E. et al. Whole-genome sequencing as an investigational device for return of hereditary disease risk and pharmacogenomic results as part of the All of Us Research Program. Genome Med. 14 , 34 (2022).

Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536 , 285–291 (2016).

Tiao, G. & Goodrich, J. gnomAD v3.1 New Content, Methods, Annotations, and Data Availability ; https://gnomad.broadinstitute.org/news/2020-10-gnomad-v3-1-new-content-methods-annotations-and-data-availability/ .

Chen, S. et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature 625 , 92–100 (2022).

Zook, J. M. et al. An open resource for accurately benchmarking small variant and reference calls. Nat. Biotechnol. 37 , 561–566 (2019).

Krusche, P. et al. Best practices for benchmarking germline small-variant calls in human genomes. Nat. Biotechnol. 37 , 555–560 (2019).

Stromberg, M. et al. Nirvana: clinical grade variant annotator. In Proc. 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics 596 (Association for Computing Machinery, 2017).

Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29 , 308–311 (2001).

Venner, E. et al. The frequency of pathogenic variation in the All of Us cohort reveals ancestry-driven disparities. Commun. Biol. https://doi.org/10.1038/s42003-023-05708-y (2024).

Karczewski, S. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581 , 434–443 (2020).

Selvaraj, M. S. et al. Whole genome sequence analysis of blood lipid levels in >66,000 individuals. Nat. Commun. 13 , 5995 (2022).

Wang, X. et al. Common and rare variants associated with cardiometabolic traits across 98,622 whole-genome sequences in the All of Us research program. J. Hum. Genet. 68 , 565–570 (2023).

Bastarache, L. et al. The phenotype-genotype reference map: improving biobank data science through replication. Am. J. Hum. Genet. 110 , 1522–1533 (2023).

Bianchi, D. W. et al. The All of Us Research Program is an opportunity to enhance the diversity of US biomedical research. Nat. Med. https://doi.org/10.1038/s41591-023-02744-3 (2024).

Van Driest, S. L. et al. Association between a common, benign genotype and unnecessary bone marrow biopsies among African American patients. JAMA Intern. Med. 181 , 1100–1105 (2021).

Chen, M.-H. et al. Trans-ethnic and ancestry-specific blood-cell genetics in 746,667 individuals from 5 global populations. Cell 182 , 1198–1213 (2020).

Chiou, J. et al. Interpreting type 1 diabetes risk with genetics and single-cell epigenomics. Nature 594 , 398–402 (2021).

Hu, X. et al. Additive and interaction effects at three amino acid positions in HLA-DQ and HLA-DR molecules drive type 1 diabetes risk. Nat. Genet. 47 , 898–905 (2015).

Grant, S. F. A. et al. Variant of transcription factor 7-like 2 (TCF7L2) gene confers risk of type 2 diabetes. Nat. Genet. 38 , 320–323 (2006).

Article   CAS   PubMed   Google Scholar  

All of Us Research Program. Framework for Access to All of Us Data Resources v1.1 (2021); https://www.researchallofus.org/wp-content/themes/research-hub-wordpress-theme/media/data&tools/data-access-use/AoU_Data_Access_Framework_508.pdf .

Abul-Husn, N. S. & Kenny, E. E. Personalized medicine and the power of electronic health records. Cell 177 , 58–69 (2019).

Mapes, B. M. et al. Diversity and inclusion for the All of Us research program: A scoping review. PLoS ONE 15 , e0234962 (2020).

Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590 , 290–299 (2021).

Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562 , 203–209 (2018).

Halldorsson, B. V. et al. The sequences of 150,119 genomes in the UK Biobank. Nature 607 , 732–740 (2022).

Kurniansyah, N. et al. Evaluating the use of blood pressure polygenic risk scores across race/ethnic background groups. Nat. Commun. 14 , 3202 (2023).

Hou, K. et al. Causal effects on complex traits are similar for common variants across segments of different continental ancestries within admixed individuals. Nat. Genet. 55 , 549– 558 (2022).

Linder, J. E. et al. Returning integrated genomic risk and clinical recommendations: the eMERGE study. Genet. Med. 25 , 100006 (2023).

Lennon, N. J. et al. Selection, optimization and validation of ten chronic disease polygenic risk scores for clinical implementation in diverse US populations. Nat. Med. https://doi.org/10.1038/s41591-024-02796-z (2024).

Deflaux, N. et al. Demonstrating paths for unlocking the value of cloud genomics through cross cohort analysis. Nat. Commun. 14 , 5419 (2023).

Regier, A. A. et al. Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects. Nat. Commun. 9 , 4038 (2018).

Article   ADS   PubMed   PubMed Central   Google Scholar  

All of Us Research Program. Data and Statistics Dissemination Policy (2020); https://www.researchallofus.org/wp-content/themes/research-hub-wordpress-theme/media/2020/05/AoU_Policy_Data_and_Statistics_Dissemination_508.pdf .

Laurie, C. C. et al. Quality control and quality assurance in genotypic data for genome-wide association studies. Genet. Epidemiol. 34 , 591–602 (2010).

Jun, G. et al. Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data. Am. J. Hum. Genet. 91 , 839–848 (2012).

Cohen, J. Statistical Power Analysis for the Behavioral Sciences (Routledge, 2013).

Andrade, C. Mean difference, standardized mean difference (SMD), and their use in meta-analysis. J. Clin. Psychiatry 81 , 20f13681 (2020).

Cavalli-Sforza, L. L. The Human Genome Diversity Project: past, present and future. Nat. Rev. Genet. 6 , 333–340 (2005).

Ho, T. K. Random decision forests. In Proc. 3rd International Conference on Document Analysis and Recognition (IEEE Computer Society Press, 2002).

Conley, A. B. et al. Rye: genetic ancestry inference at biobank scale. Nucleic Acids Res. 51 , e44 (2023).

Mbatchou, J. et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat. Genet. 53 , 1097–1103 (2021).

Denny, J. C. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat. Biotech. 31 , 1102–1111 (2013).

Buniello, A. et al. The NHGRI-EBI GWAS catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47 , D1005–D1012 (2019).

Bastarache, L. et al. The Phenotype-Genotype Reference Map: improving biobank data science through replication. Am. J. Hum. Genet. 10 , 1522–1533 (2023).

Download references

Acknowledgements

The All of Us Research Program is supported by the National Institutes of Health, Office of the Director: Regional Medical Centers (OT2 OD026549; OT2 OD026554; OT2 OD026557; OT2 OD026556; OT2 OD026550; OT2 OD 026552; OT2 OD026553; OT2 OD026548; OT2 OD026551; OT2 OD026555); Inter agency agreement AOD 16037; Federally Qualified Health Centers HHSN 263201600085U; Data and Research Center: U2C OD023196; Genome Centers (OT2 OD002748; OT2 OD002750; OT2 OD002751); Biobank: U24 OD023121; The Participant Center: U24 OD023176; Participant Technology Systems Center: U24 OD023163; Communications and Engagement: OT2 OD023205; OT2 OD023206; and Community Partners (OT2 OD025277; OT2 OD025315; OT2 OD025337; OT2 OD025276). In addition, the All of Us Research Program would not be possible without the partnership of its participants. All of Us and the All of Us logo are service marks of the US Department of Health and Human Services. E.E.E. is an investigator of the Howard Hughes Medical Institute. We acknowledge the foundational contributions of our friend and colleague, the late Deborah A. Nickerson. Debbie’s years of insightful contributions throughout the formation of the All of Us genomics programme are permanently imprinted, and she shares credit for all of the successes of this programme.

Author information

Authors and affiliations.

Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA

Alexander G. Bick & Henry R. Condon

Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA

Ginger A. Metcalf, Eric Boerwinkle, Richard A. Gibbs, Donna M. Muzny, Eric Venner, Kimberly Walker, Jianhong Hu, Harsha Doddapaneni, Christie L. Kovar, Mullai Murugan, Shannon Dugan, Ziad Khan & Eric Boerwinkle

Vanderbilt Institute of Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, TN, USA

Kelsey R. Mayo, Jodell E. Linder, Melissa Basford, Ashley Able, Ashley E. Green, Robert J. Carroll, Jennifer Zhang & Yuanyuan Wang

Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA

Lee Lichtenstein, Anthony Philippakis, Sophie Schwartz, M. Morgan T. Aster, Kristian Cibulskis, Andrea Haessly, Rebecca Asch, Aurora Cremer, Kylee Degatano, Akum Shergill, Laura D. Gauthier, Samuel K. Lee, Aaron Hatcher, George B. Grant, Genevieve R. Brandt, Miguel Covarrubias, Eric Banks & Wail Baalawi

Verily, South San Francisco, CA, USA

Shimon Rura, David Glazer, Moira K. Dillon & C. H. Albach

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA

Robert J. Carroll, Paul A. Harris & Dan M. Roden

All of Us Research Program, National Institutes of Health, Bethesda, MD, USA

Anjene Musick, Andrea H. Ramirez, Sokny Lim, Siddhartha Nambiar, Bradley Ozenberger, Anastasia L. Wise, Chris Lunt, Geoffrey S. Ginsburg & Joshua C. Denny

School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA

I. King Jordan, Shashwat Deepali Nagar & Shivam Sharma

Neuroscience Institute, Institute of Translational Genomic Medicine, Morehouse School of Medicine, Atlanta, GA, USA

Robert Meller

Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA

Mine S. Cicek, Stephen N. Thibodeau & Mine S. Cicek

Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA

Kimberly F. Doheny, Michelle Z. Mawhinney, Sean M. L. Griffith, Elvin Hsu, Hua Ling & Marcia K. Adams

Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA

Evan E. Eichler, Joshua D. Smith, Christian D. Frazar, Colleen P. Davis, Karynne E. Patterson, Marsha M. Wheeler, Sean McGee, Mitzi L. Murray, Valeria Vasta, Dru Leistritz, Matthew A. Richardson, Aparna Radhakrishnan & Brenna W. Ehmen

Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA

Evan E. Eichler

Broad Institute of MIT and Harvard, Cambridge, MA, USA

Stacey Gabriel, Heidi L. Rehm, Niall J. Lennon, Christina Austin-Tse, Eric Banks, Michael Gatzen, Namrata Gupta, Katie Larsson, Sheli McDonough, Steven M. Harrison, Christopher Kachulis, Matthew S. Lebo, Seung Hoan Choi & Xin Wang

Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA, USA

Gail P. Jarvik & Elisabeth A. Rosenthal

Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA

Dan M. Roden

Department of Pharmacology, Vanderbilt University Medical Center, Nashville, TN, USA

Center for Individualized Medicine, Biorepository Program, Mayo Clinic, Rochester, MN, USA

Stephen N. Thibodeau, Ashley L. Blegen, Samantha J. Wirkus, Victoria A. Wagner, Jeffrey G. Meyer & Mine S. Cicek

Color Health, Burlingame, CA, USA

Scott Topper, Cynthia L. Neben, Marcie Steeves & Alicia Y. Zhou

School of Public Health, University of Texas Health Science Center at Houston, Houston, TX, USA

Eric Boerwinkle

Laboratory for Molecular Medicine, Massachusetts General Brigham Personalized Medicine, Cambridge, MA, USA

Christina Austin-Tse, Emma Henricks & Matthew S. Lebo

Department of Laboratory Medicine and Pathology, University of Washington School of Medicine, Seattle, WA, USA

Christina M. Lockwood, Brian H. Shirts, Colin C. Pritchard, Jillian G. Buchan & Niklas Krumm

Manuscript Writing Group

  • Alexander G. Bick
  • , Ginger A. Metcalf
  • , Kelsey R. Mayo
  • , Lee Lichtenstein
  • , Shimon Rura
  • , Robert J. Carroll
  • , Anjene Musick
  • , Jodell E. Linder
  • , I. King Jordan
  • , Shashwat Deepali Nagar
  • , Shivam Sharma
  •  & Robert Meller

All of Us Research Program Genomics Principal Investigators

  • Melissa Basford
  • , Eric Boerwinkle
  • , Mine S. Cicek
  • , Kimberly F. Doheny
  • , Evan E. Eichler
  • , Stacey Gabriel
  • , Richard A. Gibbs
  • , David Glazer
  • , Paul A. Harris
  • , Gail P. Jarvik
  • , Anthony Philippakis
  • , Heidi L. Rehm
  • , Dan M. Roden
  • , Stephen N. Thibodeau
  •  & Scott Topper

Biobank, Mayo

  • Ashley L. Blegen
  • , Samantha J. Wirkus
  • , Victoria A. Wagner
  • , Jeffrey G. Meyer
  •  & Stephen N. Thibodeau

Genome Center: Baylor-Hopkins Clinical Genome Center

  • Donna M. Muzny
  • , Eric Venner
  • , Michelle Z. Mawhinney
  • , Sean M. L. Griffith
  • , Elvin Hsu
  • , Marcia K. Adams
  • , Kimberly Walker
  • , Jianhong Hu
  • , Harsha Doddapaneni
  • , Christie L. Kovar
  • , Mullai Murugan
  • , Shannon Dugan
  • , Ziad Khan
  •  & Richard A. Gibbs

Genome Center: Broad, Color, and Mass General Brigham Laboratory for Molecular Medicine

  • Niall J. Lennon
  • , Christina Austin-Tse
  • , Eric Banks
  • , Michael Gatzen
  • , Namrata Gupta
  • , Emma Henricks
  • , Katie Larsson
  • , Sheli McDonough
  • , Steven M. Harrison
  • , Christopher Kachulis
  • , Matthew S. Lebo
  • , Cynthia L. Neben
  • , Marcie Steeves
  • , Alicia Y. Zhou
  • , Scott Topper
  •  & Stacey Gabriel

Genome Center: University of Washington

  • Gail P. Jarvik
  • , Joshua D. Smith
  • , Christian D. Frazar
  • , Colleen P. Davis
  • , Karynne E. Patterson
  • , Marsha M. Wheeler
  • , Sean McGee
  • , Christina M. Lockwood
  • , Brian H. Shirts
  • , Colin C. Pritchard
  • , Mitzi L. Murray
  • , Valeria Vasta
  • , Dru Leistritz
  • , Matthew A. Richardson
  • , Jillian G. Buchan
  • , Aparna Radhakrishnan
  • , Niklas Krumm
  •  & Brenna W. Ehmen

Data and Research Center

  • Lee Lichtenstein
  • , Sophie Schwartz
  • , M. Morgan T. Aster
  • , Kristian Cibulskis
  • , Andrea Haessly
  • , Rebecca Asch
  • , Aurora Cremer
  • , Kylee Degatano
  • , Akum Shergill
  • , Laura D. Gauthier
  • , Samuel K. Lee
  • , Aaron Hatcher
  • , George B. Grant
  • , Genevieve R. Brandt
  • , Miguel Covarrubias
  • , Melissa Basford
  • , Alexander G. Bick
  • , Ashley Able
  • , Ashley E. Green
  • , Jennifer Zhang
  • , Henry R. Condon
  • , Yuanyuan Wang
  • , Moira K. Dillon
  • , C. H. Albach
  • , Wail Baalawi
  •  & Dan M. Roden

All of Us Research Demonstration Project Teams

  • Seung Hoan Choi
  • , Elisabeth A. Rosenthal

NIH All of Us Research Program Staff

  • Andrea H. Ramirez
  • , Sokny Lim
  • , Siddhartha Nambiar
  • , Bradley Ozenberger
  • , Anastasia L. Wise
  • , Chris Lunt
  • , Geoffrey S. Ginsburg
  •  & Joshua C. Denny

Contributions

The All of Us Biobank (Mayo Clinic) collected, stored and plated participant biospecimens. The All of Us Genome Centers (Baylor-Hopkins Clinical Genome Center; Broad, Color, and Mass General Brigham Laboratory for Molecular Medicine; and University of Washington School of Medicine) generated and QCed the whole-genomic data. The All of Us Data and Research Center (Vanderbilt University Medical Center, Broad Institute of MIT and Harvard, and Verily) generated the WGS joint call set, carried out quality assurance and QC analyses and developed the Researcher Workbench. All of Us Research Demonstration Project Teams contributed analyses. The other All of Us Genomics Investigators and NIH All of Us Research Program Staff provided crucial programmatic support. Members of the manuscript writing group (A.G.B., G.A.M., K.R.M., L.L., S.R., R.J.C. and A.M.) wrote the first draft of this manuscript, which was revised with contributions and feedback from all authors.

Corresponding author

Correspondence to Alexander G. Bick .

Ethics declarations

Competing interests.

D.M.M., G.A.M., E.V., K.W., J.H., H.D., C.L.K., M.M., S.D., Z.K., E. Boerwinkle and R.A.G. declare that Baylor Genetics is a Baylor College of Medicine affiliate that derives revenue from genetic testing. Eric Venner is affiliated with Codified Genomics, a provider of genetic interpretation. E.E.E. is a scientific advisory board member of Variant Bio, Inc. A.G.B. is a scientific advisory board member of TenSixteen Bio. The remaining authors declare no competing interests.

Peer review

Peer review information.

Nature thanks Timothy Frayling and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended data fig. 1 historic availability of ehr records in all of us v7 controlled tier curated data repository (n = 413,457)..

For better visibility, the plot shows growth starting in 2010.

Extended Data Fig. 2 Overview of the Genomic Data Curation Pipeline for WGS samples.

The Data and Research Center (DRC) performs additional single sample quality control (QC) on the data as it arrives from the Genome Centers. The variants from samples that pass this QC are loaded into the Genomic Variant Store (GVS), where we jointly call the variants and apply additional QC. We apply a joint call set QC process, which is stored with the call set. The entire joint call set is rendered as a Hail Variant Dataset (VDS), which can be accessed from the analysis notebooks in the Researcher Workbench. Subsections of the genome are extracted from the VDS and rendered in different formats with all participants. Auxiliary data can also be accessed through the Researcher Workbench. This includes variant functional annotations, joint call set QC results, predicted ancestry, and relatedness. Auxiliary data are derived from GVS (arrow not shown) and the VDS. The Cohort Builder directly queries GVS when researchers request genomic data for subsets of samples. Aligned reads, as cram files, are available in the Researcher Workbench (not shown). The graphics of the dish, gene and computer and the All of Us logo are reproduced with permission of the National Institutes of Health’s All of Us Research Program.

Extended Data Fig. 3 Proportion of allelic frequencies (AF), stratified by computed ancestry with over 10,000 participants.

Bar counts are not cumulative (eg, “pop AF < 0.01” does not include “pop AF < 0.001”).

Extended Data Fig. 4 Distribution of pathogenic, and likely pathogenic ClinVar variants.

Stratified by ancestry filtered to only those variants that are found in allele count (AC) < 40 individuals for 245,388 short read WGS samples.

Extended Data Fig. 5 Ancestry specific HLA-DQB1 ( rs9273363 ) locus associations in 231,442 unrelated individuals.

Phenome-wide (PheWAS) associations highlight ancestry specific consequences across ancestries.

Extended Data Fig. 6 Ancestry specific TCF7L2 ( rs7903146 ) locus associations in 231,442 unrelated individuals.

Phenome-wide (PheWAS) associations highlight diabetic consequences across ancestries.

Supplementary information

Supplementary information.

Supplementary Figs. 1–7, Tables 1–8 and Note.

Reporting Summary

Supplementary dataset 1.

Associations of ACKR1, HLA-DQB1 and TCF7L2 loci with all Phecodes stratified by genetic ancestry.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

The All of Us Research Program Genomics Investigators. Genomic data in the All of Us Research Program. Nature (2024). https://doi.org/10.1038/s41586-023-06957-x

Download citation

Received : 22 July 2022

Accepted : 08 December 2023

Published : 19 February 2024

DOI : https://doi.org/10.1038/s41586-023-06957-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

published research papers

IMAGES

  1. How to Find a Suitable Journal to Publish My Research Papers?

    published research papers

  2. (PDF) Choosing the Right Journal for a Scientific Paper

    published research papers

  3. (PDF) Analysis of the structure of original research papers: An aid to

    published research papers

  4. Published Research Paper_IJEAB_Aug 2016

    published research papers

  5. Types of research papers

    published research papers

  6. How to publish research paper in International Journals?

    published research papers

VIDEO

  1. ICAMMDA Journal Club

  2. How to Write a Research Paper Publication

  3. Secret To Writing A Research Paper

  4. Check out the details of my research paper 😇

  5. How to critically review research papers in 4 steps #academicwriting #studytips #thesis

  6. SECRET Tip To Publishing Research Papers In HIGH-IMPACT Journals

COMMENTS

  1. Research articles

    scientific reports research articles Research articles Article Type Year The social anatomy of climate change denial in the United States Dimitrios Gounaridis Joshua P. Newell Article Open...

  2. Published Research

    RAND is a research organization that publishes reports, papers, and books on topics such as defence, security, health, and social policy. Browse the latest publications by date, series, or topic and download them for free.

  3. Publications

    1998 148 Algorithms and Theory 1316 Data Management 166 Data Mining and Modeling 353 Distributed Systems and Parallel Computing 340 Economics and Electronic Commerce 340 Education Innovation 68 General Science 328 Hardware and Architecture 145 Health & Bioscience 358 Human-Computer Interaction and Visualization 803

  4. Journal Top 100

    This collection highlights our most downloaded* research papers published in 2022. Featuring authors from around the world, these papers highlight valuable research from an international...

  5. Search

    With 160+ million publication pages, 25+ million researchers and 1+ million questions, this is where everyone can access science You can use AND, OR, NOT, "" and () to specify your search....

  6. Research articles

    Article 14 Feb 2024 A high black hole to host mass ratio in a lensed AGN in the early Universe Lukas J. Furtak Ivo Labbé Christina C. Williams Article 14 Feb 2024 Nuclear morphology is shaped by...

  7. IEEE

    Proceedings of the IEEE The leading journal for in-depth tutorial and review coverage of the technical developments that shape the world, offering review and survey articles of broad significance and long-range interest in all areas of electrical, electronics, and computer engineering. See recent issues Benefits of publishing

  8. Google Scholar Search Help

    Overview Searching Access Alerts Library Export Coverage Corrections Questions Search Help Get the most out of Google Scholar with some helpful tips on searches, email alerts, citation export,...

  9. Sage Journals: Your gateway to world-class journal research

    Subscription and open access journals from Sage, the world's leading independent academic publisher.

  10. Recently Published

    S. Bagnasco and S.C. GautamDOI: 10.1056/NEJMicm2301676 Perspective Feb 17, 2024Direct-to-Consumer Platforms for New Antiobesity Medications — Concerns and Potential Opportunities I. Golovaty and ...

  11. ScienceOpen

    Make an impact and build your research profile in the open with ScienceOpen. Search and discover relevant research in over 92 million Open Access articles and article records; Share your expertise and get credit by publicly reviewing any article; Publish your poster or preprint and track usage and impact with article- and author-level metrics; Create a topical Collection to advance your ...

  12. How to publish your research

    Step 1: Choosing a journal Step 2: Writing your paper Step 3: Making your submission Step 4: Navigation the peer review process Step 5: The production process Step 1: Choosing a journal Choosing which journal to publish your research paper in is one of the most significant decisions you have to make as a researcher.

  13. The top 10 journal articles of 2020

    In 2020, APA's 89 journals published more than 5,000 articles—the most ever and 25% more than in 2019. Here's a quick look at the 10 most downloaded to date. By Chris Palmer Date created: January 1, ... Theory, Research, Practice, and Policy (Vol. 12, No. 4) argues that COVID-19 should be examined from a post-traumatic stress perspective ...

  14. Eight Ways (and More) To Find and Access Research Papers

    The Directory of Open Access Journals (DOAJ) allows users to search and retrieve the article contents of nearly 10,000 OA journals in science, technology, medicine, social sciences, and humanities. All journals must adhere to quality-control standards, including peer review. PubMed, maintained by the US National Library of Medicine, is a free ...

  15. OATD

    OATD.org aims to be the best possible resource for finding open access graduate theses and dissertations published around the world. Metadata (information about the theses) comes from over 1100 colleges, universities, and research institutions . OATD currently indexes 7,388,513 theses and dissertations. About OATD (our FAQ).

  16. How to Write and Publish a Research Paper for a Peer ...

    14 Citations 712 Altmetric Explore all metrics Abstract Communicating research findings is an essential step in the research process. Often, peer-reviewed journals are the forum for such communication, yet many researchers are never taught how to write a publishable scientific paper.

  17. Journal Top 100

    This collection highlights our most downloaded* research papers published in 2020. Featuring authors from around the world, these papers showcase valuable research from an international...

  18. Free APA Journal Articles

    February 2016 by Andreana C. Kenrick, Stacey Sinclair, Jennifer Richeson, Sara C. Verosky, and Janetta Lun Recognition Without Awareness: Encoding and Retrieval Factors (PDF, 116KB) Journal of Experimental Psychology: Learning, Memory, and Cognition September 2015 by Fergus I. M. Craik, Nathan S. Rose, and Nigel Gopie

  19. Published a research paper? What next??

    In our earlier editorials, we have already discussed the importance of conducting good-quality medical research, composing an original research paper, and getting the paper published successfully.[1,2,3] We have also given a roadmap for reviewing an original research paper.[] The current editorial deals with some important post-publication issues that every author should be acquainted with.

  20. Research Papers in Education

    Journal overview. Research Papers in Education has developed an international reputation for publishing significant research findings across the discipline of education. The distinguishing feature of the journal is that we publish longer articles than most other journals, to a limit of 12,000 words. We particularly focus on full accounts of ...

  21. Published Research

    Published Research Share Share Tweet  Since 1997, PMI has sponsored academic research projects. This knowledge enables stakeholders to make informed decisions and assess industry trends and challenges.

  22. Superionic lithium transport via multiple coordination environments

    The search for materials that define transport pathways with homogeneous cation coordination has emphasized the role of tetrahedral Li sites (12, 13).In particular, attention has focused on body-centered cubic (bcc) arrangements of anions, which enable the formation of percolating Li + pathways that consist of energetically equivalent face-sharing tetrahedral sites yielding low activation ...

  23. Exploring microstructures for high-performance materials

    In just the first few months of 2024, the journal Nature has published two scientific papers co-authored by Kun Luo, an Iowa State University postdoctoral research associate in materials science ...

  24. Research Papers

    Research Papers Research Papers Big Data for a Climate Disaster-Resilient Country, Philippines Ebinezer R. Florano A Veto Players Analysis of Subnational Territorial Reform in Indonesia Michael A. Tumanut The Politics of Municipal Merger in the Philippines Michael A. Tumanut 2018 AGPA Conference papers

  25. 2021 Top 25 COVID-19 Articles

    Browse our 25 most downloaded COVID-19 articles published in 2021. ... These papers highlight valuable research into the biology of coronavirus infection, its detection, treatment as well as into ...

  26. A Columbia Surgeon's Study Was Pulled. He Kept Publishing Flawed Data

    So did the American Association for Cancer Research, which published 10 articles under question from Dr. Yoon's lab across four journals. ... One of those papers, published in 2012, was formally ...

  27. Researching the White Paper

    The research you do for your white paper will require that you identify a specific problem, seek popular culture sources to help define the problem, its history, its significance and impact for people affected by it. You will then delve into academic and grey literature to learn about the way scholars and others with professional expertise ...

  28. Conway publishes new E. coli research, adds to OSU's microbiome work

    Oklahoma State University Department of Microbiology and Molecular Genetics Regents Professor Dr. Tyrrell Conway has published a paper on nitrogen sources and the E. coli bacteria, work that correlates directly with his and OSU's roles in the recently established Oklahoma Center for Microbiome Research.

  29. Genomic data in the All of Us Research Program

    To accelerate health research, All of Us is committed to curating and releasing research data early and often 6.Less than five years after national enrolment began in 2018, this fifth data release ...