- Search current calls for papers
- Try the Taylor & Francis Journal Suggester
How to publish your research
A step-by-step guide to getting published.
Publishing your research is an important step in your academic career. While there isn’t a one-size-fits-all approach, this guide is designed to take you through the typical steps in publishing a research paper.
Discover how to get your paper published, from choosing the right journal and understanding what a peer reviewed article is, to responding to reviewers and navigating the production process.
Jump to section
Step 1: choosing a journal.
Choosing which journal to publish your research paper in is one of the most significant decisions you have to make as a researcher. Where you decide to submit your work can make a big difference to the reach and impact your research has.
It’s important to take your time to consider your options carefully and analyze each aspect of journal submission – from shortlisting titles to your preferred method of publication, for example open access .
Don’t forget to think about publishing options beyond the traditional journals format – for example, open research platform F1000Research , which offers rapid, open publication for a wide range of outputs.
Why choose your target journal before you start writing?
The first step in publishing a research paper should always be selecting the journal you want to publish in. Choosing your target journal before you start writing means you can tailor your work to build on research that’s already been published in that journal. This can help editors to see how a paper adds to the ‘conversation’ in their journal.
In addition, many journals only accept specific manuscript formats of article. So, by choosing a journal before you start, you can write your article to their specifications and audience, and ultimately improve your chances of acceptance.
To save time and for peace of mind, you can consider using manuscript formatting experts while you focus on your research.
How to select the journal to publish your research in
Choosing which journal to publish your research in can seem like an overwhelming task. So, for all the details of how to navigate this important step in publishing your research paper, take a look at our choosing a journal guide . This will take you through the selection process, from understanding the aims and scope of the journals you’re interested in to making sure you choose a trustworthy journal.
Don’t forget to explore our Journal Suggester to see which Taylor & Francis journals could be right for your research.
Go to guidance on choosing a journal
Step 2: Writing your paper
Writing an effective, compelling research paper is vital to getting your research published. But if you’re new to putting together academic papers, it can feel daunting to start from scratch.
The good news is that if you’ve chosen the journal you want to publish in, you’ll have lots of examples already published in that journal to base your own paper on. We’ve gathered advice on every aspect of writing your paper, to make sure you get off to a great start.
How to write your paper
How you write your paper will depend on your chosen journal, your subject area, and the type of paper you’re writing. Everything from the style and structure you choose to the audience you should have in mind while writing will differ, so it’s important to think about these things before you get stuck in.
Our writing your paper guidance will take you through everything you need to know to put together your research article and prepare it for submission. This includes getting to know your target journal, understanding your audiences, and how to choose appropriate keywords.
You can also use this guide to take you through your research publication journey .
You should also make sure you’re aware of all the Editorial Policies for the journal you plan to submit to. Don’t forget that you can contact our editing services to help you refine your manuscript.
Discover advice and guidance for writing your paper
Step 3: Making your submission
Once you’ve chosen the right journal and written your manuscript, the next step in publishing your research paper is to make your submission .
Each journal will have specific submission requirements, so make sure you visit Taylor & Francis Online and carefully check through the instructions for authors for your chosen journal.
How to submit your manuscript
To submit your manuscript you’ll need to ensure that you’ve gone through all the steps in our making your submission guide. This includes thoroughly understanding your chosen journal’s instructions for authors, writing an effective cover letter, navigating the journal’s submission system, and making sure your research data is prepared as required.
You can also improve your submission experience with our guide to avoid obstacles and complete a seamless submission.
To make sure you’ve covered everything before you hit ‘submit’ you can also take a look at our ‘ready to submit’ checklist (don’t forget, you should only submit to one journal at a time).
Understand the process of making your submission
Step 4: Navigating the peer review process
Now you’ve submitted your manuscript, you need to get to grips with one of the most important parts of publishing your research paper – the peer review process .
What is peer review?
Peer review is the independent assessment of your research article by independent experts in your field. Reviewers, also sometimes called ‘referees’, are asked to judge the validity, significance, and originality of your work.
This process ensures that a peer-reviewed article has been through a rigorous process to make sure the methodology is sound, the work can be replicated, and it fits with the aims and scope of the journal that is considering it for publication. It acts as an important form of quality control for research papers.
Peer review is also a very useful source of feedback, helping you to improve your paper before it’s published. It is intended to be a collaborative process, where authors engage in a dialogue with their peers and receive constructive feedback and support to advance their work.
Almost all research articles go through peer review, although in some cases the journal may operate post-publication peer review, which means that reviews and reader comments are invited after the paper is published.
If you’ll like to feel more confident before getting your work peer reviewed by the journal, you may want to consider using an in-depth technical review service from experts.
Understanding peer review
Peer review can be a complex process to get your head around. That’s why we’ve put together a comprehensive guide to understanding peer review . This explains everything from the many different types of peer review to the step-by-step peer review process and how to revise your manuscript. It also has helpful advice on what to do if your manuscript is rejected.
Visit our peer review guide for authors
Step 5: The production process
If your paper is accepted for publication, it will then head into production . At this stage of the process, the paper will be prepared for publishing in your chosen journal.
A lot of the work to produce the final version of your paper will be done by the journal production team, but your input will be required at various stages of the process.
What do you need to do during production?
During production, you’ll have a variety of tasks to complete and decisions to make. For example, you’ll need to check and correct proofs of your article and consider whether or not you want to produce a video abstract to accompany it.
Take a look at our guide to the production process to find out what you’ll need to do in this final step to getting your research published.
Your research is published – now what?
You’ve successfully navigated publishing a research paper – congratulations! But the process doesn’t stop there. Now your research is published in a journal for the world to see, you’ll need to know how to access your article and make sure it has an impact .
Here’s a quick tip on how to boost your research impact by investing in making your accomplishments stand out.
Below you’ll find helpful tips and post-publication support. From how to communicate about your research to how to request corrections or translations.
How to access your published article
When you publish with Taylor & Francis, you’ll have access to a new section on Taylor & Francis Online called Authored Works . This will give you and all other named authors perpetual access to your article, regardless of whether or not you have a subscription to the journal you have published in.
You can also order print copies of your article .
How to make sure your research has an impact
Taking the time to make sure your research has an impact can help drive your career progression, build your networks, and secure funding for new research. So, it’s worth investing in.
Creating a real impact with your work can be a challenging and time-consuming task, which can feel difficult to fit into an already demanding academic career.
To help you understand what impact means for you and your work, take a look at our guide to research impact . It covers why impact is important, the different types of impact you can have, how to achieve impact – including tips on communicating with a variety of audiences – and how to measure your success.
Keeping track of your article’s progress
Through your Authored Works access , you’ll be able to get real-time insights about your article, such as views, downloads and citation numbers.
In addition, when you publish an article with us, you’ll be offered the option to sign up for email updates. These emails will be sent to you three, six and twelve months after your article is published to let you know how many views and citations the article has had.
Corrections and translations of published articles
Sometimes after an article has been published it may be necessary to make a change to the Version of Record . Take a look at our dedicated guide to corrections, expressions of concern, retractions and removals to find out more.
You may also be interested in translating your article into another language. If that’s the case, take a look at our information on article translations .
Go to your guide on moving through production
Explore related posts
Insights topic: Get published
5 tools to help you feel more confident submitting your article
5 top reasons for desk rejection – and how to avoid them
Tips for writing a literature review
- EXPLORE Coupons Tech Help Pro Random Article About Us Quizzes Request a New Article Community Dashboard This Or That Game Popular Categories Arts and Entertainment Artwork Books Movies Computers and Electronics Computers Phone Skills Technology Hacks Health Men's Health Mental Health Women's Health Relationships Dating Love Relationship Issues Hobbies and Crafts Crafts Drawing Games Education & Communication Communication Skills Personal Development Studying Personal Care and Style Fashion Hair Care Personal Hygiene Youth Personal Care School Stuff Dating All Categories Arts and Entertainment Finance and Business Home and Garden Relationship Quizzes Cars & Other Vehicles Food and Entertaining Personal Care and Style Sports and Fitness Computers and Electronics Health Pets and Animals Travel Education & Communication Hobbies and Crafts Philosophy and Religion Work World Family Life Holidays and Traditions Relationships Youth
- EDIT Edit this Article
- PRO Courses Guides New Tech Help Pro Expert Videos About wikiHow Pro Upgrade Sign In
- Browse Articles
- Learn Something New
- This Or That Game New
- Train Your Brain
- Explore More
- Support wikiHow
- About wikiHow
- H&M Coupons
- Hotwire Promo Codes
- StubHub Discount Codes
- Ashley Furniture Coupons
- Blue Nile Promo Codes
- NordVPN Coupons
- Samsung Promo Codes
- Chewy Promo Codes
- Ulta Coupons
- Vistaprint Promo Codes
- Shutterfly Promo Codes
- DoorDash Promo Codes
- Office Depot Coupons
- adidas Promo Codes
- Home Depot Coupons
- DSW Coupons
- Bed Bath and Beyond Coupons
- Lowe's Coupons
- Surfshark Coupons
- Nordstrom Coupons
- Walmart Promo Codes
- Dick's Sporting Goods Coupons
- Fanatics Coupons
- Edible Arrangements Coupons
- eBay Coupons
- Log in / Sign up
- Education and Communications
- College University and Postgraduate
- Academic Writing
- Research Papers
How to Publish a Research Paper
Last Updated: August 17, 2023 References Approved
This article was co-authored by Matthew Snipp, PhD and by wikiHow staff writer, Christopher M. Osborne, PhD . C. Matthew Snipp is the Burnet C. and Mildred Finley Wohlford Professor of Humanities and Sciences in the Department of Sociology at Stanford University. He is also the Director for the Institute for Research in the Social Science’s Secure Data Center. He has been a Research Fellow at the U.S. Bureau of the Census and a Fellow at the Center for Advanced Study in the Behavioral Sciences. He has published 3 books and over 70 articles and book chapters on demography, economic development, poverty and unemployment. He is also currently serving on the National Institute of Child Health and Development’s Population Science Subcommittee. He holds a Ph.D. in Sociology from the University of Wisconsin—Madison. There are 7 references cited in this article, which can be found at the bottom of the page. wikiHow marks an article as reader-approved once it receives enough positive feedback. This article received 30 testimonials and 93% of readers who voted found it helpful, earning it our reader-approved status. This article has been viewed 678,588 times.
Publishing a research paper in a peer-reviewed journal is an important activity within the academic community. It allows you to network with other scholars, get your name and work into circulation, and further refine your ideas and research. Getting published isn’t easy, but you can improve your odds by submitting a technically sound and creative yet straightforward piece of research. It’s also vital to find a suitable academic journal for your topic and writing style, so you can tailor your research paper to it and increase your chances of publication and wider recognition.
Submitting (and Resubmitting) Your Paper
- Have two or three people review your paper. At least one should be a non-expert in the major topic — their “outsider’s perspective” can be particularly valuable, as not all reviewers will be experts on your specific topic.
- Journal articles in the sciences often follow a specific organizational format, such as: Abstract; Introduction; Methods; Results; Discussion; Conclusion; Acknowledgements/References. Those in the arts and humanities are usually less regimented.
- Submit your article to only one journal at a time. Work your way down your list, one at a time, as needed.
- When submitting online, use your university email account. This connects you with a scholarly institution, which adds credibility to your work.
- Accept with Revision — only minor adjustments are needed, based on the provided feedback by the reviewers.
- Revise and Resubmit — more substantial changes (as described) are needed before publication can be considered, but the journal is still very interested in your work.
- Reject and Resubmit — the article is not currently viable for consideration, but substantial alterations and refocusing may be able to change this outcome.
- Reject — the paper isn’t and won’t be suitable for this publication, but that doesn’t mean it might not work for another journal.
- Do not get over-attached to your original submission. Instead, remain flexible and rework the paper in light of the feedback you receive. Use your skills as a researcher and a writer to create a superior paper.
- However, you don’t have to “roll over” and meekly follow reviewer comments that you feel are off the mark. Open a dialogue with the editor and explain your position, respectfully but confidently. Remember, you’re an expert on this specific topic!  X Research source
- Remember, a rejected paper doesn’t necessarily equal a bad paper. Numerous factors, many of them completely out of your control, go into determining which articles are accepted.
- Move on to your second-choice journal for submission. You might even ask for guidance on finding a better fit from the editor of the first journal.
Choosing the Right Journal for Submission
- Read academic journals related to your field of study.
- Search online for published research papers, conference papers, and journal articles.
- Ask a colleague or professor for a suggested reading list.
- “Fit” is critical here — the most renowned journal in your field might not be the one best suited to your specific work. At the same time, though, don’t sell yourself short by assuming your paper could never be good enough for that top-shelf publication.
- However, always prioritize peer-reviewed journals — in which field scholars anonymously review submitted works. This is the basic standard for scholarly publishing.
- You can increase your readership dramatically by publishing in an open access journal. As such, it will be freely available as part of an online repository of peer-reviewed scholarly papers.  X Research source
Strengthening Your Submission
- “This paper explores how George Washington’s experiences as a young officer may have shaped his views during difficult circumstances as a commanding officer.”
- “This paper contends that George Washington’s experiences as a young officer on the 1750s Pennsylvania frontier directly impacted his relationship with his Continental Army troops during the harsh winter at Valley Forge.”
- This is especially true for younger scholars who are breaking into the field. Leave the grand (yet still only 20-30 page) explorations to more established scholars.
- Your abstract should make people eager to start reading the article, but never disappointed when they finish the article.
- Get as many people as you can to read over your abstract and provide feedback before you submit your paper to a journal.
Research Paper Help
- Do not immediately revise your paper if you are upset or frustrated with the journal's requests for change. Set your paper aside for several days, then come back to it with "fresh eyes." The feedback you received will have percolated and settled, and will now find a comfortable place within your article. Remember this is a big project and final refinements will take time. Thanks Helpful 1 Not Helpful 0
You Might Also Like
- ↑ https://owl.excelsior.edu/research/revising-and-editing-a-research-paper/
- ↑ http://www.canberra.edu.au/library/start-your-research/research_help/publishing-research
- ↑ http://www.apa.org/monitor/sep02/publish.aspx
- ↑ Matthew Snipp, PhD. Research Fellow, U.S. Bureau of the Census. Expert Interview. 26 March 2020.
- ↑ https://www.timeshighereducation.com/news/how-to-get-your-first-research-paper-published/2015485.article#survey-answer
- ↑ https://www.webarchive.org.uk/wayback/archive/20140615095526/http://www.jisc.ac.uk/media/documents/publications/briefingpaper/2010/bppublishingresearchpapersv1final.pdf
- ↑ https://libguides.usc.edu/writingguide/abstract
About This Article
To publish a research paper, ask a colleague or professor to review your paper and give you feedback. Once you've revised your work, familiarize yourself with different academic journals so that you can choose the publication that best suits your paper. Make sure to look at the "Author's Guide" so you can format your paper according to the guidelines for that publication. Then, submit your paper and don't get discouraged if it is not accepted right away. You may need to revise your paper and try again. To learn about the different responses you might get from journals, see our reviewer's explanation below. Did this summary help you? Yes No
- Send fan mail to authors
Reader Success Stories
Oct 16, 2017
Did this article help you?
Oct 23, 2019
Feb 13, 2017
Jul 1, 2017
Apr 7, 2017
- Do Not Sell or Share My Info
- Not Selling Info
wikiHow Tech Help Pro:
Level up your tech skills and stay ahead of the curve
We are developing the new Elsevier website to better serve you. Try the new experience (beta)
Data & Analytics
Gender & Diversity
Healthcare & Medicine
- Elsevier Connect
7 steps to publishing in a scientific journal
Before you hit “submit,” here’s a checklist (and pitfalls to avoid)
As scholars, we strive to do high-quality research that will advance science. We come up with what we believe are unique hypotheses, base our work on robust data and use an appropriate research methodology. As we write up our findings, we aim to provide theoretical insight, and share theoretical and practical implications about our work. Then we submit our manuscript for publication in a peer-reviewed journal.
In my seven years of research and teaching, I have observed several shortcomings in the manuscript preparation and submission process that often lead to research being rejected for publication. Being aware of these shortcomings will increase your chances of having your manuscript published and also boost your research profile and career progression.
In this article, intended for doctoral students and other young scholars, I identify common pitfalls and offer helpful solutions to prepare more impactful papers. While there are several types of research articles, such as short communications, review papers and so forth, these guidelines focus on preparing a full article (including a literature review), whether based on qualitative or quantitative methodology, from the perspective of the management, education, information sciences and social sciences disciplines.
Writing for academic journals is a highly competitive activity, and it’s important to understand that there could be several reasons behind a rejection. Furthermore, the journal peer-review process is an essential element of publication because no writer could identify and address all potential issues with a manuscript.
1. Do not rush submitting your article for publication.
In my first article for Elsevier Connect – “ Five secrets to surviving (and thriving in) a PhD program ” – I emphasized that scholars should start writing during the early stages of your research or doctoral study career. This secret does not entail submitting your manuscript for publication the moment you have crafted its conclusion. Authors sometimes rely on the fact that they will always have an opportunity to address their work’s shortcomings after the feedback received from the journal editor and reviewers has identified them.
A proactive approach and attitude will reduce the chance of rejection and disappointment. In my opinion, a logical flow of activities dominates every research activity and should be followed for preparing a manuscript as well. Such activities include carefully re-reading your manuscript at different times and perhaps at different places. Re-reading is essential in the research field and helps identify the most common problems and shortcomings in the manuscript, which might otherwise be overlooked. Second, I find it very helpful to share my manuscripts with my colleagues and other researchers in my network and to request their feedback. In doing so, I highlight any sections of the manuscript that I would like reviewers to be absolutely clear on.
2. Select an appropriate publication outlet.
I also ask colleagues about the most appropriate journal to submit my manuscript to; finding the right journal for your article can dramatically improve the chances of acceptance and ensure it reaches your target audience.
Elsevier provides an innovative Journal Finder search facility on its website. Authors enter the article title, a brief abstract and the field of research to get a list of the most appropriate journals for their article. For a full discussion of how to select an appropriate journal see Knight and Steinbach (2008).
Less experienced scholars sometimes choose to submit their research work to two or more journals at the same time. Research ethics and policies of all scholarly journals suggest that authors should submit a manuscript to only one journal at a time. Doing otherwise can cause embarrassment and lead to copyright problems for the author, the university employer and the journals involved.
Learn about publishing at Elsevier
3. Read the aims and scope and author guidelines of your target journal carefully.
Once you have read and re-read your manuscript carefully several times, received feedback from your colleagues, and identified a target journal, the next important step is to read the aims and scope of the journals in your target research area. Doing so will improve the chances of having your manuscript accepted for publishing. Another important step is to download and absorb the author guidelines and ensure your manuscript conforms to them. Some publishers report that one paper in five does not follow the style and format requirements of the target journal, which might specify requirements for figures, tables and references.
Rejection can come at different times and in different formats. For instance, if your research objective is not in line with the aims and scope of the target journal, or if your manuscript is not structured and formatted according to the target journal layout, or if your manuscript does not have a reasonable chance of being able to satisfy the target journal’s publishing expectations, the manuscript can receive a desk rejection from the editor without being sent out for peer review. Desk rejections can be disheartening for authors, making them feel they have wasted valuable time and might even cause them to lose enthusiasm for their research topic. Sun and Linton (2014), Hierons (2016) and Craig (2010) offer useful discussions on the subject of “desk rejections.”
4. Make a good first impression with your title and abstract.
The title and abstract are incredibly important components of a manuscript as they are the first elements a journal editor sees. I have been fortunate to receive advice from editors and reviewers on my submissions, and feedback from many colleagues at academic conferences, and this is what I’ve learned:
- The title should summarize the main theme of the article and reflect your contribution to the theory.
- The abstract should be crafted carefully and encompass the aim and scope of the study; the key problem to be addressed and theory; the method used; the data set; key findings; limitations; and implications for theory and practice.
Dr. Angel Borja goes into detail about these components in “ 11 steps to structuring a science paper editors will take seriously .”
Learn more in Elsevier's free Researcher Academy
5. Have a professional editing firm copy-edit ( not just proofread) your manuscript, including the main text, list of references, tables and figures.
The key characteristic of scientific writing is clarity. Before submitting a manuscript for publication, it is highly advisable to have a professional editing firm copy-edit your manuscript. An article submitted to a peer-reviewed journal will be scrutinized critically by the editorial board before it is selected for peer review. According to a statistic shared by Elsevier , between 30 percent and 50 percent of articles submitted to Elsevier journals are rejected before they even reach the peer-review stage, and one of the top reasons for rejection is poor language. A properly written, edited and presented text will be error free and understandable and will project a professional image that will help ensure your work is taken seriously in the world of publishing. On occasion, the major revisions conducted at the request of a reviewer will necessitate another round of editing.
Authors can facilitate the editing of their manuscripts by taking precautions at their end. These include proofreading their own manuscript for accuracy and wordiness (avoid unnecessary or normative descriptions like “it should be noted here” and “the authors believe) and sending it for editing only when it is complete in all respects and ready for publishing. Professional editing companies charge hefty fees, and it is simply not financially viable to have them conduct multiple rounds of editing on your article. Applications like the spelling and grammar checker in Microsoft Word or Grammarly are certainly worth applying to your article, but the benefits of proper editing are undeniable. For more on the difference between proofreading and editing, see the description in Elsevier’s WebShop.
6. Submit a cover letter with the manuscript.
Never underestimate the importance of a cover letter addressed to the editor or editor-in-chief of the target journal. Last year, I attended a conference in Boston. A “meet the editors” session revealed that many submissions do not include a covering letter, but the editors-in-chief present, who represented renewed and ISI-indexed Elsevier journals, argued that the cover letter gives authors an important opportunity to convince them that their research work is worth reviewing.
Accordingly, the content of the cover letter is also worth spending time on. Some inexperienced scholars paste the article’s abstract into their letter thinking it will be sufficient to make the case for publication; it is a practice best avoided. A good cover letter first outlines the main theme of the paper; second, argues the novelty of the paper; and third, justifies the relevance of the manuscript to the target journal. I would suggest limiting the cover letter to half a page. More importantly, peers and colleagues who read the article and provided feedback before the manuscript’s submission should be acknowledged in the cover letter.
7. Address reviewer comments very carefully.
Editors and editors-in-chief usually couch the acceptance of a manuscript as subject to a “revise and resubmit” based on the recommendations provided by the reviewer or reviewers. These revisions may necessitate either major or minor changes in the manuscript. Inexperienced scholars should understand a few key aspects of the revision process. First, it important to address the revisions diligently; second, is imperative to address all the comments received from the reviewers and avoid oversights; third, the resubmission of the revised manuscript must happen by the deadline provided by the journal; fourth, the revision process might comprise multiple rounds.
The revision process requires two major documents. The first is the revised manuscript highlighting all the modifications made following the recommendations received from the reviewers. The second is a letter listing the authors’ responses illustrating they have addressed all the concerns of the reviewers and editors. These two documents should be drafted carefully. The authors of the manuscript can agree or disagree with the comments of the reviewers (typically agreement is encouraged) and are not always obliged to implement their recommendations, but they should in all cases provide a well-argued justification for their course of action.
Given the ever increasing number of manuscripts submitted for publication, the process of preparing a manuscript well enough to have it accepted by a journal can be daunting. High-impact journals accept less than 10 percent of the articles submitted to them, although the acceptance ratio for special issues or special topics sections is normally over 40 percent. Scholars might have to resign themselves to having their articles rejected and then reworking them to submit them to a different journal before the manuscript is accepted.
The advice offered here is not exhaustive but it’s also not difficult to implement. These recommendations require proper attention, planning and careful implementation; however, following this advice could help doctoral students and other scholars improve the likelihood of getting their work published, and that is key to having a productive, exciting and rewarding academic career.
I would like to thank Professor Heikki Karjaluoto, Jyväskylä University School of Business and Economics for providing valuable feedback on this article.
- Sun, H., & Linton, J. D. (2014). Structuring papers for success: Making your paper more like a high impact publication than a desk reject , Technovation.
- Craig, J. B. (2010). Desk rejection: How to avoid being hit by a returning boomerang , Family Business Review .
- Hierons, R. M. (2016). The dreaded desk reject , Software Testing, Verification and Reliability .
- Borja, A (2014): 11 steps to structuring a science paper editors will take seriously , Elsevier Connect
- Knight, L. V., & Steinbach, T. A. (2008). Selecting an appropriate publication outlet: a comprehensive model of journal selection criteria for researchers in a broad range of academic disciplines , International Journal of Doctoral Studies .
- Tewin, K. (2015). How to Better Proofread An Article in 6 Simple Steps ,
- Day, R, & Gastel, B: How to write and publish a scientific paper. Cambridge University Press (2012)
Aijaz Shaikh, PhD
Aijaz Shaikh has a PhD in Marketing from the Jyväskylä University School of Business & Economics (AMBA accredited), Finland, an MSc from Hanken School of Economics (AACSB / EQUIS/AMBA accredited), Finland. He is Member-Editorial Review Board of the International Journal of E-Business Research and special issue Guest Editor of the International Journal of E-Business Research. His academic specialty is in Marketing (consumer behaviour), Information Technology Adoption, and Mobile Financial Services.
Top tips from science writers — before you speak to the media
How Elsevier’s WebShop is helping authors avoid rejection
A brief guide to research collaboration for the young scholar
Aijaz A. Shaikh
5 secrets to surviving (and thriving in) a PhD program
11 steps to structuring a science paper editors will take seriously
Angel Borja, PhD
Elsevier.com visitor survey
We are always looking for ways to improve customer experience on Elsevier.com. We would like to ask you for a moment of your time to fill in a short questionnaire, at the end of your visit . If you decide to participate, a new browser tab will open so you can complete the survey after you have completed your visit to this website. Thanks in advance for your time.
How to Write and Publish a Research Paper for a Peer-Reviewed Journal
- Open Access
- Published: 30 April 2020
- volume 36 , pages 909–913 ( 2021 )
You have full access to this open access article
- Clara Busse ORCID: orcid.org/0000-0002-0178-1000 1 &
- Ella August ORCID: orcid.org/0000-0001-5151-1036 1 , 2
Cite this article
Communicating research findings is an essential step in the research process. Often, peer-reviewed journals are the forum for such communication, yet many researchers are never taught how to write a publishable scientific paper. In this article, we explain the basic structure of a scientific paper and describe the information that should be included in each section. We also identify common pitfalls for each section and recommend strategies to avoid them. Further, we give advice about target journal selection and authorship. In the online resource 1 , we provide an example of a high-quality scientific paper, with annotations identifying the elements we describe in this article.
Working on a manuscript?
Writing a scientific paper is an important component of the research process, yet researchers often receive little formal training in scientific writing. This is especially true in low-resource settings. In this article, we explain why choosing a target journal is important, give advice about authorship, provide a basic structure for writing each section of a scientific paper, and describe common pitfalls and recommendations for each section. In the online resource 1 , we also include an annotated journal article that identifies the key elements and writing approaches that we detail here. Before you begin your research, make sure you have ethical clearance from all relevant ethical review boards.
Select a Target Journal Early in the Writing Process
We recommend that you select a “target journal” early in the writing process; a “target journal” is the journal to which you plan to submit your paper. Each journal has a set of core readers and you should tailor your writing to this readership. For example, if you plan to submit a manuscript about vaping during pregnancy to a pregnancy-focused journal, you will need to explain what vaping is because readers of this journal may not have a background in this topic. However, if you were to submit that same article to a tobacco journal, you would not need to provide as much background information about vaping.
Information about a journal’s core readership can be found on its website, usually in a section called “About this journal” or something similar. For example, the Journal of Cancer Education presents such information on the “Aims and Scope” page of its website, which can be found here: https://www.springer.com/journal/13187/aims-and-scope .
Peer reviewer guidelines from your target journal are an additional resource that can help you tailor your writing to the journal and provide additional advice about crafting an effective article [ 1 ]. These are not always available, but it is worth a quick web search to find out.
Identify Author Roles Early in the Process
Early in the writing process, identify authors, determine the order of authors, and discuss the responsibilities of each author. Standard author responsibilities have been identified by The International Committee of Medical Journal Editors (ICMJE) [ 2 ]. To set clear expectations about each team member’s responsibilities and prevent errors in communication, we also suggest outlining more detailed roles, such as who will draft each section of the manuscript, write the abstract, submit the paper electronically, serve as corresponding author, and write the cover letter. It is best to formalize this agreement in writing after discussing it, circulating the document to the author team for approval. We suggest creating a title page on which all authors are listed in the agreed-upon order. It may be necessary to adjust authorship roles and order during the development of the paper. If a new author order is agreed upon, be sure to update the title page in the manuscript draft.
In the case where multiple papers will result from a single study, authors should discuss who will author each paper. Additionally, authors should agree on a deadline for each paper and the lead author should take responsibility for producing an initial draft by this deadline.
Structure of the Introduction Section
The introduction section should be approximately three to five paragraphs in length. Look at examples from your target journal to decide the appropriate length. This section should include the elements shown in Fig. 1 . Begin with a general context, narrowing to the specific focus of the paper. Include five main elements: why your research is important, what is already known about the topic, the “gap” or what is not yet known about the topic, why it is important to learn the new information that your research adds, and the specific research aim(s) that your paper addresses. Your research aim should address the gap you identified. Be sure to add enough background information to enable readers to understand your study. Table 1 provides common introduction section pitfalls and recommendations for addressing them.
The main elements of the introduction section of an original research article. Often, the elements overlap
The purpose of the methods section is twofold: to explain how the study was done in enough detail to enable its replication and to provide enough contextual detail to enable readers to understand and interpret the results. In general, the essential elements of a methods section are the following: a description of the setting and participants, the study design and timing, the recruitment and sampling, the data collection process, the dataset, the dependent and independent variables, the covariates, the analytic approach for each research objective, and the ethical approval. The hallmark of an exemplary methods section is the justification of why each method was used. Table 2 provides common methods section pitfalls and recommendations for addressing them.
The focus of the results section should be associations, or lack thereof, rather than statistical tests. Two considerations should guide your writing here. First, the results should present answers to each part of the research aim. Second, return to the methods section to ensure that the analysis and variables for each result have been explained.
Begin the results section by describing the number of participants in the final sample and details such as the number who were approached to participate, the proportion who were eligible and who enrolled, and the number of participants who dropped out. The next part of the results should describe the participant characteristics. After that, you may organize your results by the aim or by putting the most exciting results first. Do not forget to report your non-significant associations. These are still findings.
Tables and figures capture the reader’s attention and efficiently communicate your main findings [ 3 ]. Each table and figure should have a clear message and should complement, rather than repeat, the text. Tables and figures should communicate all salient details necessary for a reader to understand the findings without consulting the text. Include information on comparisons and tests, as well as information about the sample and timing of the study in the title, legend, or in a footnote. Note that figures are often more visually interesting than tables, so if it is feasible to make a figure, make a figure. To avoid confusing the reader, either avoid abbreviations in tables and figures, or define them in a footnote. Note that there should not be citations in the results section and you should not interpret results here. Table 3 provides common results section pitfalls and recommendations for addressing them.
Opposite the introduction section, the discussion should take the form of a right-side-up triangle beginning with interpretation of your results and moving to general implications (Fig. 2 ). This section typically begins with a restatement of the main findings, which can usually be accomplished with a few carefully-crafted sentences.
Major elements of the discussion section of an original research article. Often, the elements overlap
Next, interpret the meaning or explain the significance of your results, lifting the reader’s gaze from the study’s specific findings to more general applications. Then, compare these study findings with other research. Are these findings in agreement or disagreement with those from other studies? Does this study impart additional nuance to well-accepted theories? Situate your findings within the broader context of scientific literature, then explain the pathways or mechanisms that might give rise to, or explain, the results.
Journals vary in their approach to strengths and limitations sections: some are embedded paragraphs within the discussion section, while some mandate separate section headings. Keep in mind that every study has strengths and limitations. Candidly reporting yours helps readers to correctly interpret your research findings.
The next element of the discussion is a summary of the potential impacts and applications of the research. Should these results be used to optimally design an intervention? Does the work have implications for clinical protocols or public policy? These considerations will help the reader to further grasp the possible impacts of the presented work.
Finally, the discussion should conclude with specific suggestions for future work. Here, you have an opportunity to illuminate specific gaps in the literature that compel further study. Avoid the phrase “future research is necessary” because the recommendation is too general to be helpful to readers. Instead, provide substantive and specific recommendations for future studies. Table 4 provides common discussion section pitfalls and recommendations for addressing them.
Follow the Journal’s Author Guidelines
After you select a target journal, identify the journal’s author guidelines to guide the formatting of your manuscript and references. Author guidelines will often (but not always) include instructions for titles, cover letters, and other components of a manuscript submission. Read the guidelines carefully. If you do not follow the guidelines, your article will be sent back to you.
Finally, do not submit your paper to more than one journal at a time. Even if this is not explicitly stated in the author guidelines of your target journal, it is considered inappropriate and unprofessional.
Your title should invite readers to continue reading beyond the first page [ 4 , 5 ]. It should be informative and interesting. Consider describing the independent and dependent variables, the population and setting, the study design, the timing, and even the main result in your title. Because the focus of the paper can change as you write and revise, we recommend you wait until you have finished writing your paper before composing the title.
Be sure that the title is useful for potential readers searching for your topic. The keywords you select should complement those in your title to maximize the likelihood that a researcher will find your paper through a database search. Avoid using abbreviations in your title unless they are very well known, such as SNP, because it is more likely that someone will use a complete word rather than an abbreviation as a search term to help readers find your paper.
After you have written a complete draft, use the checklist (Fig. 3 ) below to guide your revisions and editing. Additional resources are available on writing the abstract and citing references [ 5 ]. When you feel that your work is ready, ask a trusted colleague or two to read the work and provide informal feedback. The box below provides a checklist that summarizes the key points offered in this article.
Checklist for manuscript quality
Michalek AM (2014) Down the rabbit hole…advice to reviewers. J Cancer Educ 29:4–5
Article Google Scholar
International Committee of Medical Journal Editors. Defining the role of authors and contributors: who is an author? http://www.icmje.org/recommendations/browse/roles-and-responsibilities/defining-the-role-of-authosrs-and-contributors.html . Accessed 15 January, 2020
Vetto JT (2014) Short and sweet: a short course on concise medical writing. J Cancer Educ 29(1):194–195
Brett M, Kording K (2017) Ten simple rules for structuring papers. PLoS ComputBiol. https://doi.org/10.1371/journal.pcbi.1005619
Lang TA (2017) Writing a better research article. J Public Health Emerg. https://doi.org/10.21037/jphe.2017.11.06
Ella August is grateful to the Sustainable Sciences Institute for mentoring her in training researchers on writing and publishing their research.
Authors and affiliations.
Department of Maternal and Child Health, University of North Carolina Gillings School of Global Public Health, 135 Dauer Dr, 27599, Chapel Hill, NC, USA
Clara Busse & Ella August
Department of Epidemiology, University of Michigan School of Public Health, 1415 Washington Heights, Ann Arbor, MI, 48109-2029, USA
You can also search for this author in PubMed Google Scholar
Correspondence to Ella August .
Conflicts of interests.
The authors declare that they have no conflict of interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
(PDF 362 kb)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .
Reprints and Permissions
About this article
Busse, C., August, E. How to Write and Publish a Research Paper for a Peer-Reviewed Journal. J Canc Educ 36 , 909–913 (2021). https://doi.org/10.1007/s13187-020-01751-z
Published : 30 April 2020
Issue Date : October 2021
DOI : https://doi.org/10.1007/s13187-020-01751-z
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Scientific writing
- Find a journal
- Publish with us
You are using an outdated browser . Please upgrade your browser today !
How to Write and Publish a Research Paper in 7 Steps
What comes next after you're done with your research? Publishing the results in a journal of course! We tell you how to present your work in the best way possible.
This post is part of a series, which serves to provide hands-on information and resources for authors and editors.
Things have gotten busy in scholarly publishing: These days, a new article gets published in the 50,000 most important peer-reviewed journals every few seconds, while each one takes on average 40 minutes to read. Hundreds of thousands of papers reach the desks of editors and reviewers worldwide each year and 50% of all submissions end up rejected at some stage.
In a nutshell: there is a lot of competition, and the people who decide upon the fate of your manuscript are short on time and overworked. But there are ways to make their lives a little easier and improve your own chances of getting your work published!
Well, it may seem obvious, but before submitting an academic paper, always make sure that it is an excellent reflection of the research you have done and that you present it in the most professional way possible. Incomplete or poorly presented manuscripts can create a great deal of frustration and annoyance for editors who probably won’t even bother wasting the time of the reviewers!
This post will discuss 7 steps to the successful publication of your research paper:
- Check whether your research is publication-ready
- Choose an article type
- Choose a journal
- Construct your paper
- Decide the order of authors
- Check and double-check
- Submit your paper
1. Check Whether Your Research Is Publication-Ready
Should you publish your research at all?
If your work holds academic value – of course – a well-written scholarly article could open doors to your research community. However, if you are not yet sure, whether your research is ready for publication, here are some key questions to ask yourself depending on your field of expertise:
- Have you done or found something new and interesting? Something unique?
- Is the work directly related to a current hot topic?
- Have you checked the latest results or research in the field?
- Have you provided solutions to any difficult problems?
- Have the findings been verified?
- Have the appropriate controls been performed if required?
- Are your findings comprehensive?
If the answers to all relevant questions are “yes”, you need to prepare a good, strong manuscript. Remember, a research paper is only useful if it is clearly understood, reproducible and if it is read and used .
2. Choose An Article Type
The first step is to determine which type of paper is most appropriate for your work and what you want to achieve. The following list contains the most important, usually peer-reviewed article types in the natural sciences:
Full original research papers disseminate completed research findings. On average this type of paper is 8-10 pages long, contains five figures, and 25-30 references. Full original research papers are an important part of the process when developing your career.
Review papers present a critical synthesis of a specific research topic. These papers are usually much longer than original papers and will contain numerous references. More often than not, they will be commissioned by journal editors. Reviews present an excellent way to solidify your research career.
Letters, Rapid or Short Communications are often published for the quick and early communication of significant and original advances. They are much shorter than full articles and usually limited in length by the journal. Journals specifically dedicated to short communications or letters are also published in some fields. In these the authors can present short preliminary findings before developing a full-length paper.
3. Choose a Journal
Are you looking for the right place to publish your paper? Find out here whether a De Gruyter journal might be the right fit.
Submit to journals that you already read, that you have a good feel for. If you do so, you will have a better appreciation of both its culture and the requirements of the editors and reviewers.
Other factors to consider are:
- The specific subject area
- The aims and scope of the journal
- The type of manuscript you have written
- The significance of your work
- The reputation of the journal
- The reputation of the editors within the community
- The editorial/review and production speeds of the journal
- The community served by the journal
- The coverage and distribution
- The accessibility ( open access vs. closed access)
4. Construct Your Paper
Each element of a paper has its purpose, so you should make these sections easy to index and search.
Don’t forget that requirements can differ highly per publication, so always make sure to apply a journal’s specific instructions – or guide – for authors to your manuscript, even to the first draft (text layout, paper citation, nomenclature, figures and table, etc.) It will save you time, and the editor’s.
Also, even in these days of Internet-based publishing, space is still at a premium, so be as concise as possible. As a good journalist would say: “Never use three words when one will do!”
Let’s look at the typical structure of a full research paper, but bear in mind certain subject disciplines may have their own specific requirements so check the instructions for authors on the journal’s home page.
4.1 The Title
It’s important to use the title to tell the reader what your paper is all about! You want to attract their attention, a bit like a newspaper headline does. Be specific and to the point. Keep it informative and concise, and avoid jargon and abbreviations (unless they are universally recognized like DNA, for example).
4.2 The Abstract
This could be termed as the “advertisement” for your article. Make it interesting and easily understood without the reader having to read the whole article. Be accurate and specific, and keep it as brief and concise as possible. Some journals (particularly in the medical fields) will ask you to structure the abstract in distinct, labeled sections, which makes it even more accessible.
A clear abstract will influence whether or not your work is considered and whether an editor should invest more time on it or send it for review.
Keywords are used by abstracting and indexing services, such as PubMed and Web of Science. They are the labels of your manuscript, which make it “searchable” online by other researchers.
Include words or phrases (usually 4-8) that are closely related to your topic but not “too niche” for anyone to find them. Make sure to only use established abbreviations. Think about what scientific terms and its variations your potential readers are likely to use and search for. You can also do a test run of your selected keywords in one of the common academic search engines. Do similar articles to your own appear? Yes? Then that’s a good sign.
This first part of the main text should introduce the problem, as well as any existing solutions you are aware of and the main limitations. Also, state what you hope to achieve with your research.
Do not confuse the introduction with the results, discussion or conclusion.
Every research article should include a detailed Methods section (also referred to as “Materials and Methods”) to provide the reader with enough information to be able to judge whether the study is valid and reproducible.
Include detailed information so that a knowledgeable reader can reproduce the experiment. However, use references and supplementary materials to indicate previously published procedures.
In this section, you will present the essential or primary results of your study. To display them in a comprehensible way, you should use subheadings as well as illustrations such as figures, graphs, tables and photos, as appropriate.
Here you should tell your readers what the results mean .
Do state how the results relate to the study’s aims and hypotheses and how the findings relate to those of other studies. Explain all possible interpretations of your findings and the study’s limitations.
Do not make “grand statements” that are not supported by the data. Also, do not introduce any new results or terms. Moreover, do not ignore work that conflicts or disagrees with your findings. Instead …
Be brave! Address conflicting study results and convince the reader you are the one who is correct.
Your conclusion isn’t just a summary of what you’ve already written. It should take your paper one step further and answer any unresolved questions.
Sum up what you have shown in your study and indicate possible applications and extensions. The main question your conclusion should answer is: What do my results mean for the research field and my community?
4.9 Acknowledgments and Ethical Statements
It is extremely important to acknowledge anyone who has helped you with your paper, including researchers who supplied materials or reagents (e.g. vectors or antibodies); and anyone who helped with the writing or English, or offered critical comments about the content.
Learn more about academic integrity in our blog post “Scholarly Publication Ethics: 4 Common Mistakes You Want To Avoid” .
Remember to state why people have been acknowledged and ask their permission . Ensure that you acknowledge sources of funding, including any grant or reference numbers.
Furthermore, if you have worked with animals or humans, you need to include information about the ethical approval of your study and, if applicable, whether informed consent was given. Also, state whether you have any competing interests regarding the study (e.g. because of financial or personal relationships.)
The end is in sight, but don’t relax just yet!
De facto, there are often more mistakes in the references than in any other part of the manuscript. It is also one of the most annoying and time-consuming problems for editors.
Remember to cite the main scientific publications on which your work is based. But do not inflate the manuscript with too many references. Avoid excessive – and especially unnecessary – self-citations. Also, avoid excessive citations of publications from the same institute or region.
5. Decide the Order of Authors
In the sciences, the most common way to order the names of the authors is by relative contribution.
Generally, the first author conducts and/or supervises the data analysis and the proper presentation and interpretation of the results. They put the paper together and usually submit the paper to the journal.
Co-authors make intellectual contributions to the data analysis and contribute to data interpretation. They review each paper draft. All of them must be able to present the paper and its results, as well as to defend the implications and discuss study limitations.
Do not leave out authors who should be included or add “gift authors”, i.e. authors who did not contribute significantly.
6. Check and Double-Check
As a final step before submission, ask colleagues to read your work and be constructively critical .
Make sure that the paper is appropriate for the journal – take a last look at their aims and scope. Check if all of the requirements in the instructions for authors are met.
Ensure that the cited literature is balanced. Are the aims, purpose and significance of the results clear?
Conduct a final check for language, either by a native English speaker or an editing service.
7. Submit Your Paper
When you and your co-authors have double-, triple-, quadruple-checked the manuscript: submit it via e-mail or online submission system. Along with your manuscript, submit a cover letter, which highlights the reasons why your paper would appeal to the journal and which ensures that you have received approval of all authors for submission.
It is up to the editors and the peer-reviewers now to provide you with their (ideally constructive and helpful) comments and feedback. Time to take a breather!
If the paper gets rejected, do not despair – it happens to literally everybody. If the journal suggests major or minor revisions, take the chance to provide a thorough response and make improvements as you see fit. If the paper gets accepted, congrats!
It’s now time to get writing and share your hard work – good luck!
If you are interested, check out this related blog post
[Title Image by Nick Morrison via Unsplash]
David Sleeman worked as Senior Journals Manager in the field of Physical Sciences at De Gruyter.
You might also be interested in
Academia & Publishing
The Evolving Role of the Academic Librarian – From Traditional to Digital
How to blog about your research, how the ban on affirmative action affects black librarianship, visit our shop.
De Gruyter publishes over 1,300 new book titles each year and more than 750 journals in the humanities, social sciences, medicine, mathematics, engineering, computer sciences, natural sciences, and law.
Pin It on Pinterest
Publishing our work allows us to share ideas and work collaboratively to advance the field of computer science.
Filters Reset filters
Algorithms and theory.
Google’s mission presents many exciting algorithmic and optimization challenges across different product areas including Search, Ads, Social, and Google Infrastructure. These include optimizing internal systems such as scheduling the machines that power the numerous computations done each day, as well as optimizations that affect core products and users, from online allocation of ads to page-views to automatic management of ad campaigns, and from clustering large-scale graphs to finding best paths in transportation networks. Other than employing new algorithmic ideas to impact millions of users, Google researchers contribute to the state-of-the-art research in these areas by publishing in top conferences and journals.
Google is deeply engaged in Data Management research across a variety of topics with deep connections to Google products. We are building intelligent systems to discover, annotate, and explore structured data from the Web, and to surface them creatively through Google products, such as Search (e.g., structured snippets , Docs, and many others). The overarching goal is to create a plethora of structured data on the Web that maximally help Google users consume, interact and explore information. Through those projects, we study various cutting-edge data management research issues including information extraction and integration, large scale data analysis, effective data exploration, etc., using a variety of techniques, such as information retrieval, data mining and machine learning.
A major research effort involves the management of structured data within the enterprise. The goal is to discover, index, monitor, and organize this type of data in order to make it easier to access high-quality datasets. This type of data carries different, and often richer, semantics than structured data on the Web, which in turn raises new opportunities and technical challenges in their management.
Furthermore, Data Management research across Google allows us to build technologies that power Google's largest businesses through scalable, reliable, fast, and general-purpose infrastructure for large-scale data processing as a service. Some examples of such technologies include F1 , the database serving our ads infrastructure; Mesa , a petabyte-scale analytic data warehousing system; and Dremel , for petabyte-scale data processing with interactive response times. Dremel is available for external customers to use as part of Google Cloud’s BigQuery .
Data Mining and Modeling
The proliferation of machine learning means that learned classifiers lie at the core of many products across Google. However, questions in practice are rarely so clean as to just to use an out-of-the-box algorithm. A big challenge is in developing metrics, designing experimental methodologies, and modeling the space to create parsimonious representations that capture the fundamentals of the problem. These problems cut across Google’s products and services, from designing experiments for testing new auction algorithms to developing automated metrics to measure the quality of a road map.
Data mining lies at the heart of many of these questions, and the research done at Google is at the forefront of the field. Whether it is finding more efficient algorithms for working with massive data sets, developing privacy-preserving methods for classification, or designing new machine learning approaches, our group continues to push the boundary of what is possible.
Distributed Systems and Parallel Computing
No matter how powerful individual computers become, there are still reasons to harness the power of multiple computational units, often spread across large geographic areas. Sometimes this is motivated by the need to collect data from widely dispersed locations (e.g., web pages from servers, or sensors for weather or traffic). Other times it is motivated by the need to perform enormous computations that simply cannot be done by a single CPU.
From our company’s beginning, Google has had to deal with both issues in our pursuit of organizing the world’s information and making it universally accessible and useful. We continue to face many exciting distributed systems and parallel computing challenges in areas such as concurrency control, fault tolerance, algorithmic efficiency, and communication. Some of our research involves answering fundamental theoretical questions, while other researchers and engineers are engaged in the construction of systems to operate at the largest possible scale, thanks to our hybrid research model .
Economics and Electronic Commerce
Google is a global leader in electronic commerce. Not surprisingly, it devotes considerable attention to research in this area. Topics include 1) auction design, 2) advertising effectiveness, 3) statistical methods, 4) forecasting and prediction, 5) survey research, 6) policy analysis and a host of other topics. This research involves interdisciplinary collaboration among computer scientists, economists, statisticians, and analytic marketing researchers both at Google and academic institutions around the world.
A major challenge is in solving these problems at very large scales. For example, the advertising market has billions of transactions daily, spread across millions of advertisers. It presents a unique opportunity to test and refine economic principles as applied to a very large number of interacting, self-interested parties with a myriad of objectives.
It is remarkable how some of the fundamental problems Google grapples with are also some of the hardest research problems in the academic community. At Google, this research translates direction into practice, influencing how production systems are designed and used.
Our Education Innovation research area includes publications on: online learning at scale, educational technology (which is any technology that supports teaching and learning), curriculum and programming tools for computer science education, diversity and broadening participation in computer science the hiring and onboarding process at Google.
We aim to transform scientific research itself. Many scientific endeavors can benefit from large scale experimentation, data gathering, and machine learning (including deep learning). We aim to accelerate scientific research by applying Google’s computational power and techniques in areas such as drug discovery, biological pathway modeling, microscopy, medical diagnostics, material science, and agriculture. We collaborate closely with world-class research partners to help solve important problems with large scientific or humanitarian benefit.
Hardware and Architecture
The machinery that powers many of our interactions today — Web search, social networking, email, online video, shopping, game playing — is made of the smallest and the most massive computers. The smallest part is your smartphone, a machine that is over ten times faster than the iconic Cray-1 supercomputer. The capabilities of these remarkable mobile devices are amplified by orders of magnitude through their connection to Web services running on building-sized computing systems that we call Warehouse-scale computers (WSCs).
Google’s engineers and researchers have been pioneering both WSC and mobile hardware technology with the goal of providing Google programmers and our Cloud developers with a unique computing infrastructure in terms of scale, cost-efficiency, energy-efficiency, resiliency and speed. The tight collaboration among software, hardware, mechanical, electrical, environmental, thermal and civil engineers result in some of the most impressive and efficient computers in the world.
Human-Computer Interaction and Visualization
HCI researchers at Google have enormous potential to impact the experience of Google users as well as conduct innovative research. Grounded in user behavior understanding and real use, Google’s HCI researchers invent, design, build and trial large-scale interactive systems in the real world. We declare success only when we positively impact our users and user communities, often through new and improved Google products. HCI research has fundamentally contributed to the design of Search, Gmail, Docs, Maps, Chrome, Android, YouTube, serving over a billion daily users. We are engaged in a variety of HCI disciplines such as predictive and intelligent user interface technologies and software, mobile and ubiquitous computing, social and collaborative computing, interactive visualization and visual analytics. Many projects heavily incorporate machine learning with HCI, and current projects include predictive user interfaces; recommenders for content, apps, and activities; smart input and prediction of text on mobile devices; user engagement analytics; user interface development tools; and interactive visualization of complex data.
Information Retrieval and the Web
The science surrounding search engines is commonly referred to as information retrieval, in which algorithmic principles are developed to match user interests to the best information about those interests.
Google started as a result of our founders' attempt to find the best matching between the user queries and Web documents, and do it really fast. During the process, they uncovered a few basic principles: 1) best pages tend to be those linked to the most; 2) best description of a page is often derived from the anchor text associated with the links to a page. Theories were developed to exploit these principles to optimize the task of retrieving the best documents for a user query.
Search and Information Retrieval on the Web has advanced significantly from those early days: 1) the notion of "information" has greatly expanded from documents to much richer representations such as images, videos, etc., 2) users are increasingly searching on their Mobile devices with very different interaction characteristics from search on the Desktops; 3) users are increasingly looking for direct information, such as answers to a question, or seeking to complete tasks, such as appointment booking. Through our research, we are continuing to enhance and refine the world's foremost search engine by aiming to scientifically understand the implications of those changes and address new challenges that they bring.
Google is at the forefront of innovation in Machine Intelligence, with active research exploring virtually all aspects of machine learning, including deep learning and more classical algorithms. Exploring theory as well as application, much of our work on language, speech, translation, visual processing, ranking and prediction relies on Machine Intelligence. In all of those tasks and many others, we gather large volumes of direct or indirect evidence of relationships of interest, applying learning algorithms to understand and generalize.
Machine Intelligence at Google raises deep scientific and engineering challenges, allowing us to contribute to the broader academic research community through technical talks and publications in major conferences and journals. Contrary to much of current theory and practice, the statistics of the data we observe shifts rapidly, the features of interest change as well, and the volume of data often requires enormous computation capacity. When learning systems are placed at the core of interactive services in a fast changing and sometimes adversarial environment, combinations of techniques including deep learning and statistical models need to be combined with ideas from control and game theory.
Research in machine perception tackles the hard problems of understanding images, sounds, music and video. In recent years, our computers have become much better at such tasks, enabling a variety of new applications such as: content-based search in Google Photos and Image Search, natural handwriting interfaces for Android, optical character recognition for Google Drive documents, and recommendation systems that understand music and YouTube videos. Our approach is driven by algorithms that benefit from processing very large, partially-labeled datasets using parallel computing clusters. A good example is our recent work on object recognition using a novel deep convolutional neural network architecture known as Inception that achieves state-of-the-art results on academic benchmarks and allows users to easily search through their large collection of Google Photos. The ability to mine meaningful information from multimedia is broadly applied throughout Google.
Machine Translation is an excellent example of how cutting-edge research and world-class infrastructure come together at Google. We focus our research efforts on developing statistical translation techniques that improve with more data and generalize well to new languages. Our large scale computing infrastructure allows us to rapidly experiment with new models trained on web-scale data to significantly improve translation quality. This research backs the translations served at translate.google.com, allowing our users to translate text, web pages and even speech. Deployed within a wide range of Google services like GMail , Books , Android and web search , Google Translate is a high-impact, research-driven product that bridges language barriers and makes it possible to explore the multilingual web in 90 languages. Exciting research challenges abound as we pursue human quality translation and develop machine translation systems for new languages.
Mobile devices are the prevalent computing device in many parts of the world, and over the coming years it is expected that mobile Internet usage will outpace desktop usage worldwide. Google is committed to realizing the potential of the mobile web to transform how people interact with computing technology. Google engineers and researchers work on a wide range of problems in mobile computing and networking, including new operating systems and programming platforms (such as Android and ChromeOS); new interaction paradigms between people and devices; advanced wireless communications; and optimizing the web for mobile settings. In addition, many of Google’s core product teams, such as Search, Gmail, and Maps, have groups focused on optimizing the mobile experience, making it faster and more seamless. We take a cross-layer approach to research in mobile systems and networking, cutting across applications, networks, operating systems, and hardware. The tremendous scale of Google’s products and the Android and Chrome platforms make this a very exciting place to work on these problems.
Some representative projects include mobile web performance optimization, new features in Android to greatly reduce network data usage and energy consumption; new platforms for developing high performance web applications on mobile devices; wireless communication protocols that will yield vastly greater performance over today’s standards; and multi-device interaction based on Android, which is now available on a wide variety of consumer electronics.
Natural Language Processing
Natural Language Processing (NLP) research at Google focuses on algorithms that apply at scale, across languages, and across domains. Our systems are used in numerous ways across Google, impacting user experience in search, mobile, apps, ads, translate and more.
Our work spans the range of traditional NLP tasks, with general-purpose syntax and semantic algorithms underpinning more specialized systems. We are particularly interested in algorithms that scale well and can be run efficiently in a highly distributed environment.
Our syntactic systems predict part-of-speech tags for each word in a given sentence, as well as morphological features such as gender and number. They also label relationships between words, such as subject, object, modification, and others. We focus on efficient algorithms that leverage large amounts of unlabeled data, and recently have incorporated neural net technology.
On the semantic side, we identify entities in free text, label them with types (such as person, location, or organization), cluster mentions of those entities within and across documents (coreference resolution), and resolve the entities to the Knowledge Graph.
Recent work has focused on incorporating multiple sources of knowledge and information to aid with analysis of text, as well as applying frame semantics at the noun phrase, sentence, and document level.
Networking is central to modern computing, from connecting cell phones to massive Cloud-based data stores to the interconnect for data centers that deliver seamless storage and fine-grained distributed computing at the scale of entire buildings. With an understanding that our distributed computing infrastructure is a key differentiator for the company, Google has long focused on building network infrastructure to support our scale, availability, and performance needs.
Our research combines building and deploying novel networking systems at massive scale, with recent work focusing on fundamental questions around data center architecture, wide area network interconnects, Software Defined Networking control and management infrastructure, as well as congestion control and bandwidth allocation. By publishing our findings at premier research venues, we continue to engage both academic and industrial partners to further the state of the art in networked systems.
Quantum Computing merges two great scientific revolutions of the 20th century: computer science and quantum physics. Quantum physics is the theoretical basis of the transistor, the laser, and other technologies which enabled the computing revolution. But on the algorithmic level, today's computing machinery still operates on "classical" Boolean logic. Quantum Computing is the design of hardware and software that replaces Boolean logic by quantum law at the algorithmic level. For certain computations such as optimization, sampling, search or quantum simulation this promises dramatic speedups. We are particularly interested in applying quantum computing to artificial intelligence and machine learning. This is because many tasks in these areas rely on solving hard optimization problems or performing efficient sampling.
Having a machine learning agent interact with its environment requires true unsupervised learning, skill acquisition, active learning, exploration and reinforcement, all ingredients of human learning that are still not well understood or exploited through the supervised approaches that dominate deep learning today.
Our goal is to improve robotics via machine learning, and improve machine learning via robotics. We foster close collaborations between machine learning researchers and roboticists to enable learning at scale on real and simulated robotic systems.
Security, Privacy and Abuse Prevention
The Internet and the World Wide Web have brought many changes that provide huge benefits, in particular by giving people easy access to information that was previously unavailable, or simply hard to find. Unfortunately, these changes have raised many new challenges in the security of computer systems and the protection of information against unauthorized access and abusive usage. At Google, our primary focus is the user, and his/her safety. We have people working on nearly every aspect of security, privacy, and anti-abuse including access control and information security, networking, operating systems, language design, cryptography, fraud detection and prevention, spam and abuse detection, denial of service, anonymity, privacy-preserving systems, disclosure controls, as well as user interfaces and other human-centered aspects of security and privacy. Our security and privacy efforts cover a broad range of systems including mobile, cloud, distributed, sensors and embedded systems, and large-scale machine learning.
Delivering Google's products to our users requires computer systems that have a scale previously unknown to the industry. Building on our hardware foundation, we develop technology across the entire systems stack, from operating system device drivers all the way up to multi-site software systems that run on hundreds of thousands of computers. We design, build and operate warehouse-scale computer systems that are deployed across the globe. We build storage systems that scale to exabytes, approach the performance of RAM, and never lose a byte. We design algorithms that transform our understanding of what is possible. Thanks to the distributed systems we provide our developers, they are some of the most productive in the industry. And we write and publish research papers to share what we have learned, and because peer feedback and interaction helps us build better systems that benefit everybody.
Our goal in Speech Technology Research is to make speaking to devices--those around you, those that you wear, and those that you carry with you--ubiquitous and seamless.
Our research focuses on what makes Google unique: computing scale and data. Using large scale computing resources pushes us to rethink the architecture and algorithms of speech recognition, and experiment with the kind of methods that have in the past been considered prohibitively expensive. We also look at parallelism and cluster computing in a new light to change the way experiments are run, algorithms are developed and research is conducted. The field of speech recognition is data-hungry, and using more and more data to tackle a problem tends to help performance but poses new challenges: how do you deal with data overload? How do you leverage unsupervised and semi-supervised techniques at scale? Which class of algorithms merely compensate for lack of data and which scale well with the task at hand? Increasingly, we find that the answers to these questions are surprising, and steer the whole field into directions that would never have been considered, were it not for the availability of significantly higher orders of magnitude of data.
We are also in a unique position to deliver very user-centric research. Researchers are able to conduct live experiments to test and benchmark new algorithms directly in a realistic controlled environment. Whether these are algorithmic performance improvements or user experience and human-computer interaction studies, we focus on solving real problems and with real impact for users.
We have a huge commitment to the diversity of our users, and have made it a priority to deliver the best performance to every language on the planet. We currently have systems operating in more than 55 languages, and we continue to expand our reach to more users. The challenges of internationalizing at scale is immense and rewarding. Many speakers of the languages we reach have never had the experience of speaking to a computer before, and breaking this new ground brings up new research on how to better serve this wide variety of users. Combined with the unprecedented translation capabilities of Google Translate, we are now at the forefront of research in speech-to-speech translation and one step closer to a universal translator.
Indexing and transcribing the web’s audio content is another challenge we have set for ourselves, and is nothing short of gargantuan, both in scope and difficulty. The videos uploaded every day on YouTube range from lectures, to newscasts, music videos and, of course, cat videos. Making sense of them takes the challenges of noise robustness, music recognition, speaker segmentation, language detection to new levels of difficulty. The potential payoff is immense: imagine making every lecture on the web accessible to every language. This is the kind of impact for which we are striving.
Health & Bioscience
Research in health and biomedical sciences has a unique potential to improve peoples’ lives, and includes work ranging from basic science that aims to understand biology, to diagnosing individuals’ diseases, to epidemiological studies of whole populations.
We recognize that our strengths in machine learning, large-scale computing, and human-computer interaction can help accelerate the progress of research in this space. By collaborating with world-class institutions and researchers and engaging in both early-stage research and late-stage work, we hope to help people live healthier, longer, and more productive lives.
///::author.name/// ///::author.name/// ,
Learn more about how we do research
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work
- Publication Process
How to Get an Article Published: Checklist
- 4 minute read
- 19.4K views
Table of Contents
It’s sometimes said that writing your article is the easiest part of the process. Publishing it, and getting it into the hands of other researchers and interested parties, is another matter, altogether.
In this post, we’ll go over details on how to get an article published. We’ll also include a checklist for manuscript submission. When you follow these guidelines, you will increase your chances of getting your hard work published in a journal.
Our first step will be to make sure you are submitting your article to the right journal, and following their guidelines. So, it’s important that we include a disclaimer here. Every journal will have different procedures and requirements, so it’s critical that you follow their guidelines exactly. This article is simply meant to be a general guide.
How to Get Published in a Journal
Before you even consider getting published, it’s critical that you select the right journal. Is your article a good fit? You can find out by looking at the journal’s scope and aim, as well as evaluating recent articles. Do they compliment your work? Would your article “make sense” in the context of the other articles being published? Do you want to publish open access? If so, make sure your chosen journal has that option.
Checklist Journal Guidelines
If you have the right journal for you, simply follow these checklist journal guidelines. Again, make sure you check with your specific journal to make sure you’re following their procedure.
1 – Write your paper in accordance to the Instructions for Authors (IFAs)
These will tell you everything that you need to know about what the editorial board of your journal wants to see in your article, including details like style guides and word counts.
2 – Is your article well-written?
Make sure that it flows logically, you’ve described your research completely, and that you’re using clear and accessible language. Have a friend, peer or editing expert make suggestions that will improve the overall quality and clarity of your paper.
3 – Double-check citations
Double, then triple-check the accuracy of your citations. Have any of the articles you’ve cited been withdrawn? Do you have citations that support your text? Do you have any needed permissions for copyrighted material?
4 – Write a killer title and abstract
Here’s your chance to get your audience’s attention. You want to make your title descriptive and catchy, but not overly so. Likewise, with your abstract; think of it as a way to “sell” your article, so that the reader of your abstract will want to explore the rest of your paper.
5 – Find your keywords
Help your audience find you and your work by selecting effective keywords. Consider keyword research to find the most popularly searched phrases that you can include naturally within your abstract, article description, and even the title.
6 – Double-check IFAs
As mentioned previously, before you even begin writing your article, if possible know the IFAs of your chosen journal. Many journal’s offer templates to help you with formatting, and instructions on style, reference section requirements, etc.
7 – Write a Killer Cover Letter
One of the most overlooked elements of the publishing process, your cover letter can make or break your success to publishing in a journal. You want to explain, in a creative way, why your paper is the perfect article for the journal, and why the editorial board should consider your work for publication. Take your time on this step. There are various templates of successfully published cover letters, and templates available online.
8 – Have everything?
Make sure you have all the files you need. This will depend on what type of peer review the journal is utilizing. For example, for a peer, double-blind review, you’ll need to provide an anonymized version of your manuscript. Additionally, make sure you have all your keywords clearly identified on the title page, attached and included illustrations, figures and captions, as well as tables that include their legends, as well as their description, title and any corresponding footnotes.
9 – Who’s submitting the manuscript?
One author should be designated as the contact or corresponding author. Make sure complete information is included, such as email address, the address of the affiliated organization or site, and the full mailing address for the corresponding author.
10 – Have you included everybody?
Make sure you have all the information you need about any co-authors, including names, contact information, institution, etc. It can be tricky to define authorship, so check out our blog on that here.
11 – Have your ORCiD?
This is an identifier that links all of your work together. Make sure you include this identifier when you submit your work.
12 – Disclosure Statements and Competing Interests
Have any? Make sure these are clearly disclosed when you submit your manuscript. Each journal will have a particular format. Make sure you’re being transparent about any conflicts of interest.
Language Editing Plus
How to Find and Select Reviewers for Journal Articles
- Publication Recognition
How to Network at an Academic Conference
You may also like.
Publishing Biomedical Research: What Rules Should You Follow?
Navigating the Complex Landscape of Predatory Journals
From Pen to Press: Navigating the Manuscript Submission Process
Writing an Effective Cover Letter for Manuscript Resubmission
Journal Acceptance Rates: Everything You Need to Know
Research Data Storage and Retention
How to Appeal a Journal Decision
Input your search keywords and press Enter.
- Skip to page content
We disseminate the results of our work as broadly as possible to benefit the public good. Tens of thousands of RAND publications are available as free downloads, dating back to 1946.
- All Time All Time Last Year
- All Topics All Topics Academic Achievement Academic Achievement Gap Access to Justice Adolescent Health Adolescents Advance Directives Aerospace Afghanistan Africa African American Populations After-School Programs Agricultural Sciences Air Warfare Alaska Alcohol Alzheimer's Disease and Dementias Antimicrobial Resistance Arctic Region Artificial Intelligence Asymmetric Warfare Australia Autism Spectrum Disorders Autonomous Military Vehicles Ballistic Missiles Bibliometrics Biology and Life Sciences Biomedical Research Black Populations Border and Port Security Boston California Canada Cancer Prevention Cancer Treatment Cannabis Capabilities Based Planning Capacity Building Caregivers Change Management Charter Schools Child Abuse and Neglect Child Health Child Well-Being Childhood Development China Chronic Diseases and Conditions Civic Education Civil Rights Civil-Military Relations Civilian Military Workforce Climatology College-Bound Students Colorado Colorectal Cancer Combat Support Operations Commercial Satellites Communication Systems Communication Technology Communities Community Health and Well-Being Community Organizations Community Resilience Community-Based Health Care Community-Based Participatory Research Complex Systems Analysis Coronary Artery Disease Coronavirus Disease 2019 (COVID-19) Cost-Effectiveness in Health Care Counterinsurgency Counterterrorism Court Staffing and Funding Crime Crime and Violence Prevention Criminal Justice Criminal Law Critical Infrastructure Protection Cyber Warfare Cybercrime Cybersecurity Dallas Data Analysis Data Privacy Data Science Decisionmaking Defense Health Agency Defense Infrastructure Delaware Delphi Method Democracy Demographics Denver Depression Developing Countries Digital Equity Disaster Recovery Operations Discriminatory Practices Domestic Intelligence Domestic Violence Drug Markets and Supply Drug Policy and Trends Early Childhood Education East Asia Econometric Models Economic Analysis Methodology Economic Burden of Health Care Economic Development Economic Planning Economic Policy Economics Education Administration Education Curriculum Education Policy Education Reform Educational Equity Educational Institutions Educational Program Evaluation Educational Software Educator Well-Being Electronic Intelligence Electronic Medical Records Electronic Warfare Elementary Education Emergency Medical Services Emergency Preparedness Emergency Services and Response Emerging Technologies Employment and Unemployment End of Life Care Energy Consumption Energy Resources Engineers and Engineering Enlisted Personnel Entrepreneurship Environmental and Natural Resource Management Environmental Equity Environmental Legislation Environmental Sustainability Epidemiology Epilepsy Europe European Union Evidence Based Health Practice Families Family Planning Fee-for-Service for Health Care Female Populations Fentanyl and Other Synthetic Opioids Finance Financial Decision Making Firearms Flooding Florida Food Insecurity Forecasting Methodology Fossil Fuels Foster Care France Game Theory Gastrointestinal Disorders Gender Discrimination Gender Equity Gender Equity in the Workplace Gender Integration in the Military Geopolitical Strategic Competition Gerontology Global Climate Change Global Health Global Security Globalization Government Legislation Gulf Coast States Gun Policy Gun Violence Health and Wellness Promotion Health Behaviors Health Care Access Health Care Costs Health Care Delivery Approaches Health Care Education and Training Health Care Facilities Health Care Financing Health Care Organization and Administration Health Care Pay for Performance Health Care Price Competition Health Care Program Evaluation Health Care Quality Health Care Quality Measurement Health Care Reform Health Care Services Capacity Health Care Technology Health Care Workforce Health Economics Health Equity Health Information Technology Health Information Technology Interoperability Health Insurance Health Insurance Benefit Design Health Insurance Markets Health Interventions Health Legislation Health Risk Behaviors Health Screening Health-Related Quality of Life Hispanic Populations HIV and AIDS HIV Transmission HIV Treatment Homeland Security Homeless Populations Homelessness Honduras Hospice Care Hospitals Households Housing Markets Hurricanes Illegal Drug Trade Imagery Intelligence Immigration and Emigration Impoverished Populations Incarceration India Indian Ocean Infants Infectious Diseases Influenza Information Operations Infrastructure Finance Insurance Claims Integrated Care Intelligence Community International Diplomacy International Economic Relations International Education International Humanitarian Assistance International Humanitarian Law International Law International Organizations International Trade Iran Iraq Israel Japan Joint Operations Juvenile Delinquency Labor Markets Land Warfare Latin America and the Caribbean Latino Populations Law Enforcement Law of the Sea Law of War Legal Case and Court Management LGBTQ+ Populations Literacy Long-Term Care Los Angeles Louisiana Low-Intensity Conflict Lung Cancer Machine Learning Major Combat Operations Male Populations Marriage and Divorce Maternal Health Mathematics Measuring Health Care Costs Media Literacy Medicaid Medical Homes Medical Professionals Medicare Medicare Part D Mental Health and Illness Mental Health Treatment Meteorology and Weather Mexico Mexico City Middle Atlantic States Middle East Migration Military Acquisition and Procurement Military Affairs Military Aircraft Military Budgets and Defense Spending Military Career Field Management Military Caregivers Military Command and Control Military Communication Systems Military Compensation Military Doctrine Military Drones Military Education and Training Military Facilities Military Families Military Force Deployment Military Force Planning Military Health and Health Care Military Information Technology Systems Military Intelligence Military Logistics Military Officers Military Personnel Military Personnel Retention Military Recruitment Military Reserves Military Satellites Military Ships and Naval Vessels Military Spouses Military Strategy Military Technology Military Transformation Military Veterans Minority Populations Minority Students Missile Defense Modeling and Simulation Mortality Musculoskeletal Disorders Nation Building National Defense Authorization Act National Security Legislation National Security Organizations Native American and Alaska Native Populations Natural Hazards Naval Warfare Neighborhood Influences on Health Neighborhoods Netherlands Network Analysis Neurological Disorders New Mexico New York New York City North Atlantic Treaty Organization North Carolina North Korea Nuclear Deterrence Nuclear Disarmament Nuclear Weapons and Warfare Nurses and Nursing Nursing Homes Obstetrics Occupational Safety and Health Occupational Training Occupations Oceania Ohio Oklahoma Older Adults Open Source Intelligence Operational Readiness Opioids Optimization Modeling Outer Space Pacific Ocean Pain Management Pakistan Palliative Care Pandemic Parenting Patient Experience Patient Safety Patient-Centered Care Peacekeeping and Stability Operations Pediatric Medicine Pennsylvania People with Disabilities Performance Measurement Personal Finance Personal Savings Personal Wealth Peru Pharmaceutical Drugs Philippines Physicians Pittsburgh Police-Community Relations Politics and Government Postsecondary Education Posttraumatic Stress Disorder Poverty Pregnancy Prenatal Health Care Preschool Preschool Children Prescription Drug Misuse Preventive Health Care Primary Care Principals Prisoner Reentry Program Evaluation Psychological Warfare Public Health Public Health Preparedness Public Safety Legislation Public Sector Governance Quality Rating and Improvement System (Early Childhood Education) Racial Discrimination Racial Equity Recidivism Refugees Regression Analysis Religion Residential Housing Retirement and Retirement Benefits Return-to-Work Programs and Policies Robust Decision Making Russia School Finance School Security School Violence School-Based Health Care Science, Technology, and Innovation Policy Scientific Professions Secondary Education Security Cooperation Sentencing Sexual Abuse Sexual Assault Sexual Harassment Sleep Small Businesses Smoking Cessation Social and Emotional Learning Social Determinants of Health Social Media Social Media Analysis Social Program Evaluation Social Services and Welfare Socioeconomic Status South Asia South Korea Southeast Asia Soviet Union Space Exploration Space Science and Technology Space Warfare Spain Special Operations Forces Spouse Abuse Standards-Based Education Reform Statistical Analysis Methodology STEM Education Substance Use Substance Use Disorder Treatment Substance Use Disorders Substance Use Harm Reduction Substance Use Prevention Suicide Summer Learning Survey Research Methodology Syria Taiwan Teacher Training Teachers and Teaching Technical Professions Telecommunications Telemedicine Terrorism Risk Management Terrorism Threat Assessment Texas The Islamic State (Terrorist Organization) Threat Assessment Trauma Uganda Ukraine Underserved Students United Kingdom United Nations United States United States Air Force United States Army United States Coast Guard United States Department of Defense United States Marine Corps United States Navy United States Space Force Unmanned Aerial Vehicles Vaccination Value-Based Purchasing in Health Care Veterans Education Veterans Employment Veterans Health Care Victims of Crime Violent Extremism Vocational Education Wages and Compensation Warfare and Military Operations Wargaming Washington Water Resources Management Women's Health Workers' Compensation Workforce Development Workforce Diversity Workforce Management Workplace Discrimination Workplace Well-Being Young Adults
- All Series All Series Commercial Books Conference Proceedings Corporate Publications Dissertations External Publications Infographics Perspectives Presentations Research Briefs Research Reports Testimonies Tools Visualizations
Typologies of Duocentric Networks Among Low-Income Newlywed Couples
This study analyzed combined networks of a couple, collected from 207 mixed-sex newlywed couples We identified five distinct network types with different levels of relationship quality.
Aug 28, 2023
David P. Kennedy, Thomas N. Bradbury, Benjamin Karney
Relationship of POLST to Hospitalization and ICU Care Among Nursing Home Residents in California
This paper links Physician Orders for Life Sustaining Treatment (POLST) document status with actual intensity of care for nursing home residents in California.
David Zingmond, David Powell, Lee A. Jennings, Jose J. Escarce, Li-Jung Liang, Punam Parikh, Neil S. Wenger
Telehealth and In-Person Mental Health Service Utilization and Spending, 2019 to 2022
During the COVID-19 pandemic, telehealth for mental health boomed. Post-pandemic, little is known about mental health service trends. We studied 2019-2022 data on telehealth vs. in-person mental health utilization and spending in US adults pre-2023 PHE expiry.
Jonathan H. Cantor, Ryan K. McBain, Dena M. Bravata, Christopher M. Whaley
Saving the Government Money: Recent Examples from RAND's Defense-Related Federally Funded Research and Development Centers
This publication lists and summarizes recent projects undertaken by RAND's three federally funded research and development centers (FFRDCs) that have helped save the government money or identified ways to do so.
Aug 25, 2023
Migration Narratives in Northern Central America: How Competing Stories Shape Policy and Public Opinion in Guatemala, Honduras, and El Salvador
This report identifies and compares migration narratives within El Salvador, Guatemala, and Honduras to understand how they influence critical policy debates and decisions in the region.
Ariel G. Ruiz Soto, Natalia Banulescu-Bogdan, Aaron Clark-Ginsberg, Alejandra Lopez, Alejandro Vélez Salas
Population Benchmarking for the U.S. Department of the Air Force: Impact of Eligibility Requirements and Propensity to Serve on Demographic Representation
Department of the Air Force accession eligibility criteria affect women and racial and ethnic minority candidates differently than they affect White men. Understanding the eligible population is crucial to growing a diverse workforce.
Aug 24, 2023
Tiffany Berglund, Louis T. Mariano, Christopher E. Maerzluft
Exploring Alternative Health Care Payment Models for California’s Workers’ Compensation System: Alternatives, Recommendations, and Next Steps
Researchers considered possible alternative payment models for use in the California workers’ compensation (WC) system, including discussion of the unique constraints and factors embedded within the California WC environment.
Denise D. Quigley, Petra W. Rasmussen, Nabeel Qureshi, Michael Dworsky, Melony E. Sorbero
The Role of Benchmark Assessments in Coherent Instructional Systems: Findings from the 2022 American Instructional Resources Survey
Researchers draw on the American Instructional Resources Survey to examine the prevalence of benchmark assessments and explore educators' perceptions about the role that benchmark assessments play in their schools' instructional systems.
Aug 23, 2023
Ashley Woo, Melissa Kay Diliberti
Deepfakes and Scientific Knowledge Dissemination
We field the first study to understand the vulnerabilities of education stakeholders to science deepfakes and the characteristics that moderate vulnerability.
Christopher Joseph Doss, Jared Mondschein, Dule Shu, Tal Wolfson, Denise Kopecky, Valerie A. Fitton-Kane, Lance Bush, Conrad Tucker
Evaluation Via Simulation of Statistical Corrections for Network Nonindependence
Many statistical techniques assume independence of data points, but networked data violate this assumption. Our network simulations indicate that eight commonly used techniques intended to correct this problem in networked data fail to do so.
Luke J. Matthews, Megan S. Schuler, Raffaele Vardavas, Joshua Breslau, Ioana Popescu
Accounting for Climate Resilience in Infrastructure Investment Decisionmaking: A Data-Driven Approach for Department of the Air Force Project Prioritization
A new approach allows comparison of infrastructure projects based on their ability to improve installation resilience to climate-related hazards, supporting the Department of the Air Force’s climate resilience investment decisionmaking.
Aug 22, 2023
Anu Narayanan, Scott R. Stephenson, Michael T. Wilson, Maria McCollester, Sarah Weilant, Emmi Yonekura, Sascha Ishikawa, Jay Balagna, Krista Romita Grocholski, Nihar Chhatiawala
Military Academy Students Can Now Retain Parental Rights: Department of Defense Options for Managing the Change
As of 2023, DoD allows enrolled cadets and midshipmen to retain parental rights. This brief explores potential DoD policy changes that could help cadet and midshipman parents care for their children, succeed in school, and become exemplary officers.
Laura L. Miller, Stephanie Rennane, Jaime L. Hastings, Anthony Jacques, Tara Laila Blagg, Daniel B. Ginsberg, Barbara Bicksler
Ensuring Parental Rights of Military Service Academy Cadets and Midshipmen: Policy and Cost Implications
This report characterizes legal, policy, practice, and cost implications of U.S. Department of Defense options to comply with a new congressional requirement allowing service academy cadets and midshipmen who become parents to retain parental rights.
Creating Readiness Metrics for the Army Civilian Workforce: A Way Ahead for Integrating Readiness into Civilian Workforce Planning
The authors develop a definition of and model for measuring civilian workforce readiness to inform Army policies and practices using qualitative methods, with which they identify potential data sources for readiness metrics.
Aug 21, 2023
Irina A. Chindea, Susan M. Gates, Katherine C. Hastings, Jennifer Lamping Lewis, Emmi Yonekura, Samantha Cherney, Christine DeMartini, Molly Dunigan, Jonah Kushner, Barbara Bicksler
Research Careers at the RAND Corporation
This overview describes research careers at the RAND Corporation.
Weakened States Pose Problems for War Scenarios
Any question of conflict between the United States and China must take into account diminishing state legitimacy and capacity, the privatization of violence, and the rise of non-state actors and identities.
Timothy R. Heath
The Role of a State Public Health Initiative in Shaping Provider Knowledge, Beliefs and Practices: A Mixed Methods Study of a Cytomegalovirus Prevention Program
Explores the impact of Utah's public health education intervention to increase pregnancy care provider awareness of congenital cytomegalovirus preventative hygiene measures, knowledge, and counseling behaviors.
Three Essays in Policy Analysis Among Vulnerable Populations
Three essays that explore policy analysis among vulnerable populations.
Income Share Agreements: Market Structure, Communication, and Equity Implications of a Student Loan Alternative
Income share agreements (ISAs) provide a way to finance postsecondary education, but not much is known about how they work. In this report, RAND researchers examine ISA structure, analyze ISA communication, and assess implications for equity.
Melanie A. Zaber, Elizabeth D. Steiner, Hana Gebremariam, Asya Spears, Zohan Hasan Tariq, Katherine Grace Carman
Deterring Russia and Iran: Improving Effectiveness and Finding Efficiencies
To support defense planners in crafting effective deterrence strategies, RAND researchers assess the impact of U.S. forward presence, exercises and short-term deployments, and security cooperation on deterrence in Europe and the Middle East.
Aug 17, 2023
Jeffrey Martini, Andrew Radin, Alyssa Demus, Krystyna Marcinek, Dara Massicot, Katherine Pfrommer, Ashley L. Rhoades, Chandler Sachs, Karen M. Sudkamp, David E. Thaler, David Woodworth, Sean M. Zeigler
Prevalence of Veteran Support for Extremist Groups and Extremist Beliefs: Results from a Nationally Representative Survey of the U.S. Veteran Community 2023
Teachers' Views on School Safety: Consensus on Many Security Measures, But Stark Division About Arming Teachers 2023
The Time for International Space Traffic Management Is Now 2023
America's Opioid Ecosystem: How Leveraging System Interactions Can Help Curb Addiction, Overdose, and Other Harms 2023
Labor Trafficking in the United States: Current and Future Research 2023
The Trade-Offs of Ukraine's Recovery: Fighting for the Future 2023
Discover the Stories Behind the Research
RAND Review , our flagship magazine, covers the big issues with an eye for the important details. Check out the latest feature stories, Q&As with RAND experts, and fun facts from our history on the RAND Review Blog.
- Share on Facebook
- Share on Twitter
- Share on LinkedIn
When you choose to publish with PLOS, your research makes an impact. Make your work accessible to all, without restrictions, and accelerate scientific discovery with options like preprints and published peer review that make your work more Open.
- PLOS Biology
- PLOS Climate
- PLOS Computational Biology
- PLOS Digital Health
- PLOS Genetics
- PLOS Global Public Health
- PLOS Medicine
- PLOS Neglected Tropical Diseases
- PLOS Pathogens
- PLOS Sustainability and Transformation
- PLOS Collections
Understanding the Publishing Process
What’s happening with my paper? The publication process explained
The path to publication can be unsettling when you’re unsure what’s happening with your paper. Learn about staple journal workflows to see the detailed steps required for ensuring a rigorous and ethical publication.
Your team has prepared the paper, written a cover letter and completed the submission form. From here, it can sometimes feel like a waiting game while the journal has your paper. It can be unclear exactly who is currently handling your paper as most individuals are only involved in a few steps of the overall process. Journals are responsible for overseeing the peer review, publication and archival process: editors, reviewers, technical editors, production staff and other internal staff all have their roles in ensuring submissions meet rigorous scientific and ethical reporting standards.
Read on for an inside look at how a conventional peer-reviewed journal helps authors transform their initial submission to a certified publication.
Note that the description below is based on the process at PLOS journals. It is likely that at other journals, various roles (e.g. technical editor) may in fact also be played by the editor, and some journals may not have journal staff at all, with all roles played by volunteer academics. As such, please consider the processes and waypoints, rather than who performs them, as the key information.
Internal Checks on New Submissions
Estimated time: 10 days.
When a journal first receives your submission, there are typically two separate checks to confirm that the paper is appropriate and ready for peer review:
- Technical check. Performed by a technical editor to ensure that the submission has been properly completed and is ready for further assessment. Blurry figures, missing ethical statements, and incomplete author affiliations are common issues that are addressed at this initial stage. Typically, there are three technical checks: upon initial submission, alongside the first decision letter, and upon acceptance.
- Editorial screening . Once a paper passes the first check, an editor with subject expertise assesses the paper and determines whether it is within the journal’s scope and if it could potentially meet the required publication criteria. While there may be requests for further information and minor edits from the author as needed, the paper will either be desk rejected by the editor or allowed to proceed to peer review.
Both editors at this point will additionally make notes for items to be followed-up on at later stages. The publication process involves finding a careful balance for when each check occurs. Early checks need to be thorough so that editors with relevant expertise can focus on the scientific content and more advanced reporting standards, but no one wants to be asked to reformat references only to have their paper desk rejected a few days later.
Estimated time: 1 month.
Depending on the journal’s editorial structure, the editor who performed the initial assessment may also oversee peer review or another editor with more specific expertise may be assigned. Regardless of the journal’s specific process, the various roles and responsibilities during peer review include:
When you have questions or are unsure who your manuscripts is currently with, reach out to the journal staff for help (eg. [email protected]). They will be your lifeline, connecting you to all the other contributors working to assess the manuscript.
Whether an editor needs a reminder that all reviews are complete or a reviewer has asked for an extension, the journal acts as a central hub of communication for those involved with the publication process. As editors and reviewers are used to hearing from journal staff about their duties, any messages you send to the journal can be forwarded to them with proper context and instructions on how to proceed appropriately. Additionally, journal staff will be able to inform you of any delays, such as reviewer availability during summer and holiday periods.
Estimated time: 1 day.
Editors evaluate peer reviewer feedback and their own expert assessment of the manuscript to reach a decision. After your editor submits a decision on your manuscript, the journal may review it before formally processing the decision and sending it on to you.
A technical editor may scan the manuscript and the review comments to ensure that journal standards have been followed. At this stage, the technical editor will also add requests to ensure the paper, if published, will adhere to journal requirements for data sharing, copyright, ethical reporting and the like.
Performing the second technical check at this stage and adding the journal requirements to the decision letter ultimately saves time by allowing authors to resolve the journal’s queries while making revisions based on comments from the reviewers.
Revised Submission Received
Estimated time: 3 days.
Upon receiving your revised submission, a technical editor will assess the revisions to confirm that the requests from the journal have been properly addressed. Before the paper is returned to the editor for their consideration, the journal needs to be confident that the paper won’t have any issues related to the metadata and reporting standards that could prevent publication. The editor may contact you to resolve any serious issues, though minor items can wait until the paper is accepted.
Subsequent Peer Review
Estimated time: 2 weeks, highly variable.
When your resubmitted paper has passed the required checks, it’ll be assigned back to the same editor who handled it during the first round of peer review. At this point, your paper has gone through two sets of journal checks and one round of peer review. If all has gone well so far, the paper should feel quite solid both in terms of scientific content and proper reporting standards.
When the editor receives your revised paper, they are asked to check if all reviewer comments have been adequately addressed and if the paper now adheres to the journal’s publication criteria. Depending on the situation, some editors may feel confident making this decision based on their own expertise while others may re-invite the previous reviewers for their opinions.
Individual responsibilities are the same as the initial round of peer review, but it is generally expected that later stages of peer review proceed quicker unless new concerns have been introduced as part of the revision.
Estimated time: 1 week.
Your editor is satisfied with the scientific quality of your work and has chosen to accept it in principle. Before it can proceed to production and typesetting, the journal office will perform it’s third and final technical check, requesting any formatting changes or additional details that may be required.
When fulfilling these final journal requests, double check the final files to confirm all information is correct. If you need to make changes beyond those specifically required in the decision letter, inform the journal and explain why you made the unrequested changes. Any change that could affect the scientific meaning of the work will need to be approved by the handling editor. While including your rationale for the changes will help avoid delays, if there are extensive changes made at this point the paper may need to go through another round of formal review.
Formal Acceptance and Publication
Estimated time: 2 weeks.
After a technical editor has confirmed that all requests from the provisional acceptance letter have been addressed, you will receive your formal acceptance letter. This letter indicates that your paper is being passed from the Editorial department to the production department—that all information has been editorially approved. The scientific content has been approved through peer review, and the journal’s publication requirements have been met.
Congratulations to you and your co-authors! Your article will be available as soon as the journal transforms the submission into a typeset, consistently structured scientific manuscript, ready to be read and cited by your peers.
There is no excerpt because this is a protected post.
There’s a lot to consider when deciding where to submit your work. Learn how to choose a journal that will help your study reach its audience, while reflecting your values as a researcher…
What Is a Research Paper?
- An Introduction to Punctuation
Olivia Valdes was the Associate Editorial Director for ThoughtCo. She worked with Dotdash Meredith from 2017 to 2021.
- B.A., American Studies, Yale University
A research paper is a common form of academic writing . Research papers require students and academics to locate information about a topic (that is, to conduct research ), take a stand on that topic, and provide support (or evidence) for that position in an organized report.
The term research paper may also refer to a scholarly article that contains the results of original research or an evaluation of research conducted by others. Most scholarly articles must undergo a process of peer review before they can be accepted for publication in an academic journal.
Define Your Research Question
The first step in writing a research paper is defining your research question . Has your instructor assigned a specific topic? If so, great—you've got this step covered. If not, review the guidelines of the assignment. Your instructor has likely provided several general subjects for your consideration. Your research paper should focus on a specific angle on one of these subjects. Spend some time mulling over your options before deciding which one you'd like to explore more deeply.
Try to choose a research question that interests you. The research process is time-consuming, and you'll be significantly more motivated if you have a genuine desire to learn more about the topic. You should also consider whether you have access to all of the resources necessary to conduct thorough research on your topic, such as primary and secondary sources .
Create a Research Strategy
Approach the research process systematically by creating a research strategy. First, review your library's website. What resources are available? Where will you find them? Do any resources require a special process to gain access? Start gathering those resources—especially those that may be difficult to access—as soon as possible.
Second, make an appointment with a reference librarian . A reference librarian is nothing short of a research superhero. He or she will listen to your research question, offer suggestions for how to focus your research, and direct you toward valuable sources that directly relate to your topic.
Now that you've gathered a wide array of sources, it's time to evaluate them. First, consider the reliability of the information. Where is the information coming from? What is the origin of the source? Second, assess the relevance of the information. How does this information relate to your research question? Does it support, refute, or add context to your position? How does it relate to the other sources you'll be using in your paper? Once you have determined that your sources are both reliable and relevant, you can proceed confidently to the writing phase.
Why Write Research Papers?
The research process is one of the most taxing academic tasks you'll be asked to complete. Luckily, the value of writing a research paper goes beyond that A+ you hope to receive. Here are just some of the benefits of research papers.
- Learning Scholarly Conventions: Writing a research paper is a crash course in the stylistic conventions of scholarly writing. During the research and writing process, you'll learn how to document your research, cite sources appropriately, format an academic paper, maintain an academic tone, and more.
- Organizing Information: In a way, research is nothing more than a massive organizational project. The information available to you is near-infinite, and it's your job to review that information, narrow it down, categorize it, and present it in a clear, relevant format. This process requires attention to detail and major brainpower.
- Managing Time: Research papers put your time management skills to the test. Every step of the research and writing process takes time, and it's up to you to set aside the time you'll need to complete each step of the task. Maximize your efficiency by creating a research schedule and inserting blocks of "research time" into your calendar as soon as you receive the assignment.
- Exploring Your Chosen Subject: We couldn't forget the best part of research papers—learning about something that truly excites you. No matter what topic you choose, you're bound to come away from the research process with new ideas and countless nuggets of fascinating information.
The best research papers are the result of genuine interest and a thorough research process. With these ideas in mind, go forth and research. Welcome to the scholarly conversation!
- How to Narrow the Research Topic for Your Paper
- 10 Places to Research Your Paper
- What Is a Senior Thesis?
- How to Write a Research Paper That Earns an A
- Documentation in Reports and Research Papers
- What Is a Bibliography?
- 5 Steps to Writing a Position Paper
- How to Organize Research Notes
- How to Develop a Research Paper Timeline
- An Introduction to Academic Writing
- Writing an Annotated Bibliography for a Paper
- Abstract Writing for Sociology
- Writing a Paper about an Environmental Issue
- Finding Trustworthy Sources
- Topic In Composition and Speech
- How to Write a 10-Page Research Paper
By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts.
- Submit Manuscript
Call For Papers |
The process of publishing a research paper in a journal.
The publication of a research paper in a journal is a long and painstaking process. It involves many stages that need to be completed at the author’s end before submission to a journal . After submission, there are further steps at the publisher’s end over which the author has no control. In order to get a successful publication in good time, it is important for an author to understand the various steps involved in the process.
It all starts with the draft manuscript. A properly edited research paper , with proper references along with a good title, a short but precise abstract, and a detailed cover letter is the first step.
Any research paper submission for publication in a journal goes through an editorial screening to start with. The authors must ensure their research paper matches the focus area and objectives of the selected journal so that it is not rejected at the first stage. The best way to go about it is to follow the journal ’s instructions with precision and consistency. Research paper s that clear editorial screening are then forwarded for peer review .
Peer review is often a time-consuming process. Two or more reviewers are usually chosen of which one might be picked from experts the authors suggested as potential reviewers in their initial submission. Those engaging in the peer-review process are professionals from their fields of expertise who have other engagements and hence they often take time to revert back. Reviewers recommend immediate acceptance without changes or immediate rejection without reconsideration, although reconsideration after minor/ major changes is the common response.
The final decision on any research paper is taken by the editor, who reverts back to the author with comments from the editorial team or peer review . The author has to respond to the editor with a revised manuscript along with a detailed letter that explains exactly what changes were made and a compelling academic or scientific reason why certain suggestions were not accommodated.
Depending on the gravity of changes involved, the editor may decide to take a call by themselves or re-share the research paper for the second round of peer review . These processes, even though they delay the publication process, only help improve the quality of the publication and hence are very important.
When the paper is finally accepted by the editor, it goes into production for final checking and reformatting to fit the journal ’s conventions and styles. The journal may revert to the author for a final proofread of the final manuscript they design for publication .
in case of a rejection, the journal will convey why the research paper was rejected. The author can take note and either rewrite the research paper to fit the journal or share it with some other journal for consideration.
Clarity over the publication process by a journal is important for authors, and they should prepare accordingly to ensure a smooth publication process.
Editorial decision-making: guide to the editorial process, the impact of duplicate submission: protecting your academic and professional reputation, research work problems and solutions: overcoming challenges for success, leave a reply cancel reply.
Your email address will not be published. Required fields are marked *
Save my name, email, and website in this browser for the next time I comment.
2,900-Year-Old Brick Reveals DNA Time Capsule
Exploring the spectrum: unveiling different types of scientific articles, hpv vaccine protect yourself from cervical cancer and other hpv-related diseases, india makes history with chandrayaan-3 with successful lunar landing at moon’s south pole.
Have a language expert improve your writing
Run a free plagiarism check in 10 minutes, generate accurate citations for free.
- Knowledge Base
- Research paper
- Research Paper Format | APA, MLA, & Chicago Templates
Research Paper Format | APA, MLA, & Chicago Templates
Published on November 19, 2022 by Jack Caulfield . Revised on January 20, 2023.
The formatting of a research paper is different depending on which style guide you’re following. In addition to citations , APA, MLA, and Chicago provide format guidelines for things like font choices, page layout, format of headings and the format of the reference page.
Scribbr offers free Microsoft Word templates for the most common formats. Simply download and get started on your paper.
APA | MLA | Chicago author-date | Chicago notes & bibliography
- Generate an automatic table of contents
- Generate a list of tables and figures
- Ensure consistent paragraph formatting
- Insert page numbering
Table of contents
Formatting an apa paper, formatting an mla paper, formatting a chicago paper, frequently asked questions about research paper formatting.
The main guidelines for formatting a paper in APA Style are as follows:
- Use a standard font like 12 pt Times New Roman or 11 pt Arial.
- Set 1 inch page margins.
- Apply double line spacing.
- If submitting for publication, insert a APA running head on every page.
- Indent every new paragraph ½ inch.
Watch the video below for a quick guide to setting up the format in Google Docs.
The image below shows how to format an APA Style title page for a student paper.
If you are submitting a paper for publication, APA requires you to include a running head on each page. The image below shows you how this should be formatted.
For student papers, no running head is required unless you have been instructed to include one.
APA provides guidelines for formatting up to five levels of heading within your paper. Level 1 headings are the most general, level 5 the most specific.
APA Style citation requires (author-date) APA in-text citations throughout the text and an APA Style reference page at the end. The image below shows how the reference page should be formatted.
Note that the format of reference entries is different depending on the source type. You can easily create your citations and reference list using the free APA Citation Generator.
Generate APA citations for free
Here's why students love Scribbr's proofreading services
Discover proofreading & editing
The main guidelines for writing an MLA style paper are as follows:
- Use an easily readable font like 12 pt Times New Roman.
- Use title case capitalization for headings .
Check out the video below to see how to set up the format in Google Docs.
On the first page of an MLA paper, a heading appears above your title, featuring some key information:
- Your full name
- Your instructor’s or supervisor’s name
- The course name or number
- The due date of the assignment
A header appears at the top of each page in your paper, including your surname and the page number.
Works Cited page
MLA in-text citations appear wherever you refer to a source in your text. The MLA Works Cited page appears at the end of your text, listing all the sources used. It is formatted as shown below.
You can easily create your MLA citations and save your Works Cited list with the free MLA Citation Generator.
Generate MLA citations for free
The main guidelines for writing a paper in Chicago style (also known as Turabian style) are:
- Use a standard font like 12 pt Times New Roman.
- Use 1 inch margins or larger.
- Place page numbers in the top right or bottom center.
Chicago doesn’t require a title page , but if you want to include one, Turabian (based on Chicago) presents some guidelines. Lay out the title page as shown below.
Bibliography or reference list
Chicago offers two citation styles : author-date citations plus a reference list, or footnote citations plus a bibliography. Choose one style or the other and use it consistently.
The reference list or bibliography appears at the end of the paper. Both styles present this page similarly in terms of formatting, as shown below.
To format a paper in APA Style , follow these guidelines:
- Use a standard font like 12 pt Times New Roman or 11 pt Arial
- Set 1 inch page margins
- Apply double line spacing
- Include a title page
- If submitting for publication, insert a running head on every page
- Indent every new paragraph ½ inch
- Apply APA heading styles
- Cite your sources with APA in-text citations
- List all sources cited on a reference page at the end
The main guidelines for formatting a paper in MLA style are as follows:
- Use an easily readable font like 12 pt Times New Roman
- Include a four-line MLA heading on the first page
- Center the paper’s title
- Use title case capitalization for headings
- Cite your sources with MLA in-text citations
- List all sources cited on a Works Cited page at the end
The main guidelines for formatting a paper in Chicago style are to:
- Use a standard font like 12 pt Times New Roman
- Use 1 inch margins or larger
- Place page numbers in the top right or bottom center
- Cite your sources with author-date citations or Chicago footnotes
- Include a bibliography or reference list
To automatically generate accurate Chicago references, you can use Scribbr’s free Chicago reference generator .
Cite this Scribbr article
If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.
Caulfield, J. (2023, January 20). Research Paper Format | APA, MLA, & Chicago Templates. Scribbr. Retrieved August 28, 2023, from https://www.scribbr.com/research-paper/research-paper-format/
Is this article helpful?
Other students also liked, apa format for academic papers and essays, mla format for academic papers and essays, chicago style format for papers | requirements & examples.
Jack Caulfield (Scribbr Team)
Thanks for reading! Hope you found this article helpful. If anything is still unclear, or if you didn’t find what you were looking for here, leave a comment and we’ll see if we can help.
Still have questions?
What is your plagiarism score.
- QUICK LINKS
- How to enroll
- Career services
How to Identify the Right Journal for Publishing Your Research
By Dr. Mansureh Kebritchi
Publishing a research project in an academic periodical can be a very challenging task for many researchers. With ever-increasing academic periodicals, selecting a right-fit journal has become a daunting task for both novice and seasoned authors. The authors may wonder what types of periodicals are available for publication, how to find the periodicals, how to evaluate the credibility of the periodicals, and how to select the right fit periodical for publishing their studies. Explanations to answer these inquiries are provided below to further support you as a potential author to publish your study.
Types of Periodicals
There are various types of periodicals available in literature as shown in the above figure. Periodicals are published frequently with a fixed interval between the issues and may include magazines, newspapers, and scholarly journals. Scholarly journals can be categorized into non-peer-reviewed, and peer-reviewed journals. Peer-reviewed or refereed journals are journals that publish articles that were reviewed and approved by at least two reviewers who are experts in the field.
Peer-reviewed journals are the best place for publishing scholarly manuscripts. Peer-reviewed journals are categorized into two groups: closed-access and open-access journals. Articles published in closed-access journals are available only to readers who are subscribed to the journals, while articles published in open-access journals are open to the public. With the latter, the publishing is often paid for by the authors.
It is important to note that open-access journals can sometimes be considered predatory or unacceptable journals in academia. If you elect to publish in an open-access journal, take the needed precautions to verify that the journal is reputable and that a peer review process is in place. Use the criteria provided in the next section to evaluate the credibility of open-access journals.
After deciding about the type of periodicals to publish, you may wonder about where to find them. You may use several websites and directories to find journals in your field and identify whether they are peer-reviewed, and if they are closed or open access. The following directories provide you with the features of journals including peer-reviewed, access type, and acceptance rate. To access the directories, log into the UOPX eCampus, then click on Library Tab, University Library, and Databases A-Z.
· Cabell’s Scholarly Analytics
· Ulrich’s Periodicals Directory
The following websites provide you with a list of appropriate journals in your field.
· Eric Journals provides a list of credible journals
· Journal Guide gives the option of searching journals based on your keywords and abstract
· Edanz Journal Selector helps you search for journals
· Journal selectors from a few publishers:
Elsevier Journal Finder
Taylor & Francis Journal selector
Sage Journal recommender
Willey Journal Finder
Criteria for Evaluating the Credibility of Journals
It is essential to publish your study in an acceptable journal to impact your field and gain recognition and a voice in the community of scholars and practitioners. To evaluate the credibility of a journal the following criteria may be checked in the journal website.
Please note that a combination of all these criteria should be used to properly evaluate a journal.
Peer-reviewed Procedure. The journal should clearly explain its peer-reviewed procedures. A thorough peer-review procedure is one of the most essential factors affecting the credibility of the journals. The peer-reviewed process verifies that the submitted manuscript is rigorous, has a sound method and results, builds on past studies, and contributes to the body of knowledge in the field. Peer-reviewed procedure is a time-consuming process conducted by volunteers who are experts in the field. A very quick turnaround time may indicate a partial peer-reviewed procedure.
Reputation and Ranking. Examining a journal's ranking and reputation is one of the ways to evaluate the journal. Various metrics may be used to rank the journals. The higher score is deemed to present a higher ranking. Some of the most popular journal metrics are:
· Impact Factor is formed based on the average number of citations of the journal articles as indexed in the Journal Citation Report (JCR).
· Scimago Journal Ranking (SJR) is a measure of the scientific influence of a journal that is calculated based on a number of citations indexed in the Scopus database; the score is weighted meaning that citations from more prestigious journals have a higher weight.
Indexed. The journal should be indexed in credible databases such as ERIC, ProQuest, EBSCO, etc. One of the main reasons for publication is sharing your study with a larger audience. Journals that are indexed in credible databases provide more audience to review your article.
Editorial Board Members. The journal should list its editorial board members affiliated with known universities and academic institutions.
Previous Authors. The journal's previous authors should be affiliated with various academic institutions.
Charges and Fees. Credible journals usually would not charge authors for publication. However, charging a fee by itself is not a factor indicating the unacceptability of a journal. Other criteria provided in this list should be considered to evaluate the credibility of the journals. Note that recently some credible publishers may charge a fee for the option of making an article published in a closed-access journal to be available as an open-access article.
Solicitations. Be aware of journal solicitations. Some unknown/predatory journals may send solicitations often via email. Note that some credible journals may also send paper invitations via email. Other provided criteria should be used to evaluate the credibility of the journal.
Predatory or unacceptable Journals. These are journals without adequate credibility and should be avoided. The unacceptable journals often do not peer review the submitted manuscripts and may not pass the above evaluation criteria. All Journals listed in the Directory of Open Access Journals (DOAJ) or Cabell’s Journalytics directory are acceptable. To Identify predatory journals, you may use the following sites:
· Think. Check. Submit , a range of tools to identify predatory journals.
· How to Spot a Predatory Journal (checklist) , a checklist to identify predatory journals.
· Predatory reports , a directory of Predatory Journals.
Criteria for Selecting an Appropriate Journal for Your Manuscript
After evaluating the credibility of journals, you may further examine the selected credible journals to identify whether they fit your manuscript. You may use the following criteria to identify the appropriate journals for your manuscript publication.
Scope, Objectives, and Method. Check the objectives of the journal and ensure your manuscript and journal objectives are aligned. This is one of the most important factors in selecting a right-fit journal for your manuscript. Your target journal might be credible and meet all the criteria, yet it may not be a right fit for your manuscript if its scope, aims, and objectives do not match your manuscript objectives. Additionally, ensure that your target journal is interested in your research method. Some journals are interested in a particular research method while others may publish all types of research methods as long as the focus and objective of the studies match with their objectives and aims. If you try to publish a literature review, you should pay closer attention and verify whether the journal is interested in publishing literature reviews.
Issues per Year. A higher number of annual issues increases the chance of acceptance. If you plan to publish in a specific timeframe, you may select a journal that publishes issues within your timeframe.
Acceptance Rate. A higher acceptance rate increases the possibility of being accepted.
Turnaround Time. Some journals have a long turnaround time. Be sure to check the turnaround time as you may submit your manuscript to only one journal at a time.
Author’s Copyrights. Check the author’s copyrights in your target journals. The article copyrights which include the rights for distribution and reproduction of the article are usually transferred to closed-access journals while open-access journals may have different policies.
Mansureh Kebritchi, Ph.D.
ABOUT THE AUTHOR
Mansureh Kebritchi, Ph.D., is an accomplished educational researcher with over 4,470 peer-reviewed citations as of August 2023. Dr. Kebritchi is the founder and chair of the Center for Educational and Instructional Technology Research at the College of Doctoral Studies, University of Phoenix. In addition, she is the founder and leader of Dissertation to Publication and Research to Publication Workshops, Research Methodology Group , and Alumni Research and Support Group . She has a deep passion for educational research and extensive experience as a research methodologist and instructional designer. Dr. Kebritchi is dedicated to mentoring doctoral students and supporting faculty members in their pursuit of conducting and publishing research in the field of education.
More from Mansureh Kebritchi
How to Identify an Appropriate Research Problem
Something went wrong. Wait a moment and try again.
- [email protected]
- Create Account
- Join to Connect:
How to publish a Research Paper
- Apply to UW
- Programs & Majors
- Cost & Financial Aid
- Current Students
- UW Libraries
- UW College of Law
- Online Degrees
- Catalogs & Courses
- Degree Plans
- Honors College
- Academic Affairs
- Geological Museum
- All Colleges
- Campus Recreation
- Campus Maps
- Housing & Dining
- Transit & Parking
- University Store
- Student Organizations
- Campus Activities
- Campus Safety
- Diversity, Equity & Inclusion
- Research & Economic Dev
- Wyoming INBRE
- Neuroscience Center
- Technology Business Center
- National Parks Service
- Research Production Center
- Water Research
- WY EPSCoR/IDeA
- Amer Heritage Center
- Where We Shine
- About Laramie
- Marketing & Brand Center
- Campus Fact Book
- UWYO Magazine
- Administrative Resources
- Strategic Plan
- +Application Login
- UW Homepage
UW Faculty Members Encouraged to Submit Papers for Release in Advance
- News Releases
Institutional Communications Bureau of Mines Building, Room 137 Laramie, WY 82071 Phone: (307) 766-2929 Email: [email protected]
Published August 28, 2023
University of Wyoming faculty members who have upcoming research papers that will be published in the following journals -- Nature, Science or the Proceedings of the National Academy of Sciences -- are encouraged to let UW Institutional Communications know well in advance.
UW Institutional Communications is primarily interested in writing media releases on research published in these journals but will publicize papers appearing in other peer-reviewed scientific journals as well.
For scheduling purposes, UW faculty members are encouraged to submit their papers as soon as they know they will be published. This is typically three to four weeks in advance. It is easier to work on a news release and obtain photos weeks in advance than learning only a few days before the paper is to be published. And UW Institutional Communications honors all publishing embargoes, so there is no need to worry about sending papers well in advance of the publication dates, says Ron Podell, a communications specialist in UW Institutional Communications.
To submit a paper, call Podell at (307) 460-2764 or email [email protected] .
This paper is in the following e-collection/theme issue:
Published on 28.8.2023 in Vol 25 (2023)
Initiatives, Concepts, and Implementation Practices of the Findable, Accessible, Interoperable, and Reusable Data Principles in Health Data Stewardship: Scoping Review
Authors of this article:
- Esther Thea Inau 1 , MSc ;
- Jean Sack 2 , MLS ;
- Dagmar Waltemath 1 , PhD ;
- Atinkut Alamirrew Zeleke 1 , PhD
1 Department of Medical Informatics, Institute for Community Medicine, University Medicine Greifswald, Greifswald, Germany
2 International Health Department, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
Esther Thea Inau, MSc
Department of Medical Informatics
Institute for Community Medicine
University Medicine Greifswald
Phone: 49 3834867548
Email: [email protected]
Background: Thorough data stewardship is a key enabler of comprehensive health research. Processes such as data collection, storage, access, sharing, and analytics require researchers to follow elaborate data management strategies properly and consistently. Studies have shown that findable, accessible, interoperable, and reusable (FAIR) data leads to improved data sharing in different scientific domains.
Objective: This scoping review identifies and discusses concepts, approaches, implementation experiences, and lessons learned in FAIR initiatives in health research data.
Methods: The Arksey and O’Malley stage-based methodological framework for scoping reviews was applied. PubMed, Web of Science, and Google Scholar were searched to access relevant publications. Articles written in English, published between 2014 and 2020, and addressing FAIR concepts or practices in the health domain were included. The 3 data sources were deduplicated using a reference management software. In total, 2 independent authors reviewed the eligibility of each article based on defined inclusion and exclusion criteria. A charting tool was used to extract information from the full-text papers. The results were reported using the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines.
Results: A total of 2.18% (34/1561) of the screened articles were included in the final review. The authors reported FAIRification approaches, which include interpolation, inclusion of comprehensive data dictionaries, repository design, semantic interoperability, ontologies, data quality, linked data, and requirement gathering for FAIRification tools. Challenges and mitigation strategies associated with FAIRification, such as high setup costs, data politics, technical and administrative issues, privacy concerns, and difficulties encountered in sharing health data despite its sensitive nature were also reported. We found various workflows, tools, and infrastructures designed by different groups worldwide to facilitate the FAIRification of health research data. We also uncovered a wide range of problems and questions that researchers are trying to address by using the different workflows, tools, and infrastructures. Although the concept of FAIR data stewardship in the health research domain is relatively new, almost all continents have been reached by at least one network trying to achieve health data FAIRness. Documented outcomes of FAIRification efforts include peer-reviewed publications, improved data sharing, facilitated data reuse, return on investment, and new treatments. Successful FAIRification of data has informed the management and prognosis of various diseases such as cancer, cardiovascular diseases, and neurological diseases. Efforts to FAIRify data on a wider variety of diseases have been ongoing since the COVID-19 pandemic.
Conclusions: This work summarises projects, tools, and workflows for the FAIRification of health research data. The comprehensive review shows that implementing the FAIR concept in health data stewardship carries the promise of improved research data management and transparency in the era of big data and open research publishing.
International Registered Report Identifier (IRRID): RR2-10.2196/22505
The vast amount of data obtained from research would benefit the larger scientific community more if it were easily findable, accessible, interoperable, and reusable (FAIR) [ 1 ]. However, most research data are still maintained in individualized silos across the health care continuum instead of being managed in interoperable and integrated knowledge bases [ 2 ]. The COVID-19 global crisis partially improved data sharing to support the secondary and integral use of the available data across the globe and disciplines [ 3 , 4 ]. Secondary data reuse has been shown to be a key enabler for more extensive and valuable research dimensions, especially in situations where the data are scarce, sparse, heterogeneous, and sensitive with regard to privacy [ 5 , 6 ].
In this study, we conducted a scoping review to analyze the approaches used to FAIRify health research data, the various software used, the challenges faced and the mitigation strategies used to navigate these challenges, and the networks actively involved. The results of this work will provide valuable insights for stakeholders to reference when seeking to influence organizational practices that promote FAIR practices, FAIRify health research data, develop related software, seek collaborations with networks actively involved in the implementation of FAIR data principles, or evaluate the FAIRness of a particular set of health research data.
This scoping review adopted the framework outlined by Arksey and O’Malley [ 7 ]. It includes the following steps: (1) identifying the research question; (2) identifying relevant studies; (3) study selection; (4) charting the collected data; and (5) collating, summarizing, and reporting the results.
Stage 1: Identifying the Research Questions
Our pilot literature exploration included published works in PubMed, Google Scholar, and Web of Science. We used FAIR data principle keywords to match Medical Subject Headings (MeSH) used to tag PubMed peer-reviewed literature, along with combinations of terms used in clinical research, public health, health care, pharmacology, and patient data. The bibliographies of key papers were scrutinized for other complementary publications, and recurrent alerts for this exploration were set up on the 3 databases.
Our informal desk review showed that there is indeed a growing interest in following the phases of the research life cycle [ 8 , 9 ]. These findings motivated us to better understand the approaches used in the implementation of the FAIR data principles and their impact on the way research in health will be conducted, subsequently leading to our research questions. We decided that the review should only include studies that showed either an actual approach to implement the FAIR data principles in the health domain or the recorded results of the implementation of the FAIR data principles. The review excluded studies that introduced or provided an overview of the FAIR principles. Studies that showed the implementation of the principles in a domain other than health were also excluded.
The general objective of this work was to conduct a scoping review to identify concepts, approaches, implementation experience, and lessons learned from the FAIR data principle initiatives in the health domain. The following research questions were formulated to meet the objectives:
- What approaches are being used or piloted in the implementation of the FAIR data principles in the health data domain since the conception of these principles in 2014?
- What are the challenges and risks with regard to the approaches used in the practical implementation of the FAIR data principles in the health data domain?
- What are the suggested concepts and approaches to mitigate concerns regarding the implementation of the FAIR data principles in the health data domain?
- Which are the active public and private research and service networks involved in the implementation of the FAIR data principles in the health data domain?
- What are the reported outcomes in terms of data sharing, data reuse, and research publication after the implementation of the FAIR data principles in the health data domain?
Stage 2: Identifying Relevant Studies
With the aid of an experienced research librarian (JS), we identified relevant studies from 3 primary electronic databases: PubMed, Web of Science, and Google Scholar. The keywords for the scoping review search strategies were categorized into terms related to the FAIR data principles, data sharing, and health. Open terms were used for the construction of the search strategy for this study. The Boolean operators “AND” and “OR” were used to guide the search strategy. The following descriptors, keywords, and their combinations were used to construct the strategies: “health*,” “pharma*,” “research and development” (MeSH term), “research” (MeSH term), “biomedical research” (MeSH term), “data collection” (MeSH term), “metadata” (MeSH term), “registr*” (MeSH term), “registr*,” “Open access publishing,” “data curation,” “data preservation,” “data provenance,” “data*,” “data sharing,” “open science,” “repositor*,” “data management” (MeSH term), “FAIR data Principles” (title or abstract), “FAIR Principles,” “FAIR guiding Principles,” “Data stewardship,” and “Data management systems.” The search strategy we formulated for this purpose can be found in Multimedia Appendix 1 .
The PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) was used to report the findings [ 10 ]. The operational definition of “health” for this scoping review is based on the 2018 European Union (EU) General Data Protection Regulation (GDPR) and the health ecosystem components framed by the World Health Organization [ 11 , 12 ]. Accordingly, health data in this review is defined in the context of data from service and research practice in health services (clinical records, electronic health records and electronic medical records, prescribing, diagnostics, laboratory, health insurance, disease surveillance, immunization records, public health reporting, vital statistics, registries, clinical trials, clinical research, and public health research) [ 13 ].
As inclusion criteria, we considered literature published between January 1, 2014, and December 31, 2020. The start year of 2014 was chosen as FAIR concept initiatives and official publications first became available in that year. Moreover, to be included as a potential paper, the literature must be published in English and within the scope of FAIR principle application in the health domain (defined by the operational definition). Deduplication was performed by exporting all search results from web-based databases and gray literature sources to a reference management software. Unique search results were exported to a screening tool to facilitate an independent screening process.
Stage 3: Study Selection
The Rayyan software (Rayyan Systems, Inc) was chosen as the primary screening tool to expedite the initial screening of abstracts and titles using a semiautomated process while incorporating a high level of usability [ 14 ]. Initial screening based on the inclusion criteria can be cumbersome. Rayyan provides a platform for the collaborative screening of publication abstracts and titles [ 14 ]. According to the inclusion and exclusion criteria, nonrelevant studies were excluded from the review at this point. Where the relevance of the publication was unclear from the title or abstract, the reviewer read the full publication to determine its eligibility. Further changes to the search criteria to improve the search findings were made at this stage as necessary. The eligible publications screened in the first stage were then independently read in full by 2 researchers to further determine the relevance of the publication content to the research questions. When an agreement could not be reached during the initial and full-text screening stages, an independent researcher was consulted. This was the basis on which a PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram was then generated [ 10 ].
Stage 4: Data Charting
A pretested data charting form shown in the protocol published before this review was used by the reviewers to determine which variables to extract [ 13 ]. This form provided flexibility for iterative updates during the data charting process. The “descriptive-analytical” approach, as described by Arksey and O’Malley [ 7 ], was used in the data collection process. In this process, the researchers critically examined the identified articles and documents that met the eligibility criteria and extracted the relevant data from each publication using the pretested charting form. The data were organized into a chart with 2 main sections. In the first section (Overview) we categorized the metadata of the included publications. In the second section (Research Questions) we extracted and included data based on our predetermined objectives [ 13 ].
Stage 5: Collating, Summarizing, and Reporting the Results
This scoping review was focused on the range of data identified and curated. Quantitative assessment was limited to a count of the number of sources reporting a particular FAIR thematic issue or recommendation. After charting the relevant data from the studies on spreadsheets, the results were collated and described using summary statistics, charts, and figures. We also mapped the themes derived from the research questions (eg, FAIR implementation approaches, available FAIR networks, and FAIR infrastructural and security challenges) and other emerging themes during charting and analysis. Our results and implications for future research, practice, and policy were discussed accordingly.
No ethics approval was required for this work as only secondary data from published sources were included in the scoping review. The public was not invited to participate in any stage of this work.
The PRISMA Chart
Our first search resulted in 1561 records. The deduplication process led us to eliminate 5.25% (82/1561) of the records. We read through the titles and abstracts with the help of Rayyan software and eliminated 89.94% (1404/1561) of the records.
We then analyzed the full texts of the remaining 75 records and eliminated 41 (55%) based on relevance to the FAIR data principles in the health domain, as shown in Figure 1 .
We included the year of publication, domain focus, countries where the work was conducted, and type of method that was used to conduct research or surveillance. We considered only the first author where there were multiple authors involved in the publication. On this basis, the United States topped the list with up to 38% (13/34) of publications [ 15 - 27 ]. Both Germany [ 28 - 32 ] and the Netherlands [ 33 - 37 ] had 15% (5/34) of related publications. France had 12% (4/34) of related publications [ 38 - 41 ]. We found only 3% (1/34) of related publications from Austria [ 42 ], Belgium [ 43 ], Greece [ 44 ], Portugal [ 45 ], Turkey [ 6 ], Uganda [ 46 ], and the United Kingdom [ 47 ]. We also found that most of the work on FAIR efforts in the health research domain was conducted in 2020, as shown in Table 1 .
On the basis of the Systematized Nomenclature of Medicine–Clinical Terms (SNOMED-CT), a comprehensive, multilingual clinical health care terminology [ 48 ], we classified the research areas dealing with FAIR principles ( Table 2 ). We listed the parents for medical specialties. However, we were not able to successfully map themes related to population health [ 15 ], demography [ 25 ], and general health data research [ 6 , 30 , 36 ] as we found SNOMED-CT to be lacking in these areas. Similarly, pharmacy [ 23 ] and pharmacovigilance [ 44 ] were not listed as medical specialties. We also found that biomedical themes were not exhausted in SNOMED-CT.
Regarding the study types, all the publications we reviewed were qualitative in nature except for the studies by van Panhuis et al [ 15 ] and Looten and Simon [ 39 ], which were mixed. In addition, all the publications we reviewed described work in which the FAIR principles had already been implemented, except for the study by Mons [ 35 ], which is a conceptual work.
a SCTID: SNOMED-CT identifier.
b N/A: not applicable.
Approaches Used or Piloted in the Implementation of the FAIR Data Principles in the Health Data Domain Since the Conception of These Principles in 2014
A total of 50% (17/34) of the publications contained details on the approaches in use or under development in the implementation of the FAIR data principles.
We examined the approaches used in FAIRification and how each approach helped achieve a particular FAIR data principle. We then grouped similar approaches to better understand how the different approaches build upon each other. The publications we examined for this purpose can be found in Multimedia Appendix 2 [ 6 , 16 - 19 , 21 - 25 , 27 , 29 , 31 - 35 , 42 , 44 , 45 , 47 ].
Data Harmonization and Standardization
Approaches to data harmonization and standardization are key to facilitating data discoverability and reuse. Kassam-Adams et al [ 21 ] asserted that it is useful to distinguish between “standardization” and “harmonization” of variables drawn from multiple studies to facilitate data reuse. “Standardizing” has been defined as establishing common variable names and response values for essentially identical data points collected in different studies (eg, child age in years and values assigned to item responses within an established measure), whereas “harmonizing” has been defined as the process of deriving a new common variable from existing data that measured the same or similar constructs (eg, educational level as defined in different countries or intrusive thoughts about a traumatic event as assessed by different posttraumatic stress disorder symptom measures in 30/34, 88% of the studies from 5 countries) [ 21 ].
The American Heart Association Precision Medicine Platform aims to facilitate data findability through a transparent cloud-based platform with explicit harmonization approaches: identifying common parameters across all data sets and allowing forum users to interactively find or merge data of interest [ 19 ].
Both Lacey et al [ 24 ] and Kassam-Adams et al [ 21 ] recognized data dictionaries as a critical step toward achieving data FAIRness in trauma studies and the California Teachers Study. A data dictionary is a centralized repository of data. It describes the content, format, and structure of a data set and conveys meaning, relationships to other data, origin, and use [ 49 ].
A case study on an emergency department catalog was conducted to FAIRify the emergency department data sets to improve data searchability. Interestingly, most data sets did not meet the requirements of this case study as they were not accompanied by a publicly available data dictionary [ 27 ]. The standardization of metadata has also been discussed by Kugler and Fitch [ 25 ] and Caufield et al [ 16 ]. We observed that standards for data publication need to be upgraded to require that a data dictionary accompanies every published data set [ 27 ].
Unique Identifiers for Data Objects
Navale et al [ 17 ] have discussed the role of digital object identifiers and globally unique identifiers in data findability and in patient deidentification in research studies. The role of unique identifiers in facilitating record linkage in Integrated Public Use Microdata Series (IPUMS) population surveys has also been discussed in the study by Kugler and Fitch [ 25 ], in which the unique identifiers link the same individual as they appear across multiple demographic studies. This has enabled researchers to study life courses [ 25 ].
Some tools have been developed based on the requirement analysis by multi-stakeholder efforts, such as the SCALEUS FAIR Data (SCALEUS-FD) tool [ 45 ]. SCALEUS-FD produced a specification for the description of data sets that meets key functional requirements, uses existing vocabularies, and is expressed using the Resource Description Framework (RDF). Similar work has also been discussed by Bhatia et al [ 27 ] and Dumontier et al [ 18 ], who presented the design of an open technological solution built upon the FAIRification process proposed by GO FAIR. Closing the gaps in this process for health data sets provides the health research community with a common, standards-based, legally compliant FAIRification workflow for health data management [ 18 ]. The actual implementation of the proposed architecture was initiated as an open-source activity, developing a set of software tools addressing different steps of the FAIRification workflow. GO FAIR also used the results from focus group discussions to gather requirements and a literature review of the GDPR and national legislations to architecturally design an open technological solution built upon the FAIRification process for multinational health data sets [ 6 ]. Finally, Suhr et al [ 29 ] described functional and quality requirements based on many years of experience implementing web portal data management for biomedical collaborative research centers.
Data Linkage and Semantic Web
Data linkage allows for the identification of the same individuals as they appear across multiple study cohorts even after pseudonymization [ 25 ]. Linked data refer to an ecosystem of technologies, recommendations, and standards that aim at the interconnection of heterogeneous data in 1 unified processing realm [ 50 ]. Many linked data (semantic web) standards and recommendations are based on RDF, which uses Uniform Resource Identifiers (URIs) to unambiguously identify data items. URIs make data uniquely identifiable and, thus, findable and accessible through the internet [ 51 ].
Natsiavas et al [ 44 ] listed RDF, RDF Schema, and the Web Ontology Language as the main languages used to define knowledge in the semantic web paradigm in pharmacovigilance work on OpenPVSignal. This enables both syntactic and semantic interoperability (SI) by defining the rules for communicating data, the semantic structures to represent knowledge, and the interlinking of data with third-party data sets or ontologies [ 44 ]. Schaaf et al [ 32 ] highlighted that SI is not considered in FAIR and showed that integrating metadata repositories into clinical registries to define data elements in the system is an important step toward unified documentation across multiple registries and overall interoperability.
We identified various studies (6/34, 18%) that discussed semantics and ontologies in the context of FAIR [ 26 , 32 , 33 , 35 , 47 ]. Mons [ 35 ] described the need for rich FAIR metadata to enable controlled, computational access for analysis or visualization as well as expert data annotation in the wake of the global COVID-19 crisis. European life-science infrastructure for biological information (ELIXIR) aims to ensure that FAIRified COVID-19 data are well annotated and accessible for reuse by the scientific research community, as well as providing a registry to collect COVID-19–related workflows [ 47 ]. Semantic modeling in pharmacovigilance was also described by Celebi et al [ 33 ] as a necessary activity comprising semantic harmonization and integration, requiring the reuse or creation of models compliant with the FAIR principles and requirements gathered.
Data Pseudonymization and Anonymization
Data pseudonymization and anonymization are steps taken to FAIRify data that are unique to the health research domain because of the sensitive nature of the data. Tools have been developed for this purpose [ 21 , 30 ]. We also found studies that highlighted the need for data owners to provide a description of the data pseudonymization method, especially where different methods are used to pseudonymize data [ 31 ]. The method of pseudonymization and anonymization is dependent on the purposes for which the collected data are intended. One-way encryption of identifiers, fuzzing, generalization, and longitudinal consistency are among the pseudonymization techniques that may be used. The Health Insurance Portability and Accountability Act deidentification standard of protected health information provides for the data types that should be erased from a health data set to minimize the risk of reidentification of data subjects [ 6 ]. None of these measures can ensure that the risk of reidentification is zero, and as pseudonymization and anonymization tools continue to develop, so do new technologies that facilitate brute-force attacks [ 6 , 52 ].
Use Case–Based Approach
It is also interesting to see the development of a template that includes instructions for writing FAIR-compliant systematic reviews of rare disease treatments. Doing so enables the assembly of a Treatabolome database that complements existing diagnostic and management support tools with treatment awareness data [ 38 ]. We found studies that discussed various aspects of FAIR repository design [ 22 - 25 , 36 , 42 ].
Efforts have been made to FAIRify a wide range of data that inform the management and prognosis of various diseases such as Huntington disease, cancer, posttraumatic stress disorder, and cardiovascular diseases. Although efforts to FAIRify data on a wider variety of diseases are ongoing, researchers have highlighted the need to FAIRify the scarce data on rare diseases and find and reuse the already existing data to benefit researchers, pharmaceutical companies, health care practitioners, and patients [ 32 ].
IT Infrastructure, Workflows, and Tools for FAIRification
We identified 59% (20/34) of publications that described IT infrastructure, workflows, and tools for FAIRification.
Various workflows, tools, and infrastructures have been designed by groups worldwide to facilitate the FAIRification of health research data, as shown in Multimedia Appendix 3 [ 6 , 15 - 17 , 19 , 21 , 24 , 25 , 28 , 29 , 32 , 33 , 36 , 37 , 40 , 43 - 47 ]. However, the steps involved in the workflows, tools, and infrastructures vary. We also uncovered a wide range of problems and questions that researchers are trying to address by FAIRifying their data. We examined 4 workflow design purposes:
- To provide the health research community with a common, standards-based, legally compliant FAIRification workflow for health data management. The actual 10-step implementation of the proposed architecture has been initiated as an open-source activity for developing a set of software tools addressing different steps of the FAIRification workflow [ 6 ].
- To describe the 4-step FAIRification of a highly cited drug-repurposing workflow (OpenPREDICT) by FAIRiying data sets as well as applying semantic technologies to represent and store data on the versions of the general protocol, the concrete workflow instructions, and their execution traces [ 33 ].
- To describe a 4-step method to revolutionize the management of multiple sclerosis to a personalized, individualized, and precise level using FAIR data [ 43 ].
- To provide a 5-step guidance and give detailed instructions on how to write FAIR-compliant, homogeneous systematic reviews for rare disease treatments to facilitate the extraction of data sets that are easily transposable into machine-actionable information [ 38 ].
We found 11 tools designed for various purposes, as shown in Multimedia Appendix 3 [ 6 , 15 - 17 , 19 , 21 , 24 , 25 , 28 , 29 , 32 , 33 , 36 , 37 , 40 , 43 - 47 ]. The Tycho 2.0 tool was instrumental in illustrating the value of investing in a domain-specific open-data resource for accelerating science and creating new global health knowledge through data FAIRification [ 15 ]. Another tool expands the value of clinical case reports as vital biomedical knowledge resources by structuring extensive metadata for clinical events and case descriptions. This standardization of metadata templates for clinical case reports aids clinicians and clinical researchers in gaining a better understanding of disease presentations, including their key symptomatology, diagnostic approaches, and treatment [ 16 ]. SCALEUS-FD is a semantic web tool that was built following the semantic web and linked data principles to support the difficulties of researchers in sharing their data by publishing FAIR-compliant data and metadata to facilitate interoperability and reuse [ 45 ]. Another tool was developed to present a novel ontology aiming to support the semantic enrichment and rigorous communication of pharmacovigilance signal information in a systematic way, focusing on the FAIR data principles and exploiting automatic reasoning capabilities on the interlinked pharmacovigilance signal report data [ 44 ]. This ontology uses RDF Schema and Web Ontology Language to define concepts as well as high-level semantic relations between them, as opposed to the free-text format based on which pharmacovigilance signal reports are made publicly available from organizations, which does not facilitate systematic search and automatic interlinking of information. Semantic web technologies and ontologies are key to the standardization and FAIRification of clinical data for training using machine learning algorithms that can be used to build prediction models for personalized therapy [ 34 ].
A FAIR data archive was built to better examine the nature and course of children’s responses to acute trauma exposure by combining data from multiple studies, describing key study- and participant-level variables, harmonizing key variables, and examining retention at follow-up across studies [ 21 ]. Another data archive in the form of a registry was developed to describe the steps toward the architectural extension and implementation of the FAIR data principles in the Open Source Registry for Rare Diseases via a web Federal Demonstration Partnership. The Federal Demonstration Partnership allows institutional data owners to give access to their data sets in a FAIR manner and can be integrated into a larger interoperable system. At the time of this authorship, the focus was on building a first prototype [ 32 ]. The second registry we found was the FAIR French National Registry for patients with facioscapulohumeral muscular dystrophy, whose original design allows for the strong involvement of both patients and physicians since its inception in 2013 [ 40 ].
Tools that facilitate collaborative efforts and data discoverability were also highlighted in our findings. Menoci is a modular web portal for data collection; experiment documentation; and data publication, sharing, and preservation in biomedical research projects [ 29 ]. This software focuses mainly on the collection and integration of data and the comprehensive documentation and workflow support of all steps, from planning to sharing and publishing. Another similar tool is the FAIR-based American Heart Association Precision Medicine Platform, whose goals are to democratize data, make it easy to search across orthogonal data sets, provide a secure workspace to leverage the power of cloud computing, and provide a forum for users to share insights. The tool thereby addresses the challenges researchers face when accessing large public data sets today in finding, accessing, downloading, and interpreting each poorly harmonized data set individually [ 19 ]. We also saw a portal that enables easy access to clinical, radiological, and genomic patient data and instantaneously executes multi-domain hypothesis creation and testing. This portal is applicable for medical training as well as clinical and research purposes [ 26 ].
It was interesting to learn that the infrastructures we found were designed for various purposes. In the clinical care area, we found a FAIR system designed to use clinical data elements for electronic data submission, processing, validation, and storage within designated repositories [ 17 ]. For research management purposes, we found a pipeline built for the creation of FAIR data integration infrastructure for data creation, storage, and processing [ 28 ]. We also found 3 infrastructures that were developed for epidemiology research. One is a FAIR cloud-based approach for storing, analyzing, and sharing cancer epidemiological cohort data in a common, secure, and shared environment adopted by the California Teachers Study in 2014 [ 24 ]. The other is a point-and-click website (ClinEpiDB) that supports third-party discovery and reuse of primary epidemiological research data by incorporating resources, tools, vocabularies, and infrastructure. It also facilitates access to and interrogation of high-quality, large-scale data sets, which enables collaboration and discovery that improves global health [ 46 ]. The third tool uses a model of data quality control and data stewardship that puts large and complex sensitive cohort data (YOUth) at the forefront of FAIR proper data infrastructure and management procedures [ 36 ].
We also found works conducted by the European life-science data infrastructure to facilitate collaborative research for improved access to research infrastructures and research data–sharing platforms in the EU in the wake of COVID-19 [ 47 ]. We noted that some investigators of the IPUMS data claim that their data collection approach was already consistent with the FAIR data principles before the FAIR concept was even conceived [ 25 ].
It is noteworthy that some of the reviewed FAIRification infrastructures, workflows, and tools have not been tested outside the pilot environment in which they were developed [ 6 , 45 , 47 ]. Thus, their applicability to real-world environments still needs to be demonstrated. Some of the built systems are also yet to be evaluated [ 34 ]. The development of some of the FAIRification tools and strategies has been based on the results of requirement-gathering exercises from the community [ 6 , 45 ].
Our investigation revealed that comprehensive user training, including tutorials, conferences, or similar formats, is necessary to support the uptake of the FAIR principles as researchers continue to create or request FAIR data for reuse [ 25 ].
Challenges and Mitigation Strategies in Approaches
A total of 38% (13/34) of the publications reported challenges faced or anticipated for the practical implementation of FAIR guiding principles to make their resources FAIR.
This section summarizes the reported challenges, risks, and mitigation strategies. It covers research questions 2 and 3 of our previously published protocol [ 13 ]. These are shown in Multimedia Appendix 4 [ 6 , 19 , 23 , 24 , 27 , 30 , 32 , 33 , 35 , 36 , 39 , 42 , 47 ].
Löbe et al [ 30 ] pointed out several problems with the FAIRification of medical records. A challenge persists with determining a suitable granularity to which data should be assigned their own identifiers in cases in which external systems need to reference single data elements only. Löbe et al [ 30 ] also showed that frequent additions; updates to medical records; and subsequent changes, which include additions of data to the existing data, make iterative versioning a difficult task. Another reported challenge is the lack of “eternal persistence” of globally unique identifiers assigned by data repository software systems when updates or system changes are made. The authors also pointed out that there was no agreement on features such as descriptive human-readable identifiers as part of identifiers among digital identifier registry software, such as Health Level 7 Object Identifiers, digital object identifiers, and URIs.
Despite the high-quality metadata available for the YOUth study results, data are scattered in a Yoda file system confusion of folder titles, settings files, and headers in binary formats. To aggregate, validate, and store the metadata in an explicit metadata structure and facilitate mapping toward the Data Documentation Initiative standard, they developed a script in collaboration with the metadata specialist of the Utrecht University Library [ 36 ].
Findability challenges, such as the lack of tools to support content-based searching for data or catalogs without a data dictionary for a particular data set or a search function for a particular data element, were identified [ 27 ]. In these circumstances, users often have little insight into the content of the data unless they download the full data sets or follow links to the original data source, which can sometimes be broken. The inclusion of data dictionaries has been identified as a critical step toward improving data findability and accessibility, for example, the Emergency Department Catalog, a search tool designed to improve the “FAIRness” of electronic health databases [ 27 ].
Most of the challenges reported in the articles were accessibility-related. A long-standing, protective, and siloed research data management culture was reported as the main bottleneck for data sharing rather than the technical capabilities [ 23 , 24 , 30 ].
Cohort studies with stored data on local network drives, usually at the researchers’ institutions, create data silos that prevent real-time collaboration [ 24 ]. The manual work of merging, updating, and distributing individual and summary data sets for analysis and data sharing has been described as time-consuming and inefficient. Researchers have outlined that pivoting from every study investigator analyzing the copy of their data to all investigators using shared resources requires a conceptual shift in focus from the individual investigator to the broader user community [ 23 , 24 ]. To facilitate the cultural transformation of the workforce in biopharma, research and development envisioned a change from “it’s my lab and my data” to “it’s the company’s data” through incentives ranging from peer recognition to financial rewards [ 23 ]. A similar approach that entails engagement throughout the enterprise, from the departments to the executives, was also considered by Wise et al [ 23 ]. In this approach, a platform implemented through a combination of top-down commitment and investment from senior management and a bottom-up approach from scientists and managers has transformed their engagement and increased access to cohort data; removed many barriers to data reuse; and further accommodated data storage, cleaning, updating, analyzing, and sharing.
Cohort studies such as the YOUth study with a broad scope of data and potential interest to a broad range of researchers encounter frequent data requests [ 36 ]. The handling of a data request is a multistep procedure involving multiple actors. The authors highlighted that, by developing a system on the existing main data storage facility, they combined the request, staging, and transfer of data within a single system, which simplifies the process for all actors involved.
Regulatory burdens on data collection, such as data processing agreements, are also reported as challenges faced when handling data in diverse formats from different sources across different legal entities. A study on the impact analysis of the policy for access to administrative data in France reported that extrinsic factors influence the accessibility of claims data, such as human factors (eg, data scientists with experience in claims data) and economic factors (eg, data infrastructure that is Health Insurance Portability and Accountability Act– and GDPR-compliant) [ 39 ].
Data politics are shown to hinder access to “Real-World Observation” data. The Virus Outbreak Data Network Implementation Network (VODAN-IN) reported that the COVID-19 pandemic is highly politicized and that there is little chance that countries (or even institutions) will “share” their Real-World Observation data with even the World Health Organization [ 35 ]. The VODAN-IN sought collaborations with institutions that work on established knowledge bases and genuine partnerships worldwide, which has facilitated the enhancement of infrastructure and methods for “distributed deep learning” as a mitigation approach. Mons [ 35 ] also indicates that, based on the policy “as distributed as possible, as centralized as necessary,” the network strives to ensure that the algorithms and services can work effectively with both FAIR data and metadata.
The complexities of access to personal medical data lie in their sensitive nature for the individual patient [ 53 ]. The comprehensive GDPR sets several conditions and restrictions for data collection, including detailed consent research by stating the objective, the persons accessing the data, and the circumstances of data processing that prevent subsequent data sharing [ 30 ]. To fulfill GDPR requirements, developing and using further harmonized metadata vocabularies was suggested on topics such as the legal basis for data collection or the different variants of informed consent. Wise et al [ 23 ] also indicated that the GDPR compliance “right to be forgotten” should not be overlooked in the FAIR implementation plan. Wise et al [ 23 ] postulate that digital transformation will enable artificial intelligence analysis, machine learning, and data recognition.
Data owners’ privacy breaches during data sharing and safety concerns in cloud computing were reported as accessibility challenges [ 19 , 30 ]. Balancing the safekeeping of highly privacy-sensitive data and the protection of intellectual property and at the same time facilitating the scientific community’s access to these rich, unique data are among the reported challenges faced during data FAIRification by the American Heart Association Precision Medicine Platform and the YOUth longitudinal cohort study infrastructure in the Netherlands [ 19 , 36 ].
To mitigate community perceptions about the security of cloud computing environments, the American Heart Association platform has configured the cloud computing infrastructure, computation, and software platform with various security, confidentiality, and authentication settings to comply with widely adopted national and international standards and regulations. Another solution they have implemented is to have a third-party auditor assess the platform’s compatibility with national laws for analyzing biomedical data in a cloud-based environment, encrypt the data during transmission to or storage on the platform, and grant summary views of the results to the general community and detailed information to the other research group [ 19 ].
The fear of FAIR ecosystem monopolization is another concern that needs to be addressed. It is suggested that quality control and a minimal certification scheme for all components of the ecosystem must be in place as part of the effort to avoid monopolization of the FAIR ecosystem and its application by any particular party [ 35 ].
The interoperability challenge is reported in the scope of underuse of existing standards and lack of standard compliance in general. Löbe et al [ 30 ] pointed out that there are myriad standards, conventions, and best practices in biomedical research, but in many cases, researchers use the freedom of science to act on their own ideas rather than using existing standards. In contrast, data from health care systems are encoded with many different standards and governance models, and as a result, discovering, accessing, and linking such data is imperative in response to COVID-19 [ 47 ]. Moreover, the lack of domain-specific templates has been shown to force the use of a custom model, which limits interoperability [ 24 ].
The successful deployment of FAIR will require a standardized information architecture, and communities should reach a robust consensus on the ontologies they use to capture specific types of data. Löbe et al [ 30 ] argue that, although international medical terminologies such as the International Classification of Diseases, 10th Revision; Logical Observation Identifiers Names and Codes; or SNOMED-CT are very well suited to describe clinical concepts in detail, the vocabularies only partially fulfill the requirements of the FAIR data principles.
Furthermore, the terminologies overlap. Celebi et al [ 33 ] reported that creating a new ontology from scratch rather than creating a unified model based on the existing ontologies presents a challenge in the semantic modeling of unified workflow models. Regarding semantic modeling, Celebi et al [ 33 ] reported that the execution of the FAIRification process in the OpenPREDICT project was straightforward but the semantic modeling of the unified workflow model was challenging. Reusing existing semantic vocabularies to represent the unified model proved to be an extensive task. Moreover, some workflows have consistency issues such as missing joints and licensing elements, in addition to not conforming to the documentation intended for the specific project.
Reusability is particularly challenging in the context of provenance; data quality; and enabling factors such as incentives, return on investment (ROI), and infrastructure. Provenance is a broad topic, and the demarcation of medical data acquisition is not sharp. Hence, data owners should detail the circumstances of data collection and processing (ie, data sources, data validation rules, format conversions, data cleaning, derived or aggregated data, measurement tools, scripts, software libraries, and observers). The provision of simple web-based visual analytics tools that give potential prospects an overview of the depth of available data can increase the reusability [ 30 ]. It is also recommended to develop robust distributed provenance information schemes that can balance full path reconstruction while keeping the process privacy compliant [ 42 ].
Data quality is an important concern when relying on external data. Very large quantities of data have been generated in relation to the global COVID-19 pandemic. Ensuring data quality and the risks of false and misleading information dissemination as “fact” was challenging [ 35 ]. Even though data donors are required to sign an agreement regarding the accuracy of the data they are sharing, the quality of the shared data is not always validated. Efforts made to ensure data quality through technical validations and manual maintenance should be explicitly mentioned [ 30 , 35 ]. The high quality of the metadata annotation was considered a key point for reuse. For example, data are published for broad access and reuse in many ELIXIR nodes, which aim to provide data management support to projects launched nationally and at the EU level [ 47 ].
Apart from the typical reusability challenges with regard to data quality and provenance, general financial and human factors (incentives) were reported as impeding factors for reusability. The uneven distribution of the effort and benefits of FAIRification at the expense of the data owners was also reported as a challenge. Data owners deserve an incentive for their efforts of sharing. Incentives may be required to support data sharing, enhance FAIR awareness through activities such as training, and promote methods to assess and support organizational and cultural shifts in management that lead to the achievement of positive perceptions of FAIR data sharing [ 30 ]. In the YOUth cohort success story, collaborative data management, by identifying executive and managerial key players in partner organizations; building good relationships; and adopting attitudes of patience, persistence, and forgiveness, helped overcome organizational differences and misunderstandings [ 36 ].
The costs of cultural and platform changes affect the key business categories of people, processes, technology, and data. More specifically, Lacey et al [ 24 ] described an up-front investment that was required to develop a data model, configure the user interface, and also convert decades’ worth of existing data sets into a single integrated data warehouse for data obtained from studies on large-scale cancer epidemiology cohorts. Although the initial costs of setting up FAIRification infrastructures and tools are quite significant, the costs of maintaining the infrastructure and tools once the initial setup is completed are much lower [ 24 , 54 ]. Executive management will need to be convinced that, apart from being a high-priority, urgent endeavor, FAIR implementation will generate a long-term ROI. Differing levels of uptake of FAIRification concepts among researchers may be observed during the transition [ 23 , 24 ].
Partner organizations with a clear ambition to support proper data management and accessibility and the means and institutional support to realize this ambition by providing a dedicated research IT division and high-quality data managers was a crucial prerequisite to the success claimed by the YOUth project [ 36 ]. Figure 2 presents a summary of the challenges and mitigation strategies used in the implementation of FAIR data principles in health data stewardship.
Networks Involved in the Implementation of the FAIR Data Principles
A total of 35% (12/34) of the publications contained details on networks involved in the implementation of the FAIR data principles. The operational definition of “network” for this scoping review is “scientific communities, research institutions, repositories or data archives, consortia, funding agencies, and citizens who are actively engaged in advocating FAIR principle data stewardship in the health care domains.” Multimedia Appendix 5 [ 15 , 18 - 20 , 24 , 29 , 35 , 42 , 47 ] provides an overview of these networks.
Most continents have been reached by at least one FAIR network. All the networks we observed had different sources of funding. A common theme is the community approach through collaboration with parties sharing similar interests, such as the VODAN-IN, which was established to provide a platform for FAIR data exchange during the COVID-19 pandemic and a reference point for data stewardship in future pandemics [ 35 ]. The Research Data Alliance (RDA) is a community-driven initiative that was established in 2013 and consists of several work groups whose overall aim is to FAIRify health research data [ 55 ]. The RDA also established a working group to support the VODAN-IN [ 35 ].
Another network with a similar approach is ELIXIR, which has documented willingness to collaborate with appropriate stakeholders to facilitate discovery, access, and data linkage. ELIXIR has also expressed a willingness to further the development and application of normalization and interoperability of well-annotated medical and real-world data by seeking collaborations with researcher-driven initiatives by developing open reproducible tools and workflows for COVID-19 research [ 47 ]. Both networks are explicitly seeking collaborations with researchers who share similar interests.
A challenge facing COVID-19 research is that data from health care systems are encoded using many different standards and governance models. ELIXIR’s response aims to ensure that COVID-19 data are well annotated and accessible for reuse by the research community and society [ 47 ]. They have created databases and archives in which researchers are encouraged to deposit and share their raw sequence data and COVID-19 data in a manner compliant with the EU GDPR. Other networks that have highlighted the need for a communal approach in FAIRification endeavors are the World Wide Web Consortium Semantic Web for Health Care and Life Sciences Interest Group and the RDA [ 18 , 36 ].
The active networks have also identified challenges faced by data-driven research. The American Heart Association established the Precision Medicine Platform to address the current challenges in accessing large public data sets—lack of harmonization across multiple data creates difficulties for researchers to combine data sources and evaluate results. The aim is to provide a transparent and explicit harmonization to access both harmonized and raw data [ 19 ]. Although not yet completed, this network plans to involve the community for better data diversification. Similarly, the Biobanking and BioMolecular resources Research Infrastructure–European Research Infrastructure Consortium extended the FAIR principles to FAIR health principles in biological material management by providing comprehensive provenance information for the complete chain, from donor to biological material to data, as well as incentives for enriching existing resources and reusing them [ 42 ].
The Dutch Research Council, Data Archiving and Network Services organization, and UK Data Service have come together to develop a high-quality research data infrastructure for sensitive cohort data [ 36 ].
The VODAN-IN welcomed organizations that work on established knowledge bases and biomedical research results for collaboration in FAIRification efforts [ 35 ]. The VODAN-IN has a long-term goal of reusing the resulting data and service infrastructure for future outbreaks. Similarly, some of the active networks such as the Committee on Data for Science and Technology have indicated that their areas of future research will apply data to real-world issues and promote the application of principles, policies, and practices that enable open data and advanced data skills for national science systems. Major discussion points in these data-sharing networks are data discoverability, infrastructure development, infrastructure deployment, and collaborative efforts in the FAIRification journey [ 15 , 18 , 20 , 24 , 29 , 33 , 36 , 47 ].
With respect to future work, the World Wide Web Consortium Semantic Web for Health Care and Life Sciences Interest Group has indicated interest in adding new use cases and documenting improvements made to the existing community profile [ 18 ]. Similarly, the American Heart Association has identified the need for better data diversification and has plans to involve the community in this venture [ 19 ].
In the same community spirit, FAIRsharing intends to grow the number of users, adopters, collaborators, and activities, all working in their community-driven resources to enable the FAIRification of standards, knowledge bases, repositories, and data policies [ 29 ].
The VODAN-IN has identified the need to provide a platform for collaborative FAIR data exchange during the COVID-19 pandemic, becoming a reference point for data stewardship in future pandemics [ 35 ]. DataMed’s goal is to be for data what PubMed has been for scientific literature by enabling the discovery of biomedical data sets that are spread across different databases and on the cloud [ 20 ]. The Sherlock Division at the San Diego Supercomputer Center has identified the need to expand the geospatial, comorbidity, and biospecimen tools for query and analysis; automate certain processes; and develop an application programming interface enabling data collection and sharing [ 24 ].
Outcomes of FAIRification and Expected Future Work
A total of 68% (23/34) of the publications contained details on the outcomes of FAIRification endeavors and the expected future work.
Several outcomes were reported or expected by researchers as a result of attempting to incorporate the FAIR guiding principles into their specific use cases. We categorized the outcomes into new findings or treatments, improved data sharing and publications, and ROI, as shown in Multimedia Appendix 6 [ 15 , 16 , 21 - 27 , 29 - 34 , 36 , 40 , 41 , 43 - 47 ]. On the basis of the research findings, we identified the principles needed to achieve the outcomes and the relevant future work that the authors indicated.
New Findings or Treatments
In frontline clinical care, rich metadata acquired from clinical case reports serves as a valuable tool empowering researchers to explore disease progression and therapies to improve health outcomes [ 16 ]. The effort to enrich data with explicit metadata using standardized templates has also facilitated wider data provision for educational documentation. The data can be easily identified by metadata, reducing the time spent collecting and searching for already available data. The standardization of metadata templates reduces the time and costs of data discovery. These savings lead to improved productivity in health research and development [ 23 ]. Tools such as the Tuberculosis Data Exploration Portal allow for collaborative research that uses the wealth of metadata contained within the database to improve patient care, especially in the case of drug-resistant tuberculosis [ 26 ].
Data Sharing and Publications
At the corporate level, implementing the FAIR principles improves robotics and process automation through machine readability, which further enables reuse and scalability [ 23 ]. The difficulties experienced by researchers when sharing their data have led to the creation of tools that lighten the burden of publishing interoperable and reusable data and metadata [ 45 ]. Successful standardization and harmonization of data and metadata in clinical tools such as registries have enhanced data-sharing capabilities. Time will show if disease registries can simultaneously be FAIR and data privacy compliant in the wake of the GDPR implementation [ 32 , 45 ]. The combination of ontologies and semantic web has also allowed for FAIR clinical data sharing for the secondary use of administrative claims data [ 31 ]. In the field of radiation oncology, the radiation oncology ontology and semantic web have proven usable for integrating and querying data from different relational databases for data analysis [ 34 ].
FAIRification can also provide insights into understudied but clinically relevant matters such as pediatric traumatic stress. A sustainable framework for standard variable names, metadata, and harmonization algorithms has been developed to support data reuse [ 21 ]. The findability and accessibility of this archive have allowed investigators worldwide to reuse the published data [ 21 ]. More analyses are expected to be built on this accessible, reusable, harmonized child trauma data set. The research produced by IPUMS users, published as numerous articles, books, and papers, is abundant on Google Scholar. The pace of IPUMS-based publications continues to accelerate [ 25 ]. Publications that expand the notions of FAIR and FAIRification from the relatively static artifacts of data sets to publications on the dynamic processes of workflows have resulted from the FAIRification process [ 33 ].
A database in clinical epidemiology has been developed as an open-access web-based tool that maps data to common ontologies and further creates a unified semantic framework. The terms are reused or requested from existing ontologies when possible. The context of these terms is provided to facilitate reusability [ 46 ]. Once information related to pharmacovigilance safety signals is identified, it is publicly communicated in free-text form by organizations charged with the responsibility of identifying the safety signals for further investigation [ 56 ]. The OpenPVSignal ontology was developed to support the semantic enrichment and rigorous communication of pharmacovigilance signal information in a FAIR manner by use of existing semantic-rich metadata. It also interlinks the respective information with other data sources by applying semantic reasoning, which further enables data reuse [ 44 ].
The improvements made in Project Tycho, version 1, as a result of FAIRification gave rise to Project Tycho, version 2, demonstrating the value of sharing historical epidemiological data for creating new knowledge and technology. This has also facilitated data reuse [ 15 ]. These improvements resulted in 150 published works that cited the Project Tycho release paper, 47 of which were published by authors from 1 of the 100 institutions most commonly affiliated as registered Project Tycho users [ 15 ]. A tool has been developed to facilitate the collection of harmonized, rich data and metadata as well as the standardization and documentation of experimental data along the scientific process. This tool also allows for data sharing with public repositories. Data access is regulated by the Drupal framework [ 25 ].
The American National Cancer Institute established a framework to protect the security of cancer data warehouses under its jurisdiction [ 57 ]. We noted that, although the infrastructure of the data warehouse of the California Teachers Study has unique features in line with this framework, it still shares enough common characteristics to facilitate widespread data harmonization, pooling, and sharing. The California Teachers Study now offers all users a shared and secure workspace with common data, documentation, software, and analytic tools to facilitate use, reuse, and data sharing [ 24 ]. The combination of ontologies and semantic web technologies has also been shown to enable FAIRification of clinical data, which further facilitates data sharing [ 31 , 34 ]. However, our review led us to resonate with previous studies that have highlighted the need to evaluate data access policies to understand the potential leverages of data reuse [ 39 ].
ROI remains an open discussion [ 22 - 24 , 36 ]. Digitizing and standardizing all data is more expensive than digitizing a specific section of interest. In contrast, simultaneously digitizing and standardizing data is more efficient and budget-friendly than iteratively doing the same work for small parts of the data at a time [ 15 ]. Studies also show that poor adherence to the FAIR data principles hampers data use and reuse, but this is correctable at a reasonable cost [ 41 ]. Additional investment in human resources may be required for the purposes of preparing the data for FAIRification, actual FAIRification, and related activities. Once complete, FAIRification will prevent duplication and accelerate new science and discovery in global health [ 15 ]. The short-term impact of FAIRification includes improved data findability; faster data access; and the selection of standardized, machine-readable data for analytics [ 23 ]. The reusability of the data increases their value as different researchers worldwide continue to request these data and may eliminate the need for a new process of data collection [ 21 , 25 ].
Some of the FAIRification efforts have delivered with regard to ROI. Retrospective FAIRification of data and infrastructure may require a significant up-front investment to develop models and prepare data sets that may have been collected and maintained over the years for FAIRification. However, as increasingly more options and examples for FAIRification become available, the required up-front investment may decrease. As FAIR health data are still a relatively new concept in the field of health, it is critical that community members share their experiences, perspectives, and lessons learned in their FAIRification endeavors. There is also need for further research that will provide insights into the organizational, behavioral, and technical shifts that occur as a result of the transitions to FAIR [ 24 ].
Collaboration with appropriate stakeholders has been described as a key enabler for successful FAIRification efforts [ 47 ]. Active networks repeatedly mention plans to collaborate with appropriate stakeholders to interconnect the different researcher-driven initiatives, include FAIRified data sets from a wider variety of sources, and develop open reproducible tools and workflows for research [ 27 , 47 ]. Future work will also involve expanding tools to allow for automatic semantic data retrieval and integration with third-party tools and systems [ 29 ]. The performance of these systems may also need to be evaluated [ 41 ]. A notorious challenge facing FAIRification is the high costs and effort required for the successful conduction of data collection [ 30 ]. In the future, it may be astute to use incentives to motivate stakeholders to take on the FAIRification journey. We may also see a rise in the standardization of processes for data access and extraction in the future [ 30 ]. There are plans to further develop the repositories in use so that they can accommodate data sets from a wider variety of sources [ 27 ]. There are also efforts to develop public catalogs that promote external metadata discoverability through which researchers worldwide can access detailed metadata [ 36 ]. There may be a need to research potential conflicts of interest that may arise among the various stakeholders in the wake of FAIR health research data sharing, as well as the measures to mitigate unethical use [ 41 ].
We appreciate the efforts that the various reviewed authors have taken to implement the FAIR data principles in health research data. The overall concept of data management, of course, is not new to the research ecosystem. Unlike other initiatives that focus on human scientists, the FAIR principles emphasize improving the ability of machines to find and use data automatically and supporting its reuse by individuals [ 35 ].
Critical steps toward data FAIRification in the biomedical domain include interpolation, inclusion of comprehensive data dictionaries, repository design, SI, ontologies, data quality, linked data, and record linkage, as well as requirement gathering for FAIRification tools. Other concerns such as pseudonymization, the need for collaboration in matters of FAIRification, and data FAIRification in the wake of COVID-19 were discussed. In conducting this study, we also noted the varying levels of understanding and misunderstanding of the FAIR concept. In this section, we discuss 7 recurring themes in our findings.
Involving the intended users in requirement engineering during the development of software tools is a vital first step in ensuring both tool acceptability and applicability [ 58 ]. Requirement gathering for FAIRification tools and strategies helps deal with the sense of mistrust within the community when it comes to matters of data safety and privacy. It may be necessary to encourage community participation by first making the community aware of the potential benefits of FAIRification as well as the steps taken to ensure the safety and privacy of their data with regard to legal regulations.
The importance of a comprehensive data dictionary for data sharing cannot be overstated [ 59 ]. Data dictionaries provide context for data collection, documentation, conversion processes, generation, validation, storage, and use [ 49 ]. Standardized data dictionaries lead to more efficient data handling and analysis, which further improves data interoperability [ 60 ]. Data dictionaries also enhance a user’s understanding of the data and, therefore, help improve data reusability [ 49 ]. The inclusion of data dictionaries is a critical step toward improving data accessibility and overall FAIRness [ 21 , 24 , 61 ].
There is a need for legislation in matters of FAIRification of sensitive health data [ 62 ]. According to the GDPR, patients own their data. This is currently factored in during FAIRification processes in the EU and higher-income countries. However, there are countries where legal frameworks regarding data governance have not yet been established [ 63 ]. Rapid advances in technology allow huge amounts of health data to be collected and manipulated within a short time. Consequently, more comprehensive data protection strategies are needed. However, what is the ethical route to take in the event that the law regarding data sharing is neither permissive nor explicit? What will govern FAIRification or FAIR data sharing in these instances? Is now the time to have a conversation on law amendments that may allow for more FAIRification in a protected environment? What are the restraints if granting agencies and funders of research require open data sharing? The answers to these questions may facilitate voluntary FAIR sharing of health data for purposes of coordinating responses to health threats, health research for better prognosis, surveillance, policy making, and decision-making.
The data maintained in clinical registries play a vital role in patient management, research, policy, and decision-making [ 64 ]. This is especially critical in the domain of rare diseases, where resources are scarce [ 38 ]. It is equally important that the data be linked using similar terminologies for the data values and data types to enable data compatibility [ 65 ]. This eliminates the need to recollect data and reduces the clerical burden that researchers have of resolving incompatibilities and correcting errors. It also allows machines to aid in data analysis across resources as they depend on explicit, unambiguous definitions of data [ 40 ].
Required Organizational Cultural Shift
In the course of conducting this study, we observed the immense effort and time required to FAIRify data and metadata retrospectively [ 32 , 45 ]. It is for these reasons that we reiterate that the concept of FAIR data stewardship should be considered in the early stages of a project. However, this will require significant cultural and organizational changes that may be met with resistance from the various stakeholders involved. For example, semantic enrichment of health data at the source is a critical part of data FAIRification, but studies show that frontline health professionals are still not motivated to add the process of semantic enrichment to their workflows [ 21 ]. Therefore, it is necessary to find innovative ways to add this step and the related technology to the workflows that already exist while incentivizing health care professionals to do the same.
SI allows data generated in different systems to be interchangeable with consistent meaning [ 66 ]. This is a foundational aspect that determines the ability of different systems to effectively work together and share data and domain concepts, context knowledge, and formal data representation [ 67 ]. However, application of the FAIR principles does not guarantee SI, and it is in this vein that Natsiavas et al [ 44 ] suggested that it would be astute for the global health community to establish a collection of preferred standards and ontologies. In the course of this review, we came across studies that attempted to include SI in a registry designed for rare diseases. A metadata repository offered possible uniform descriptions and defined data elements, creating SI in a FAIR infrastructure, which further facilitates data findability, sharing, and reuse [ 32 ].
Training on FAIR
FAIRified clinical case reports have also been shown to be an attractive educational resource to a larger diverse audience if enriched with standardized metadata [ 16 ].
Several training platforms target FAIR education. The Precision Medicine Platform, for example, provides tutorials that guide researchers through data analyses such as genome-wide association, population demographics, descriptive statistics, and deep learning [ 19 ]. IPUMS supports FAIR through an extensive program of user training and support in the form of brief video tutorials on the use of the data extraction systems. In-person training and workshops at conferences, as well as active user forums, complement the portfolio [ 25 ]. Are these training workshops effective?
We recognize that it is probably not entirely practical to have a single template or workflow that guarantees an outcome of FAIRified data. Many technical and administrative questions must be asked about influences on the FAIRification process.
Is the required expertise for this process available? Are there enough time and finances to train a select few to become FAIR data stewards within the organization with the intent of nurturing skills and expertise within the organization, for example, by implementing a train the trainer concept [ 68 ]? Do time and financial constraints necessitate that it is better to outsource FAIR expertise? What is the budget available for this process? As FAIRification is an iterative continuous process, what is the long-term plan to iteratively improve the FAIRness of the data? As there is more than one way to evaluate FAIR, what method of evaluation should be chosen in a particular context or data domain and why? What if the evaluation method is not applicable to the analysis of some of the software tools used, as is the case with the FAIR metrics [ 28 ]? Is the data management plan FAIR-inclusive? What is the purpose of FAIRification for this specific context? For the purposes of meeting the specific objectives of FAIRification, which of the 15 FAIR subprinciples must be adhered to in said context, and which ones can be left out [ 69 ]?
Although many benefits have been demonstrated in the FAIR work that we have reviewed, the related efforts, time required, and costs may be above reach for many researchers. Incentives may be needed to motivate stakeholders to embrace the FAIRification journey [ 30 ]. Subsidizing the costs involved in data FAIRification may also prove to be a worthy investment [ 36 ]. Still, questions may arise regarding the expected ROI for all the participants involved in this process and the legal constraints, implications, or risks that need to be navigated.
The development of the FAIR principles in clinical research may serve as an important step toward the standardization of data elements, but it still does not guarantee data users a concrete minimum viable product. However, at the same time, FAIRification efforts are expected to be iterative, but to what point? There may be a need for the various stakeholders involved to define the minimum viable product.
There may also be a need to discuss the sustainability of the developed tools and infrastructures for FAIRification as well as the FAIRification outcomes. Further discussion is needed on the expertise required to operate software to realistically prepare the users for these tools and infrastructures. For example, our further study of the work done by Caufield et al [ 16 ] on using FAIRshake (to evaluate the FAIRness of the metadata acquired from clinical case reports) led us to discover that FAIRshake users need to be efficient Python programmers. It is also worth discussing how the required funding and human resources are maintained over time.
What FAIR Is Not
Our work led us to realize the need to clarify the things that FAIR is not. These include those outlined in the following sections.
FAIR and open data are not the same; they are 2 distinct concepts. However, they are becoming increasingly close. “FAIR” and “open” represent 2 different concepts, and the 2 words cannot be used interchangeably to mean the same thing. We found studies in which these 2 distinct concepts seemed to mean the same thing [ 19 ]. FAIR data management does enable more shared research results from open publications.
The Pathway to Better Data Quality
We observed that some authors claimed that the FAIRification of their data led to an increased quality in the data available for analytics, such as machine learning [ 23 ]. Previously conducted studies have shown that, conceptually, this is not the case [ 70 ]. However, a study in our review claimed that the data quality and consistency of submissions were enhanced through validation with domain-specific data dictionaries. This claim may still require further investigation.
We also observed that authors claimed that they achieved findability as their data resources were actively promoted by the staff of the organization as well as by happy users in various forums, such as workshops and conference presentations, publications, classrooms, blogs, and social media [ 25 ]. Further evaluation of these claims is required to determine how the findability principle in FAIR is met.
In our review, we identified studies that stated that the data were accessible as they were provided free of charge and with as few restrictions as possible [ 25 ]. However, Mons [ 35 ] indicated that free does not mean FAIR. Further investigations are needed to investigate these claims [ 35 ].
We observed that some authors claimed that they improved the findability of their data by carefully designing the user interface to ensure that users could navigate large data collections to locate the specific data they needed for their research [ 25 ]. The same authors claimed that the data were accessible as the tool’s interface was easy to use, which allowed users to navigate their large data collections to access free data with as few restrictions as possible. There remain more criteria to be fulfilled before claiming data findability or accessibility.
Strengths and Limitations of This Study
We used independent reviews of 3 databases throughout the extraction phase. However, we were not able to critically examine the gray literature as it was focused on specific health research domains; therefore, it did not meet our inclusion criteria. Our strict focus on the health domain may have led us to miss out on other important developments, collaborative data approaches, and data-sharing initiatives in data FAIRification for intersecting domains. Many of the reviewed FAIRification efforts did not provide an objective FAIR evaluation. Therefore, it is not entirely possible for us to have an actual score that depicts the extent to which each of the 15 FAIR subprinciples was fulfilled [ 69 ].
This review covers work until the end of 2020. We expect a sharp rise in publications owing to the relevance of the topic in the wake of the COVID-19 pandemic. Therefore, it may be valuable to further extend this work to a systematic review covering the time range until the present. An evaluation of the quality of the publications remains a potential factor in judging the statements extracted from the included publications. Further research is needed to evaluate the quality of the included papers as well as the quality of the application of the FAIR principles in the included publications.
This work brings together a series of initiatives, concepts, and implementation practices of the FAIR data principles in health data stewardship practices. The results of this review are useful in identifying gaps and further areas of research. We identified aspects of FAIR that seem to be misunderstood, and we recommend further training on them. We hope that this work will serve to inform decisions on the FAIRification journey and provide a comprehensive introduction to the various stakeholders who may want to know what to anticipate before embarking on the FAIRification journey. We also anticipate that this work will be of value in informing decisions on the readiness of research institutions to embark on the FAIRification journey. This work may also serve as a valuable tool for the developers of FAIRification tools and infrastructures as they strive to meet the needs of the stakeholders involved. It would be interesting to conduct further studies on the reproducibility of the FAIRification assessment results. We recommend that solutions to the challenges and risks encountered be further investigated.
The implementation of the FAIR principles carries the promise of improved data management and governance through improved data sharing, standardization, harmonization, and deduplication of work [ 24 ]. The addition of rich metadata in the process of FAIRifying data has facilitated the discovery of said data, thereby increasing the audience. FAIR and open databases allow continuously updated data to serve as a valuable educational resource for clinical investigators, facilitating the development of novel advances in medical science and improved patient care [ 16 ]. All these examples contribute to a worthy ROI. In 2019, the cost of not having FAIR research data in the EU research economy as a whole was estimated to be €10.2 billion (a currency exchange rate of €1=US $1.07 is applicable) per annum, with more specific estimation expected to vary from domain to domain [ 71 ]. The interesting question is what are the costs of not having FAIR health research data now, and how much can be saved from the implementation of the FAIR data principles in this domain?
This work was partially funded by the NFDI4Health–die Nationale Forschungsdateninfrastruktur für personenbezogene Gesundheitsdaten (National Research Data Infrastructure for Personal Health Data) under Deutsche Forschungsgemeinschaft–funded project 442326535. The authors thank Benjamin Winter for help with designing Figure 2 .
Conflicts of Interest
PubMed search strategy.
Approaches being used and piloted in the implementation of the findable, accessible, interoperable, and reusable data principles in the health data domain since 2014.
FAIRification IT infrastructure, workflows, and tools in use or under development.
Challenges and mitigation strategies with regard to the approaches used in the practical implementation of the findable, accessible, interoperable, and reusable data principles in health data.
Active networks involved in findable, accessible, interoperable, and reusable data principle implementation in the health data domain.
Outcomes and future work.
PRISMA-ScR (Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews) checklist.
- Almada M, Midão L, Portela D, Dias I, Núñez-Benjumea FJ, Parra-Calderón CL, et al. [A New Paradigm in Health Research: FAIR Data (Findable, Accessible, Interoperable, Reusable)]. Acta Med Port 2020 Dec 02;33(12):828-834 [ http://www.actamedicaportuguesa.com/revista/index.php/amp/article/view/12910 ] [ CrossRef ] [ Medline ]
- Gamache R, Kharrazi H, Weiner JP. Public and population health informatics: the bridging of big data to benefit communities. Yearb Med Inform 2018 Aug;27(1):199-206 [ http://www.thieme-connect.com/DOI/DOI?10.1055/s-0038-1667081 ] [ CrossRef ] [ Medline ]
- Cosgriff CV, Ebner DK, Celi LA. Data sharing in the era of COVID-19. Lancet Digit Health 2020 May;2(5):e224 [ https://linkinghub.elsevier.com/retrieve/pii/S2589750020300820 ] [ CrossRef ]
- Fegan G, Cheah PY. Solutions to COVID-19 data sharing. Lancet Digit Health 2021 Jan;3(1):e6 [ https://linkinghub.elsevier.com/retrieve/pii/S2589750020302739 ] [ CrossRef ]
- Semler S, Wissing F, Heyder R. German medical informatics initiative. Methods Inf Med 2018 Jul 17;57(S 01):e50-e56 [ http://www.thieme-connect.de/DOI/DOI?10.3414/ME18-03-0003 ] [ CrossRef ]
- Sinaci AA, Núñez-Benjumea FJ, Gencturk M, Jauer ML, Deserno T, Chronaki C, et al. From raw data to FAIR data: the FAIRification workflow for health research. Methods Inf Med 2020 Jun;59(S 01):e21-e32 [ http://hdl.handle.net/10261/236308 ] [ CrossRef ] [ Medline ]
- Arksey H, O'Malley L. Scoping studies: towards a methodological framework. Int J Soc Res Methodol 2005 Feb;8(1):19-32 [ http://www.tandfonline.com/doi/abs/10.1080/1364557032000119616 ] [ CrossRef ]
- Fecher B, Friesike S, Hebing M. What drives academic data sharing? PLoS One 2015;10(2):e0118053 [ https://dx.plos.org/10.1371/journal.pone.0118053 ] [ CrossRef ] [ Medline ]
- Tang C, Plasek JM, Bates DW. Rethinking data sharing at the dawn of a health data economy: a viewpoint. J Med Internet Res 2018 Nov 22;20(11):e11519 [ https://www.jmir.org/2018/11/e11519/ ] [ CrossRef ] [ Medline ]
- Tricco AC, Lillie E, Zarin W, O'Brien KK, Colquhoun H, Levac D, et al. PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med 2018 Oct 02;169(7):467-473 [ https://www.acpjournals.org/doi/abs/10.7326/M18-0850?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub 0pubmed ] [ CrossRef ] [ Medline ]
- Staunton C, Slokenberga S, Mascalzoni D. The GDPR and the research exemption: considerations on the necessary safeguards for research biobanks. Eur J Hum Genet 2019 Aug;27(8):1159-1167 [ https://europepmc.org/abstract/MED/30996335 ] [ CrossRef ] [ Medline ]
- Corvalán C, Hales S, McMichael A, Millennium Ecosystem Assessment (Program), World Health Organization. Ecosystems and Human Well-being Health Synthesis. Geneva, Switzerland: World Health Organization; 2005.
- Inau ET, Sack J, Waltemath D, Zeleke AA. Initiatives, concepts, and implementation practices of FAIR (findable, accessible, interoperable, and reusable) data principles in health data stewardship practice: protocol for a scoping review. JMIR Res Protoc 2021 Feb 02;10(2):e22505 [ https://www.researchprotocols.org/2021/2/e22505/ ] [ CrossRef ] [ Medline ]
- Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan-a web and mobile app for systematic reviews. Syst Rev 2016 Dec 05;5(1):210 [ https://systematicreviewsjournal.biomedcentral.com/articles/10.1186/s13643-016-0384-4 ] [ CrossRef ] [ Medline ]
- van Panhuis WG, Cross A, Burke DS. Project Tycho 2.0: a repository to improve the integration and reuse of data for global population health. J Am Med Inform Assoc 2018 Dec 01;25(12):1608-1617 [ https://europepmc.org/abstract/MED/30321381 ] [ CrossRef ] [ Medline ]
- Caufield JH, Zhou Y, Garlid AO, Setty SP, Liem DA, Cao Q, et al. A reference set of curated biomedical data and metadata from clinical case reports. Sci Data 2018 Nov 20;5:180258 [ https://doi.org/10.1038/sdata.2018.258 ] [ CrossRef ] [ Medline ]
- Navale V, Ji M, Vovk O, Misquitta L, Gebremichael T, Garcia A, et al. Development of an informatics system for accelerating biomedical research. F1000Res 2019;8:1430 [ https://europepmc.org/abstract/MED/32760576 ] [ CrossRef ] [ Medline ]
- Dumontier M, Gray AJ, Marshall MS, Alexiev V, Ansell P, Bader G, et al. The health care and life sciences community profile for dataset descriptions. PeerJ 2016 Aug 16;4:e2331 [ https://peerj.com/articles/2331 ] [ CrossRef ] [ Medline ]
- Kass-Hout TA, Stevens LM, Hall JL. American heart association precision medicine platform. Circulation 2018 Feb 13;137(7):647-649 [ https://www.ahajournals.org/doi/10.1161/CIRCULATIONAHA.117.032041 ] [ CrossRef ]
- Sansone SA, Gonzalez-Beltran A, Rocca-Serra P, Alter G, Grethe JS, Xu H, et al. DATS, the data tag suite to enable discoverability of datasets. Sci Data 2017 Jun 06;4:170059 [ https://doi.org/10.1038/sdata.2017.59 ] [ CrossRef ] [ Medline ]
- Kassam-Adams N, Kenardy JA, Delahanty DL, Marsac ML, Meiser-Stedman R, Nixon RD, et al. Development of an international data repository and research resource: the Prospective studies of Acute Child Trauma and Recovery (PACT/R) Data Archive. Eur J Psychotraumatol 2020;11(1):1729025 [ https://europepmc.org/abstract/MED/32284820 ] [ CrossRef ] [ Medline ]
- Deshpande P, Rasin A, Furst J, Raicu D, Antani S. DiiS: a biomedical data access framework for aiding data driven research supporting FAIR principles. Data 2019 Apr 20;4(2):54 [ https://www.mdpi.com/2306-5729/4/2/54 ] [ CrossRef ]
- Wise J, de Barron AG, Splendiani A, Balali-Mood B, Vasant D, Little E, et al. Implementation & relevance of FAIR data principles in biopharmaceutical R and D. Drug Discov Today 2019 Apr;24(4):933-938 [ https://linkinghub.elsevier.com/retrieve/pii/S1359-6446(18)30303-9 ] [ CrossRef ] [ Medline ]
- Lacey Jr JV, Chung NT, Hughes P, Benbow JL, Duffy C, Savage KE, et al. Insights from adopting a data commons approach for large-scale observational cohort studies: the California teachers study. Cancer Epidemiol Biomarkers Prev 2020 Apr;29(4):777-786 [ http://cebp.aacrjournals.org/lookup/pmidlookup?view=long&pmid=32051191 ] [ CrossRef ] [ Medline ]
- Kugler TA, Fitch CA. Interoperable and accessible census and survey data from IPUMS. Sci Data 2018 Feb 27;5:180007 [ https://doi.org/10.1038/sdata.2018.7 ] [ CrossRef ] [ Medline ]
- Gabrielian A, Engle E, Harris M, Wollenberg K, Juarez-Espinosa O, Glogowski A, et al. TB DEPOT (Data Exploration Portal): a multi-domain tuberculosis data analysis resource. PLoS One 2019;14(5):e0217410 [ https://dx.plos.org/10.1371/journal.pone.0217410 ] [ CrossRef ] [ Medline ]
- Bhatia K, Tanch J, Chen ES, Sarkar IN. Applying FAIR principles to improve data searchability of emergency department datasets: a case study for HCUP-SEDD. Methods Inf Med 2020 Feb;59(1):48-56 [ http://www.thieme-connect.de/DOI/DOI?10.1055/s-0040-1712510 ] [ CrossRef ] [ Medline ]
- Parciak M, Bender T, Sax U, Bauer CR. Applying FAIRness: redesigning a biomedical informatics research data management pipeline. Methods Inf Med 2019 Dec;58(6):229-234 [ http://www.thieme-connect.de/DOI/DOI?10.1055/s-0040-1709158 ] [ CrossRef ] [ Medline ]
- Suhr M, Lehmann C, Bauer CR, Bender T, Knopp C, Freckmann L, et al. Menoci: lightweight extensible web portal enhancing data management for biomedical research projects. BMC Bioinformatics 2020 Dec 17;21(1):582 [ https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03928-1 ] [ CrossRef ] [ Medline ]
- Löbe M, Matthies F, Stäubert S, Meineke FA, Winter A. Problems in FAIRifying medical datasets. Stud Health Technol Inform 2020 Jun 16;270:392-396 [ CrossRef ] [ Medline ]
- Haux C, Knaup P. Using FAIR metadata for secondary use of administrative claims data. Stud Health Technol Inform 2019 Aug 21;264:1472-1473 [ CrossRef ] [ Medline ]
- Schaaf J, Kadioglu D, Goebel J, Behrendt CA, Roos M, van Enckevort D, et al. OSSE Goes FAIR - implementation of the FAIR data principles for an open-source registry for rare diseases. Stud Health Technol Inform 2018;253:209-213 [ Medline ]
- Celebi R, Rebelo Moreira J, Hassan AA, Ayyar S, Ridder L, Kuhn T, et al. Towards FAIR protocols and workflows: the OpenPREDICT use case. PeerJ Comput Sci 2020;6:e281 [ https://europepmc.org/abstract/MED/33816932 ] [ CrossRef ] [ Medline ]
- Traverso A, van Soest J, Wee L, Dekker A. The radiation oncology ontology (ROO): publishing linked data in radiation oncology using semantic web and ontology techniques. Med Phys 2018 Oct;45(10):e854-e862 [ https://onlinelibrary.wiley.com/doi/10.1002/mp.12879 ] [ CrossRef ] [ Medline ]
- Mons B. The VODAN IN: support of a FAIR-based infrastructure for COVID-19. Eur J Hum Genet 2020 Jun;28(6):724-727 [ https://europepmc.org/abstract/MED/32376989 ] [ CrossRef ] [ Medline ]
- Zondergeld JJ, Scholten RH, Vreede BM, Hessels RS, Pijl AG, Buizer-Voskamp JE, et al. FAIR, safe and high-quality data: the data infrastructure and accessibility of the YOUth cohort study. Dev Cogn Neurosci 2020 Oct;45:100834 [ https://linkinghub.elsevier.com/retrieve/pii/S1878-9293(20)30082-7 ] [ CrossRef ] [ Medline ]
- Kalendralis P, Shi Z, Traverso A, Choudhury A, Sloep M, Zhovannik I, et al. FAIR-compliant clinical, radiomics and DICOM metadata of RIDER, interobserver, Lung1 and head-Neck1 TCIA collections. Med Phys 2020 Nov;47(11):5931-5940 [ https://europepmc.org/abstract/MED/32521049 ] [ CrossRef ] [ Medline ]
- Atalaia A, Thompson R, Corvo A, Carmody L, Piscia D, Matalonga L, et al. A guide to writing systematic reviews of rare disease treatments to generate FAIR-compliant datasets: building a Treatabolome. Orphanet J Rare Dis 2020 Aug 12;15(1):206 [ https://ojrd.biomedcentral.com/articles/10.1186/s13023-020-01493-7 ] [ CrossRef ] [ Medline ]
- Looten V, Simon M. Impact analysis of the policy for access of administrative data in France: a before-after study. Stud Health Technol Inform 2020 Jun 16;270:1133-1137 [ CrossRef ] [ Medline ]
- Guien C, Blandin G, Lahaut P, Sanson B, Nehal K, Rabarimeriarijaona S, et al. The french national registry of patients with facioscapulohumeral muscular dystrophy. Orphanet J Rare Dis 2018 Dec 04;13(1):218 [ https://ojrd.biomedcentral.com/articles/10.1186/s13023-018-0960-x ] [ CrossRef ] [ Medline ]
- Jantzen R, Rance B, Katsahian S, Burgun A, Looten V. The need of an open data quality policy: the case of the "Transparency - health" database in the prevention of conflict of interest. Stud Health Technol Inform 2018;247:611-615 [ Medline ]
- Holub P, Kohlmayer F, Prasser F, Mayrhofer MT, Schlünder I, Martin GM, et al. Enhancing reuse of data and biological material in medical research: from FAIR to FAIR-health. Biopreserv Biobank 2018 Apr;16(2):97-105 [ https://europepmc.org/abstract/MED/29359962 ] [ CrossRef ] [ Medline ]
- Peeters LM. Fair data for next-generation management of multiple sclerosis. Mult Scler 2018 Aug;24(9):1151-1156 [ https://journals.sagepub.com/doi/abs/10.1177/1352458517748475?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub 0pubmed ] [ CrossRef ] [ Medline ]
- Natsiavas P, Boyce RD, Jaulent MC, Koutkias V. OpenPVSignall: advancing information search, sharing and reuse on pharmacovigilance signals via FAIR principles and semantic web technologies. Front Pharmacol 2018;9:609 [ https://europepmc.org/abstract/MED/29997499 ] [ CrossRef ] [ Medline ]
- Pereira A, Lopes RP, Oliveira JL. SCALEUS-FD: a FAIR data tool for biomedical applications. Biomed Res Int 2020;2020:3041498 [ https://doi.org/10.1155/2020/3041498 ] [ CrossRef ] [ Medline ]
- Ruhamyankaka E, Brunk BP, Dorsey G, Harb OS, Helb DA, Judkins J, et al. ClinEpiDB: an open-access clinical epidemiology database resource encouraging online exploration of complex studies. Gates Open Res 2019;3:1661 [ https://europepmc.org/abstract/MED/32047873 ] [ CrossRef ] [ Medline ]
- Blomberg N, Lauer KB. Connecting data, tools and people across Europe: ELIXIR's response to the COVID-19 pandemic. Eur J Hum Genet 2020 Jun;28(6):719-723 [ https://europepmc.org/abstract/MED/32415272 ] [ CrossRef ] [ Medline ]
- Benson T, Grieve G. SNOMED CT. In: Principles of Health Interoperability. Cham, Switzerland: Springer; 2016.
- Rashid SM, McCusker JP, Pinheiro P, Bax MP, Santos H, Stingone JA, et al. The semantic data dictionary - an approach for describing and annotating data. Data Intell 2020;2(4):443-486 [ https://europepmc.org/abstract/MED/33103120 ] [ CrossRef ] [ Medline ]
- Heath T, Bizer C. Linked Data Evolving the Web Into a Global Data Space. San Rafael, CA: Morgan & Claypool Publishers; 2011.
- Papadakis I, Kyprianos K, Stefanidakis M. Linked data URIs and libraries: the story so far. D Lib Mag 2015 May;21(5/6) [ http://www.dlib.org/dlib/may15/papadakis/05papadakis.html ] [ CrossRef ]
- El Emam K, Rodgers S, Malin B. Anonymising and sharing individual patient data. BMJ 2015 Mar 20;350:h1139 [ https://europepmc.org/abstract/MED/25794882 ] [ CrossRef ] [ Medline ]
- Price WN2, Cohen IG. Privacy in the age of medical big data. Nat Med 2019 Jan;25(1):37-43 [ https://europepmc.org/abstract/MED/30617331 ] [ CrossRef ] [ Medline ]
- Alharbi E, Skeva R, Juty N, Jay C, Goble C. Exploring the current practices, costs and benefits of FAIR implementation in pharmaceutical research and development: a qualitative interview study. Data Intell 2021;3(4):507-527 [ https://direct.mit.edu/dint/article/3/4/507/107429/Exploring-the-Current-Practices-Costs-and-Benefits ] [ CrossRef ]
- Treloar A. The research data alliance: globally co-ordinated action against barriers to data publishing and sharing. Learn Pub 2014 Sep 01;27(5):9-13 [ http://doi.wiley.com/10.1087/20140503 ] [ CrossRef ]
- Insani WN, Pacurariu AC, Mantel-Teeuwisse AK, Gross-Martirosyan L. Characteristics of drugs safety signals that predict safety related product information update. Pharmacoepidemiol Drug Saf 2018 Jul;27(7):789-796 [ https://europepmc.org/abstract/MED/29797381 ] [ CrossRef ] [ Medline ]
- Atay CE, Garani G. Building a lung and ovarian cancer data warehouse. Healthc Inform Res 2020 Oct;26(4):303-310 [ https://europepmc.org/abstract/MED/33190464 ] [ CrossRef ] [ Medline ]
- Inau E. Common functional user requirements for electronic registries supporting the provision of contraceptive methods. Figshare. 2021. URL: https://figshare.com/articles/thesis/Common_Functional_User_Requirements_for_Electronic_Registries_Supporting_the_Provision_of_Contraceptive_Methods/13550567 [accessed 2022-12-04]
- Mayer CS, Williams N, Huser V. Analysis of data dictionary formats of HIV clinical trials. PLoS One 2020 Oct 5;15(10):e0240047 [ https://dx.plos.org/10.1371/journal.pone.0240047 ] [ CrossRef ] [ Medline ]
- Sharma DK, Solbrig HR, Prud'hommeaux E, Pathak J, Jiang G. Standardized representation of clinical study data dictionaries with CIMI archetypes. AMIA Annu Symp Proc 2016;2016:1119-1128 [ https://europepmc.org/abstract/MED/28269909 ] [ Medline ]
- Thibault JC, Roe DR, Facelli JC, Cheatham TE3. Data model, dictionaries, and desiderata for biomolecular simulation data indexing and sharing. J Cheminform 2014 Jan 30;6(1):4 [ https://dx.doi.org/10.1186/1758-2946-6-4 ] [ CrossRef ] [ Medline ]
- Carrington A, Manuel DG, Bennett C. FAIR access to personal health information in private and public COVID-19 health applications internet. Authorea Preprint posted online Oct 28, 2020. [ https://www.authorea.com/users/341411/articles/468362-fair-access-to-personal-health-information-in-private-and-public-covid-19-health-applications?commit=1664e1e2c0a1012993351dc61b40d0feeed43087 ] [ CrossRef ]
- van Reisen M, Stokmans M, Mawere M, Basajja M, Ong'ayo AO, Nakazibwe P, et al. FAIR practices in Africa. Data Intell 2020 Jan;2(1-2):246-256 [ https://direct.mit.edu/dint/article/2/1-2/246-256/10009 ] [ CrossRef ]
- Inau E, Zeleke AA, Waltemath D. An introduction to clinical registries: types, uptake and future directions. Syst Med 2021;3:547-556 [ https://linkinghub.elsevier.com/retrieve/pii/B9780128012383116666 ] [ CrossRef ]
- Kodra Y, Posada DL, Coi A, Santoro M, Bianchi F, Ahmed F, et al. Data quality in rare diseases registries. In: Rare Diseases Epidemiology: Update and Overview. Cham, Switzerland: Springer; 2017.
- Davies J, Welch J, Milward D, Harris S. A formal, scalable approach to semantic interoperability. Sci Comput Program 2020 Jun;192:102426 [ https://linkinghub.elsevier.com/retrieve/pii/S016764232030037X ] [ CrossRef ]
- de Mello BH, Rigo SJ, da Costa CA, da Rosa Righi R, Donida B, Bez MR, et al. Semantic interoperability in health records standards: a systematic literature review. Health Technol (Berl) 2022;12(2):255-272 [ https://europepmc.org/abstract/MED/35103230 ] [ CrossRef ] [ Medline ]
- Biernacka K, Bierwirth M, Buchholz P, Dolzycka D, Helbig K, Neumann J, et al. Train-the-trainer concept on research data management. Zenodo. 2020 Nov 4. URL: https://zenodo.org/record/4071471 [accessed 2022-12-03]
- Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, et al. The FAIR guiding principles for scientific data management and stewardship. Sci Data 2016 Mar 15;3:160018 [ https://doi.org/10.1038/sdata.2016.18 ] [ CrossRef ] [ Medline ]
- Waltemath D, Inau E, Zeleke AA, Schmidt CO. How FAIR are frameworks for data quality measures in clinical research? 2020 Presented at: 65th Annual Meeting of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS), Meeting of the Central European Network (CEN: German Region, Austro-Swiss Region and Polish Region) of the International Biometric Society (IBS); Sep 6-9, 2020; Berlin URL: https://www.egms.de/en/meetings/gmds2020/20gmds127.shtml
- Cost-benefit Analysis for FAIR Research Data – Cost of Not Having FAIR Research Data. Brussels, Belgium: European Commission; 2019.
Edited by A Mavragani; submitted 13.12.22; peer-reviewed by A Winter, M Aanestad, N Howe; comments to author 28.01.23; revised version received 25.03.23; accepted 14.04.23; published 28.08.23
©Esther Thea Inau, Jean Sack, Dagmar Waltemath, Atinkut Alamirrew Zeleke. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 28.08.2023.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
- View all journals
- Explore content
- About the journal
- Publish with us
- Sign up for alerts
- Published: 23 August 2023
The complete sequence of a human Y chromosome
- Arang Rhie ORCID: orcid.org/0000-0002-9809-8127 1 na1 ,
- Sergey Nurk 1 na1 nAff51 ,
- Monika Cechova 2 , 3 na1 ,
- Savannah J. Hoyt ORCID: orcid.org/0000-0001-7804-3236 4 na1 ,
- Dylan J. Taylor ORCID: orcid.org/0000-0001-5806-4494 5 na1 ,
- Nicolas Altemose ORCID: orcid.org/0000-0002-7231-6026 6 ,
- Paul W. Hook ORCID: orcid.org/0000-0002-3912-1999 7 ,
- Sergey Koren ORCID: orcid.org/0000-0002-1472-8962 1 ,
- Mikko Rautiainen 1 ,
- Ivan A. Alexandrov 8 , 9 nAff52 ,
- Jamie Allen ORCID: orcid.org/0000-0002-8677-2225 10 ,
- Mobin Asri 11 ,
- Andrey V. Bzikadze ORCID: orcid.org/0000-0002-7928-7950 12 ,
- Nae-Chyun Chen 13 ,
- Chen-Shan Chin ORCID: orcid.org/0000-0003-4394-2455 14 , 15 ,
- Mark Diekhans ORCID: orcid.org/0000-0002-0430-0989 11 ,
- Paul Flicek 10 , 16 ,
- Giulio Formenti 17 ,
- Arkarachai Fungtammasan ORCID: orcid.org/0000-0003-2398-0358 18 ,
- Carlos Garcia Giron 10 ,
- Erik Garrison ORCID: orcid.org/0000-0003-3821-631X 19 ,
- Ariel Gershman 7 ,
- Jennifer L. Gerton ORCID: orcid.org/0000-0003-0743-3637 20 , 21 ,
- Patrick G. S. Grady 4 ,
- Andrea Guarracino ORCID: orcid.org/0000-0001-9744-131X 19 , 22 ,
- Leanne Haggerty 10 ,
- Reza Halabian 23 ,
- Nancy F. Hansen ORCID: orcid.org/0000-0002-0950-0699 1 , 24 ,
- Robert Harris 25 ,
- Gabrielle A. Hartley 4 ,
- William T. Harvey 26 ,
- Marina Haukness ORCID: orcid.org/0000-0001-9991-8089 11 ,
- Jakob Heinz 7 ,
- Thibaut Hourlier ORCID: orcid.org/0000-0003-4894-7773 10 ,
- Robert M. Hubley 27 ,
- Sarah E. Hunt ORCID: orcid.org/0000-0002-8350-1235 10 ,
- Stephen Hwang 28 ,
- Miten Jain ORCID: orcid.org/0000-0002-4571-3982 29 ,
- Rupesh K. Kesharwani 30 ,
- Alexandra P. Lewis 26 ,
- Heng Li 31 , 32 ,
- Glennis A. Logsdon ORCID: orcid.org/0000-0003-2396-0656 26 ,
- Julian K. Lucas 3 , 11 ,
- Wojciech Makalowski ORCID: orcid.org/0000-0003-2303-9541 23 ,
- Christopher Markovic 33 ,
- Fergal J. Martin ORCID: orcid.org/0000-0002-1672-050X 10 ,
- Ann M. Mc Cartney 1 ,
- Rajiv C. McCoy ORCID: orcid.org/0000-0003-0615-146X 5 ,
- Jennifer McDaniel 34 ,
- Brandy M. McNulty 3 , 11 ,
- Paul Medvedev 35 , 36 , 37 ,
- Alla Mikheenko 9 , 38 ,
- Katherine M. Munson ORCID: orcid.org/0000-0001-8413-6498 26 ,
- Terence D. Murphy ORCID: orcid.org/0000-0001-9311-9745 39 ,
- Hugh E. Olsen 3 , 11 ,
- Nathan D. Olson 34 ,
- Luis F. Paulin ORCID: orcid.org/0000-0003-2567-3773 30 ,
- David Porubsky ORCID: orcid.org/0000-0001-8414-8966 26 ,
- Tamara Potapova ORCID: orcid.org/0000-0003-2761-1795 20 ,
- Fedor Ryabov ORCID: orcid.org/0000-0001-8728-9465 40 ,
- Steven L. Salzberg ORCID: orcid.org/0000-0002-8859-7432 41 ,
- Michael E. G. Sauria 5 ,
- Fritz J. Sedlazeck ORCID: orcid.org/0000-0001-6040-2691 30 , 42 ,
- Kishwar Shafin ORCID: orcid.org/0000-0001-5252-3434 43 ,
- Valery A. Shepelev 44 ,
- Alaina Shumate 7 ,
- Jessica M. Storer 27 ,
- Likhitha Surapaneni 10 ,
- Angela M. Taravella Oill 45 ,
- Françoise Thibaud-Nissen ORCID: orcid.org/0000-0003-4957-7807 39 ,
- Winston Timp ORCID: orcid.org/0000-0003-2083-6027 7 ,
- Marta Tomaszkiewicz ORCID: orcid.org/0000-0003-1523-200X 25 , 46 ,
- Mitchell R. Vollger ORCID: orcid.org/0000-0002-8651-1615 26 ,
- Brian P. Walenz ORCID: orcid.org/0000-0001-8431-1428 1 ,
- Allison C. Watwood 25 ,
- Matthias H. Weissensteiner 25 ,
- Aaron M. Wenger 47 ,
- Melissa A. Wilson 45 ,
- Samantha Zarate ORCID: orcid.org/0000-0001-5570-2059 13 ,
- Yiming Zhu 30 ,
- Justin M. Zook ORCID: orcid.org/0000-0003-2309-8402 34 ,
- Evan E. Eichler ORCID: orcid.org/0000-0002-8246-4014 26 , 48 ,
- Rachel J. O’Neill ORCID: orcid.org/0000-0002-1525-6821 4 , 49 , 50 ,
- Michael C. Schatz ORCID: orcid.org/0000-0002-4118-4446 5 , 13 ,
- Karen H. Miga ORCID: orcid.org/0000-0002-3670-4507 3 , 11 ,
- Kateryna D. Makova 25 &
- Adam M. Phillippy ORCID: orcid.org/0000-0003-2983-8934 1
Nature ( 2023 ) Cite this article
- Genetic variation
- Genome informatics
The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications 1 , 2 , 3 . As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished 4 , 5 . Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY , DAZ and RBMY ; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome 4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.
This is a preview of subscription content, access via your institution
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
24,99 € / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
185,98 € per year
only 3,65 € per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
The T2T-CHM13v2.0 (T2T-CHM13+Y) assembly, reference analysis set, complete list of resources—including gene annotation, repeat annotation, epigenetic profiles, variant-calling results from 1KGP and SGDP, gnomAD, ClinVar, GWAS and dbSNP datasets—are available for download at https://github.com/marbl/CHM13 . The assembly is also available from NCBI and EBI with GenBank accession GCA_009914755.4 . Annotation and associated resources are also browsable as ‘hs1’ from the UCSC Genome Browser ( http://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_3671779_hs1 ), the Ensembl Genome Browser ( https://projects.ensembl.org/hprc/ ) (assembly name T2T-CHM13v2.0) and NCBI data-hub ( https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_009914755.1/ ). Potential assembly issues are listed and can be tracked at https://github.com/marbl/CHM13-issues . 1KGP and SGDP short-read alignments and variant calls are available within AnVIL at https://anvil.terra.bio/#workspaces/anvil-datastorage/AnVIL_T2T_CHRY . Original data from the Gerton lab underlying this manuscript can be accessed from the Stowers Original Data Repository at http://www.stowers.org/research/publications/libpb-2358 . Sequencing data used in this study are listed in Supplementary Table 1 .
Custom codes developed for data analysis and visualization are available at https://github.com/arangrhie/T2T-HG002Y , https://github.com/snurk/sg_sandbox and https://github.com/schatzlab/t2t-chm13-chry and are deposited with Zenodo 159 . Software and parameters used are stated in the Supplementary Methods with further details.
Skaletsky, H. et al. The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature 423 , 825–837 (2003).
Article ADS CAS PubMed Google Scholar
Miga, K. H. et al. Centromere reference models for human chromosomes X and Y satellite arrays. Genome Res. 24 , 697–707 (2014).
Article CAS PubMed PubMed Central Google Scholar
Vollger, M. R. et al. Segmental duplications and their variation in a complete human genome. Science 376 , eabj6965 (2022).
Nurk, S. et al. The complete sequence of a human genome. Science 376 , 44–53 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Schneider, V. A. et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res. 27 , 849–864 (2017).
Gustafson, M. L. & Donahoe, P. K. Male sex determination: current concepts of male sexual differentiation. Annu. Rev. Med. 45 , 505–524 (1994).
Article CAS PubMed Google Scholar
Vog, P. H. et al. Human Y chromosome azoospermia factors (AZF) mapped to different subregions in Yq11. Hum. Mol. Genet. 5 , 933–943 (1996).
Article Google Scholar
Miga, K. H. et al. Telomere-to-telomere assembly of a complete human X chromosome. Nature 585 , 79–84 (2020).
Logsdon, G. A. et al. The structure, function and evolution of a complete human chromosome 8. Nature 593 , 101–107 (2021).
Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37 , 1155–1162 (2019).
Jain, M. et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol. 36 , 338–345 (2018).
Nurk, S. et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 30 , 1291–1305 (2020).
Rautiainen, M. & Marschall, T. GraphAligner: rapid and versatile sequence-to-graph alignment. Genome Biol. 21 , 253 (2020).
Article PubMed PubMed Central Google Scholar
Formenti, G. et al. Merfin: improved variant filtering, assembly evaluation and polishing via k-mer validation. Nat. Methods 19 , 696–704 (2022).
Kirsche, M. et al. Jasmine and Iris: population-scale structural variant comparison and analysis. Nat. Methods 20 , 408–417 (2023).
Jain, C., Rhie, A., Hansen, N. F., Koren, S. & Phillippy, A. M. Long-read mapping to repetitive reference sequences using Winnowmap2. Nat. Methods 19 , 705–710 (2022).
Mc Cartney, A. M. et al. Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies. Nat. Methods 19 , 687–695 (2022).
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21 , 245 (2020).
Wang, T. et al. The Human Pangenome Project: a global resource to map genomic diversity. Nature 604 , 437–446 (2022).
Jarvis, E. D. et al. Semi-automated assembly of high-quality diploid human reference genomes. Nature 611 , 519–531 (2022).
Shumate, A. et al. Assembly and annotation of an Ashkenazi human reference genome. Genome Biol. 21 , 129 (2020).
Zook, J. M. et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci. Data 3 , 160025 (2016).
Landrum, M. J. et al. ClinVar: improvements to accessing data. Nucleic Acids Res. 48 , D835–D844 (2020).
Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42 , D1001–D1006 (2014).
Smigielski, E. M., Sirotkin, K., Ward, M. & Sherry, S. T. dbSNP: a database of single nucleotide polymorphisms. Nucleic Acids Res. 28 , 352–355 (2000).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581 , 434–443 (2020).
Byrska-Bishop, M. et al. High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell 185 , 3426–3440 (2022).
Mallick, S. et al. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538 , 201–206 (2016).
Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489 , 57–74 (2012).
Article ADS CAS Google Scholar
Ebert, P. et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372 , eabf7117 (2021).
Sanders, A. D. et al. Single-cell analysis of structural variations and complex rearrangements with tri-channel processing. Nat. Biotechnol. 38 , 343–354 (2020).
Hallast, P. et al. Assembly of 43 human Y chromosomes reveals extensive complexity and variation. Nature https://doi.org/10.1038/s41586-023-06425-6 (2023).
Hammer, M. F. et al. Extended Y chromosome haplotypes resolve multiple and unique lineages of the Jewish priesthood. Hum. Genet. 126 , 707 (2009).
Poznik, G. D. et al. Punctuated bursts in human male demography inferred from 1,244 worldwide Y-chromosome sequences. Nat. Genet. 48 , 593–599 (2016).
Vegesna, R., Tomaszkiewicz, M., Medvedev, P. & Makova, K. D. Dosage regulation, and variation in gene expression and copy number of human Y chromosome ampliconic genes. PLoS Genet. 15 , e1008369 (2019).
NCBI RefSeq v110 Browser. Homo sapiens isolate NA24385 chromosome Y, alternate assembly T2T-CHM13v2.0. https://tinyurl.com/bdfudexn (2022).
Hoyt, S. J. et al. From telomere to telomere: the transcriptional and epigenetic state of human repeat elements. Science 376 , eabk3112 (2022).
Warburton, P. E. et al. Analysis of the largest tandemly repeated DNA families in the human genome. BMC Genomics 9 , 533 (2008).
Halabian, R. & Makałowski, W. A map of 3′ DNA transduction variants mediated by non-LTR retroelements on 3202 human genomes. Biology 11 , 1032 (2022).
Weissensteiner, M. H. et al. Accurate sequencing of DNA motifs able to form alternative (non-B) structures. Genome Res. 33 , 907-922 (2023).
Tyler-Smith, C., Taylor, L. & Müller, U. Structure of a hypervariable tandemly repeated DNA sequence on the short arm of the human Y chromosome. J. Mol. Biol. 203 , 837–848 (1988).
Xue, Y. & Tyler-Smith, C. An exceptional gene: evolution of the TSPY gene family in humans and other great apes. Genes 2 , 36–47 (2011).
Saxena, R. et al. Four DAZ genes in two clusters found in the AZFc region of the human Y chromosome. Genomics 67 , 256–267 (2000).
Altemose, N. et al. Complete genomic and epigenetic maps of human centromeres. Science 376 , eabl4178 (2022).
Jain, M. et al. Linear assembly of a human centromere on the Y chromosome. Nat. Biotechnol. 36 , 321–323 (2018).
Gershman, A. et al. Epigenetic patterns in a complete human genome. Science 376 , eabj5089 (2022).
Kasinathan, S. & Henikoff, S. Non-B-form DNA is enriched at centromeres. Mol. Biol. Evol. 35 , 949–962 (2018).
Nailwal, M. & Chauhan, J. B. Azoospermia factor C subregion of the Y chromosome. J. Hum. Reprod. Sci. 10 , 256 (2017).
Kuroda-Kawaguchi, T. et al. The AZFc region of the Y chromosome features massive palindromes and uniform recurrent deletions in infertile men. Nat. Genet. 29 , 279–286 (2001).
Repping, S. et al. A family of human Y chromosomes has dispersed throughout northern Eurasia despite a 1.8-Mb deletion in the azoospermia factor c region. Genomics 83 , 1046–1052 (2004).
Porubsky, D. et al. Recurrent inversion polymorphisms in humans associate with genetic instability and genomic disorders. Cell 185 , 1986–2005 (2022).
Teitz, L. S., Pyntikova, T., Skaletsky, H. & Page, D. C. Selection has countered high mutability to preserve the ancestral copy number of Y chromosome amplicons in diverse human lineages. Am. J. Hum. Genet. 103 , 261–275 (2018).
Jobling, M. A. Copy number variation on the human Y chromosome. Cytogenet. Genome Res. 123 , 253–262 (2008).
Navarro-Costa, P., Plancha, C. E. & Gonçalves, J. Genetic dissection of the AZF regions of the human Y chromosome: thriller or filler for male (in)fertility? Biomed Res. Int. 2010 , e936569 (2010).
Evans, H. J., Gosden, J. R., Mitchell, A. R. & Buckland, R. A. Location of human satellite DNAs on the Y chromosome. Nature 251 , 346–347 (1974).
Schmid, M., Guttenbach, M., Nanda, I., Studer, R. & Epplen, J. T. Organization of DYZ2 repetitive DNA on the human Y chromosome. Genomics 6 , 212–218 (1990).
Manz, E., Alkan, M., Bühler, E. & Schmidtke, J. Arrangement of DYZ1 and DYZ2 repeats on the human Y-chromosome: a case with presence of DYZ1 and absence of DYZ2. Mol. Cell. Probes 6 , 257–259 (1992).
Altemose, N. A classical revival: human satellite DNAs enter the genomics era. Semin. Cell Dev. Biol. 128 , 2–14 (2022).
Gripenberg, U. Size variation and orientation of the human Y chromosome. Chromosoma 15 , 618–629 (1964).
Mathias, N., Bayés, M. & Tyler-Smith, C. Highly informative compound haplotypes for the human Y chromosome. Hum. Mol. Genet. 3 , 115–123 (1994).
Altemose, N., Miga, K. H., Maggioni, M. & Willard, H. F. Genomic characterization of large heterochromatic gaps in the human genome assembly. PLoS Comput. Biol. 10 , e1003628 (2014).
Article ADS PubMed PubMed Central Google Scholar
Cooke, H. Repeated sequence specific to human males. Nature 262 , 182–186 (1976).
Frommer, M., Prosser, J. & Vincent, P. C. Human satellite I sequences include a male specific 2.47 kb tandemly repeated unit containing one Alu family member per repeat. Nucleic Acids Res. 12 , 2887–2900 (1984).
Babcock, M., Yatsenko, S., Stankiewicz, P., Lupski, J. R. & Morrow, B. E. AT-rich repeats associated with chromosome 22q11.2 rearrangement disorders shape human genome architecture on Yq12. Genome Res. 17 , 451–460 (2007).
Webster, T. H. et al. Identifying, understanding, and correcting technical artifacts on the sex chromosomes in next-generation sequencing data. GigaScience 8 , giz074 (2019).
Aganezov, S. et al. A complete reference genome improves analysis of human genetic variation. Science 376 , eabl3533 (2022).
Bekritsky, M. A., Colombo, C. & Eberle, M. A. Identifying genomic regions with high quality single nucleotide variant calling. Illumina https://www.illumina.com/content/illumina-marketing/amr/en_US/science/genomics-research/articles/identifying-genomic-regions-with-high-quality-single-nucleotide-.html (2023).
Breitwieser, F. P., Pertea, M., Zimin, A. V. & Salzberg, S. L. Human contamination in bacterial genomes has created thousands of spurious proteins. Genome Res. 29 , 954–960 (2019).
Steinegger, M. & Salzberg, S. L. Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank. Genome Biol. 21 , 115 (2020).
Chrisman, B. et al. The human “contaminome”: bacterial, viral, and computational contamination in whole genome sequences from 1000 families. Sci. Rep. 12 , 9863 (2022).
Kent, W. J. et al. The Human Genome Browser at UCSC. Genome Res. 12 , 996–1006 (2002).
Rautiainen, M. et al. Telomere-to-telomere assembly of diploid chromosomes with Verkko. Nat. Biotechnol. https://doi.org/10.1038/s41587-023-01662-6 (2023).
Liao, W.-W. et al. A draft human pangenome reference. Nature 617 , 312–324 (2023).
Jiang, Z., Hubley, R., Smit, A. & Eichler, E. E. DupMasker: a tool for annotating primate segmental duplications. Genome Res. 18 , 1362–1368 (2008).
Vollger, M. R., Kerpedjiev, P., Phillippy, A. M. & Eichler, E. E. StainedGlass: interactive visualization of massive tandem repeat structures with identity heatmaps. Bioinformatics 38 , 2049–2051 (2022).
Skene, P. J. & Henikoff, S. An efficient targeted nuclease strategy for high-resolution mapping of DNA binding sites. eLife 6 , e21856 (2017).
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29 , 24–26 (2011).
Shafin, K. et al. Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes. Nat. Biotechnol. 38 , 1044–1053 (2020).
Koren, S. et al. De novo assembly of haplotype-resolved genomes with trio binning. Nat. Biotechnol. 36 , 1174–1182 (2018).
Article CAS Google Scholar
Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37 , 540–546 (2019).
Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36 , 983–987 (2018).
Shafin, K. et al. Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads. Nat. Methods 18 , 1322–1332 (2021).
Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single molecule sequencing. Nat. Methods 15 , 461–468 (2018).
Jiang, T. et al. Long-read-based human genomic structural variation detection with cuteSV. Genome Biol. 21 , 189 (2020).
Bzikadze, A. V., Mikheenko, A. & Pevzner, P. A. Fast and accurate mapping of long reads to complete genome assemblies with VerityMap. Genome Res. 32 , 2107–2118 (2022).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25 , 1754–1760 (2009).
Porubsky, D. et al. breakpointR: an R/Bioconductor package to localize strand state changes in Strand-seq data. Bioinformatics 36 , 1260–1261 (2020).
PacBio Revio WGS Dataset. Homo sapiens – GIAB trio HG002-4. https://downloads.pacbcloud.com/public/revio/2022Q4/ (2022).
Poznik, D. yhaplo | Identifying Y-chromosome haplogroups. GitHub https://github.com/23andMe/yhaplo (2022).
Tseng, B. et al. Y-SNP Haplogroup Hierarchy Finder: a web tool for Y-SNP haplogroup assignment. J. Hum. Genet. 67 , 487–493 (2022).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34 , 3094–3100 (2018).
Li, H. Identifying centromeric satellites with dna-brnn. Bioinformatics 35 , 4408–4410 (2019).
Harris, R. S. Improved Pairwise Alignmnet of Genomic DNA (Pennsylvania State Univ., 2007).
Morgulis, A., Gertz, E. M., Schäffer, A. A. & Agarwala, R. WindowMasker: window-based masker for sequenced genomes. Bioinformatics 22 , 134–141 (2006).
Chin, C.-S. et al. Multiscale analysis of pangenomes enables improved representation of genomic diversity for repetitive and clinically relevant genes. Nat. Methods https://doi.org/10.1038/s41592-023-01914-y (2023).
Frankish, A. et al. GENCODE 2021. Nucleic Acids Res. 49 , D916–D923 (2021).
Armstrong, J. et al. Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature 587 , 246–251 (2020).
Kovaka, S. et al. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20 , 278 (2019).
Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24 , 637–644 (2008).
Fiddes, I. T. et al. Comparative Annotation Toolkit (CAT)—simultaneous clade and personal genome annotation. Genome Res. 28 , 1029–1038 (2018).
Shumate, A. & Salzberg, S. L. Liftoff: accurate mapping of gene annotations. Bioinformatics 37 , 1639–1643 (2021).
Dale, R. K., Pedersen, B. S. & Quinlan, A. R. Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinformatics 27 , 3423–3424 (2011).
Rhie, A. et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature 592 , 737–746 (2021).
Pruitt, K. D. et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 42 , D756–D763 (2014).
Kapustin, Y., Souvorov, A., Tatusova, T. & Lipman, D. Splign: algorithms for computing spliced alignments with identification of paralogs. Biol. Direct 3 , 20 (2008).
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 30 , 772-80 (2013).
Slater, G. S. C. & Birney, E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6 , 31 (2005).
Zook, J. M. et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol. 32 , 246–251 (2014).
Numanagić, I. et al. Fast characterization of segmental duplications in genome assemblies. Bioinformatics 34 , i706–i714 (2018).
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27 , 573–580 (1999).
Arian, F. A. S., Hubley, R. & Green, P. RepeatMasker Open-4.0 2013-2015. http://www.repeatmasker.org (2015).
Storer, J., Hubley, R., Rosen, J., Wheeler, T. J. & Smit, A. F. The Dfam community resource of transposable element families, sequence models, and genome annotations. Mob. DNA 12 , 2 (2021).
Olson, D. & Wheeler, T. ULTRA: a model based tool to detect tandem repeats. ACM BCB 2018 , 37–46 (2018)
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26 , 841–842 (2010).
Storer, J. M., Hubley, R., Rosen, J. & Smit, A. F. A. Curation guidelines for de novo generated transposable element families. Curr. Protoc. 1 , e154 (2021).
Kent, W. J. BLAT—the BLAST-like alignment tool. Genome Res. 12 , 656–664 (2002).
CAS PubMed PubMed Central Google Scholar
Szak, S. T. et al. Molecular archeology of L1 insertions in the human genome. Genome Biol. 3 , research0052.1 (2002).
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215 , 403–410 (1990).
Cer, R. Z. et al. Searching for non-B DNA-forming motifs using nBMST (non-B DNA motif search tool). Curr. Protoc. Hum. Genet. 73 , 18.7.1–18.7.22 (2012).
Zou, X. et al. Short inverted repeats contribute to localized mutability in human somatic cells. Nucleic Acids Res. 45 , 11213–11221 (2017).
Svetec Miklenić, M. et al. Size-dependent antirecombinogenic effect of short spacers on palindrome recombinogenicity. DNA Repair 90 , 102848 (2020).
Article PubMed Google Scholar
Sahakyan, A. B. et al. Machine learning model for sequence-driven DNA G-quadruplex formation. Sci. Rep. 7 , 14535 (2017).
Hao, Z. et al. RIdeogram: drawing SVG graphics to visualize and map genome-wide data on the idiograms. PeerJ Comput. Sci. 6 , e251 (2020).
Dotmatics. GraphPad Prism v.9.1.0 for Windows. https://www.graphpad.com (16 March 2021).
Vollger, M. R. SafFire. GitHub https://github.com/mrvollger/SafFire (2022).
Pendleton, A. L. et al. Comparison of village dog and wolf genomes highlights the role of the neural crest in dog domestication. BMC Biol. 16 , 64 (2018).
Hach, F. et al. mrsFAST: a cache-oblivious algorithm for short-read mapping. Nat. Methods 7 , 576–577 (2010).
Escalona, M. et al. Whole-genome sequence and assembly of the Javan gibbon ( Hylobates moloch ). J. Hered. 114 , 35–43 (2023).
Cortez, D. et al. Origins and functional evolution of Y chromosomes across mammals. Nature 508 , 488–493 (2014).
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30 , 1312–1313 (2014).
Dotmatics. Geneious v2019.2.3. https://www.geneious.com/ (2019).
Rambaut et al. FigTree v1.4.4. http://tree.bio.ed.ac.uk/software/figtree/ (2018).
Tyler-Smith, C. & Brown, W. R. A. Structure of the major block of alphoid satellite DNA on the human Y chromosome. J. Mol. Biol. 195 , 457–470 (1987).
Shepelev, V. A. et al. Annotation of suprachromosomal families reveals uncommon types of alpha satellite organization in pericentromeric regions of hg38 human genome assembly. Genomics Data 5 , 139–146 (2015).
Lee, I. et al. Simultaneous profiling of chromatin accessibility and methylation on human cell lines with nanopore sequencing. Nat. Methods 17 , 1191–1199 (2020).
Krumsiek, J., Arnold, R. & Rattei, T. Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics 23 , 1026–1028 (2007).
Rice, P., Longden, I. & Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 16 , 276–277 (2000).
Sun, C. et al. Deletion of azoospermia factor a (AZFa) region of human Y chromosome caused by recombination between HERV15 proviruses. Hum. Mol. Genet. 9 , 2291–2296 (2000).
Lassmann, T. Kalign 3: multiple sequence alignment of large datasets. Bioinformatics 36 , 1928–1929 (2020).
Wheeler, T. J. & Eddy, S. R. nhmmer: DNA homology search with profile HMMs. Bioinformatics 29 , 2487–2489 (2013).
Stephens, Z. D. et al. Simulating next-generation sequencing datasets from empirical mutation and sequencing models. PLoS ONE 11 , e0167047 (2016).
Bushnell, B. BBMap: a fast, accurate, splice-aware aligner. OSTI.gov https://www.osti.gov/biblio/1241166 (2017).
Aken, B. L. et al. Ensembl 2017. Nucleic Acids Res. 45 , D635–D642 (2017).
Poznik, G. D. et al. Sequencing Y chromosomes resolves discrepancy in time to common ancestor of males versus females. Science 341 , 562–565 (2013).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20 , 1297–1303 (2010).
Schatz, M. C. et al. Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space. Cell Genomics 2 , 100085 (2022).
Danecek, P. et al. Twelve years of SAMtools and BCFtools. GigaScience 10 , giab008 (2021).
Talenti, A. & Prendergast, J. nf-LO: a scalable, containerized workflow for genome-to-genome lift over. Genome Biol. Evol. 13 , evab183 (2021).
Guarracino, A., Mwaniki, N., Marco-Sola, S., & Garrison, E. wfmash: whole-chromosome pairwise alignment using the hierarchical wavefront algorithm. GitHub https://github.com/ekg/wfmash (2021).
Sherry, S. T., Ward, M. & Sirotkin, K. dbSNP—database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome Res. 9 , 677–679 (1999).
Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46 , D1062–D1067 (2018).
Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47 , D1005–D1012 (2019).
Van der Auwera G. A. & O’Connor B. D. Genomics in the Cloud: Using Docker, GATK, and WDL in Terra (O’Reilly Media, 2020).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9 , 357–359 (2012).
Ramírez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44 , W160–W165 (2016).
Zhao, H. et al. CrossMap: a versatile tool for coordinate conversion between genome assemblies. Bioinformatics 30 , 1006–1007 (2014).
Marçais, G. et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput. Biol. 14 , e1005944 (2018).
Ondov, B. D., Bergman, N. H. & Phillippy, A. M. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics 12 , 385 (2011).
Rhie, A. Repositories for the analysis of T2T-Y and T2T-CHM13v2.0. Zenodo https://doi.org/10.5281/zenodo.8136598 (2023).
Falconer, E. et al. DNA template strand sequencing of single-cells maps genomic rearrangements at high resolution. Nat. Methods 9 , 1107–1112 (2012).
We thank P. Hallast, M. C. Loftus, M. K. Konkel, P. Ebert, T. Marschall and C. Lee for coordination and discussions, J.C.-I. Lee for sharing the GRCh38-Y coordinates used in Y-Finder and members of the Telomere-to-Telomere consortium and HPRC for constructive feedback. This work utilized the computational resources of the National Institutes of Health (NIH) HPC Biowulf cluster ( https://hpc.nih.gov ). Computational resources were partially provided by the e-INFRA CZ project (no. 90140), supported by the Ministry of Education, Youth and Sports of the Czech Republic and Computational Biology Core, Institute for Systems Genomics, University of Connecticut. Certain commercial equipment, instruments and materials are identified to specify adequately experimental conditions or reported results. Such identification does not imply recommendation or endorsement by the NIST, nor does it imply that the equipment, instruments or materials identified are necessarily the best available for the purpose. We thank the Intramural Research Program of NHGRI, NIH no. HG200398 (A.R., S.N., S.K., M.R., A.M.M., B.P.W. and A.M.P.); NIH no. GM123312 (S.J.H., P.G.S.G., G.A.H. and R.J.O.); NIH no. GM130691 (P.M., M.H.W. and K.D.M.); HHMI Hanna Gray Fellowship (N.A.); NIH no. CA266339 (J.G. and T.P.); NIH no. GM147352 (G.A.L.); NIH nos. HG002939 and HG010136 (R.M.H. and J.M.S.); NIH no. HG009190 (P.W.H., A. Gershman and W.T.); NIH nos. HG010263, HG006620 and CA253481 and NSF no. DBI-1627442 (M.C.S.); NIH no. GM136684 (K.D.M.); NIH nos. HG011274 and HG010548 (K.H.M.); NIH nos. HG010961 and HG010040 (H.L.); NIH no. HG007234 (M.D.); NIH no. HG011758 (F.J.S.); NIH no. DA047638 (E.G.); NIH no. GM124827 (M.A.W.); NIH no. GM133747 (R.C.M.); NIH no. CA240199 (R.J.O.); NIH nos. HG002385, HG010169 and HG010971 (E.E.E.); Stowers Institute for Medical Research (J.L.G. and T.P.); National Center for Biotechnology Information of the National Library of Medicine, NIH (F.T.-N. and T.D.M.); intramural funding at NIST (J.M.Z.); NIST no. 70NANB20H206 (M.J.); and NIH nos. HG010972 and WT222155/Z/20/Z and the European Molecular Biology Laboratory (J.A., P.F., C.G.G., L.H., T.H., S.E.H., F.J.M. and L.S.). RNA generation was supported by NIST no. 70NANB21H101 and NIH no. 1S10OD028587; the Ministry of Science and Higher Education of the Russian Federation, St. Petersburg State University, no. PURE 73023672 (I.A.A.); the Computation, Bioinformatics, and Statistics Predoctoral Training Program awarded to Penn State by the NIH (A.C.W.); and Achievement Rewards for College Scientists Foundation, The Graduate College at Arizona State University (A.M.T.O.). E.E.E. is an investigator for HHMI.
Present address: Oxford Nanopore Technologies Inc., Oxford, UK
Ivan A. Alexandrov
Present address: Department of Anatomy and Anthropology and Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv-Yafo, Israel
These authors contributed equally: Arang Rhie, Sergey Nurk, Monika Cechova, Savannah J. Hoyt, Dylan J. Taylor
Authors and Affiliations
Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
Arang Rhie, Sergey Nurk, Sergey Koren, Mikko Rautiainen, Nancy F. Hansen, Ann M. Mc Cartney, Brian P. Walenz & Adam M. Phillippy
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
Monika Cechova, Julian K. Lucas, Brandy M. McNulty, Hugh E. Olsen & Karen H. Miga
Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
Savannah J. Hoyt, Patrick G. S. Grady, Gabrielle A. Hartley & Rachel J. O’Neill
Department of Biology, Johns Hopkins University, Baltimore, MD, USA
Dylan J. Taylor, Rajiv C. McCoy, Michael E. G. Sauria & Michael C. Schatz
Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
Paul W. Hook, Ariel Gershman, Jakob Heinz, Alaina Shumate & Winston Timp
Federal Research Center of Biotechnology of the Russian Academy of Sciences, Moscow, Russia
Center for Algorithmic Biotechnology, Saint Petersburg State University, St Petersburg, Russia
Ivan A. Alexandrov & Alla Mikheenko
European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
Jamie Allen, Paul Flicek, Carlos Garcia Giron, Leanne Haggerty, Thibaut Hourlier, Sarah E. Hunt, Fergal J. Martin & Likhitha Surapaneni
UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
Mobin Asri, Mark Diekhans, Marina Haukness, Julian K. Lucas, Brandy M. McNulty, Hugh E. Olsen & Karen H. Miga
Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, CA, USA
Andrey V. Bzikadze
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
Nae-Chyun Chen, Samantha Zarate & Michael C. Schatz
GeneDX Holdings Corp, Stamford, CT, USA
Foundation of Biological Data Science, Belmont, CA, USA
Department of Genetics, University of Cambridge, Cambridge, UK
The Rockefeller University, New York, NY, USA
DNAnexus, Inc., Mountain View, CA, USA
Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
Erik Garrison & Andrea Guarracino
Stowers Institute for Medical Research, Kansas City, MO, USA
Jennifer L. Gerton & Tamara Potapova
University of Kansas Medical Center, Kansas City, MO, USA
Jennifer L. Gerton
Genomics Research Centre, Human Technopole, Milan, Italy
Institute of Bioinformatics, Faculty of Medicine, University of Münster, Münster, Germany
Reza Halabian & Wojciech Makalowski
Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
Nancy F. Hansen
Department of Biology, Pennsylvania State University, University Park, PA, USA
Robert Harris, Marta Tomaszkiewicz, Allison C. Watwood, Matthias H. Weissensteiner & Kateryna D. Makova
Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
William T. Harvey, Alexandra P. Lewis, Glennis A. Logsdon, Katherine M. Munson, David Porubsky, Mitchell R. Vollger & Evan E. Eichler
Institute for Systems Biology, Seattle, WA, USA
Robert M. Hubley & Jessica M. Storer
XDBio Program, Johns Hopkins University, Baltimore, MD, USA
Department of Bioengineering, Department of Physics, Northeastern University, Boston, MA, USA
Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
Rupesh K. Kesharwani, Luis F. Paulin, Fritz J. Sedlazeck & Yiming Zhu
Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Genome Technology Access Center at the McDonnell Genome Institute, Washington University, St. Louis, MO, USA
Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
Jennifer McDaniel, Nathan D. Olson & Justin M. Zook
Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, USA
Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA
Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park, PA, USA
UCL Queen Square Institute of Neurology, UCL, London, UK
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
Terence D. Murphy & Françoise Thibaud-Nissen
Masters Program in National Research University Higher School of Economics, Moscow, Russia
Departments of Biomedical Engineering, Computer Science, and Biostatistics, Johns Hopkins University, Baltimore, MD, USA
Steven L. Salzberg
Department of Computer Science, Rice University, Houston, TX, USA
Fritz J. Sedlazeck
Google Inc., Mountain View, CA, USA
Institute of Molecular Genetics, Moscow, Russia
Valery A. Shepelev
Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA
Angela M. Taravella Oill & Melissa A. Wilson
Department of Biomedical Engineering, Pennsylvania State University, State College, PA, USA
Pacific Biosciences, Menlo Park, CA, USA
Aaron M. Wenger
Investigator, Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
Evan E. Eichler
Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
Rachel J. O’Neill
Department of Genetics and Genome Sciences, UConn Health, Farmington, CT, USA
You can also search for this author in PubMed Google Scholar
V.A.S. is retired from the Institute of Molecular Genetics. Assembly was carried out by S.N., S.K. and M.R. Validation was performed by A.R., S.K., M.A., A.V.B., G.F., A.F., A.M.M., J.M., A.M., L.F.P., D.P., F.J.S., K.S., P.M., J.M.Z. and K.D.M. ChrY haplogroups were determined by A.R. and A.C.W. Alignment was done by C.-S.C., M.D., R. Harris, M.R.V. and K.D.M. Satellite annotation was performed by N.A., I.A.A., G.A.L., F.R., V.A.S. and K.H.M. N.A., J.G. and T.P. carried out FISH. Repeat annotation was done by S.J.H., P.G.S.G., G.A.H., R.M.H., J.M.S. and R.J.O. Retro-elements were dealt with by R. Halabian and W.M. Non-B DNA was dealt with by M.H.W. and K.D.M. Gene annotation was undertaken by A.R., M.D., P.F., C.G.G., L.H., M.H., J.H., T.H., F.J.M., T.D.M., S.L.S., A.S. and F.T.-N. A.R., R. Harris, W.T.H., P.M., M.T. and K.D.M. dealt with ampliconic genes. Structural annotation was performed by A.R., M.C., H.L., P.M. and K.D.M. Epigenetic analysis was performed by A.R., P.W.H., A. Gershman, W.T. and A.M.W. Mappability was performed by A.M.T.O., M.A.W. and J.M.Z. Non-B DNA was dealt with by M.H.W. and K.D.M. Variants and liftover were carried out by A.R., D.J.T., S.K., J.A., N.-C.C., M.D., E.G., A. Guarracino, N.F.H., W.T.H., S.E.H., S.H., R.C.M., N.D.O., M.E.G.S., L.S., M.R.V., S.Z., J.M.Z., E.E.E. and A.M.P. A.R., S.L.S., B.P.W. and A.M.P. dealt with contamination. Data generation was carried out by M.J., R.K.K., A.P.L., J.K.L., C.M., B.M.M., K.M.M., H.E.O., F.J.S. and Y.Z. Data management was undertaken by A.R., M.D., M.J. and J.K.L. Computational resources were sourced by R.J.O., M.C.S. and A.M.P. A.R., S.N., M.C., S.J.H., D.J.T., N.A., I.A.A., N.-C.C., E.G., J.G., P.G.S.G., A. Guarracino, R. Halabian, W.M., J.M., T.P., F.R., S.L.S., J.M.S., A.M.T.O., A.C.W., M.A.W., S.Z., J.M.Z., E.E.E., R.J.O., M.C.S., K.H.M., K.D.M. and A.M.P. wrote the manuscript draft. A.R. and A.M.P. edited the manuscript, with the assistance of all authors. J.M.Z., E.E.E., R.J.O., M.C.S., K.H.M., K.D.M. and A.M.P. supervised the research. Conceptualization was the responsibility of A.R., S.N., M.C., E.E.E., K.H.M., K.D.M. and A.M.P.
Correspondence to Adam M. Phillippy .
S.N. is now an employee of ONT. S.K. has received travel funding for speaking at events hosted by ONT. A.F. is an employee of DNAnexus. C.-S.C. is an employee of GeneDX Holdings Corp. N.-C.C. is an employee of Exai Bio. L.F.P. receives research support from Genetech. F.J.S. receives research support from Pacific Biosciences, ONT, Illumina and Genetech. K.S. is an employee of Google LLC and owns Alphabet stock as part of the standard compensation package. W.T. has two patents (nos. 8,748,091 and 8,394,584) licensed to ONT. E.E.E. is a scientific advisory board member of Variant Bio, Inc. The remaining authors declare no competing interests.
Peer review information.
Nature thanks John Lovell, Mikkel Heide Schierup and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended data fig. 1 assembling the x and y chromosomes of hg002..
a . Chromosome X and Y components of the assembly string graph built from HiFi reads, detected based on node sequence alignments to T2T-CHM13 and GRCh38 references. Each node is colored according to the excess of paternal-specific (blue) and maternal-specific (red) k-mers, obtained from parental Illumina reads, indicating if they exclusively belong to chromosome Y or X, respectively. Most complicated tangles are localized within the heterochromatic satellite region on the Y q-arm. The X and Y subgraphs are connected in PAR1 and PAR2. Graph discontinuities are due to a lack of HiFi sequence coverage in these regions caused by contextual sequencing bias, with 9 out of 11 observed breaks falling within PAR1 on either chromosome (5 out of 5 for chromosome Y). Note that for visualization purposes the length of shorter nodes is artificially increased making the extent of the tangles appear larger than reality. b . The effects of manual pruning and semi-automated ONT read integration is illustrated from top to bottom. Top, zoomed in view of a tangle encoding the P1–P3 palindromic region in Y (approx. 22.86–27.08 Mb, see Fig. 4 ). Middle, corresponding subgraph following the manual pruning and recompaction. Nodes excluded from the curated “single-copy” list for automated ONT-based repeat resolution are shown in yellow. Three hairpin structures are highlighted, which form almost-perfect inverted tandem repeats encompassing the entire P3 and two P2 (red) palindromes. Node outlines in the palindromes are colored according to the palindromic arms as in Fig. 4 . Bottom, corresponding subgraph following the repeat resolution using ONT read-to-graph alignments. Remaining ambiguities were resolved by evaluating ONT read alignments to all candidate reconstructions of the corresponding sub-regions. c . PAR1 subgraph labeled with HiFi read coverage on each node. Gaps (green edges) and uneven node coverage estimates indicate biases in HiFi sequencing across the region. Figure 1 shows an enrichment of SINE repeats and non-B DNA motifs in PAR1 that may underlie the sequencing gaps in this region.
Extended Data Fig. 2 Validation and polishing of the T2T-Y.
a . Evaluation and polishing workflow performed on T2T-CHM13v1.1 autosomes + HG002 XY assemblies. b . Venn diagram of the k-mers from the parents and child. On the left, hap-mers 18 represent haplotype specific k-mers inherited by the child. The darker outlined circle inside the child k-mers represent single-copy k-mers (k-mers occurring once in the assembly and single-copy in the child’s genome). Right figure shows an example of the paternal specific, “single-copy” and “marker” k-mers. The marker set includes both multi-copy and single-copy k-mers specific to the paternal haplotype that were inherited by the child. Unlike polishing the nearly haploid CHM13 assembly 17 , both single-copy k-mers and marker k-mers were used for the marker-assisted alignments to HG002 XY. This helped align more reads within repetitive regions to the correct chromosome for evaluation during polishing. Right panel shows counts of the k-mers and coverage of HiFi and ONT reads using the marker-assisted Winnowmap2 alignment, in addition to alignments from VerityMap, which uses locally unique k-mers for anchoring the reads. c . Aggregated Strand-seq coverage profile across all 65 libraries on GRCh38-Y (top) and T2T-Y (bottom). Each bar represents read counts in every 20 kb bin supporting the reference in forward direction (light green) or reverse direction (dark green). Multiple spikes in reverse direction (black asterisks) in GRCh38-Y indicate inversion polymorphisms relative to HG002, likely due to differences between the haplogroups. Such spikes in coverage are not observed on T2T-X and T2T-Y, which confirm the structural and directional accuracy of the HG002 assemblies. A 3 kb inversion of the unique sequence between the P5 palindromic arms was identified as erroneous in T2T-Y (red asterisk), but was confirmed to be polymorphic in the population and left uncorrected in this version of the assembly.
Extended Data Fig. 3 Large structural differences between T2T-Y and previous GRCh Y assemblies.
a - b . Ampliconic genes and X-degenerate sequences revealed from alignments between GRCh38-Y (Y-axis) and T2T-Y (X-axis). a . Dotplot generated using LastZ 93 after softmasking with WindowMasker 94 . b . Identity was computed from matches and mismatches over positions with alignments, excluding gaps. c . Structural differences revealed using PRG-TK 95 against GRCh38-Y and GRCh37-Y in the euchromatic region of the Y chromosome.
Extended Data Fig. 4 Repeat discovery and annotation of T2T-Y.
a . Assembly completion allowed for a full assessment of repeats and resulted in the identification of previously unknown satellite arrays (predominantly in the PAR1) and subunit repeats that fall within one of three composite repeat units ( TSPY , RBMY , DAZ ). b . Ideogram of TE density (per 100 kb bin). This is an extension of Fig. 1 with non-SINEs expanded into separate TE classes (SVA, LTR, LINE, DNA/RC). Density scale ranges from low (white, zero) to high (black, relative to total density) and sequence classes are denoted by color. c . Summary (in terms of base coverage per region) across all five TE classes and two specific families: Alu /SINE and L1/LINE. The satellites in ( b ) were kept separate as two categories; Cen/Sat as the left satellite block including alpha satellites and DYZ19, while all other categories were combined per sequence classes.
Extended Data Fig. 5 Non-B DNA motifs along the T2T-Y.
HSat3 on the Yq and satellite sequences around the centromere are more enriched with A-phased repeats, direct repeats and STRs, while HSat1B is more enriched with inverted repeats and mirror repeats. Enrichment of non-B DNA sequences were also observed in the PAR region. Notably, the TSPY gene array is enriched for G4 and Z-DNA motifs, as shown in Extended Data Fig. 6b .
Extended Data Fig. 6 Phylogenetic tree analysis of the ampliconic TSPY gene family and pattern of non-B DNA structure.
a . Phylogenetic tree analysis using protein-coding TSPY s from a Sumatran Orangutan ( Pongo abelii ) and a Silvery gibbon ( Hylobates moloch ) as outgroups confirmed TSPY2 (distal to the array) and TSPY copies within the array originated from the same branch, distinguished from the rest of the TSPY pseudogenes. Rectangular inset shows a cartoon representation of the simplified tree. Numbers next to the triangles indicate the number of TSPY genes in the same branch. b . G4 and Z-DNA structures predicted for a typical TSPY copy inside the TSPY array. All TSPY copies in the array have the same signature, with one G4 peak present ~500 bases upstream of the TSPY (arrow). Higher Quadron score 122 (Q-score) indicates a more stable G4 structure, with scores over 19 considered stable (dotted line).
Extended Data Fig. 7 Recurrent inversions identified with Strand-seq.
a . Five out of 15 individuals have the inverted variant as present in HG002 at the P3 palindrome (white arrow). Although inversions across P1–P2 (yellow and red arrows) are difficult to confirm with Strand-seq because of the high sequence similarity between the palindromic arms, different orientations are observable in these samples. b . Strand states for 65 Strand-seq libraries of HG002. Depending on the mappings of directional Strand-seq reads (+ reads: ‘Crick’, C, – reads: ‘Watson’, W), reference sequence was assigned in three states: WC, WW, and CC. WC, roughly equal mixture of plus and minus reads; WW, all reads mapped in minus orientation; CC, all reads mapped in plus orientation. Changes in strand state along a single chromosome are normally caused by a double-strand-break (DSBs) that occurred during DNA replication 160 in a random fashion and we refer to them as sister-chromatid-exchanges (SCEs, yellow thunderbolts). Recurrent change in strand state over the same region in multiple Strand-seq cells indicates misassembly. Similarly, collapsed or incomplete assembly of a certain genomic region will result in a recurrent strand state change as observed for GRCh38-Y (black arrowheads). In contrast, T2T-Y shows strand state changes randomly distributed along each Strand-seq library with no evidence of misassembly or collapse. c . Strand-seq profile of selected libraries over T2T-Y summarized in bins (bin size: 500 kb, step size: 50 kb). Teal, Crick read counts; orange, Watson read counts. As ChrY is haploid, reads are expected to map only in Watson or Crick orientation. Light gray rectangles highlight regions where SCEs were detected in the heterochromatic Yq12 despite a lower coverage of Strand-seq reads. A modified breakpointR parameter was used (windowsize = 500000 minReads = 20) in order to refine detected SCEs presented in panel b and c .
Extended Data Fig. 8 Satellite annotation and recent expansion events in the Yq heterochromatin.
a . A plot showing the top repeat periodicities detected by NTRprism 44 in 50 kb blocks tiled across T2T-Y, with centromeric satellite annotations overlaid on the X axis. Large arrays are labeled with their historic nomenclature 1 , HSat subfamilies 61 , and predominant repeat periodicities. b . An exact 2000-mer match dotplot of the Yq region (a dot is plotted when an identical 2000 base sequence is found at positions X and Y). The lower triangle has DYZ1/DYZ2 annotations overlaid as yellow and blue bars, respectively. Circled patterns in the upper triangle correspond to recent iterative duplication events, which are illustrated below the X axis. c . A reconstruction of a possible sequence of recent iterative duplications that could explain the observed dotplot patterns. d . A 2000-mer dotplot comparison of two ~800 kb HSat1B sub-arrays that were part of a recent large duplication event, along with self-self comparisons of the same arrays, revealing sites of more recent and smaller-scale deletions and expansions (annotated in yellow and red, with a possible sequence of events illustrated by the schematic on the right).
Extended Data Fig. 9 Genomic similarity in PARs and XTR and improved MAPQ of the PARs through informed sex chromosome complement reference.
a . Dotplots from LASTZ alignments of the CHM13-X, HG002-X, and HG002-Y (T2T-Y) over 96% sequence identity. Dashed gray lines represent the start and end of the approximate PARs or XTR boundaries. Disconnected diagonal lines indicate the presence of genomic diversity between each paired region. More genomic differences are observed in the PAR1 between the HG002-Y and CHM13-X. b - c . Average mapping quality (MAPQ) across GRCh38-X from simulated reads of an XX ( b ) and XY ( c ) sample. Top, a default version of GRCh38 (with two copies of identical PARs on XY). Middle, a version of GRCh38 informed on the sex chromosome complement (SCC) of the sample (entire Y hard-masked for the XX sample vs. only PARs on the Y hard-masked for the XY sample). Bottom, the difference in average MAPQ between the SCC and default approaches. MAPQ was averaged in 50 kb windows, sliding 10 kb across the chromosome. A positive value means MAPQ score is higher with SCC reference alignment compared to default alignment.
Extended Data Fig. 10 Number of variants called from 1KGP and SGDP individuals.
a . More variants are called on the X-PARs when using the sex chromosome complement reference approach (calling variants in diploid mode on PARs) than the non-masked approach (calling variants in haploid mode on PARs). The 1KGP results for GRCh38-Y are from Aganezov et al. 66 , which was performed on CHM13v1.0+GRCh38-Y. b . Num. of variants called from each 1KGP XY sample on chromosome GRCh38-Y and T2T-Y c . Num. of variants called in the syntenic region between the two Ys. A large num. of additional variants are called on each sample attributed to the newly added, non-syntenic sequences on T2T-Y. Within the syntenic regions, a reduction in the number of variants is observed for each population except for samples from R1 haplogroups as shown in Fig. 6c . d . Aggregated total number of variants for the 279 SGDP samples per chromosome. e . SGDP genome-wide counts of variants per-sample (n = 279) demonstrate increased variation in African samples regardless of reference. Each bar in the box plot represents the 1st, 2nd (median), and 3rd quartile of the number of variants in each population. Whiskers are bound to the 1.5 × interquartile range. Data outside of the whisker ranges are shown as dots. For the SGDP samples, variants were called using T2T-CHM13+Y or GRCh38 as the reference. All variants shown in this figure were filtered for “high quality (PASS)”.
Extended Data Fig. 11 Human contaminants in bacterial reference genomes.
a . Number of distinct RefSeq accessions in every 10 kb window containing 64-mers of GRCh38-Y (top), T2T-Y (middle), and in T2T-Y only (bottom). Here, RefSeq sequences with more than 20 64-mers or matching over 10% of the Y chromosome are included. b . Length distribution of the sequences from ( a ) in log scale. Majority of the shorter (<1 kb) sequences contain 64-mers found in HSat1B or HSat3. c . Number of bacterial RefSeq entries by strain identified to contain sequences of T2T-Y and not GRCh38-Y, visualized with Krona 158 .
This file contains Supplementary Methods, Figs. 1–19 and Notes 1–4.
Peer review file, supplementary tables.
This file contains Supplementary Tables 1–32.
This file contains Supplementary Data 1–3.
Rights and permissions
Reprints and Permissions
About this article
Cite this article.
Rhie, A., Nurk, S., Cechova, M. et al. The complete sequence of a human Y chromosome. Nature (2023). https://doi.org/10.1038/s41586-023-06457-y
Received : 02 December 2022
Accepted : 19 July 2023
Published : 23 August 2023
DOI : https://doi.org/10.1038/s41586-023-06457-y
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.
- Explore articles by subject
- Guide to authors
- Editorial policies
Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.